1) Create Model2 definition in ML_Core based on ndArray(see discussion below)
2) Add IRegression2 interface definition to ML_Core to fix problems with the current IRegression interface (see discussion in
3) Add LUCI_Rec and Field_Mapping to ML_Core (copy from LearningTrees)
4) Increment the version of ML_Core.
Discussion on item 1 follows:
Using N-Dimensional Arrays to represent models provides a more flexible model structure that will:
1) eliminate complex encoding / decoding of models into 2D arrays (ala NumericField).
2) support ensemble methods by supporting combination of multiple models (homogeneous or heterogeneous) into meta-models without re-encoding the models
Create ndArray layout in Types
Create ndArray operations module
Make Model (Types) based on ndArray
Operations module would include:
- Insert – Insert an ndArray into another ndArray at a given location
- Extract – Extract an ndArray from a given location of an ndArray
- Filter – Filter the ndArray using wildcarded index values e.g. [8.4.*.3] > 4.2
- ToNF – Convert a two dimensional ndArray dataset to a NumericField dataset
- FromNF – Convert a NumericField dataset to a two dimensional ndArray
Proposed ndArray structure:
Notes on proposed ndArray structure:
This is a very flexible data structure that is more versatile (and simpler) than a traditional ND array:
- It can hold an ND array of any shape
- It supports Jagged ND Arrays – does not require a fixed length in any dimension.
- It supports arbitrary tree structures. Not only is the dimension variable, but so is the number of dimensions which allows for deeply unbalanced trees (or Jagged Dimensional Jagged Arrays)
- It could be constrained to any of the above standard data types (e.g. rectangular ND Arrays) by adding restrictions, but I see no reason to do so at this point, where its primary function is to contain models of arbitrary complexity and composition.
- It could be extended in several ways:
- Allow string or numeric values
- Provide self-documenting structure by allowing 'description' at each cell