Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
1) Create Model2 definition in ML_Core based on ndArray(see discussion below)
2) Add IRegression2 interface definition to ML_Core to fix problems with the current IRegression interface (see discussion in ML-343)
3) Add LUCI_Rec and Field_Mapping to ML_Core (copy from LearningTrees)
4) Increment the version of ML_Core.
Discussion on item 1 follows:
Using N-Dimensional Arrays to represent models provides a more flexible model structure that will:
1) eliminate complex encoding / decoding of models into 2D arrays (ala NumericField).
2) support ensemble methods by supporting combination of multiple models (homogeneous or heterogeneous) into meta-models without re-encoding the models
Tasks:
Create ndArray layout in Types
Create ndArray operations module
Make Model (Types) based on ndArray
Operations module would include:
- Insert – Insert an ndArray into another ndArray at a given location
- Extract – Extract an ndArray from a given location of an ndArray
- Filter – Filter the ndArray using wildcarded index values e.g. [8.4.*.3] > 4.2
- ToNF – Convert a two dimensional ndArray dataset to a NumericField dataset
- FromNF – Convert a NumericField dataset to a two dimensional ndArray
Proposed ndArray structure:
EXPORT t_index := UNSIGNED4; EXPORT t_ndIndexes := SET OF t_index; EXPORT ndArray := RECORD t_work_item wi; t_fieldReal value; t_ndIndexes indexes; END;
Example Usage:
// Convert a 2D numeric field array into an ndArray. Put the array under // index 1. E.g., id=1, number = 1 would go to [1,1,1] in the ndArray DATASET(ndArray) myModel0 := ndArrayMod.FromNF(mySimpleModel, [1]); // Do the same for a different 2D array under index 2 // E.g., id = 1, number = 3 would go to [2, 1, 3] DATASET(ndArray) myModel1 := ndArrayMod.FromNF(someMetaData, [2]); // And another under index 3 DATASET(ndArray) myModel2 := ndArrayMod.FromNF(someOtherData, [3]); // Combine the three into a single ndArray DATASET(ndArray) myModel := myModel0 + myModel1 + myModel2; // Now insert this model and another into a 'meta-model' DATASET(ndArray) metaMod0 := ndArrayMod.Insert(myModel, [1]); DATASET(ndArray) metaMod1 := ndArrayMod.Insert(anotherModel, [2]); // Now myModel[1,1,1] (the original NumericField id=1, number = 1) // goes to [1,1,1,1] DATASET(ndArray) metaMod := metaMod0 + metaMod1; // Now retrieve myModel from metaMod DATASET(ndArray) mod1 := ndArrayMod.Extract(metaMod, [1]); // Get original mySimpleModel from that model DATASET(ndArray) simpleModel1 := ndArrayMod.Extract(mod1, [1]); // simpleModel1 is a 2D array. I can convert it back to NumericField if // I want. DATASET(NumericField) origModel := ndArrayMod.ToNF(simpleModel1, []); // Or I could have done it in one step if I just wanted this NF dataset DATASET(NumericField) origModel := ndArrayMod.ToNF(metaModel, [1,1]); // Suppose I want a list of metaData[1,1] (the first piece of metaData for // each model assuming the 2 models were homogeneous). I would do: DATASET(ndArray) metaDat1_1 := ndArrayMod.filter(metaModel, [0, 2,1, 1]); // The above would return items [1,2,1,1] and [2,2,1,1]. Zero is not a valid // index value and is used to indicate wildcards in 'filter'. // Likewise, I could get the set of all metaData for all models: DATASET(ndArray) allMetaDat := ndArrayMod.filter(metaModel, [0, 2]);
Notes on proposed ndArray structure:
This is a very flexible data structure that is more versatile (and simpler) than a traditional ND array:
- It can hold an ND array of any shape
- It supports Jagged ND Arrays – does not require a fixed length in any dimension.
- It supports arbitrary tree structures. Not only is the dimension variable, but so is the number of dimensions which allows for deeply unbalanced trees (or Jagged Dimensional Jagged Arrays)
- It could be constrained to any of the above standard data types (e.g. rectangular ND Arrays) by adding restrictions, but I see no reason to do so at this point, where its primary function is to contain models of arbitrary complexity and composition.
- It could be extended in several ways:
- Allow string or numeric values
- Provide self-documenting structure by allowing 'description' at each cell