Details

Improvement

Status: Resolved

Major

Resolution: Fixed

None
Description
1) Create Model2 definition in ML_Core based on ndArray(see discussion below)
2) Add IRegression2 interface definition to ML_Core to fix problems with the current IRegression interface (see discussion in ML343)
3) Add LUCI_Rec and Field_Mapping to ML_Core (copy from LearningTrees)
4) Increment the version of ML_Core.
Discussion on item 1 follows:
Using NDimensional Arrays to represent models provides a more flexible model structure that will:
1) eliminate complex encoding / decoding of models into 2D arrays (ala NumericField).
2) support ensemble methods by supporting combination of multiple models (homogeneous or heterogeneous) into metamodels without reencoding the models
Tasks:
Create ndArray layout in Types
Create ndArray operations module
Make Model (Types) based on ndArray
Operations module would include:
 Insert – Insert an ndArray into another ndArray at a given location
 Extract – Extract an ndArray from a given location of an ndArray
 Filter – Filter the ndArray using wildcarded index values e.g. [8.4.*.3] > 4.2
 ToNF – Convert a two dimensional ndArray dataset to a NumericField dataset
 FromNF – Convert a NumericField dataset to a two dimensional ndArray
Proposed ndArray structure:
EXPORT t_index := UNSIGNED4; EXPORT t_ndIndexes := SET OF t_index; EXPORT ndArray := RECORD t_work_item wi; t_fieldReal value; t_ndIndexes indexes; END;
Example Usage:
// Convert a 2D numeric field array into an ndArray. Put the array under // index 1. E.g., id=1, number = 1 would go to [1,1,1] in the ndArray DATASET(ndArray) myModel0 := ndArrayMod.FromNF(mySimpleModel, [1]); // Do the same for a different 2D array under index 2 // E.g., id = 1, number = 3 would go to [2, 1, 3] DATASET(ndArray) myModel1 := ndArrayMod.FromNF(someMetaData, [2]); // And another under index 3 DATASET(ndArray) myModel2 := ndArrayMod.FromNF(someOtherData, [3]); // Combine the three into a single ndArray DATASET(ndArray) myModel := myModel0 + myModel1 + myModel2; // Now insert this model and another into a 'metamodel' DATASET(ndArray) metaMod0 := ndArrayMod.Insert(myModel, [1]); DATASET(ndArray) metaMod1 := ndArrayMod.Insert(anotherModel, [2]); // Now myModel[1,1,1] (the original NumericField id=1, number = 1) // goes to [1,1,1,1] DATASET(ndArray) metaMod := metaMod0 + metaMod1; // Now retrieve myModel from metaMod DATASET(ndArray) mod1 := ndArrayMod.Extract(metaMod, [1]); // Get original mySimpleModel from that model DATASET(ndArray) simpleModel1 := ndArrayMod.Extract(mod1, [1]); // simpleModel1 is a 2D array. I can convert it back to NumericField if // I want. DATASET(NumericField) origModel := ndArrayMod.ToNF(simpleModel1, []); // Or I could have done it in one step if I just wanted this NF dataset DATASET(NumericField) origModel := ndArrayMod.ToNF(metaModel, [1,1]); // Suppose I want a list of metaData[1,1] (the first piece of metaData for // each model assuming the 2 models were homogeneous). I would do: DATASET(ndArray) metaDat1_1 := ndArrayMod.filter(metaModel, [0, 2,1, 1]); // The above would return items [1,2,1,1] and [2,2,1,1]. Zero is not a valid // index value and is used to indicate wildcards in 'filter'. // Likewise, I could get the set of all metaData for all models: DATASET(ndArray) allMetaDat := ndArrayMod.filter(metaModel, [0, 2]);
Notes on proposed ndArray structure:
This is a very flexible data structure that is more versatile (and simpler) than a traditional ND array:
 It can hold an ND array of any shape
 It supports Jagged ND Arrays – does not require a fixed length in any dimension.
 It supports arbitrary tree structures. Not only is the dimension variable, but so is the number of dimensions which allows for deeply unbalanced trees (or Jagged Dimensional Jagged Arrays)
 It could be constrained to any of the above standard data types (e.g. rectangular ND Arrays) by adding restrictions, but I see no reason to do so at this point, where its primary function is to contain models of arbitrary complexity and composition.
 It could be extended in several ways:
 Allow string or numeric values
 Provide selfdocumenting structure by allowing 'description' at each cell