Uploaded image for project: 'Machine Learning Library'
  1. Machine Learning Library
  2. ML-349

Add IRegression2 and Model2 definitions to ML_Core

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4.12
    • Component/s: ML_Core

      Description

      1) Create Model2 definition in ML_Core based on ndArray(see discussion below)

      2) Add IRegression2 interface definition to ML_Core to fix problems with the current IRegression interface (see discussion in ML-343)

      3) Add LUCI_Rec and Field_Mapping  to ML_Core (copy from LearningTrees)

       

      4) Increment the version of ML_Core.

      Discussion on item 1 follows:

      Using N-Dimensional Arrays to represent models provides a more flexible model structure that will:
      1) eliminate complex encoding / decoding of models into 2D arrays (ala NumericField).
      2) support ensemble methods by supporting combination of multiple models (homogeneous or heterogeneous) into meta-models without re-encoding the models

      Tasks:
      Create ndArray layout in Types
      Create ndArray operations module
      Make Model (Types) based on ndArray

      Operations module would include:

      • Insert – Insert an ndArray into another ndArray at a given location
      • Extract – Extract an ndArray from a given location of an ndArray
      • Filter – Filter the ndArray using wildcarded index values e.g. [8.4.*.3] > 4.2
      • ToNF – Convert a two dimensional ndArray dataset to a NumericField dataset
      • FromNF – Convert a NumericField dataset to a two dimensional ndArray

      Proposed ndArray structure:

      EXPORT t_index := UNSIGNED4;
      EXPORT t_ndIndexes := SET OF t_index;
      EXPORT ndArray := RECORD
        t_work_item wi;
        t_fieldReal value;
        t_ndIndexes indexes;
      END;

      Example Usage:

      // Convert a 2D numeric field array into an ndArray. Put the array under
      // index 1. E.g., id=1, number = 1 would go to [1,1,1] in the ndArray
      DATASET(ndArray) myModel0 := ndArrayMod.FromNF(mySimpleModel, [1]);
      // Do the same for a different 2D array under index 2
      // E.g., id = 1, number = 3 would go to [2, 1, 3]
      DATASET(ndArray) myModel1 := ndArrayMod.FromNF(someMetaData, [2]);
      // And another under index 3
      DATASET(ndArray) myModel2 := ndArrayMod.FromNF(someOtherData, [3]);
      // Combine the three into a single ndArray
      DATASET(ndArray) myModel := myModel0 + myModel1 + myModel2;
      // Now insert this model and another into a 'meta-model'
      DATASET(ndArray) metaMod0 := ndArrayMod.Insert(myModel, [1]);
      DATASET(ndArray) metaMod1 := ndArrayMod.Insert(anotherModel, [2]);
      // Now myModel[1,1,1] (the original NumericField id=1, number = 1)
      // goes to [1,1,1,1]
      DATASET(ndArray) metaMod := metaMod0 + metaMod1;
      // Now retrieve myModel from metaMod
      DATASET(ndArray) mod1 := ndArrayMod.Extract(metaMod, [1]);
      // Get original mySimpleModel from that model
      DATASET(ndArray) simpleModel1 := ndArrayMod.Extract(mod1, [1]);
      // simpleModel1 is a 2D array. I can convert it back to NumericField if
      // I want.
      DATASET(NumericField) origModel := ndArrayMod.ToNF(simpleModel1, []);
      // Or I could have done it in one step if I just wanted this NF dataset
      DATASET(NumericField) origModel := ndArrayMod.ToNF(metaModel, [1,1]);
      // Suppose I want a list of metaData[1,1] (the first piece of metaData for 
      // each model assuming the 2 models were homogeneous). I would do:
      DATASET(ndArray) metaDat1_1 := ndArrayMod.filter(metaModel, [0, 2,1, 1]);
      // The above would return items [1,2,1,1] and [2,2,1,1]. Zero is not a valid
      // index value and is used to indicate wildcards in 'filter'.
      // Likewise, I could get the set of all metaData for all models:
      DATASET(ndArray) allMetaDat := ndArrayMod.filter(metaModel, [0, 2]);

      Notes on proposed ndArray structure:
      This is a very flexible data structure that is more versatile (and simpler) than a traditional ND array:

      • It can hold an ND array of any shape
      • It supports Jagged ND Arrays – does not require a fixed length in any dimension.
      • It supports arbitrary tree structures. Not only is the dimension variable, but so is the number of dimensions which allows for deeply unbalanced trees (or Jagged Dimensional Jagged Arrays)
      • It could be constrained to any of the above standard data types (e.g. rectangular ND Arrays) by adding restrictions, but I see no reason to do so at this point, where its primary function is to contain models of arbitrary complexity and composition.
      • It could be extended in several ways:
        • Allow string or numeric values
        • Provide self-documenting structure by allowing 'description' at each cell

       

        Attachments

          Activity

            People

            • Assignee:
              rdev Roger Dev
              Reporter:
              rdev Roger Dev
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: