Uploaded image for project: 'Machine Learning Library'
  1. Machine Learning Library
  2. ML-350

Descriptive Stats Bundle



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 7.0.0
    • 7.0.0
    • None


      The primary target use for this bundle is the support of RAMPS. A secondary
      use is for stand-alone analysis. This analysis is intended to compliment the current analysis performed in the data hygine features of the SALT tools.
      The data analysis assumes that a record set will consist of multiple independent data subsets or work items.

      The data exploration features will be:

      • simple statistics;
      • frequency or distribution analysis;
      • correlation analysis.

      The simple statistics will vary by the type of data. For cardinal or measurement values, the simple statistics will include:

      • mean
      • variance
      • skew
      • kurtosis
      • median
        For ordinal and nominal data values the the simple statistics will include trend and sequence statistics such as the Walds-Wolfowitz and the Kolmogorov-Smirnov tests.

      Frequency or distribution analysis will include:

      • equal interval histograms for cardinal and measurement data;
      • value histograms for ordinal and nominal data;
      • discrete value histograms for discretized cardinal or measurement data.

      Corelation analysis will include:

      • correlation analysis between pairs of nominal values or a nominal and any other value type treated as a nominal value using Kendal's tau-B measure;
      • correlation analysis between pairs of ordinals or an ordinal and a cardinal or measurement value type treated as an ordinal using Spearman's rho;
      • correlation analysis between pairs of cardinal or measurement values using Pierson's correlation coefficient.

      A mechanism will be defined to specify the order of values for ordinal data.

      A mechanism will be defined to optionally identify the presense of a sequencing field that will be used to resequence the records for the sequence tests. If the sequencing field is not defined the order of the records will be used. If the sequence field values are not unique, the sequence of the data records will be used.




            timothyesler Tim Esler
            johnholt John Holt
            0 Vote for this issue
            4 Start watching this issue