Uploaded image for project: 'Machine Learning Library'
  1. Machine Learning Library
  2. ML-426

Job hangs or runs out of memory when LOOP depth exceeds 68

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 7.0.0
    • Learning Trees
    • None

    Description

      Process runs fine up till 68 iterations.  On the 69th iteration (depending on data), the job fails to complete or runs out of memory.  This occurs when there are non-separable data points in the training set, causing the process to run to max-depth.

      The allocated node ids are reorganized after every 32 iterations to avoid overflowing but it turns out that under certain conditions, the node id can wrap before 32 iterations has gone by.  This causes a mismatch in ids and confounds the JOIN, creating potentially billions of records as output.

      The fix is two-fold:

      • Increase the size of the nodeId field from UNSIGNED4 to UNSIGNED8
      • Create a positive test for overflow, rather than depending on a fixed count of iterations.

      The nodeId field should still be constained to <= 2**48 as this is the limit of what can be held in a Layout_Model2 field.

       

      Attachments

        Activity

          People

            rdev Roger Dev
            rdev Roger Dev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: