Uploaded image for project: 'Machine Learning Library'
  1. Machine Learning Library
  2. ML-426

Job hangs or runs out of memory when LOOP depth exceeds 68

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0.0
    • Component/s: Learning Trees
    • Labels:
      None

      Description

      Process runs fine up till 68 iterations.  On the 69th iteration (depending on data), the job fails to complete or runs out of memory.  This occurs when there are non-separable data points in the training set, causing the process to run to max-depth.

      The allocated node ids are reorganized after every 32 iterations to avoid overflowing but it turns out that under certain conditions, the node id can wrap before 32 iterations has gone by.  This causes a mismatch in ids and confounds the JOIN, creating potentially billions of records as output.

      The fix is two-fold:

      • Increase the size of the nodeId field from UNSIGNED4 to UNSIGNED8
      • Create a positive test for overflow, rather than depending on a fixed count of iterations.

      The nodeId field should still be constained to <= 2**48 as this is the limit of what can be held in a Layout_Model2 field.

       

        Attachments

          Activity

            People

            • Assignee:
              rdev Roger Dev
              Reporter:
              rdev Roger Dev
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: