Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-19608

Performance issue with LOOP with unbalanced data

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Core Libraries
    • Labels:
      None
    • Compatibility:
      Minor

      Description

      I am training a Neural Network using Back-Propagation, and I was having severe performance problems on Thor.  I isolated the problem down to the behavior of LOOP, though I haven’t been able to create a small test program that recreates it.  Here’s the background:

       

      • The BP method used does not lend itself to parallelization, so I put all the data on the first node.
      • The algorithm consists of three levels of loop:
        • IterLoop – The number of training iterations (n = 100)
          • DataLoop – Loop through the datapoints, adjusting weights after each (n = 100)
            • FFLoop – Loop through the layers (n = 2)
            • DeltaLoop – Loop through the layers (n = 2)
          • Running on Thor takes hundreds of times longer running on thor than on hthor.
          • Having determined that the LOOP is an issue, I recoded all the LOOPs to use LOCAL ITERATE instead.
          • Now running on thor is only twice as slow as hthor – I can live with that.
          • Next I tried to pin it down further, so I started to put the LOOPs back in.
          • I was able to change all the iterates back to LOOP without affecting performance substantially, except the DataLoop (middle loop).
            • Whenever I change the dataloop back to using LOOP, performance goes down by orders of magnitude.
            • The performance degredation is accompanied by thousands of warnings displayed on ECLWatch.
            • Some of those warnings include Deadman Timer Expiries, which may be related.
            • When that loop is done via ITERATE, I get no warnings.
          • So, attached are 2 ZAP-reports:
            • 1 with the DataLoop using ITERATE – NNIter(Runtime 2:34)
            • 1 with the DataLoop using LOOP – NNLoop (Runtime 2:37:54 – 60x longer)
            • All of the other loops are implemented with LOOP.
          • Keep in mind that all of the data is on Node 1, so only that node has any work to do during the 3 levels of loop.
          • If you would like to look at the code in question, the differences are localized to the attribute IterStep (around #360 in NeuralNetworks.ecl)

      If the problem is inherent to the distributed LOOP functionality, then perhaps a LOCAL variant of LOOP could be made available.

        Attachments

          Activity

            People

            • Assignee:
              rdev Roger Dev
              Reporter:
              rdev Roger Dev
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: