Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-10894

Loop can spuriously try to continue and timeout if iteration one 1 slave is > 25 minutes

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.4
    • Component/s: Thor
    • Labels:
      None

      Description

      If a global LOOP takes over 25 minutes between two slaves finishing their iteration, the barrier code in a global loop, spuriously interpreted the timeout as meaning 'proceed with next iteration', with the result that thormaster started the next iteration prematurely (tried to) or started an extra iteration that the slaves were not expecting.

      In the case of starting an extra iteration, where the slaves have actually finished looping, the error seen is:

      serializeCreateContexts - Timeout receiving from slaves

      In the cast where it times out on an intermediate iteration, it would mean there were unconsumed end of loop messages from previous iterations pending, which would be consumed (early) by next iteration, causing all subsequent iterations to start early (on the master). It's likely you'd hit similar timeouts to the one above, as graphs initializing on master vs slave expect messages in short time frames, don't get them quickly enough.

        Attachments

          Activity

            People

            • Assignee:
              jakesmith Jake Smith
              Reporter:
              jakesmith Jake Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: