Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-27561

Prevent Thor manager watchdog from stopping (on e.g. deserialization error), causing a build up of orphaned worker MP messages

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Not specified
    • Resolution: Fixed
    • None
    • 8.6.22, 7.12.112
    • Thor
    • None

    Description

      The Thor manager watchdog runs at the start of each graph and waits for watchdog/progress packets from the workers.
      If there is an exception processing one of those packets, it stops.

      Workers continue to send progress packets and the MP messaging system keeps all of them pending waiting to be read.
      This causes over time, a massive build up of pending messages - which wastes memory, but I think also causes a huge slowdown in MP communication between manager and workers
      (as seen primarily by very slow sorts).

      I believe this is being seen now, because a serialization/deserialization issue has been introduced in recent builds related to the sub file stats.

      Attachments

        Activity

          People

            jakesmith Jake Smith
            jakesmith Jake Smith
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: