Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-29186

Many zombie run_ftslave processes

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Not specified
    • Resolution: Fixed
    • 8.10.18
    • 9.0.0
    • FTSlave
    • None
    • Single hardware node

    Description

      Report from outside customer; email excerpts:

      • Server which had dali/sasha corruption recently
      • Server becomes very slow - commands, logging in, etc
      • Defunct slave processes build up over days
      • 9 Days since last restart of HPCC cluster => There are 53774 zombie processes (see zombie_pic enclosure)

      I have skimmed through the logs for dfu, dali, sasha, but I can’t seem to find any clear reason why these processes are not terminating properly.

      In fact, based on what I have seen in the “ftslave” logs, the slaves appear to be acknowledging a terminate call from the master – which makes sense as the processes are becoming defunct – but I wonder why they are not being cleaned up?

      Small sample of ftslave log enclosed. A larger log file archive is available.

      Attachments

        Activity

          People

            jakesmith Jake Smith
            dcamper Dan S. Camper
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: