Details
-
Bug
-
Status: Resolved
-
Not specified
-
Resolution: Fixed
-
8.10.18
-
None
-
Single hardware node
Description
Report from outside customer; email excerpts:
- Server which had dali/sasha corruption recently
- Server becomes very slow - commands, logging in, etc
- Defunct slave processes build up over days
- 9 Days since last restart of HPCC cluster => There are 53774 zombie processes (see zombie_pic enclosure)
I have skimmed through the logs for dfu, dali, sasha, but I can’t seem to find any clear reason why these processes are not terminating properly.
In fact, based on what I have seen in the “ftslave” logs, the slaves appear to be acknowledging a terminate call from the master – which makes sense as the processes are becoming defunct – but I wonder why they are not being cleaned up?
Small sample of ftslave log enclosed. A larger log file archive is available.