Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.10.8
-
None
Description
The dev platform cluster is up and running. From node 1, I restarted the cluster by running sudo /opt/HPCCSystems/hpcc-run.sh -a hpcc-init restart. The start script started roxie nodes 1 and 10 first (slightly surprising since I thought dali was supposed to be the first thing started). Right after dali was started it was followed by other support nodes (esp, eclccserver, eclserver, etc) and thor, then the rest of the roxie nodes.
Roxie nodes 1 and 10 started in about 1 minutes (ran testsocket control:alive). The nodes responded before thor finished starting. Once the init system finished with thor, the other 48 roxie nodes were started. These 48 nodes took approximately 50 minutes to start. This sounds to me like there could be some contention connecting to dali (dali is on node 102)
I'm not sure this next thing is an issue...I also see "no more data files to copy followed by received data files to copy. It seems that roxie is trying to fully load and resolve data for queries 1 at a time