Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-18648

Thor sometimes fails to (re)start due to init_thorslave rsync hang/failure

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4.6
    • Component/s: Init system
    • Labels:
      None

      Description

      Occasionally Thor will fail to (re)start on a cluster.

      Thor master waits 15 minutes to connect to all slaves before giving up and exiting.

      It has been tracked down to when at least one host in cluster failed to start Thor slaves because init_thorslave startup rsync command hangs or fails and thus slaves.tmp file does not exist.

       

        Attachments

          Activity

            People

            • Assignee:
              mckellyln Mark Kelly
              Reporter:
              mckellyln Mark Kelly
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: