Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-18648

Thor sometimes fails to (re)start due to init_thorslave rsync hang/failure

    XMLWordPrintable

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.4.6
    • Init system
    • None

    Description

      Occasionally Thor will fail to (re)start on a cluster.

      Thor master waits 15 minutes to connect to all slaves before giving up and exiting.

      It has been tracked down to when at least one host in cluster failed to start Thor slaves because init_thorslave startup rsync command hangs or fails and thus slaves.tmp file does not exist.

       

      Attachments

        Activity

          People

            mckellyln Mark Kelly
            mckellyln Mark Kelly
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: