Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-16120

Thor can fail to start because of script issue in kill_slaves

    XMLWordPrintable

    Details

    • Type: Regression
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.0.0, 6.0.2, 6.0.4
    • Fix Version/s: 6.0.6
    • Component/s: Thor
    • Labels:
      None

      Description

      Looks like a regression slipped in here : HPCC-14429 :

              # we want to kill only slaves that have already been started in run_thor
              if [[ -r $instancedir/uslaves ]]; then
                  clusternodes=$(cat $instancedir/uslaves 2> /dev/null | wc -l)
                  $deploydir/frunssh $instancedir/slaves "/bin/sh -c '$deploydir/init_thorslave stop localhost $slavespernode $THORSLAVEPORT $slaveportinc $THORMASTER $THORMASTERPORT $LOG_DIR $instancedir $deploydir $THORNAME $PATH_PRE $logredirect'" -i:$SSHidentityfile -u:$SSHusername -pe:$SSHpassword -t:$SSHtimeout -a:$SSHretries -n:$clusternodes 2>&1
                  FRUNSSH_RC=$?
                  if [[ ${FRUNSSH_RC} -gt 0 ]]; then
                      log "Error ${FRUNSSH_RC} in frunssh"
                      log "Please check ${LOG_DIR}/frunssh for more details"
                      # clean up any slaves it was able to reach
                      log "Stopping ${component}"
                      kill_process ${PID_NAME} thormaster_${component} 30
                      unlock /var/lock/HPCCSystems/$component/${component}.lock
                      rm -f $INIT_PID_NAME $instancedir/slaves > /dev/null 2>&1
                      exit 255
                  fi
              fi
      

      The 1st if checks for existence of 'uslaves', but then uses 'slaves' in the frunssh.
      Which means if the uslaves file exists at startup, frunssh fails and the process exits.

      I'm not sure how uslaves can get left behind, separate issue, but it's been seen to happen and cause Thor to be unrestartable.

        Attachments

          Activity

            People

            • Assignee:
              Michael-Gardner Michael Gardner
              Reporter:
              jakesmith Jake Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: