Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.2
    • Fix Version/s: 4.2.4
    • Component/s: DFU Server
    • Labels:
      None

      Description

      http://10.144.84.12:8010/?inner=../FileSpray/GetDFUWorkunit%3Fwuid%3DD20140305-105139

      We are seeing an issue in uslm infra testing 4.2.2 rc12 with dfu copies getting hung. The dfuserver logs it failed to connect to correct slave, and the ftslave log on the thorslave shows it is because the port is in use, and it is, but it used by ftslave.

      DFU log:
      000202B2 2014-03-05 10:54:35.882 2986 24686 "Failed to connect to correct slave (0,14006/13919) - try again later"
      000202B3 2014-03-05 10:54:36.426 2986 24663 "Try to connect to slave 10.144.104.82:6407"
      000202B4 2014-03-05 10:54:36.427 2986 24663 "Failed to connect to correct slave (0,13969/13950) - try again later"
      000202B5 2014-03-05 10:54:36.517 2986 24740 "Try to connect to slave 10.144.104.40:6414"
      000202B6 2014-03-05 10:54:36.517 2986 24740 "Failed to connect to correct slave (0,14001/13866) - try again later"
      000202B7 2014-03-05 10:54:37.183 2986 24710 "Try to connect to slave 10.144.104.60:6401"
      000202B8 2014-03-05 10:54:37.184 2986 24710 "Failed to connect to correct slave (0,14014/13902) - try again later"
      000202B9 2014-03-05 10:54:37.342 2986 24750 "Try to connect to slave 10.144.104.73:6418"
      000202BA 2014-03-05 10:54:37.342 2986 24750 "Failed to connect to correct slave (0,14038/13951) - try again later"
      000202BB 2014-03-05 10:54:37.778 2986 24702 "Try to connect to slave 10.144.104.34:6412"

      Ftslave log:
      00000000 2014-03-05 10:54:27.008 31858 31858 "Starting ftslave 10.144.84.15 13969 0 6407 D20140305-105139 /var/log/HPCCSystems/dfuserver"
      00000001 2014-03-05 10:54:27.008 31858 31858 "Starting remote slave. Master=10.144.84.15 reply=13969 port=6407"
      00000002 2014-03-05 10:54:27.008 31858 31858 "Ready to listen. reply=13969 port=6407"
      00000003 2014-03-05 10:54:27.008 31858 31858 "ERROR: -7: /var/lib/jenkins/workspace/LN-Candidate-withplugins-4.2.2-rc12/LN/centos-5.7-x86_64/HPCC-Platform/common/remote/rmtspawn.cpp(369) : Failed to create master listener: : port in use
      Target: S>10.144.104.82, port = 6407, Raised in: /var/lib/jenkins/workspace/LN-Candidate-withplugins-4.2.2-rc12/LN/centos-5.7-x86_64/HPCC-Platform/system/jlib/jsocket.cpp, line 905"
      00000004 2014-03-05 10:54:27.042 31762 31762 "Process incoming connection. reply=13950 got(13969,10.144.84.15)"
      00000005 2014-03-05 10:54:27.093 31762 31762 "Ready to accept connection. reply=13950"
      00000004 2014-03-05 10:54:31.392 31858 31858 "Ready to listen. reply=13969 port=6407"
      00000005 2014-03-05 10:54:31.392 31858 31858 "ERROR: -7: /var/lib/jenkins/workspace/LN-Candidate-withplugins-4.2.2-rc12/LN/centos-5.7-x86_64/HPCC-Platform/common/remote/rmtspawn.cpp(369) : Failed to create master listener: : port in use

      Joe aborted and resubmitted the job and it completed, but a few jobs later he hit the same issue again.

        Attachments

          Activity

            People

            • Assignee:
              ghalliday Gavin Halliday
              Reporter:
              rwagner42 Russell Wagner
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: