Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-22081

PIPE "Failed to create process ..." error provides insufficient debug information

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.4.x
    • Fix Version/s: 7.4.0
    • Component/s: EclAgent, Roxie, Thor
    • Labels:
      None
    • Environment:
      6.4.2-rc3, 16-way, 4-instances, 1 slave/instance, 4 channels/slave, AWS m4.2xlarge for slave instances.

      Description

      One of our Thors sometimes experiences the following error.

      <Error><source>eclagent</source><code>10003</code><message>System error: 10003: Graph graph13[142], pipethrough[148]: SLAVE #2 [10.53.56.43:20100]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh"</message></Error>

      In the log file the message appears as follows:

      005F98A1 2019-05-07 03:44:11.100 13359 9979 "ERROR: PipeWriterThread.3 - activity(ch=0, pipethrough, 148) : Graph graph13[142], pipethrough[148]: Failed to create process in /var/lib/HPCCSystems/thor1/ for : /bin/bash -c "IFS=, read file REST;echo ${file} | MY_HDRROWCNT=1 HPCC_WUID=W20190507-033846 HPCC_NODE=1 HPCC_NODES=16 /bin/bash /opt/HPCCSystems/scripts/bin/read_ext_file_py.sh""

      When rerun during working hours, the workunit completes without issues.

      None of the HPCC logs, nor the system message logs, provide any clues concerning the cause of the error. It would be great if the failing "errno" and the failing system call were available for review in one of the logs. This issue only occurs in a production system and the failure only occurs at certain times when the system is busy and support staff are not generally available. It seems likely that the underlying issue is a resource issue of some type, however we don't have any way to narrow down the possible root issues. Because this is a production system, our ability to "just try" a change is limited.

        Attachments

          Activity

            People

            • Assignee:
              jakesmith Jake Smith
              Reporter:
              brianb644 Brian Bounds
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: