Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-21485

Performance monitoring causing jobs to crash when calculating CPU usage

    Details

    • Type: Regression
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 7.2.0
    • Fix Version/s: 7.2.0
    • Component/s: JLib
    • Labels:
      None
    • Environment:
      Ubuntu 18.04 running in Parallels
      Build from commit d84e6b1c

      Description

      While running pretty much any job under Thor, the process terminates unexpectedly. I'm running a debug build, so I have a core with symbols. Stack trace:

      Core was generated by `./thorslave_mythor master=10.211.55.15:20000 slave=.:20100 slavenum=1 slaveproc'.
      Program terminated with signal SIGFPE, Arithmetic exception.
      #0  0x00007f60a9b1cd77 in CpuInfo::getPercentCpu (this=0x5595d2c60398) at /home/campda01/Projects/HPCC-Platform/system/jlib/jdebug.cpp:898
      898	    unsigned percent = (unsigned)(((total - idle) * 100) / idle);
      
      #0  0x00007f60a9b1cd77 in CpuInfo::getPercentCpu (this=0x5595d2c60398) at /home/campda01/Projects/HPCC-Platform/system/jlib/jdebug.cpp:898
      #1  0x00007f60a9b22c00 in CExtendedStats::getCPU (this=0x5595d2c60320) at /home/campda01/Projects/HPCC-Platform/system/jlib/jdebug.cpp:2048
      #2  0x00007f60a9b25335 in CMemoryUsageReporter::getSystemTraceInfo (this=0x5595d2c601e0, str=..., mode=9) at /home/campda01/Projects/HPCC-Platform/system/jlib/jdebug.cpp:2620
      #3  0x00007f60a9b1e2b9 in getSystemTraceInfo (str=..., mode=9) at /home/campda01/Projects/HPCC-Platform/system/jlib/jdebug.cpp:2785
      #4  0x00007f60ab9e1b03 in CGraphBase::executeSubGraph (this=0x5595d2cbde60, parentExtractSz=0, parentExtract=0x0)
          at /home/campda01/Projects/HPCC-Platform/thorlcr/graph/thgraph.cpp:1332
      #5  0x00007f60abc644e6 in CSlaveGraph::executeSubGraph (this=0x5595d2cbde60, parentExtractSz=0, parentExtract=0x0)
          at /home/campda01/Projects/HPCC-Platform/thorlcr/graph/thgraphslave.cpp:1160
      #6  0x00007f60ab9e907d in CJobChannel::runSubgraph (this=0x5595d2ce46d0, graph=..., parentExtractSz=0, parentExtract=0x0)
          at /home/campda01/Projects/HPCC-Platform/thorlcr/graph/thgraph.cpp:3021
      #7  0x00007f60abc66e5a in CJobSlaveChannel::runSubgraph (this=0x5595d2ce46d0, graph=..., parentExtractSz=0, parentExtract=0x0)
          at /home/campda01/Projects/HPCC-Platform/thorlcr/graph/thgraphslave.cpp:1802
      #8  0x00007f60ab9f1179 in CGraphExecutor::CGraphExecutorFactory::createNew()::CGraphExecutorThread::threadmain() (this=0x5595d2cd57d0)
          at /home/campda01/Projects/HPCC-Platform/thorlcr/graph/thgraph.cpp:2305
      #9  0x00007f60a9c0560d in CPooledThreadWrapper::run (this=0x5595d2cd5800) at /home/campda01/Projects/HPCC-Platform/system/jlib/jthread.cpp:908
      #10 0x00007f60a9c020d5 in Thread::begin (this=0x5595d2cd5800) at /home/campda01/Projects/HPCC-Platform/system/jlib/jthread.cpp:267
      #11 0x00007f60a9c01b1f in Thread::_threadmain (v=0x5595d2cd5800) at /home/campda01/Projects/HPCC-Platform/system/jlib/jthread.cpp:113
      #12 0x00007f60a92286db in start_thread (arg=0x7f5f7d7f2700) at pthread_create.c:463
      #13 0x00007f60a8f5188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      It appears that the idle protected member is zero:

      unsigned CpuInfo::getPercentCpu() const
      {
          __uint64 total = getTotal();
          if (total == 0)
              return 0;
          unsigned percent = (unsigned)(((total - idle) * 100) / idle);
          if (percent > 100)
              percent = 100;
          return percent;
      }
      

        Attachments

          Activity

            People

            • Assignee:
              ghalliday Gavin Halliday
              Reporter:
              dcamper Dan S. Camper
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: