Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-17420

Follow up crash in lookupjoin abort()

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.2.12
    • Fix Version/s: 6.4.0, 6.2.12
    • Component/s: Thor
    • Labels:
    • Environment:
      Alpha dev

      Description

      Header build job consistently failing in Alpha dev.

      Example WUs:

      http://alpha_staging_thor_esp.risk.regn.net:8010/?Wuid=W20170414-133216&Widget=WUDetailsWidget
      http://alpha_staging_thor_esp.risk.regn.net:8010/?Wuid=W20170414-130816&Widget=WUDetailsWidget
      http://alpha_staging_thor_esp.risk.regn.net:8010/?Wuid=W20170414-112900&Widget=WUDetailsWidget

      All 3 failed the same way. Below is a sample of one of the

      W20170414-112900 - Slave 10.194.64.15

      000000A9 2017-04-14 13:01:14.787 12955 13038 "Processing graph - graph(graph2, 20)"
      000000AA 2017-04-14 13:01:14.934 12955 12955 "GraphAbort: W20170414-112900graph2"
      000000AB 2017-04-14 13:01:14.934 12955 12955 "Abort condition set - activity(ch=0, diskwrite, 25)"
      000000AC 2017-04-14 13:01:14.934 12955 12955 "Abort condition set - activity(ch=0, hashdistribute, 24)"
      000000AD 2017-04-14 13:01:14.934 12955 12955 "Abort condition set - activity(ch=0, lookupjoin, 23)"
      000000AE 2017-04-14 13:01:14.934 12955 12955 "================================================"
      000000AF 2017-04-14 13:01:14.934 12955 13038 " - graph(graph2, 20) : Error receiving actinit data for graph: 20"
      000000B0 2017-04-14 13:01:14.934 12955 12955 "Signal: 11 Segmentation fault"
      000000B1 2017-04-14 13:01:14.934 12955 12955 "Fault IP: 00007FCF4ABE0145"
      000000B2 2017-04-14 13:01:14.934 12955 13038 "End of sub-graph - graph(graph2, 20)"
      000000B3 2017-04-14 13:01:14.934 12955 12955 "Accessing: 0000000000000240"
      000000B4 2017-04-14 13:01:14.934 12955 13038 "HASHDISTRIB: kill - activity(ch=0, hashdistribute, 24)"
      000000B5 2017-04-14 13:01:14.934 12955 12955 "Registers:"
      000000B6 2017-04-14 13:01:14.934 12955 13038 " - graph(graph2, 20) : Error receiving actinit data for graph: 20"
      000000B7 2017-04-14 13:01:14.934 12955 12955 "EAX:0000000000000000 EBX:00000000017E9B00 ECX:0000000000000000 EDX:00000000017E9F28 ESI:0000000000000000 EDI:00000000017E9F28"
      000000B8 2017-04-14 13:01:14.934 12955 13038 "Graph Done - graph(graph2, 20)"
      000000B9 2017-04-14 13:01:14.934 12955 12955 "CS:EIP:0033:00007FCF4ABE0145"
      000000BA 2017-04-14 13:01:14.934 12955 12955 " ESP:00007FFC3BD15DB0 EBP:0000000000000000"
      000000BB 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15DB0]: 000000000180F1C0 017EAA8000000000 00000000017EAA80 016F136000000000 00000000016F1360 4A20CD8400000000 00007FCF4A20CD84 34000BB40000
      7FCF"
      000000BC 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15DD0]: 00007FCF34000BB4 016F136000007FCF 00000000016F1360 017EAA8000000000 00000000017EAA80 016F136000000000 00000000016F1360 0180BBE00000
      0000"
      000000BD 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15DF0]: 000000000180BBE0 3BD15F1000000000 00007FFC3BD15F10 0180EE6000007FFC 000000000180EE60 4A894A8200000000 00007FCF4A894A82 000000000000
      7FCF"
      000000BE 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15E10]: 0000000000000000 0040C5B500000000 000000000040C5B5 0001000100000000 00007FCF00010001 3BD162F000007FCF 00007FFC3BD162F0 3BD160C00000
      7FFC"
      000000BF 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15E30]: 00007FFC3BD160C0 3BD15F1000007FFC 00007FFC3BD15F10 3BD1609000007FFC 00007FFC3BD16090 3BD15F9000007FFC 00007FFC3BD15F90 0001001A0000
      7FFC"
      000000C0 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15E50]: 00007FFC0001001A 0000005400007FFC 00007FFC00000054 01773CB000007FFC 0000000001773CB0 3BD1610000000000 00007FFC3BD16100 3BD160500000
      7FFC"
      000000C1 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15E70]: 00007FFC3BD16050 3BD15FB000007FFC 00007FFC3BD15FB0 016FB06000007FFC 00000000016FB060 3BD15EB000000000 00007FFC3BD15EB0 3BD15FD00000
      7FFC"
      000000C2 2017-04-14 13:01:14.934 12955 12955 "Stack[00007FFC3BD15E90]: 00007FFC3BD15FD0 3BD15FF000007FFC 00007FFC3BD15FF0 BE35CA0100007FFC 00000000BE35CA01 4AE9CFEA00000000 00007FCF4AE9CFEA 000000050000
      7FCF"
      000000C3 2017-04-14 13:01:14.934 12955 12955 "Backtrace:"
      000000C4 2017-04-14 13:01:14.935 12955 12955 " /opt/HPCCSystems/lib/libjlib.so(+0xd0958) [0x7fcf4565e958]"
      000000C5 2017-04-14 13:01:14.935 12955 12955 " /opt/HPCCSystems/lib/libjlib.so(_Z13excsighandleriP7siginfoPv+0x33c) [0x7fcf456605cc]"
      000000C6 2017-04-14 13:01:14.935 12955 12955 " /lib64/libpthread.so.0(+0xf7e0) [0x7fcf443797e0]"
      000000C7 2017-04-14 13:01:14.935 12955 12955 " /opt/HPCCSystems/lib/libactivityslaves_lcr.so(_ZN23CLookupJoinActivityBaseI13CLookupManyHTE5abortEv+0xb5) [0x7fcf4abe0145]"
      000000C8 2017-04-14 13:01:14.935 12955 12955 " /opt/HPCCSystems/lib/libgraph_lcr.so(_ZN10CGraphBase5abortEP10IException+0x1a4) [0x7fcf4a20cd84]"
      000000C9 2017-04-14 13:01:14.935 12955 12955 " /opt/HPCCSystems/lib/libgraphslave_lcr.so(_ZN11CSlaveGraph5abortEP10IException+0x12) [0x7fcf4a894a82]"
      000000CA 2017-04-14 13:01:14.935 12955 12955 " ./thorslave_thor400_64b_linking(_ZN12CJobListener4mainEv+0x685) [0x40c5b5]"
      000000CB 2017-04-14 13:01:14.935 12955 12955 " ./thorslave_thor400_64b_linking(_Z9slaveMainRb+0x4c1) [0x4123f1]"
      000000CC 2017-04-14 13:01:14.935 12955 12955 " ./thorslave_thor400_64b_linking(main+0xe11) [0x40ec81]"
      000000CD 2017-04-14 13:01:14.936 12955 12955 " /lib64/libc.so.6(__libc_start_main+0xfd) [0x7fcf43ff4d1d]"
      000000CE 2017-04-14 13:01:14.936 12955 12955 " ./thorslave_thor400_64b_linking() [0x40f71d]"
      000000CF 2017-04-14 13:01:14.936 12955 12955 "ThreadList:
      7FCF42C30700 140528155035392 12958: CMPNotifyClosedThread
      7FCF4222F700 140528144545536 12959: CSocketBaseThread
      7FCF4182E700 140528134055680 12960: MP Connection Thread
      7FCF40E2D700 140528123565824 12961: CMemoryUsageReporter
      7FCE39FBE700 140523712800512 12962: CBackupHandler
      7FCE395BD700 140523702310656 12963: CGraphProgressHandler
      7FCE38BBC700 140523691820800 13006: BackgroundReleaseBufferThread
      7FCE2B5FE700 140523467695872 13037: ProcessSlaveActivity
      7FCE30CAD700 140523558590208 13038: CGraphExecutor pool

      /var/lib/HPCCSystems/thor400_64b_linking
      rw------ 1 hpcc hpcc 4317585408 Apr 14 13:01 core.12955

      Tony KirkLisa Frates

        Attachments

          Activity

            People

            • Assignee:
              jakesmith Jake Smith
              Reporter:
              kev77log Kevin Logemann
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: