Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-21998

Crash in epoll thread during multiConnect if fails to connect

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.2.12
    • Component/s: JLib
    • Labels:
      None

      Description

      gdb stack:

      Program terminated with signal 11, Segmentation fault.
      r#0  0x00007fba567c2e31 in SocketElem::notifySelected (this=0x2105a88, socket=<optimized out>, selected=<optimized out>)
          at /mnt/disk1/jenkins/workspace/LN-with-Plugins-Spark-7.2.x-Nightly-Build/LN/centos-7.0-x86_64/HPCC-Platform/system/jlib/jsocket.cpp:5939
      5939	/mnt/disk1/jenkins/workspace/LN-with-Plugins-Spark-7.2.x-Nightly-Build/LN/centos-7.0-x86_64/HPCC-Platform/system/jlib/jsocket.cpp: No such file or directory.
      Missing separate debuginfos, use: debuginfo-install hpccsystems-platform-with-spark-7.2.5-closedown0.x86_64
      (gdb) where
      #0  0x00007fba567c2e31 in SocketElem::notifySelected (this=0x2105a88, socket=<optimized out>, selected=<optimized out>)
          at /mnt/disk1/jenkins/workspace/LN-with-Plugins-Spark-7.2.x-Nightly-Build/LN/centos-7.0-x86_64/HPCC-Platform/system/jlib/jsocket.cpp:5939
      #1  0x00007fba567c8d4f in CSocketEpollThread::run (this=0x7fba3c000ab0)
          at /mnt/disk1/jenkins/workspace/LN-with-Plugins-Spark-7.2.x-Nightly-Build/LN/centos-7.0-x86_64/HPCC-Platform/system/jlib/jsocket.cpp:5138
      #2  0x00007fba567e53ac in Thread::begin (this=0x7fba3c000ab0)
          at /mnt/disk1/jenkins/workspace/LN-with-Plugins-Spark-7.2.x-Nightly-Build/LN/centos-7.0-x86_64/HPCC-Platform/system/jlib/jthread.cpp:267
      #3  0x00007fba567e691e in Thread::_threadmain (v=0x7fba3c000ab0)
          at /mnt/disk1/jenkins/workspace/LN-with-Plugins-Spark-7.2.x-Nightly-Build/LN/centos-7.0-x86_64/HPCC-Platform/system/jlib/jthread.cpp:113
      #4  0x00007fba54f0bdd5 in start_thread () from /lib64/libpthread.so.0
      #5  0x00007fba54c34ead in clone () from /lib64/libc.so.6
      

      registers:

      (gdb) info registers
      rax            0x16	22
      rbx            0x2105a88	34626184
      rcx            0x2105a88	34626184
      rdx            0x6	6
      rsi            0x7fba480009d0	140438048606672
      rdi            0x0	0
      rbp            0x7fba480009d0	0x7fba480009d0
      rsp            0x7fba5150b340	0x7fba5150b340
      r8             0x7ffc02261490	140720344536208
      r9             0x0	0
      r10            0x7fba5150b2d0	140438204887760
      r11            0x0	0
      r12            0x7ffc02261490	140720344536208
      r13            0x7ffc02261490	140720344536208
      r14            0x7fba5150b460	140438204888160
      r15            0x7fba3c000ab0	140437847280304
      rip            0x7fba567c2e31	0x7fba567c2e31 <SocketElem::notifySelected(ISocket*, unsigned int)+273>
      eflags         0x10246	[ PF ZF IF RF ]
      cs             0x33	51
      ss             0x2b	43
      ds             0x0	0
      es             0x0	0
      fs             0x0	0
      gs             0x0	0
      

      slave log:

      00000002 2019-04-23 16:47:51.064 63133 63133 "Opened log file //10.173.160.103/var/log/HPCCSystems/thor_160/thormaster.2019_04_23.log"
      00000003 2019-04-23 16:47:51.064 63133 63133 "Build internal_7.2.5-closedown0[remotes/origin/candidate-7.2.x-0-g02cd65]"
      00000004 2019-04-23 16:47:51.064 63133 63133 "calling initClientProcess Port 20000"
      00000005 2019-04-23 16:47:51.088 63133 63133 "Checking cluster replicate nodes"
      00000006 2019-04-23 16:51:21.324 63133 63133 "multiConnect failed to 10.173.160.89:7600 with -1"
      00000007 2019-04-23 16:51:21.423 63133 63155 "================================================"
      00000008 2019-04-23 16:51:21.423 63133 63155 "Program:   10.173.160.103:/opt/HPCCSystems/bin/thormaster_lcr"
      00000009 2019-04-23 16:51:21.423 63133 63155 "Signal:    11 Segmentation fault"
      0000000A 2019-04-23 16:51:21.423 63133 63155 "Fault IP:  00007FBA567C2E31"
      0000000B 2019-04-23 16:51:21.423 63133 63155 "Accessing: 0000000000000000"
      0000000C 2019-04-23 16:51:21.423 63133 63155 "Backtrace:"
      0000000D 2019-04-23 16:51:21.426 63133 63155 "  /opt/HPCCSystems/lib/libjlib.so(+0x16ce31) [0x7fba567c2e31]"
      0000000E 2019-04-23 16:51:21.426 63133 63155 "  /opt/HPCCSystems/lib/libjlib.so(_ZN18CSocketEpollThread3runEv+0x30f) [0x7fba567c8d4f]"
      0000000F 2019-04-23 16:51:21.426 63133 63155 "  /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread5beginEv+0x2c) [0x7fba567e53ac]"
      00000010 2019-04-23 16:51:21.426 63133 63155 "  /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread11_threadmainEPv+0x1e) [0x7fba567e691e]"
      00000011 2019-04-23 16:51:21.426 63133 63155 "  /lib64/libpthread.so.0(+0x7dd5) [0x7fba54f0bdd5]"
      00000012 2019-04-23 16:51:21.426 63133 63155 "  /lib64/libc.so.6(clone+0x6d) [0x7fba54c34ead]"
      

        Attachments

          Activity

            People

            • Assignee:
              mckellyln Mark Kelly
              Reporter:
              jakesmith Jake Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: