Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-23944

eclcc takes a long time to quit if k8s cluster stopped

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 7.8.8
    • Fix Version/s: 7.10.0
    • Component/s: cloud, eclccserver
    • Labels:
      None

      Description

      Running stopall.sh while there was a compile active led to a stall on eclccserver, waiting for a job to terminate that had already been killed by the stopall script

      Thread 13 (Thread 0x7f6a767fc700 (LWP 30867)):
      #0 0x00007f6ac64f6618 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f6a767fb100, clockid=0, expected=0, futex_word=0x7f6abc06b6a8) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x7f6abc06b6a8, abstime=abstime@entry=0x7f6a767fb100, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f6743 in __new_sem_wait_slow (sem=0x7f6abc06b6a8, abstime=0x7f6a767fb100, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f299 in Semaphore::wait (this=0x7f6abc06b6a8, timeout=1000) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:88
      #4 0x00007f6ac73a37a0 in CLinuxPipeProcess::cStdErrorBufferThread::run (this=0x7f6abc06b5e0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1863
      #5 0x00007f6ac7399d79 in Thread::begin (this=0x7f6abc06b5e0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #6 0x00007f6ac73994bc in Thread::_threadmain (v=0x7f6abc06b5e0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #7 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #8 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 12 (Thread 0x7f6a76ffd700 (LWP 30855)):
      #0 0x00007f6ac63d6c6f in _GI__wait4 (pid=30856, stat_loc=0x7f6a76ffc4ac, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
      #1 0x00007f6ac739db35 in dowaitpid (pid=30856, mode=0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1795
      #2 0x00007f6ac73a5a50 in CLinuxPipeProcess::run (this=0x7f6abc030a50) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:2113
      #3 0x00007f6ac73a34d9 in CLinuxPipeProcess::cForkThread::run (this=0x7f6a7800c020) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1839
      #4 0x00007f6ac7399d79 in Thread::begin (this=0x7f6a7800c020) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #5 0x00007f6ac73994bc in Thread::_threadmain (v=0x7f6a7800c020) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #6 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #7 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 11 (Thread 0x7f6a777fe700 (LWP 30837)):
      #0 0x00007f6ac64f6618 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f6a777fd100, clockid=0, expected=0, futex_word=0x7f6a98006af8) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x7f6a98006af8, abstime=abstime@entry=0x7f6a777fd100, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f6743 in __new_sem_wait_slow (sem=0x7f6a98006af8, abstime=0x7f6a777fd100, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f299 in Semaphore::wait (this=0x7f6a98006af8, timeout=1000) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:88
      #4 0x00007f6ac73a37a0 in CLinuxPipeProcess::cStdErrorBufferThread::run (this=0x7f6a98006a30) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1863
      #5 0x00007f6ac7399d79 in Thread::begin (this=0x7f6a98006a30) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #6 0x00007f6ac73994bc in Thread::_threadmain (v=0x7f6a98006a30) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #7 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #8 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 10 (Thread 0x7f6a77fff700 (LWP 30826)):
      #0 0x00007f6ac63d6c6f in _GI__wait4 (pid=30827, stat_loc=0x7f6a77ffe4ac, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
      #1 0x00007f6ac739db35 in dowaitpid (pid=30827, mode=0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1795
      #2 0x00007f6ac73a5a50 in CLinuxPipeProcess::run (this=0x7f6abc0301d0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:2113
      #3 0x00007f6ac73a34d9 in CLinuxPipeProcess::cForkThread::run (this=0x7f6a98002730) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1839
      #4 0x00007f6ac7399d79 in Thread::begin (this=0x7f6a98002730) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #5 0x00007f6ac73994bc in Thread::_threadmain (v=0x7f6a98002730) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #6 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #7 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 9 (Thread 0x7f6a927fc700 (LWP 1058)):
      #0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55fd9b6af390) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x55fd9b6af390, abstime=0x0, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f64e8 in __new_sem_wait_slow (sem=0x55fd9b6af390, abstime=0x0, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f1a6 in Semaphore::wait (this=0x55fd9b6af390) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:64
      #4 0x00007f6ac739f2bc in CPooledThreadWrapper::run (this=0x55fd9b6af2d0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:919
      #5 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6af2d0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #6 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6af2d0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #7 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #8 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 8 (Thread 0x7f6a92ffd700 (LWP 1051)):
      #0 __libc_read (nbytes=1024, buf=0x7f6a92ffbe60, fd=12) at ../sysdeps/unix/sysv/linux/read.c:26
      #1 __libc_read (fd=12, buf=0x7f6a92ffbe60, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:24
      -Type <RET> for more, q to quit, c to continue without paging-
      #2 0x00007f6ac73a66a0 in CLinuxPipeProcess::read (this=0x7f6abc030a50, sz=1024, buf=0x7f6a92ffbe60) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:2193
      #3 0x00007f6ac73bea33 in runExternalCommand (output=..., error=..., cmd=0x7f6a78008ba0 "kubectl wait --for=condition=complete --timeout=10h job/eclccserver-w20200428-114740", input=0x0) at /hpcc-dev/HPCC-Platform/system/jlib/jutil.cpp:1855
      #4 0x00007f6ac79bf92a in waitK8sJob (componentName=0x55fd9a6936dc "eclccserver", job=0x7f6abc013c60 "W20200428-114740", condition=0x7f6ac7a48333 "condition=complete") at /hpcc-dev/HPCC-Platform/common/workunit/workunit.cpp:13381
      #5 0x00007f6ac79c0148 in runK8sJob (componentName=0x55fd9a6936dc "eclccserver", wuid=0x7f6abc013c60 "W20200428-114740", job=0x7f6abc013c60 "W20200428-114740", del=true, extraParams=Python Exception <class 'AttributeError'> 'NoneType' object has no attribute 'pointer':
      empty std::__cxx11::list) at /hpcc-dev/HPCC-Platform/common/workunit/workunit.cpp:13425
      #6 0x000055fd9a68a579 in EclccCompileThread::threadmain (this=0x55fd9b6adb40) at /hpcc-dev/HPCC-Platform/ecl/eclccserver/eclccserver.cpp:467
      #7 0x00007f6ac739f51d in CPooledThreadWrapper::run (this=0x55fd9b6aea50) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:932
      #8 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6aea50) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #9 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6aea50) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #10 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #11 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 7 (Thread 0x7f6aa96f9700 (LWP 151)):
      #0 __libc_read (nbytes=1024, buf=0x7f6aa96f7e60, fd=11) at ../sysdeps/unix/sysv/linux/read.c:26
      #1 __libc_read (fd=11, buf=0x7f6aa96f7e60, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:24
      #2 0x00007f6ac73a66a0 in CLinuxPipeProcess::read (this=0x7f6abc0301d0, sz=1024, buf=0x7f6aa96f7e60) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:2193
      #3 0x00007f6ac73bea33 in runExternalCommand (output=..., error=..., cmd=0x7f6a98000c10 "kubectl wait --for=condition=complete --timeout=10h job/eclccserver-w20200428-114738", input=0x0) at /hpcc-dev/HPCC-Platform/system/jlib/jutil.cpp:1855
      #4 0x00007f6ac79bf92a in waitK8sJob (componentName=0x55fd9a6936dc "eclccserver", job=0x7f6abc06b6e0 "W20200428-114738", condition=0x7f6ac7a48333 "condition=complete") at /hpcc-dev/HPCC-Platform/common/workunit/workunit.cpp:13381
      #5 0x00007f6ac79c0148 in runK8sJob (componentName=0x55fd9a6936dc "eclccserver", wuid=0x7f6abc06b6e0 "W20200428-114738", job=0x7f6abc06b6e0 "W20200428-114738", del=true, extraParams=Python Exception <class 'AttributeError'> 'NoneType' object has no attribute 'pointer':
      empty std::__cxx11::list) at /hpcc-dev/HPCC-Platform/common/workunit/workunit.cpp:13425
      #6 0x000055fd9a68a579 in EclccCompileThread::threadmain (this=0x55fd9b6b5470) at /hpcc-dev/HPCC-Platform/ecl/eclccserver/eclccserver.cpp:467
      #7 0x00007f6ac739f51d in CPooledThreadWrapper::run (this=0x55fd9b6acbd0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:932
      #8 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6acbd0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #9 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6acbd0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #10 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #11 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 6 (Thread 0x7f6aaaffd700 (LWP 31)):
      #0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55fd9b6a6560) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x55fd9b6a6560, abstime=0x0, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f64e8 in __new_sem_wait_slow (sem=0x55fd9b6a6560, abstime=0x0, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f1a6 in Semaphore::wait (this=0x55fd9b6a6560) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:64
      #4 0x00007f6ac739f2bc in CPooledThreadWrapper::run (this=0x55fd9b6a64a0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:919
      #5 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6a64a0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #6 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6a64a0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #7 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #8 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 5 (Thread 0x7f6ac2093700 (LWP 25)):
      #0 0x00007f6ac64f6618 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f6ac20924f0, clockid=0, expected=0, futex_word=0x55fd9b6a6878) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x55fd9b6a6878, abstime=abstime@entry=0x7f6ac20924f0, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f6743 in __new_sem_wait_slow (sem=0x55fd9b6a6878, abstime=0x7f6ac20924f0, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f299 in Semaphore::wait (this=0x55fd9b6a6878, timeout=60000) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:88
      #4 0x00007f6ac72037d5 in CMemoryUsageReporter::run (this=0x55fd9b6a67c0) at /hpcc-dev/HPCC-Platform/system/jlib/jdebug.cpp:2865
      #5 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6a67c0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #6 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6a67c0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #7 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #8 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 4 (Thread 0x7f6ac2894700 (LWP 21)):
      #0 0x00007f6ac64f6618 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f6ac2893490, clockid=0, expected=0, futex_word=0x7f6abc000fe0) at ../sysdeps/nptl/futex-internal.h:320
      -Type <RET> for more, q to quit, c to continue without paging-
      #1 do_futex_wait (sem=sem@entry=0x7f6abc000fe0, abstime=abstime@entry=0x7f6ac2893490, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f6743 in __new_sem_wait_slow (sem=0x7f6abc000fe0, abstime=0x7f6ac2893490, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f299 in Semaphore::wait (this=0x7f6abc000fe0, timeout=1000) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:88
      #4 0x00007f6ac73557b0 in CSocketEpollThread::run (this=0x7f6abc000ef0) at /hpcc-dev/HPCC-Platform/system/jlib/jsocket.cpp:5108
      #5 0x00007f6ac7399d79 in Thread::begin (this=0x7f6abc000ef0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #6 0x00007f6ac73994bc in Thread::_threadmain (v=0x7f6abc000ef0) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #7 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #8 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 3 (Thread 0x7f6ac30ce700 (LWP 18)):
      #0 0x00007f6ac64f749f in __libc_accept (fd=3, addr=..., len=0x7f6ac30cd334) at ../sysdeps/unix/sysv/linux/accept.c:26
      #1 0x00007f6ac7339550 in CSocket::accept (this=0x55fd9b637fb0, allowcancel=true, peerEp=0x7f6ac30cd460) at /hpcc-dev/HPCC-Platform/system/jlib/jsocket.cpp:1045
      #2 0x00007f6ac7522814 in CMPConnectThread::run (this=0x55fd9b6a5d90) at /hpcc-dev/HPCC-Platform/system/mp/mpcomm.cpp:2063
      #3 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6a5d90) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #4 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6a5d90) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #5 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #6 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 2 (Thread 0x7f6ac38cf700 (LWP 17)):
      #0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55fd9b6a60b0) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x55fd9b6a60b0, abstime=0x0, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f64e8 in __new_sem_wait_slow (sem=0x55fd9b6a60b0, abstime=0x0, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f222 in Semaphore::wait (this=0x55fd9b6a60b0, timeout=4294967295) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:75
      #4 0x00007f6ac753cfb4 in SimpleInterThreadQueueOf<INode, false>::qwait (this=0x55fd9b6a6038, sem=..., waiting=@0x55fd9b6a60d0: 1, timeout=4294967295, start=@0x7f6ac38ce4b4: 0) at /hpcc-dev/HPCC-Platform/system/mp/./../jlib/jqueue.tpp:318
      #5 0x00007f6ac75391f7 in SimpleInterThreadQueueOf<INode, false>::dequeue (this=0x55fd9b6a6038, timeout=4294967295) at /hpcc-dev/HPCC-Platform/system/mp/./../jlib/jqueue.tpp:426
      #6 0x00007f6ac752c295 in CMPNotifyClosedThread::run (this=0x55fd9b6a5f40) at /hpcc-dev/HPCC-Platform/system/mp/mpcomm.cpp:639
      #7 0x00007f6ac7399d79 in Thread::begin (this=0x55fd9b6a5f40) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:291
      #8 0x00007f6ac73994bc in Thread::_threadmain (v=0x55fd9b6a5f40) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:137
      #9 0x00007f6ac64ec609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #10 0x00007f6ac6413103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

      Thread 1 (Thread 0x7f6ac38d6000 (LWP 1)):
      #0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7fff6b2f1ff0) at ../sysdeps/nptl/futex-internal.h:320
      #1 do_futex_wait (sem=sem@entry=0x7fff6b2f1ff0, abstime=0x0, clockid=0) at sem_waitcommon.c:112
      #2 0x00007f6ac64f64e8 in __new_sem_wait_slow (sem=0x7fff6b2f1ff0, abstime=0x0, clockid=0) at sem_waitcommon.c:184
      #3 0x00007f6ac732f222 in Semaphore::wait (this=0x7fff6b2f1ff0, timeout=4294967295) at /hpcc-dev/HPCC-Platform/system/jlib/jsem.cpp:75
      #4 0x00007f6ac73a203d in CThreadPool::joinWait (this=0x55fd9b6a9cb0, t=..., timeout=4294967295) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1224
      #5 0x00007f6ac73a2420 in CThreadPool::joinAll (this=0x55fd9b6a9cb0, del=false, timeout=4294967295) at /hpcc-dev/HPCC-Platform/system/jlib/jthread.cpp:1254
      #6 0x000055fd9a68c3fe in EclccServer::~EclccServer (this=0x7fff6b2f21e0, __in_chrg=<optimized out>) at /hpcc-dev/HPCC-Platform/ecl/eclccserver/eclccserver.cpp:663
      #7 0x000055fd9a683140 in main (argc=3, argv=0x7fff6b2f2398) at /hpcc-dev/HPCC-Platform/ecl/eclccserver/eclccserver.cpp:899
      (gdb)

        Attachments

          Activity

            People

            • Assignee:
              richardkchapman Richard Chapman
              Reporter:
              richardkchapman Richard Chapman
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved: