Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-9828

Package watcher deadlocking in notification/unsubscribe process

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.0
    • Component/s: Roxie
    • Labels:

      Description

      The roxie package watcher mechanism, seems to deadlock on reload sometimes/always.
      It appears that a CDaliPackageWatcher::notify can cause other CDaliPackageWatcher's to unsubscribe. It two threads to this at the same time then they can each hold each others mutex's and deadlock.

      Here are some stacks:

      > Thread 57 (Thread 0x7f2a40dfa700 (LWP 28455)):
      > #0 0x0000003c8ee0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
      > #1 0x0000003c8ee093a3 in _L_lock_892 () from /lib64/libpthread.so.0
      > #2 0x0000003c8ee09287 in pthread_mutex_lock () from /lib64/libpthread.so.0
      > #3 0x00007f2aebad98e2 in CDaliPackageWatcher::unsubscribe() () from /opt/HPCCSystems/lib/libccd.so
      > #4 0x00007f2aebbb97ca in CRoxiePackageSetWatcher::~CRoxiePackageSetWatcher() () from /opt/HPCCSystems/lib/libccd.so
      > #5 0x00007f2aebac437a in CInterface::Release() const () from /opt/HPCCSystems/lib/libccd.so
      > #6 0x00007f2aebbb6450 in CRoxiePackageSetManager::reload() () from /opt/HPCCSystems/lib/libccd.so
      > #7 0x00007f2aebbb6729 in CRoxiePackageSetManager::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
      > #8 0x00007f2aebad91f7 in CDaliPackageWatcher::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
      > #9 0x00007f2ae7bb5e6c in CSDSSubscriberProxy::notify(MemoryBuffer&) () from /opt/HPCCSystems/lib/libdalibase.so
      > #10 0x00007f2ae7c3f288 in CDaliPublisherClient::processMessage(CMessageBuffer&) ()
      > from /opt/HPCCSystems/lib/libdalibase.so
      > #11 0x00007f2ae7c41505 in CMessageHandler<CDaliPublisherClient>::Chandler::main() ()
      > from /opt/HPCCSystems/lib/libdalibase.so
      > #12 0x00007f2aeb6338aa in CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so

      > Thread 58 (Thread 0x7f2a417fb700 (LWP 28454)):
      > #0 0x0000003c8ee0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
      > #1 0x0000003c8ee093a3 in _L_lock_892 () from /lib64/libpthread.so.0
      > #2 0x0000003c8ee09287 in pthread_mutex_lock () from /lib64/libpthread.so.0
      > #3 0x00007f2aebad98e2 in CDaliPackageWatcher::unsubscribe() () from /opt/HPCCSystems/lib/libccd.so
      > #4 0x00007f2aebbb97ca in CRoxiePackageSetWatcher::~CRoxiePackageSetWatcher() () from /opt/HPCCSystems/lib/libccd.so
      > #5 0x00007f2aebac437a in CInterface::Release() const () from /opt/HPCCSystems/lib/libccd.so
      > #6 0x00007f2aebbb6450 in CRoxiePackageSetManager::reload() () from /opt/HPCCSystems/lib/libccd.so
      > #7 0x00007f2aebbb6729 in CRoxiePackageSetManager::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
      > #8 0x00007f2aebad91f7 in CDaliPackageWatcher::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
      > #9 0x00007f2ae7bb5e6c in CSDSSubscriberProxy::notify(MemoryBuffer&) () from /opt/HPCCSystems/lib/libdalibase.so
      > #10 0x00007f2ae7c3f288 in CDaliPublisherClient::processMessage(CMessageBuffer&) ()
      > from /opt/HPCCSystems/lib/libdalibase.so
      > #11 0x00007f2ae7c41505 in CMessageHandler<CDaliPublisherClient>::Chandler::main() ()
      > from /opt/HPCCSystems/lib/libdalibase.so
      > #12 0x00007f2aeb6338aa in CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so

      These two thread were blocking many other 'notify' threads, e.g:

      > Thread 52 (Thread 0x7f2a1d7fb700 (LWP 28762)):
      > #0 0x0000003c8ee0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
      > #1 0x0000003c8ee093a3 in _L_lock_892 () from /lib64/libpthread.so.0
      > #2 0x0000003c8ee09287 in pthread_mutex_lock () from /lib64/libpthread.so.0
      > #3 0x00007f2aebad91d7 in CDaliPackageWatcher::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
      > #4 0x00007f2ae7bb5e6c in CSDSSubscriberProxy::notify(MemoryBuffer&) () from /opt/HPCCSystems/lib/libdalibase.so
      > #5 0x00007f2ae7c3f288 in CDaliPublisherClient::processMessage(CMessageBuffer&) ()
      > from /opt/HPCCSystems/lib/libdalibase.so
      > #6 0x00007f2ae7c41505 in CMessageHandler<CDaliPublisherClient>::Chandler::main() ()
      > from /opt/HPCCSystems/lib/libdalibase.so
      > #7 0x00007f2aeb6338aa in CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so
      > #8 0x00007f2aeb6324bf in Thread::begin() () from /opt/HPCCSystems/lib/libjlib.so
      > #9 0x00007f2aeb630f9c in Thread::_threadmain(void*) () from /opt/HPCCSystems/lib/libjlib.so
      > #10 0x0000003c8ee07851 in start_thread () from /lib64/libpthread.so.0
      > #11 0x0000003c8e6e811d in clone () from /lib64/libc.so.6

        Attachments

          Activity

            People

            • Assignee:
              jakesmith Jake Smith
              Reporter:
              jakesmith Jake Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: