The roxie package watcher mechanism, seems to deadlock on reload sometimes/always.
It appears that a CDaliPackageWatcher::notify can cause other CDaliPackageWatcher's to unsubscribe. It two threads to this at the same time then they can each hold each others mutex's and deadlock.
Here are some stacks:
> Thread 57 (Thread 0x7f2a40dfa700 (LWP 28455)):
> #0 0x0000003c8ee0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x0000003c8ee093a3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2 0x0000003c8ee09287 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00007f2aebad98e2 in CDaliPackageWatcher::unsubscribe() () from /opt/HPCCSystems/lib/libccd.so
> #4 0x00007f2aebbb97ca in CRoxiePackageSetWatcher::~CRoxiePackageSetWatcher() () from /opt/HPCCSystems/lib/libccd.so
> #5 0x00007f2aebac437a in CInterface::Release() const () from /opt/HPCCSystems/lib/libccd.so
> #6 0x00007f2aebbb6450 in CRoxiePackageSetManager::reload() () from /opt/HPCCSystems/lib/libccd.so
> #7 0x00007f2aebbb6729 in CRoxiePackageSetManager::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
> #8 0x00007f2aebad91f7 in CDaliPackageWatcher::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
> #9 0x00007f2ae7bb5e6c in CSDSSubscriberProxy::notify(MemoryBuffer&) () from /opt/HPCCSystems/lib/libdalibase.so
> #10 0x00007f2ae7c3f288 in CDaliPublisherClient::processMessage(CMessageBuffer&) ()
> from /opt/HPCCSystems/lib/libdalibase.so
> #11 0x00007f2ae7c41505 in CMessageHandler<CDaliPublisherClient>::Chandler::main() ()
> from /opt/HPCCSystems/lib/libdalibase.so
> #12 0x00007f2aeb6338aa in CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so
> Thread 58 (Thread 0x7f2a417fb700 (LWP 28454)):
> #0 0x0000003c8ee0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x0000003c8ee093a3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2 0x0000003c8ee09287 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00007f2aebad98e2 in CDaliPackageWatcher::unsubscribe() () from /opt/HPCCSystems/lib/libccd.so
> #4 0x00007f2aebbb97ca in CRoxiePackageSetWatcher::~CRoxiePackageSetWatcher() () from /opt/HPCCSystems/lib/libccd.so
> #5 0x00007f2aebac437a in CInterface::Release() const () from /opt/HPCCSystems/lib/libccd.so
> #6 0x00007f2aebbb6450 in CRoxiePackageSetManager::reload() () from /opt/HPCCSystems/lib/libccd.so
> #7 0x00007f2aebbb6729 in CRoxiePackageSetManager::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
> #8 0x00007f2aebad91f7 in CDaliPackageWatcher::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
> #9 0x00007f2ae7bb5e6c in CSDSSubscriberProxy::notify(MemoryBuffer&) () from /opt/HPCCSystems/lib/libdalibase.so
> #10 0x00007f2ae7c3f288 in CDaliPublisherClient::processMessage(CMessageBuffer&) ()
> from /opt/HPCCSystems/lib/libdalibase.so
> #11 0x00007f2ae7c41505 in CMessageHandler<CDaliPublisherClient>::Chandler::main() ()
> from /opt/HPCCSystems/lib/libdalibase.so
> #12 0x00007f2aeb6338aa in CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so
These two thread were blocking many other 'notify' threads, e.g:
> Thread 52 (Thread 0x7f2a1d7fb700 (LWP 28762)):
> #0 0x0000003c8ee0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x0000003c8ee093a3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2 0x0000003c8ee09287 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00007f2aebad91d7 in CDaliPackageWatcher::notify(long long, char const*, SDSNotifyFlags, unsigned int, void const*) () from /opt/HPCCSystems/lib/libccd.so
> #4 0x00007f2ae7bb5e6c in CSDSSubscriberProxy::notify(MemoryBuffer&) () from /opt/HPCCSystems/lib/libdalibase.so
> #5 0x00007f2ae7c3f288 in CDaliPublisherClient::processMessage(CMessageBuffer&) ()
> from /opt/HPCCSystems/lib/libdalibase.so
> #6 0x00007f2ae7c41505 in CMessageHandler<CDaliPublisherClient>::Chandler::main() ()
> from /opt/HPCCSystems/lib/libdalibase.so
> #7 0x00007f2aeb6338aa in CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so
> #8 0x00007f2aeb6324bf in Thread::begin() () from /opt/HPCCSystems/lib/libjlib.so
> #9 0x00007f2aeb630f9c in Thread::_threadmain(void*) () from /opt/HPCCSystems/lib/libjlib.so
> #10 0x0000003c8ee07851 in start_thread () from /lib64/libpthread.so.0
> #11 0x0000003c8e6e811d in clone () from /lib64/libc.so.6