Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Timed Out
-
6.0.0
-
None
-
None
Description
sasha cored when archiving workunits on the 190 cluster.
the node in question is 10.239.190.101.
Here is the excerpt from the logs:
00001E73 2016-06-14 12:43:51.866 3701 3709 "ARCHIVE: Scanning WorkUnits limit=1000"
00001E74 2016-06-14 12:43:52.385 3701 3709 "ARCHIVE count=1001 ignored=436 later=7 nulltimes=0 protected=0"
00001E75 2016-06-14 12:43:52.385 3701 3709 "ARCHIVE: WorkUnits - 1 to archive, 0 to backup"
00001E76 2016-06-14 12:43:58.006 3701 3709 "================================================"
00001E77 2016-06-14 12:43:58.006 3701 3709 "Signal: 11 Segmentation fault"
00001E78 2016-06-14 12:43:58.006 3701 3709 "Fault IP: 00007F523795C1AE"
00001E79 2016-06-14 12:43:58.006 3701 3709 "Accessing: 0000000000000000"
00001E7A 2016-06-14 12:43:58.006 3701 3709 "Registers:"
00001E7B 2016-06-14 12:43:58.006 3701 3709 "EAX:737953434350482F EBX:00007F521863AAC0 ECX:000000032BCB6980 EDX:0000000072657673 ESI:0000000000000000 EDI:0000000000000001"
00001E7C 2016-06-14 12:43:58.006 3701 3709 "CS:EIP:0033:00007F523795C1AE"
00001E7D 2016-06-14 12:43:58.006 3701 3709 " ESP:00007F5231EA4800 EBP:00007F521863AAD8"
00001E7E 2016-06-14 12:43:58.006 3701 3709 "Stack[00007F5231EA4800]: 00007F521863AAD0 364CA6B100007F52 00007F52364CA6B1 1867099000007F52 00007F5218670990 1867099000007F52 00007F5218670990 3795C45000007F52"
00001E7F 2016-06-14 12:43:58.006 3701 3709 "Stack[00007F5231EA4820]: 00007F523795C450 180B5B2000007F52 00007F52180B5B20 31EA48C000007F52 00007F5231EA48C0 180B5B3800007F52 00007F52180B5B38 1863AAD000007F52"
00001E80 2016-06-14 12:43:58.006 3701 3709 "Stack[00007F5231EA4840]: 00007F521863AAD0 3B80022600007F52 00007F523B800226 0000000000007F52 00007F5200000000 31EA490000007F52 00007F5231EA4900 31EA4AE000007F52"
00001E81 2016-06-14 12:43:58.006 3701 3709 "Stack[00007F5231EA4860]: 00007F5231EA4AE0 1809D50000007F52 00007F521809D500 31EA495800007F52 00007F5231EA4958 31EA499800007F52 00007F5231EA4998 31EA48E000007F52"
00001E82 2016-06-14 12:43:58.007 3701 3709 "Stack[00007F5231EA4880]: 00007F5231EA48E0 1809F4B000007F52 000000011809F4B0 0064244000000001 0000000000642440 1809F30000000000 00007F521809F300 31EA495800007F52"
00001E83 2016-06-14 12:43:58.007 3701 3709 "Stack[00007F5231EA48A0]: 00007F5231EA4958 31EA494000007F52 00007F5231EA4940 31EA498000007F52 00007F5231EA4980 180320E000007F52 00007F52180320E0 0000000000007F52"
00001E84 2016-06-14 12:43:58.007 3701 3709 "Stack[00007F5231EA48C0]: 0000000000000000 FFFF000000000000 16BEEF0AFFFF0000 0000000016BEEF0A 0000000000000000 0041CF0C00000000 000000000041CF0C 3B8354E000000000"
00001E85 2016-06-14 12:43:58.007 3701 3709 "Stack[00007F5231EA48E0]: 00007F523B8354E0 186727E000007F52 00007F52186727E0 94D8231900007F52 00F0A84394D82319 0122126000F0A843 0000000001221260 0064339000000000"
00001E86 2016-06-14 12:43:58.007 3701 3709 "Backtrace:"
00001E87 2016-06-14 12:43:58.007 3701 3709 " /opt/HPCCSystems/lib/libjlib.so(+0xe09e8) [0x7f52378b19e8]"
00001E88 2016-06-14 12:43:58.007 3701 3709 " /opt/HPCCSystems/lib/libjlib.so(_Z13excsighandleriP7siginfoPv+0x21c) [0x7f52378b33fc]"
00001E89 2016-06-14 12:43:58.007 3701 3709 " /lib64/libpthread.so.0(+0xf710) [0x7f52367f3710]"
00001E8A 2016-06-14 12:43:58.007 3701 3709 " /opt/HPCCSystems/lib/libjlib.so(_ZN16CWorkQueueThread4postEP14IWorkQueueItem+0x7e) [0x7f523795c1ae]"
00001E8B 2016-06-14 12:43:58.007 3701 3709 " /opt/HPCCSystems/lib/libworkunit.so(_ZN14CLocalWorkUnit16cleanupAndDeleteEbbPK11StringArray+0x646) [0x7f523b800226]"
00001E8C 2016-06-14 12:43:58.007 3701 3709 " /opt/HPCCSystems/lib/libworkunit.so(_ZN13CDaliWorkUnit16cleanupAndDeleteEbbPK11StringArray+0x10) [0x7f523b819f80]"
00001E8D 2016-06-14 12:43:58.007 3701 3709 " /opt/HPCCSystems/lib/libworkunit.so(_ZN14CLocalWorkUnit15archiveWorkUnitEPKcbbbb+0x9aa) [0x7f523b7fe3da]"
00001E8E 2016-06-14 12:43:58.007 3701 3709 " saserver() [0x4167bb]"
00001E8F 2016-06-14 12:43:58.007 3701 3709 " saserver(_ZN17CWorkUnitArchiver13cWUBranchItem7archiveEv+0x52) [0x41bbd2]"
00001E90 2016-06-14 12:43:58.007 3701 3709 " saserver(_ZN15CBranchArchiver6actionEv+0x563) [0x41b6f3]"
00001E91 2016-06-14 12:43:58.008 3701 3709 " saserver(_ZN20CSashaArchiverServer3runEv+0xab4) [0x41e004]"
00001E92 2016-06-14 12:43:58.008 3701 3709 " /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread5beginEv+0x2c) [0x7f5237957abc]"
00001E93 2016-06-14 12:43:58.008 3701 3709 " /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread11_threadmainEPv+0x1e) [0x7f523795945e]"
00001E94 2016-06-14 12:43:58.008 3701 3709 " /lib64/libpthread.so.0(+0x79d1) [0x7f52367eb9d1]"
00001E95 2016-06-14 12:43:58.008 3701 3709 " /lib64/libc.so.6(clone+0x6d) [0x7f52365388fd]"
00001E96 2016-06-14 12:43:58.008 3701 3709 "ThreadList:
7F52350AB700 139991053940480 3702: CMPNotifyClosedThread
7F52346AA700 139991043450624 3703: CSocketBaseThread
7F5233CA9700 139991032960768 3704: MP Connection Thread
7F52332A8700 139991022470912 3706: LogMsgParentReceiver
7F522BFFF700 139990902241024 3707: LogMsgFilterReceiver
7F52328A7700 139991011981056 3708: CMemoryUsageReporter
7F5231EA6700 139991001491200 3709: CSashaArchiverServer
7F52314A5700 139990991001344 3710: CSashaSDSCoalescingServer
7F5230AA4700 139990980511488 3711: CSashaXRefServer
7F522B5FE700 139990891751168 3712: Stopped CSashaDaFSMonitorServer
7F522ABFD700 139990881261312 3713: Stopped CSashaQMonitorServer
7F522A1FC700 139990870771456 3714: CSashaExpiryServer
7F522B5FE700 139990891751168 3761: CStopThread
7F522ABFD700 139990881261312 13026: Member of thread pool: sachaCmdPool
"
The core file can be found at /var/lib/HPCCSystems/mysasha
here's the info from the core:
Program terminated with signal 11, Segmentation fault.
#0 0x00007f523795c1ae in CWorkQueueThread::post(IWorkQueueItem*) () from /opt/HPCCSystems/lib/libjlib.so
Missing separate debuginfos, use: debuginfo-install hpccsystems-platform-6.0.0-2.x86_64
(gdb) where
#0 0x00007f523795c1ae in CWorkQueueThread::post(IWorkQueueItem*) () from /opt/HPCCSystems/lib/libjlib.so
#1 0x00007f523b800226 in CLocalWorkUnit::cleanupAndDelete(bool, bool, StringArray const*) ()
from /opt/HPCCSystems/lib/libworkunit.so
#2 0x00007f523b819f80 in CDaliWorkUnit::cleanupAndDelete(bool, bool, StringArray const*) ()
from /opt/HPCCSystems/lib/libworkunit.so
#3 0x00007f523b7fe3da in CLocalWorkUnit::archiveWorkUnit(char const*, bool, bool, bool, bool) ()
from /opt/HPCCSystems/lib/libworkunit.so
#4 0x00000000004167bb in _start ()