Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-21935

sporadic keyed join crash, due to race condition in lazy io caching code

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.2.6
    • Component/s: Thor
    • Labels:
      None

      Description

      We've seen this a few times in OBT and smoketest.
      keyed_join5 crash, here is an example stack

      #0  0x0000003d0aa0f867 in ?? () from /lib64/libgcc_s.so.1
      #1  0x0000003d0aa10119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
      #2  0x0000003d092febc6 in backtrace () from /lib64/libc.so.6
      #3  0x00007fd3ecc347bc in printStackReport (startIP=49) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/system/jlib/jexcept.cpp:1524
      #4  0x00007fd3ecc34e3c in excsighandler (signum=11, info=0x7fce227a6e70, extra=0x7fce227a6d40) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/system/jlib/jexcept.cpp:1140
      #5  <signal handler called>
      #6  0x0000000000000031 in ?? ()
      #7  0x00007fd3f08a2df2 in Release<IFileIO> (this=0x7fcf2c0dfbb0, processing=..., selected=0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/./../../system/jlib/jscm.hpp:46
      #8  ~Shared (this=0x7fcf2c0dfbb0, processing=..., selected=0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/./../../system/jlib/jscm.hpp:62
      #9  ~Owned (this=0x7fcf2c0dfbb0, processing=..., selected=0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/./../../system/jlib/jscm.hpp:99
      #10 createPartKeyIndex (this=0x7fcf2c0dfbb0, processing=..., selected=0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/keyedjoin/thkeyedjoinslave.cpp:1825
      #11 createPartKeyManager (this=0x7fcf2c0dfbb0, processing=..., selected=0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/keyedjoin/thkeyedjoinslave.cpp:1830
      #12 CKeyedJoinSlave::CKeyLookupLocalHandler::process (this=0x7fcf2c0dfbb0, processing=..., selected=0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/keyedjoin/thkeyedjoinslave.cpp:886
      #13 0x00007fd3f08ac87c in CKeyedJoinSlave::CLookupHandler::threadmain (this=0x7fcf2c0dfbb0) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/thorlcr/activities/keyedjoin/thkeyedjoinslave.cpp:747
      #14 0x000000000040daa0 in CThreaded::run() ()
      #15 0x00007fd3eccdadcc in Thread::begin (this=0x7fcf2c0dfc50) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/system/jlib/jthread.cpp:267
      #16 0x00007fd3eccdc32e in Thread::_threadmain (v=0x7fcf2c0dfc50) at /mnt/disk1/home/vamosax/build/CE/platform/HPCC-Platform/system/jlib/jthread.cpp:113
      

      It looks like it's in this piece of code:

              else
              {
                  /* NB: createKeyIndex here, will load the key immediately
                   * But that's okay, because we are only here on demand.
                   * The underlying IFileIO can later be closed by the file caching mechanism.
                   */
                  Owned<IFileIO> lazyIFileIO = queryThor().queryFileCache().lookupIFileIO(*this, indexName, filePart);
                  return createKeyIndex(filename, crc, *lazyIFileIO, false, false);
              }
      

      .. Releasing the IFileIO - possibly it's been destroyed already somehow.

        Attachments

          Activity

            People

            • Assignee:
              jakesmith Jake Smith
              Reporter:
              jakesmith Jake Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved: