Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-29557

DFU copied file sometimes leads to long running reads on thor (that hangs and must be aborted)

    XMLWordPrintable

Details

    Description

      When reading a dfu copied file from another thor on same build level, sometimes the read runs infinitely long.  Example workunit is W20230519-135657.  Here is the running subgraph about 15 minutes into the job prior to aborting:

      Properties continued:

       

      I will point out the high SkeMaxDiskReadIO on node 96, which is the same node as the NodeMinDiskRowsRead.

      Now for the background info.  The file was copied with W20230519-135155, which just makes a STD.File.DfuPlusExec call and kicks off D20230519-135156, which has parameters:

      <Options expireDays="30" maxConnections="200" noCommon="1" nosplit="1" overwrite="1" preserveCompression="1" replicate="1" transferBufferSize="10000000"/>

      The hanging workunit is just a straight read/write:

      b := dataset('~delivery::extractcache::hc::gcid8224024::fsidmq001::hcp::npi::w20230518-202439_b',recordof('~delivery::extractcache::hc::gcid8224024::fsidmq001::hcp::npi::w20230518-202439_b',lookup),thor);
      output(b,,'~prjl::npi::W20230518-202439_b',compressed,expire(3),overwrite,thor);

      Two other copies of the data work for (STD.File.DfuPlusExec workunit/DFU Workunit/Read test workunit): 

      W20230519-135008/D20230519-135010/W20230519-135648

      W20230519-135502/D20230519-135504/W20230519-135701

      These files look to be identical:

       

      Here's the subgraph properties for W20230519-135648:

      A second run of the "b file" (W20230519-141817) shows the reproducibility of the error on the same record read:

      Let us know if you need more info (archives, etc.).  We are at a loss to explain what is wrong with the copy/file/read.

       

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              jakesmith Jake Smith
              joecella Joe Cella
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: