Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-18269

Inefficient CSV processing of long lines.


    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0.0
    • Component/s: EclAgent, Roxie, Thor
    • Labels:


      The read ahead of a CSV line started with a modest size (4k) then doubled the size until it could read a complete line.
      However the code in all the engines that reads from a stream which would usually read more than the minimum size asked for. So the doubling of the minimum size on the next requests only returned what had already been read until eventually the doubling min required figure exceeded the available figure causing it to actually read more.

      The upshot was that the loop and request from the stream to read more would do nothing for a number of iterations and the partial line returned would be re-processed many times until the stream read ahead was actually asked to read more than it had.

      The fix is to ensure that if an incomplete line is read, the next request asks for more than the last stream read call returned.

      This bug was present in all engines in a couple of places because the code involved had been cloned a number of times. This code should be commoned up.




            • Assignee:
              jakesmith Jake Smith
              jakesmith Jake Smith
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: