Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-13668

Introduce a non-partitioning spray implementation (e.g. round-robin)


    • Type: New Feature
    • Status: Unresourced
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: DFU Server
    • Labels:


      Some formats are very expensive to partition, CSV with possible quoted terminators (the default) is one of them.
      The spray implementation must walk the source file from start to finish, to establish record boundaries and therefore split points.

      This can mean the partitioning time equals or exceeds the actual data transfer time

      We have the option 'quotedTerminator=0' since 5.0 (see HPCC-10961), which allows partitioning points to be discovered quickly, with the caveat that if the CSV file does contain quoted terminators it may well break record boundaries.

      A spray implementation that forgoes partitioning and streams the source file to the target nodes in a round-robin fashion would be sensible.

      This may also be useful when spraying other formats where partitioning is either expensive or impractical. Zip files may be a good candidate.

      The implementation would:
      + read the source file, up to the next record boundary or until it has filled a reasonable size send buffer of complete records.
      + send collated records to a target node in the destination cluster - round robin or similar.
      + repeat

      Gavin HallidayAttila Vamos


          Issue Links



              • Assignee:
                jakesmith Jake Smith
              • Votes:
                0 Vote for this issue
                4 Start watching this issue


                • Created: