Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-20100

Remote File Read Performance

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0.0
    • Component/s: dafilesrv
    • Labels:
      None

      Description

      Currently during a remote file read dafilesrv, returns 100 records at a time to the client. This is currently a significant bottleneck for read performance. The Spark-HPCC connector is currently reading somewhere between 20-100x (record size dependent) slower than the Spark-Thor POC connector due to this issue.

      A simple solution would be to allow the client to set the max number of rows it would like to receive at one time. However, I think a better solution would be for the client to request a certain amount of data. IE: "give me as many rows as you can fit into 4mb".

      The reasoning here is the client in most circumstances will not be aware of the exact row size. So, if it were to try and calculate the number of rows it needs to request based on I/O limits it would be off except for fixed record sizes. Dafilesrv however has the information required.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jakesmith Jake Smith
                Reporter:
                mcmuja01 James McMullan
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: