Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-19655

Splitter not populating input meta info (e.g. for known record counts/disk sizes) causing down stream inefficiencies

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0.0
    • Component/s: Thor
    • Labels:
      None

      Description

      The current implementation of DISTRIBUTE(SKEW) seems to always redistribute the data no matter what the skew is of the input dataset. This request is to change that behavior so that the actual skew is checked first, then redistribute only if needed.

      This case may come up most often with just-sprayed data files, but may happen in other cases as well and the check should be fairly inexpensive to perform. The gain, if the distribute is bypassed, is significant if the input dataset is large.

        Attachments

          Activity

            People

            • Assignee:
              jakesmith Jake Smith
              Reporter:
              dcamper Dan S. Camper
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: