Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-12275

Hash distribute option to gather most frequent keys

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Sometimes a few values (.e.g, NULL or invalid entries) can cause DISTRIBUTE to create highly skewed output. It would be useful if you could optionally request for a HASH DISTRIBUTE to gather the top <N> commonest values and report them at the end.

      To do this efficiently it would probably need to record the top <2N> values, and throw the bottom N values away when it got full. Very common values would be likely to be retained...

        Attachments

          Activity

            People

            • Assignee:
              anybody Available for anyone
              Reporter:
              ghalliday Gavin Halliday
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: