Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-16476

Improve KEYED JOIN performance in Thor



      The performance of KEYED JOIN in Thor is sub-optimal, especially when retrieving large quantities of matches from all over the key. As KEL makes quite heavy use of keyed joins we may find this becomes a significant issue.

      Some ideas:

      1. Where preservation of the order of the LHS dataset is not important, pre-sorting it and/or pre-distributing it may help with cache locality. This could be done by the ECL coder, or via a compile-time transformation, I suppose.
      2. Executing post-filtering and projection on the remote end would reduce the network traffic significantly. It can't always be assumed that the remote end is actually another node of the Thor, but in the cases that it is we could perhaps link some of the Roxie activity code into the Thor slave activities. Eventually we might want to push this to the dafilesrv too, but that's rather harder/longer term, where the thor slave case might be a quicker win.
      3. A special case of the above where the data did not need to be returned to the original slave might also be possible (if the activity was marked as not needing to preserve order/distribution/grouping, there were no LIMIT/ATMOST etc to worry about, etc).

      David Bayliss Anything to add?


          Issue Links



              • Assignee:
                jakesmith Jake Smith
                richardkchapman Richard Chapman
              • Votes:
                0 Vote for this issue
                7 Start watching this issue


                • Created: