From our lead analyst, an experienced ECL coder:
There are a few scenarios that i am trying to optimize. In all cases, I have a very large dataset that is well distributed. We want to maintain the distribution but we don't care at all about the sort order of the results - because we'll either do a local sort immediately after this operation or just return the result of it.
- project(dataset, <expensive-transform>);
In this case, I think I want to use the UNORDERED and PARALLEL options. I think specifying LOCAL is redundant and specifying UNSTABLE is inapplicable because there is no sorting in the operation. Is this correct?
- project(dataset(expensive-filter), <cheap-transform>);
**Do the same optimizations apply here?
Related to questions 1 & 2: would the optimizations be any different if these were normalize operations instead of project operations?
- rollup(sort(dataset, <fields>, local), <expensive-transform>, <same-fields>, local);
I am thinking I should add the UNSTABLE option to the sort and add both the UNORDERED and the PARALLEL option to the rollup. Do I have that right?
Is there expanded documentation/discussion of these anywhere?