To better support environments with a mix of cluster sizes that regularly cross read from one another, allow final outputs (not intermediate outputs) to be created with a defined number of parts (possibly using existing WIDTH attribute).
This will allow clusters of different sizes to more efficiently read the source files when the cluster is larger than the cluster that the outputs were originally create on.
Particularly relevant to cluster who use NAS type storage, e.g. in AWS/cloud type setups.
We already some [very] limited support for targeting outputs to be different widths (different # of partitions), by:
1) Targeting a different cluster of a different width via OUTPUT,CLUSTER(<cluster>);
If the target cluster is bigger, it simply pads the output file with blank parts.
Or if target is smaller, the target parts will be spread over the smaller set of slaves.
2) BUILD[index] , WIDTH(<w>), LOCAL);
<w> has to a factor of the cluster the query is running on and the index must be a local index.
There are caveats/problems that will need thinking through:
1) The implementation may reorder the output in order to efficiently break up the output from a single stream into parts (e.g. via round-robin)
There is no assumed order from the code generators point of view of an input file, so as long as users aren't assuming that the order is preserved this should be ok.
Alternatively (discussed before), we could introduce large block boundaries into each physical file, to allow them to be read from in parallel at different starting points.
e.g. every ~ 10MB mark a boundary and store in an index (unless fixed width records).
2) Indexes - an existing b-tree index could be subdivided, with each slave using a lower level as the top of it's index.
The TLK handling would encompass not only the physical TLK part, but also the b-tree levels above the slave node starting point.
i.e. determining which slaves performed lookups would involve a lookup into TLK+a few levels.
Something to discuss further.