The idea is to allow a Thor job to execute sub graphs on different numbers of machines depending on the resourcing requirements.
Some notes from the initial design discussion:
- Always run jobs with the same number of streams, but change the number of streams running on each node. E.g. 1 stream on 400 nodes or 40 streams on 10 nodes. This simplifies the potential issues with switching between different number of nodes. The number of logical streams used for a job can be configured for a system.
The streams might be executed in parallel or sequentially spending on what the requirements of the subgraph are (e.g., any global communication), and the number of cores allocated to a particular process.
- Each Thor master only executes a single job at a time.
- The ways that the system can be split are are dynamically configured. E.g.,
All of the following run 400 streams.
The machines can be allocated in the following slices.
3 400 nodes. (1 stream on each)
2 4 x 100nodes (4 streams on each)
2 8 x 50nodes (8 streams on each)
1 20 x 20nodes (20 streams on each)
The may be a limit on the maximum number of slices that can be active at any one time (e.g., any 6 of the above - e.g., 3x400,2x100,1x20 or 1x400,2x100,2x50,1x20).
A Thor running a job sends a request which contains a list of widths that it would like to use. This might be a combination of different sizes - e.g., 2 400 way + 3 50way, but at most 4 of the above.
It is returned elements from whichever slices are available. (e.g., 9x50node from 2 of the 50 node slices.)
Those subgraphs that have resources available are executed, and when they completed, the resources are released back. (Possibly keep onto them if more work can be done to avoid thrashing between jobs.)
- Think carefully about how nodes are matched to physical machines. Are they the same? Is there a M:1 mapping. If so, can we arrange the divisions so that files are normally available locally.