Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Timed Out
-
None
-
None
-
Minor
Description
Currently the code generator handles graph resourcing, meaning that a large job will be broken up into separate subgraph units, with spill writes and reads introduced between them, so that Thor can run execution units that it can handle within specified resource limits.
Problem with the current approach:
+ eclcc has no information about the clusters it targets and the amount of resources activities in a graph use, depend on the specification of the cluster, e.g. width.
+ Even if it does know, it cannot know how much resources are actually used in most cases, e.g. how many records a SORT is going to use.
As a consequence, eclcc makes broad assumptions about width, memory available when splitting a job into subgraphs.
Problem with letting Thor handle the whole job (one big graph)
+ Many activities require or work best, if they have access to as much memory and/or other resources as possible.
If Thor must manage the efficient resourcing of contending resource requirements dynamically at query runtime, then it must effectively split the graph up itself.
However this could be done by spotting a low-resource condition and dynamically causing proceeding section of the subgraph to be spillled
Needs more though and discussion...