Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-12936

Discuss Thor managing graph resourcing, that is currently handled at codegen time.



    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Timed Out
    • None
    • None
    • Code Generator, Thor
    • Minor


      Currently the code generator handles graph resourcing, meaning that a large job will be broken up into separate subgraph units, with spill writes and reads introduced between them, so that Thor can run execution units that it can handle within specified resource limits.

      Problem with the current approach:
      + eclcc has no information about the clusters it targets and the amount of resources activities in a graph use, depend on the specification of the cluster, e.g. width.
      + Even if it does know, it cannot know how much resources are actually used in most cases, e.g. how many records a SORT is going to use.

      As a consequence, eclcc makes broad assumptions about width, memory available when splitting a job into subgraphs.

      Problem with letting Thor handle the whole job (one big graph)

      + Many activities require or work best, if they have access to as much memory and/or other resources as possible.

      If Thor must manage the efficient resourcing of contending resource requirements dynamically at query runtime, then it must effectively split the graph up itself.
      However this could be done by spotting a low-resource condition and dynamically causing proceeding section of the subgraph to be spillled

      Needs more though and discussion...




            jakesmith Jake Smith
            jakesmith Jake Smith
            0 Vote for this issue
            2 Start watching this issue