There are various problems with the code to generate common sub-expressions
a) Conditional aliases may be executed too many times, or too early
b) unconditional aliases that are first used conditionally may be evaluated too many times.
c) The code to calculate where an alias should be evaluated can be O(N^2) which can cause issues for very large numbers of aliases.
So the proposal is to change the way that aliases are generated.
a) Generate a class for each alias of the form
a) All cses are passed by using references to each other
b) The constructors pass all the parameters that are required to evaluate an expression.
c) ctx may be required
d) There is a strong similarity between the parameters passed to an alias constructor and a colocal parent extract. (A parent extract could be passed instead). There will also be quite a lot in common with HPCC-8333 using C++ classes for rows and the code required to execute out of line functions.
e) An alias might require a pointer to a child query to be passed into it, so it can extract results.
f) Need to think carefully about ctx. Is it ever needed inside the cse? E.., is it valid for an alias to call an out-of-line function. (It must be). It probably needs to be added, but on demand.
Problem: How does this interact with lazy evaluation of graphs? Do the cses actually have to become graph nodes?
- If all helpers were colocal by default (with a separate helper for the remote) then cses could be passed by reference to the child graphs in the parent extract, and executed on demand. If some were non-local it would require early execution (but see below).
Problem: What happens if a cse depends on graph result, but another graph result depends on that cse. You will have a circular reference. Does that mean that all child query execution needs to be by demand???