Currently there is some code in hqlcse.cpp(291) which prevents cses being created from the conditions in IF() dataset expressions.
Unfortunately this means that many examples generate much more code than they need to. Uncommenting the no_if significantly improves lots of examples. However there is one (very significant) example ncf*.eclxml that gets worse.
The reason is a bit obscure.
In a child query there is some code something along the lines of
x := EXISTS(ds, f(a));
y := IF(x, ....);
z := g(a) + y + h + i(a);
Where (a) is a datarow expression.
When cses are not matched in IF conditions, x is not spotted as a cse. "a" is not spotted as a cse (since one use is within a child query), and its first evaluation is outside of the exists loop. The use within the exist loops uses the previous evaluation.
When cses are evaluated in IF conditions, x is spotted as a cse, and ends up evaluated first. That means a is evaluated within the exists, which then isn't available to the subsequent evaluation outside.
The primary problem is that it is hard to spot cses within child queries (e.g., exists) because the meaning of the expression might differ depending on its context.
If there was a relatively simple fix in this case it would make a lot of code better. It might even be worth adding an option just for the queries that benefit.