We received a report from one of our SALT users that his internal linking module “lost” some records (i.e. the output dataset had fewer records than the input dataset). That should never happen, provided the field declared to be a unique record identifier (RIDFIELD) is indeed unique. I’ve confirmed that proviso to be true for his data.
I traced the loss to the following join…
This can be seen to occur in http://dataland_esp:8010/?Wuid=W20170219-133941&GraphName=graph17&SubGraphId=1383&SafeMode=false&Widget=GraphTreeWidget – 7,108,236,038 go in to the join on both the left and right, and 7,108,235,381 come out.
Based on the history of the code I asked the user to change this join from SMART to HASH and rerun. This succeeded; the same number of records came out of the join as went in. This can be seen in http://dataland_esp:8010/?Wuid=W20170223-210259&GraphName=graph17&SubGraphId=1386&SafeMode=false&Widget=GraphTreeWidget
The ECL can be seen in context on line 189 of BIPV2_LGID3_V362.matches
This appears to be similar in some respects to
Within the SALT team we are tracking this as https://github.com/hpcc-systems/SALT/issues/2167