Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
8.6.x
-
None
-
Ubuntu 20.04
Description
Sample ECL code within plugins/nlp/README.md:
IMPORT nlp from lib_nlp; text01 := 'The quick brown fox jumped over the lazy boy.'; parsedtext01 := nlp.AnalyzeText('taiparse',text01); output(parsedtext01); text02 := 'TAI has bought the American Medical Records Processing for more than $130 million dollars.'; parsedtext02 := nlp.AnalyzeText('corporate',text02); output(parsedtext02); text03 := 'Right middle lobe consolidation compatible with acute pneumonitis.'; parsedtext03 := nlp.AnalyzeText('taiparse',text03); output(parsedtext03); text04 := 'TAI\'s stock is up 4% from $58.33 a share to $60.66.'; parsedtext04 := nlp.AnalyzeText('corporate',text04); output(parsedtext04);
When submitted against either hthor or thor, the workunit terminates unexpectedly. ZAP from the hthor invocation enclosed.
I inspected some of the source code. plugins/nlp/lib_nlp.ecllib contains:
EXPORT nlp := SERVICE : plugin('nlp'), namespace('nlp'), library('nlp'), CPP, PURE string AnalyzeText(const string analyzer, const string txt) : cpp,pure,entrypoint='AnalyzeText'; END;
The C++ function in plugins/nlp/nlp.cpp is:
ECL_NLP_API void ECL_NLP_CALL AnalyzeText(size32_t & tgtLen, char * & tgt, size32_t anaLen, const char * ana, size32_t txtLen, const char * txt) { { CriticalBlock block(cs); if (nlpEng == NULL) { nlpEng = new NLPEng(); } } ostringstream sso; tgtLen = nlpEng->nlpEngAnalyze(ana,txt,sso); tgt = (char *) CTXMALLOC(parentCtx, tgtLen); memcpy_iflen(tgt, sso.str().c_str(), tgtLen); }
The C++ code ignores the data length of the two string inputs and appears to assume the strings themselves are null-terminated, which is not necessarily the case. I don't know if this is the underlying cause of the workunit terminating unexpectedly or an unrelated issue.