Uploaded image for project: 'HPCC'
  1. HPCC
  2. HPCC-27097

NLP Plugin: Fails to execute sample code

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 8.6.x
    • 8.6.2
    • Plugins
    • None
    • Ubuntu 20.04

    Description

      Sample ECL code within plugins/nlp/README.md:

      IMPORT nlp from lib_nlp; 
      
      text01 := 'The quick brown fox jumped over the lazy boy.';
      parsedtext01 := nlp.AnalyzeText('taiparse',text01);
      output(parsedtext01);
      
      text02 := 'TAI has bought the American Medical Records Processing for more than $130 million dollars.';
      parsedtext02 := nlp.AnalyzeText('corporate',text02);
      output(parsedtext02);
      
      text03 := 'Right middle lobe consolidation compatible with acute pneumonitis.';
      parsedtext03 := nlp.AnalyzeText('taiparse',text03);
      output(parsedtext03);
      
      text04 := 'TAI\'s stock is up 4% from $58.33 a share to $60.66.';
      parsedtext04 := nlp.AnalyzeText('corporate',text04);
      output(parsedtext04);
      

      When submitted against either hthor or thor, the workunit terminates unexpectedly. ZAP from the hthor invocation enclosed.

      I inspected some of the source code. plugins/nlp/lib_nlp.ecllib contains:

      EXPORT nlp := SERVICE : plugin('nlp'), namespace('nlp'), library('nlp'), CPP, PURE
        string AnalyzeText(const string analyzer, const string txt) : cpp,pure,entrypoint='AnalyzeText';
      END;
      

      The C++ function in plugins/nlp/nlp.cpp is:

      ECL_NLP_API void ECL_NLP_CALL AnalyzeText(size32_t & tgtLen, char * & tgt, size32_t anaLen, const char * ana, size32_t txtLen, const char * txt)
      {
          {
              CriticalBlock block(cs);
              if (nlpEng == NULL) {
                  nlpEng = new NLPEng();
              }
          }
          ostringstream sso;
          tgtLen = nlpEng->nlpEngAnalyze(ana,txt,sso);
          tgt = (char *) CTXMALLOC(parentCtx, tgtLen);
          memcpy_iflen(tgt, sso.str().c_str(), tgtLen);
      }
      

      The C++ code ignores the data length of the two string inputs and appears to assume the strings themselves are null-terminated, which is not necessarily the case. I don't know if this is the underlying cause of the workunit terminating unexpectedly or an unrelated issue.

      Attachments

        Activity

          People

            dehilster David de Hilster
            dcamper Dan S. Camper
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: