Uploaded image for project: 'Machine Learning Library'
  1. Machine Learning Library
  2. ML-416

Implementation of CLDA Topic Modeling Algorithm in ECL-ML


    • Type: New Feature
    • Status: Active
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 7.0.0
    • Fix Version/s: None
    • Component/s: ecl-ml


      Taken from Lilys' proposal. Refer to this document for more details. 

      DecData()  - Declare project datasets. Import and clean the datasets to a unified valid format for Segmentation. Create file naming convention for the whole project.

      Segmentation(S, attr)  - Partition the input corpus into S chunks based on the attribute applied (attr). It realizes the agnostic data- segmentation of CLDA.

      Models(S, Params) - Setup the global and local model parameters, such as the local topic numbers Lj, local betas and alphas for each of the S local models and the global topic number K, etc.

      sLDA(L) - Apply LDA on the S sub-chunks based on the local model parameters

      getV(Corp) - Generate the global vocabulary V from the global corpus for Merge.

      Merge(V) - Merge all the local topic results into global topic set U based on the global vocabulary V

      Norm(U) - Normalize the global topic set U and transform the results for clustering.

      Cluster(K) - Cluster all the local results based on the global topic set U into global K topics

      Topics(K,V) - Show human readable results of the K topics based on the cluster results.

      Test and Documentation - Test the results of CLDA and finish project documentation.




            • Assignee:
              xulili01 Lili Xu
              lorraineachapman Lorraine Chapman
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: