Taken from Lilys' proposal. Refer to this document for more details.
DecData() - Declare project datasets. Import and clean the datasets to a unified valid format for Segmentation. Create file naming convention for the whole project.
Segmentation(S, attr) - Partition the input corpus into S chunks based on the attribute applied (attr). It realizes the agnostic data- segmentation of CLDA.
Models(S, Params) - Setup the global and local model parameters, such as the local topic numbers Lj, local betas and alphas for each of the S local models and the global topic number K, etc.
sLDA(L) - Apply LDA on the S sub-chunks based on the local model parameters
getV(Corp) - Generate the global vocabulary V from the global corpus for Merge.
Merge(V) - Merge all the local topic results into global topic set U based on the global vocabulary V
Norm(U) - Normalize the global topic set U and transform the results for clustering.
Cluster(K) - Cluster all the local results based on the global topic set U into global K topics
Topics(K,V) - Show human readable results of the K topics based on the cluster results.
Test and Documentation - Test the results of CLDA and finish project documentation.