Topicalizer

Topicalizer is a tool that allow you to process a bunch of SOAP API descriptors in order to group the technical information they contain, in semantic related categories, and specifying this categorization as RDF statements stored in a Sesame triple-store. As a first step in the process of categorization, this tool applies text processing procedures over the service descriptors for extracting some relevant technical information (operations, documentation and datatypes). Once such information is available, the tool fits a probabilistic topic model, known as Online LDA, for infering a set of relevant categories (topics–distributions over terms in a fixed vocabulary), and associate a probability distribution over such topics for each one of the service operations processed by the tool.

The Online LDA processing is based on the implementation of ONLINE VARIATIONAL BAYES FOR LATENT DIRICHLET ALLOCATION by Matthew D. Hoffman, which uses the online Variational Bayes (VB) algorithm presented in the paper “Online Learning for Latent Dirichlet Allocation” by Matthew D. Hoffman, David M. Blei, and Francis Bach.

The algorithm uses stochastic optimization to maximize the variational objective function for the Latent Dirichlet Allocation (LDA) topic model. It only looks at a subset of the total corpus of documents (namely, text files whose content has been extracted from Web APIs documentation archives) each iteration, and thereby is able to find a locally optimal setting of the variational posterior over the topics more quickly than a batch VB algorithm could for large corpora.

Files/Directories provided:

You will need to have the numpy and scipy packages installed somewhere that Python can find them to use these scripts.

System Requirements:

Initial Settings

  1. In MySQL create a Database with name service_registry.
  2. Deploy both of the Sesame Framework .war files on your servlet container. After you have deployed the Sesame Server webapp, you should be able to access it, by default, at path /openrdf-sesame (/openrdf-sesame/home/overview.view for Apache Tomcat 7).
  3. Create a new Native Java Store with ID WebAPIModel in the Sesame Server, by accessing http://localhost:8080/openrdf-workbench/ -> New repository.
  4. Give execution permissions on the run.sh script. Open a terminal and type:

    $chmod u+x run.sh
    

Running