Update 2

The first update for this week (3-9 March, 2008). I spent the most of my time on the implementation of the system outlined in the last update.

Completed Modules

Most of the different blocks were roughly complete. The first few days were spent on testing, debugging and integration. So as of today, the following modules are complete:

  • Term weightage. This works for both the input message and synonyms.
  • Synonyms. Fetching synonyms works without a hitch.
  • POS Tagger. Input messages are tagged and the contextual form is stored appropriately.

Progress on integration

I have defined some global variables to manage the settings easier. They are:

  • NUMBEROFKEYWORDS. Currently set to 3; the number of keywords selected from the input message.
  • NUMBEROFSYNONYMS. Currently set to 1; the number of synonyms selected for each word.
  • NUMBEROFLINES. Currently set to 3; the maximum number of poetry lines selected for each word.

Except for the POS tagger, the modules above have been integrated and tested. For instance, given a phrase or message, the QueryHandler class will perform the following by calling the appropriate classes and functions:

  • Tag the message. The POS tagger is called, and it appends the appropriate tag to each word in the message. These tags are then read and extracted by the Tagger class.
  • Find and select significant words. Assign a tf-idf weight to each word in the message, and pick the most relevant ones.
  • Fetch synonyms for these words. Fetch synonyms from thefreedictionary.com for these words.
  • Find and select significant words from this augmented set. Once again, assign a tf-idf weight to all the synonyms and pick the most significant one.
  • Select poetry lines in the database where the weight of the word is maximum. A quick note here: I am working on integrating the POS tags in the query at this stage.

Work in progress

The final step is to take into account the emotional weight of the message. This is a separate module, and is currently the focus of my work for the rest of this week.

