Update 1

After the last presentation, we went back to the drawing board to think through the system. We decomposed the system in smaller blocks, to design the complete system.

Advanced Poetry Generation

Essentially, there are a number stages to the system.

  • Pre-processing. Before any selection or generation is made, certain analyses could help improve the final result of the system. This includes identifying the context of each word in the input message, sorting the words according to importance and finding the emotional weight of the input.
  • Shortlisting. The information (words and context) from the previous stage is used, along with some further processing to shortlist an initial set of poem lines for display to the user
  • Selection and generation. The emotional weight is added to the mix to select the lines more closely aligned to the initial message.

The process, we believe, allows for the right balance between preserving the context of the message and presenting an original and meaningful poem.

System Overview

In our model, we integrate a number of techniques from different disciplines such as information retrieval and natural language understanding, and augment the system with emotional intelligence to generate a poem which is both meaningful and capable of entertaining the user.

The advanced poetry generation process consists of several additional stages. The system uses three different criteria to shortlist discrete sets of poem lines. The following diagram shows an overview of the process.

advancedpoetry.jpg

Term Importance

Given an input message, the words in the message are arranged according to their importance. The importance of a particular word is denoted by a numerical weight. This number is the tf-idf weight and is calculated as explained earlier. The modification is that in this case, the basic entity in the database is a poem line, and not a full poem. Hence, the tf and idf values are calculated with respect to a poem line.

Word sense disambiguation

One key success factor of the system is the ability to make meaningful connections between user input and the poem lines in the database, resulting in an original and meaningful poem. For this purpose, word sense disambiguation is necessary and this is the second part of the analysis. The system uses a part of speech (POS) tagger for basic disambiguation. The tagger used in the Blogwall is the English POStagger [1], primarily for the tagging speed and ease of integration.
The input message and each poem line in the database is tagged using a POS tagger. In order to avoid poems that do not make sense, these tags are used to pick only those poem lines which use a particular keyword or its synonym in the same sense as in the input message.

Emotional Weight

Analogous to the tf-idf weight described earlier, which ranks words in the input message according to importance, the third analysis is the calculation of an emotional weight. This attaches a numerical value to the mood or emotional content of the message.
The system maintains a database of words that can influence the emotional state of the sentence, along with the corresponding weight of the word along two axes: degree of arousal, and degree of pleasantness. The weights are modeled after the Russell’s Dimensions for emotions [2]. In addition, a database of qualifiers and their corresponding multipliers is also maintained. For instance, the phrase “not happy” will result in the weights of the word “happy” being multiplied by negative one which will yield in a result closer to the emotional weight of “sad”.

The system thus analyses the input message for such emotional words and qualifiers. Ultimately, the message will be attached a numerical value denoting the emotional weight. In a similar manner, all the poem lines in the database will also be assigned a numerical emotional weight. The system will then shortlist lines with weights that are closest to the weight of the input message.

Final Selection

These three processes are important to the final output. In the first case, the significant words are augmented by fetching synonyms from the internet. A second round of calculation of tf-idf weights results in the most important words from this combined set. These words, together with the contextual tag from the POS tagger are used to shortlist poem lines. Only the poem lines which contain these words used in the same context are shortlisted. The final output to the user will be the lines that are closest in emotional weight to the input message.

Comments

After the last presentation, we met a professor here at the school of computing to discuss about how best to apply POS tagging to this project. In addition, as I described earlier, we did a higher level block level of the system at the same time. The description here is a result of the various discussions and meetings about how to innovatively combine the different tools and methods at our disposal.

Any comments welcome.

Bibliography
1. Yoshimasa Tsuruoka and Jun’ichi Tsujii. Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data. In Proceedings of HLT/EMNLP, 2005, pp. 467-474.
2. Russell, J., A Circumplex Model of Affect, Journal of Personality & Social Psychology, Vol. 39 (1-sup-6), Dec. 1980, 1161-1178.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License