[CSCI 203] Final Project: TextCloud

A text cloud is a collection of the most-commonly used words within some body of text, usually with attention paid to avoid extremely common words (e.g., the) and to unify different forms of a single word (e.g., aliens and alien are the same). Text clouds allow us to obtain a quick sense of the topic of a website by visualizing the most frequently occurring words on that website where the words are displayed with size proportional to the frequency.

Text clouds, in essence, are a word-by-word summary of the contents of an article, book, or other work. Though certainly the structure of the document is lost, the relative frequency of particular words may be very useful in understanding the topic and genre of the text.

To skip CSCI 203, I have to do its final project, which is generating a text cloud. It is a very interesting project with a lot of functional programming, and I like it a lot. The most challenging parts are the stemming and the Vpython for visual effect of text cloud, since they consumes most of my time working on the project. My program constructs a 3D visual “tornado” text cloud. Because the word with most appearances appears at the bottom of the tornado and then it move. Beside text, we can assign the position to other objects such as images. It creates interesting way of represent information.

Figure 1: TextCloud results

The code for this project can be found at my GitHub: csci203_textCloud