Popular Tags:

 

Tag Cloud Guide

When to use a tag cloud

A tag cloud is a visualization of word frequencies. Our tag cloud enables you to see how frequently words appear in a given text, or see the relationship between a column of words and a column of numbers.

The size of the word corresponds to the quantity associated with that word. For instance, if your dataset is a plot summary of "The Godfather", you will be likely to see frequently-occurring words like "corleone" and "mafia" drawn in a larger size than words like "open" or "restaurant".

How our tag cloud works

The Many Eyes tag cloud can show one of two kinds of data: free text, or a two-column table of tags and numbers.

If you choose to use free text, the tag cloud will strip out punctuation, calculate the frequency of each word, and draw the word at a size that is based on its frequency. The tag cloud will also ignore common words in some languages, such as the word "the" in English.

Whenever the mouse is over a word, information about the occurrences of that word and the context it was used in will be shown in a tooltip.


Optionally, you can also select to look at the frequency of two consecutive words occuring in the text, by selecting the 2 word radio button at the top.


If you choose to use the two-column format, the tag cloud assumes that the text column contains the tag and the numeric column contains the tag's frequency. When using this format contextual information or two-word tagclouds are unavailable.


Search and selection

To look for specific tags in the tag cloud, click on the search box and start typing. Every time you hit a key, the cloud shows the tags that start with the letters you've typed.


To highlight a tag on the cloud, click on it with your mouse and it will be marked with an orange color. To highlight more than one tag, hold down the control or the shift key as you click. Any modification you make - including searching and highlighting - will be saved when you publish a tag cloud or make a comment on it.


Using the tagcloud to compare two texts

People often use tagclouds to compare two different bodies of text, so we decided to make it easier to compare two texts in a single tag cloud. You can upload a text file containing two separate text fragments (see next section for the specific format details) and the tag cloud will show a view that compares the frequencies of the most common words in both texts. Here is a sample that shows the state of the union address of 2008 compared to the state of the union address in 2007:

Shown in orange are the most frequent terms from the 2008 address, in blue the terms from the 2007 address. Terms that occured in both speeches are paired together. The size of the terms indicates the relative frequency of the term in the text. A mouse over will bring up the specific context for each tag, along with both the absolute and relative frequecies of that tag.

Data requirements

The tag cloud can accept either free text or tabular data. If you choose to use tabular data, your dataset should have a single text column with one or more rows. You may put all of the text into a single row, or split it between multiple rows. In this format, be careful to remove any tab characters from your text. If your tabular data set also has a numeric column, the tag cloud will interpret the numbers in that column as quantities for the corresponding text in that row.

If you wish to compare two fragments of text in a single tagcloud, each fragment needs to be preceded by a single line consisting of a title flanked by a number of dashes:

----------------My fragment's title------------------

Note that the exact number of dashes doesn't matter as long as there are more than 5. The maximum number of fragments in a dataset is two at the moment.

To keep performace at interactive levels the tag cloud will impose an upper limit on the number of tags shown in the cloud, although filters will still act on all tags in the cloud, regardless of their visibility. Commonly occurring words (also known as stopwords) in a number of languages (English, Deutsch, Français, Italiano, Español, Nederlands, Português, Русский and العربية) are automatically filtered out. Other languages and character sets (such as Hindi) should also work, however stopwords in these languages will not be filtered out.

Here is a sample two-column dataset:

Word Occurrences
Tag 45
Cloud 55
Mass 10

Here is a sample free text dataset:

Text
Whose woods these are I think I know.
His house is in the village though;
He will not see me stopping here
To watch his woods fill up with snow.
My little horse must think it queer
To stop without a farmhouse near
Between the woods and frozen lake
The darkest evening of the year.

And here is a sample of a dataset consisting of two separate fragments:

----------------Fragment 1----------------
Some say the world will end in fire,
Some say in ice.
From what I've tasted of desire
I hold with those who favor fire.
----------------Fragment 2----------------
But if it had to perish twice,
I think I know enough of hate
To say that for destruction ice
Is also great
And would suffice.

Expert Notes

Tag clouds have several benefits: they are extremely simple, easy to read, and by their nature don't suffer from the labeling problems of bar charts, treemaps or bubble charts. Yet there is some controversy around tag clouds, partly due to their strong association with trendy web sites (one wag dubbed tag clouds the "mullets of Web 2.0").

More seriously, in tag clouds long words are emphasized over short words, and words whose letters contain many ascenders and descenders may receive undue attention as well. Indeed, recent work from CUE, to appear at CHI 2007, suggests that in some circumstances tag clouds are no more effective than simple lists.

How should you balance the potential pluses and minuses? In choosing a tag cloud, keep in mind the alternatives, especially a plain table, a bar chart, and a bubble chart. Our current view is that the legibility and potential data density of tag clouds make them well-suited to large texts and collections of tags.