Show/Hide Toolbars

WordSmith Tools Manual

Navigation: KeyWords

Key Words display

Scroll Prev Top Next More


The display shows

each key word

its frequency in the source text(s) which these key words are key in. (Freq. column below)

the % that frequency represents

the number of texts it was present in

its frequency in the reference corpus (RC. Freq. column)

the reference corpus frequency as a %

BIC score indicating of keyness

Log likelihood statistic of keyness

Log ratio statistic showing strength of keyness

p value  

lemmas (any which have been joined to each other)

the user-defined set




The criterion for what counts as "outstanding" is based on the minimum probability value selected before the key words were calculated. The smaller the number, the fewer key words in the display. Usually you'll not want more than about 40 key words to handle.

The words appear sorted according to how outstanding their frequencies of occurrence are. Those near the top are outstandingly frequent.


Negative KWs

At the end of the listing you'll find any which are outstandingly infrequent (negative keywords), in a different colour.


ALAS is the last ordinary KW: its frequency % of 0.09 is about three times bigger than in the reference corpus (0.03%). OUR and WE occur at 0.20% in the play Othello but much more (0.37% or 0.40%) in the reference corpus so are considered negatively key: outstandingly infrequent. (The Log Ratio threshold does not apply in the case of negative KWs.)


Limits, Strength, Interpretation

There is no upper limit to the Log_L column of a set of key words. It is not necessarily sensible to assume that the word with the highest log likelihood keyness value must be the most outstanding, since keyness is computed merely statistically; there will be cases where several items are obviously equally key (to the human reader) but the one which is found least often in the reference corpus and most often in the text itself will be at the top of the list.

The log ratio statistic may help you understand the strength of keyness. In this case the word handkerchief scores between 4 and 5 which suggests it occurs between 16 and 32 times more often in this play than in the whole set of Shakespeare plays (used as a reference corpus). BIC scores relate Log L to size of the two corpora being compared.    


Source text

As its name suggests, choosing the source text tab gets you to a view of the source text(s).