WordSmith Tools Manual

Navigation: KeyWords

How Key Words are calculated

The "key words" are calculated by comparing the frequency of each word in the word-list of the text you're interested in (study corpus) with the frequency of the same word in another word-list (comparison corpus). All words which appear in the smaller list are considered, unless they are in a stop list.

If the occurs say, 5% of the time in the study corpus and 6% of the time in the comparison corpus, it will not turn out to be "key", though it may well be the most frequent word. If the text concerns the anatomy of spiders, it may well turn out that the names of the researchers, and the items spider, leg, eight, etc. may be more frequent than they would otherwise be in your comparison corpus (unless your comparisonr corpus only concerns spiders!)

To compute the "keyness" of an item, the program therefore computes

its frequency in the small word-list

the number of running words in the small word-list

its frequency in the other corpus

the number of running words in the comparison corpus

and cross-tabulates these.

A word will get into the listing here if it is unusually frequent (or unusually infrequent) in comparison with what one would expect on the basis of the comparison word-list.

Unusually infrequent key-words are called "negative key-words" and appear at the very end of your listing, in a different colour. Note that negative key-words will be omitted automatically from a keywords database and a plot.

text dispersion keyness

Egbert and Biber (2019) propose that text dispersion key words can be computed by comparing the number of texts each word is found in in both the study corpus and the reference corpus (instead of comparing word frequencies).

Statistical tests