Definition of Keyness

The term "key word", though it is in common use, is not defined in Linguistics. This program identifies key words on a mechanical basis by comparing patterns of frequency. (A human being, on the other hand, may choose a phrase or a superordinate as a key word.)

A word is said to be "key" if

a)        it occurs in the text at least as many times as the user has specified as a minimum frequency

b)        its frequency in the text when compared with its frequency in a reference corpus is such that the statistical probability as computed by an appropriate procedure is smaller than or equal to a p value specified by the user

c)        in addition, the strength of keyness must be at least as great as the minimum log ratio set by the user.


positive and negative keyness

A word which is positively key occurs more often than would be expected by chance in comparison with the reference corpus.

A word which is negatively key occurs less often than would be expected by chance in comparison with the reference corpus.


typical key words

KeyWords will usually throw up 3 kinds of words as "key".

First, there will be proper nouns. Proper nouns are often key in texts, though a text about racing could wrongly identify as key, names of horses which are quite incidental to the story. This can be avoided by specifying a higher Minimum Frequency.


Second, there are key words that human beings would recognise. The program is quite good at finding these, and they give a good indication of the text's "aboutness". (All the same, the program does not group synonyms, and a word which only occurs once in a text may sometimes be "key" for a human being. And KeyWords will not identify key phrases unless you are comparing word-lists based on word clusters.)


Third, there are high-frequency words like because or shall or already. These would not usually be identified by the reader as key. They may be key indicators more of style than of "aboutness". But the fact that KeyWords identifies such words should prompt you to go back to the text, perhaps with Concord (just choose Compute | Concordance WSImage_102_concordance_24), to investigate why such words have cropped up with unusual frequencies.


