Show/Hide Toolbars

WordSmith Tools Manual

Navigation: WordList > relationships between words

Relationships Display

Scroll Prev Top Next More


The Relationships procedure contains a number of columns and uses various formulae:




Word 1: the first word in a pair, followed by Freq. (its frequency in the whole index).

Word 2: the other word in that pair, followed by Freq. (its frequency in the whole index). If you have computed "to right only", then Word 1 precedes Word 2.

Texts: the number of texts this pair was found in (there were 23 in the whole index).

Gap: the most typical distance between Word 1 and Word 2.

Joint: their joint frequency over the entire span (not just the joint frequency at the typical gap distance).


In line 7 of this display, BACKWARDS occurs 83 times in the whole index (based on Dickens novels), and FORWARDS 8 times. They occur together 62 times. The gap is 2 because backwards, in these data, typically comes 2 words away from forwards. The pair backwards * forwards comes in 17 texts.  (This search was computed using the to right only setting mentioned above).


As usual, the data can be sorted by clicking on the headers. Let's now sort by clicking on "Z score" first and "Word 1" second.




You get a double sort, main and secondary, because sometimes you will want to see how MI or Z score or other sorting affects the whole list and sometimes you will want to keep the words sorted alphabetically and only sort by MI or Z score within each word-type. Press Swap to switch the primary & secondary sorts.





The order is not quite the same ... but not very different either. Both Freq. columns have fairly small numbers.


Here is the display sorted by MI3 Score (Oakes p. 172):




Much more frequent items have jumped to the top.


Now, by Log Likelihood (Dunning, 1993):




Here the Word 2 items are again very high frequency ones and we get at colligation (grammatical collocation). A T Score listing is fairly similar:



but a Dice score ordered list brings us back to results akin to the first two shown above:





The Log Ratio ordered list


prioritises cases where nearly all case of one word are found near the other even if the other is a very frequent word. In the above screen shot, all 7 cases of tatters are found with in 1 word away, and in the case of recesses 6 out of 7 times it was found with in up to 3 words away.


See also: Formulae, Mutual Information and other relationships, Computing Relationships, Making an Index List, Viewing Index Lists, WordList Help Contents.


See Oakes for further information about the various statistics offered.