Show/Hide Toolbars

WordSmith Tools Manual

Navigation: WordList > relationships between words

Relationships Display

Scroll Prev Top Next More

 

The Relationships procedure contains a number of columns and uses various formulae:

 

MI_sortMI

 

Word 1: the first word in a pair, followed by Freq. (its frequency in the whole index).

Word 2: the other word in that pair, followed by Freq. (its frequency in the whole index). If you have computed "to right only", then Word 1 precedes Word 2.

Texts: the number of texts this pair was found in (there were 23 in the whole index).

Gap: the most typical distance between Word 1 and Word 2.

Joint: their joint frequency over the entire span (not just the joint frequency at the typical gap distance).

 

In line 7 of this display, BACKWARDS occurs 83 times in the whole index (based on Dickens novels), and FORWARDS 8 times. They occur together 62 times. The gap is 2 because backwards, in these data, typically comes 2 words away from forwards. The pair backwards * forwards comes in 17 texts.  (This search was computed using the to right only setting mentioned above).

 

As usual, the data can be sorted by clicking on the headers. Let's now sort by clicking on "Z score" first and "Word 1" second.

 

mutual_information_sort

 

You get a double sort, main and secondary, because sometimes you will want to see how MI or Z score or other sorting affects the whole list and sometimes you will want to keep the words sorted alphabetically and only sort by MI or Z score within each word-type. Press Swap to switch the primary & secondary sorts.

 

 

MI_sortZ

 

The order is not quite the same ... but not very different either. Both Freq. columns have fairly small numbers.

 

Here is the display sorted by MI3 Score (Oakes p. 172):

 

MI_sortMI3

 

Much more frequent items have jumped to the top.

 

Now, by Log Likelihood (Dunning, 1993):

 

MI_sortLogL

 

Here the Word 2 items are again very high frequency ones and we get at colligation (grammatical collocation). A T Score listing is fairly similar:

 

MI_sortTScore

but a Dice score ordered list brings us back to results akin to the first two shown above:

 

 

MI_sort_Dice

 

The Log Ratio ordered list

MI_sort_LogRatio

prioritises cases where nearly all case of one word are found near the other even if the other is a very frequent word. In the above screen shot, all 7 cases of tatters are found with in 1 word away, and in the case of recesses 6 out of 7 times it was found with in up to 3 words away.

 

See also: Formulae, Mutual Information and other relationships, Computing Relationships, Making an Index List, Viewing Index Lists, WordList Help Contents.

 

See Oakes for further information about the various statistics offered.