Mutual Information and other similar statistics (mutuali)


the point of it

A Mutual Information (MI) score relates one word to another. For example, if problem is often found with solve, they may have a high mutual information score. Usually, the will be found much more often near problem than solve, so the procedure for calculating Mutual Information takes into account not just the most frequent words found near the word in question, but also whether each word is often found elsewhere, well away from the word in question. Since the is found very often indeed far away from problem, it will not tend to be related, that is, it will get a low MI score.


There are several other alternative statistics: you can see examples of how they differ here.


This relationship is bi-lateral: in the case of kith and kin, it doesn't distinguish between the virtual certainty of finding kin near kith, and the much lower likelihood of finding kith near kin.


There are various different formulae for computing the strength of collocational relationships. The MI in WordSmith ("specific mutual information") is computed using a formula derived from Gaussier, Lange and Meunier described in Oakes, p. 174; here the probability is based on total corpus size in tokens. Other measures of collocational relation are computed too, which you will see explained under Mutual Information Display.



The Relationships settings are found in the Controller under Main Settings | Advanced | Index or in a menu option in WordList.


See also: Mutual Information Display, Computing Mutual Information, Making an Index List, Viewing Index Lists, WordList Help Contents.


See Oakes for further information about Mutual Information, Dice, MI3 etc.


Click the Permalink button if you want to copy a link to this page.