Making an Index List

 

The process is just like the one for making a word-list except that after choosing your texts and ensuring you like the index filename, you choose the bottom button here:

 

making an index wizard

In this screenshot above, the basic filename is shakespeare_plays: WordSmith will add .tokens and .types to this basic filename as it works. Two files are created for each index:

.tokens file: a large file containing information about the position of every word token in your text files.        

.types file: knows the individual word types.

 

If you choose an existing basic filename which you have already used, WordList will check whether you want to add to it or start it afresh:

over-write index_filename

An index permits the computation of word clusters and Mutual Information scores for each word type. The screenshot below shows the progress bars for an index of the BNC corpus; on a modern PC it might work at a rate of about 2.8 million words per minute. The resulting BNC.tokens file was 1.6GB in size and the BNC.types file was 26 MB.

 

making an index

 

adding to an index

To add to an existing index, just choose some more texts and choose File | New | Index. If the existing file-name is already in use for an index, you will be asked whether to add more or start it afresh as shown above.

 

See also Using Index Lists, Viewing Index Lists, WordList Help Contents.

Click the Permalink button if you want to copy a link to this page.