Show/Hide Toolbars

WordSmith Tools Manual

Navigation: Utility Programs > Corpus Checker

corpus sampler

Scroll Prev Top Next More

The point of it

 

Texts downloaded from the Internet often seem to promise one thing but in fact contain another... A news story may refer incidentally to an issue you are interested in, but really be mainly about something else.

 

Use the sampler when you need to read the individual texts in your corpus to check whether they meet your purposes. This utility finds a random set and can save that for sharing with others so you can agree on criteria.

 

How to do it

 

Choose the number of texts desired and press Random sample. The procedure looks at all the text files in the source folder and any sub-folders, and picks out that number at random.

 

corpus_sampler

 

Zipped text button

This generates a zipped file, with all the original folder structure of the sample texts preserved:

unfiltered_zip

In the .zip you find a separate copy of each text file, as well as a large text containing a copy of each text glued one after the other. It also contains a plain text list of the text files ready for you or fellow researchers to note down observations about each text. You might find it useful to copy the latter into Excel in your record-keeping.

 

See also: relevance check.