Show/Hide Toolbars

WordSmith Tools Manual

Navigation: Utility Programs > Corpus Checker

corpus sampler

Scroll Prev Top Next More

The point of it

 

Texts downloaded from the Internet often seem to promise one thing but in fact contain another... A news story may refer incidentally to an issue you are interested in, but really be mainly about something else.

 

Use the sampler when you need to read the individual texts in your corpus to check whether they meet your purposes. This utility finds a random set and can save that for sharing with others so you can agree on criteria.

 

How to do it

In the Sample settings tab, choose the sample size desired, the types of text file and the folder they are in. Decide whether you want a sample from the top, bottom or middle of the collection (as sorted by file-date if from a folder, or from a list of files you already have).

 

corpus_sampler

 

Now in the Actions tab,

corpus_sampler_actions_tab

press Random sample from folder if you want the program to go through a set of texts in a folder (and its sub-folders), or Random Sample from a list if you have a plain text list.  The procedure looks at all the text files, and picks out that number at random. (If there are aren't enough you'll get what there are.)

 

Here the random procedure has chosen anywhere within the total.  

 

corpus_sampler_results

I have highlighted the text file dates in yellow: they are still in date order but only 25 were selected from a set starting in 2005 and ending in 2019.

 

RTF Copy button

Lets you save some or all in RTF format, so that you can read and study the texts in Microsoft Word or similar.

 

Zipped text button

This generates a zipped file, with all the original folder structure of the sample texts preserved:

unfiltered_zip

In the .zip you find a separate copy of each text file, as well as a large text containing a copy of each text glued one after the other. It also contains a plain text list of the text files ready for you or fellow researchers to note down observations about each text. You might find it useful to copy the latter into Excel in your record-keeping.

 

See also: relevance check. And File Utilities has a tool for moving or copying the structure of text files.