Aim

  Previous topic Next topic JavaScript is required for the print function  

 

ws-48-corupt

 

The purpose is to check whether one or more of your text files in your corpus doesn't belong. This could be because

it has got corrupted so what used to be good text is now just random characters or has got cut much shorter because of disk problems
it isn't even in the same language as the rest of the corpus

 

The tool works in any language. It does it by using a known sample of good text (in whatever language) and comparing that good text with all your corpus.

 

See also : How to do it

Page url: http://www.lexically.net/downloads/version5/HTML/?aimofcorpuscorruptiondetector.htm