<< Click here to display Table of Contents >> Navigation: Concord > editing concordances:

Remove duplicates ()

Contents

The problem

Sometimes one finds that text files contain duplicate sections, either because the corpus has become corrupted through being copied numerous times onto different file-stores or because they were not edited effectively, e.g. a newspaper has several different editions in the same file. The result can sometimes be that you get a number of repeated concordance lines.

Solution

If you choose Edit |Remove Duplicates, Concord goes through your concordance lines and if it finds any two where the stored concordance lines are identical, regardless of the filename, date etc. it will mark one of these for deletion. That is, it checks all the "characters to save" to see whether the two lines are identical. If you set this to 150 or so it is highly unlikely that false duplicates will be identified, since every single character, comma, space etc. would have to match.

Check before you zap...

At the end it will sort all the lines so you can see which ones match each other before you decide finally to zap the ones you really don't want.

Click the Permalink button if you want to copy a link to this page.