Download Parser

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Print Topic
Send Mail Feedback
Share This Topic
Save Permalink URL

Navigation: » No topics above this level «

Filters

The point of it

A function to let you separate out texts which meet specific requirements. The procedure computes a score for each text based on whether it contains certain words or phrases or not. Some words count positively, others (which you want to exclude in the filter) count negatively. Eventually you can separate out all texts which score above some threshold.

How to do it

Filter Strings and syntax

filter_tab

Minimum word count

This is a simple method for avoiding excessively short texts.

This process also counts the spread of hits over the text. The text is notionally split into three segments and the hits must be spread over all of these segments.

Headlines search

If you check the within headlines only box, the search will be carried out only within the headlines text.

Filter now

Press this button to get your texts scored. What happens is each text is read, its words counted after stripping out any mark-up, and then each of the filter strings get searched for. The number of occurrences of each word or phrase is computed.The final score computed is the number of positive hits minus the number of negative hits (with ~), plus a computation of the amount of spread within the article where one extra point is given for each 33% of the text the hits are found in. Finally the total gets divided by the word count of the whole article.

The resulting list shows you each original article, plus its score (0.083 for the first article) and its word count (604).

filter_list_A

at the bottom of the list we get negative scores such as -0.061. Not enough positive words were found and a lot of the unwanted ones were present.

filter_list_bottom

Examining the results

Copy if score >=

If you press the Copy if score >= button, the program will copy any of the texts whose scores are equal to or greater than the value in the box, to a sub-folder of the FILTERS sub-folder. (If MOVE text-files is checked any filtered texts get moved, not just copied.) So you will find the article in a sub-folder of your parsed data matching the structure of each original article.

Text by Mike Scott, Help system by Help&Manual

Please enable JavaScript to view this site.

Download Parser