Show/Hide Toolbars

WordSmith Tools Manual

Navigation: Utility Programs > WebGetter

Overview

Scroll Prev Top Next More

The point of it

The idea is to build up your own corpus of texts, by downloading web pages with the help of a search engine.

 

What you do

Just type a word or phrase, check the language, and press Download.

 

How it works

 

ws-48-webgetter

 

WebGetter visits the search engine you specify and downloads the first 1000 sources or so. Basically it uses the search engine just as you do yourself, getting a list of useful references. Then it sends out a robot to visit each web address and download the web page in each case (not from the search engine's cache but from the original web-site). Quite a few robots may be out there searching for you at once -- the advantage of this is that one slow download doesn't hold all the others up.

 

After downloading a web page, that WebGetter robot checks it meets your requirements (in Settings) and cleans up the resulting text. If the page is big enough, a file with a name very similar to the web address will be saved to your hard disk.

 

When it runs out of references, WebGetter re-visits the search engine and gets some more.

 

See also: Settings, Display, Limitations