Handling the British National Corpus

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Print Topic
Share This Topic
Save Permalink URL

Header Tags

In the case of the BNC the header information supplies data about the length of the text, who wrote it and where it was found, who processed it for the BNC and when, copyright issues, etc.

The whole text file starts with <bncDOC and some form of identification:

in the case of text file A0T, and ends with

</bncDoc>

in the XML edition you will find xml in the header and to conform to XML specifications, double quote characters are used around each attribute:

The header section starts with something like this:

and ends with

</teiHeader>

You might wish to cut out the whole header when processing with WordSmith. To do so, see this section.

Please enable JavaScript to view this site.

Handling the British National Corpus