Types of Tag Markup


You will need to specify how each tag type starts and ends, and you should be consistent in usage. Restrict yourself to symbols which otherwise do not appear in your texts.


eight special markers

Eight kinds of marker may be marked as significant for word lists: those which represent starts and ends of headings, sections, sentences and paragraphs. Type these in the appropriate spaces when selecting Text Characteristics.


tags within 2 separators

These tags are often used to signal the part of speech of each word; they're also widely used in HTML, XML, SGML for "switches", e.g. <H1> to switch on Heading 1 style and </H1> to switch it off again. You should use the same opening and closing symbols, usually some kind of brackets, for all your tags (as the British National Corpus does using SGML or XML markup): <Noun>,<Verb>,<Pronoun>.


entity references

HTML, XML and SGML use so-called entity references for symbols which are outside the standard alphabet, e.g. &eacute;t&eacute which represents été.

Specify these two types of markup by choosing Settings/Tag Lists, or Settings/Text Characteristics/Tags. You will then see a dialogue box offering Text to Ignore and a Browse button.

The Tags to Ignore option allows you to specify tags which you do not want to see in the concordance or word list results.

The Tags to be INcluded option allows you to specify a tag file, containing tags which you do want to see in the concordance or word list results.

The Tags to be EXcluded option allows you to specify a different tag file, containing stretches of tags which you want to find and remove in the concordance or word list results.

The Tags to be Translated option allows you to specify entity references which you want to convert on the fly, such as &eacute.


multimedia markers

Text files can be tagged for reference to sound or video files which you can hear or see. For example, a text might contain something like this: blah blah blah ...<a href=http://gandalf.hit.uib.no/c/l/32401-1.mp3> blah blah etc. A concordance on blah blah could pick up the tag so you can hear the source mp3 file. See defining multimedia tags.


See also: Overview of Tags, Handling Tags, Making a Tag File, Showing Nearest Tags in Concord, Tag Concordancing, Viewing the Tags, Using Tags as Text Selectors, Concord Sound and Video, Guide to handling the BNC.



(A particular sub-variety of tags within 2 separators sometimes used is tags with underscores at the left and space at the right like this

He_PRONOUN entered_VERB the_DET room_NOUN.

To process these, you will need to declare the underscore a valid character, or else convert your corpus to a format like.

<PRONOUN>He <VERB>entered <DET>the <NOUN>room.)


