Making a Tag File

 

Tag Syntax

Each tag is case sensitive.

Tags conventionally begin with < and end with > but the first & last characters of the tag can be any symbol.

You can use

 * to mean any sequence of characters;

 ? to mean any one character;

 # to mean any numerical digit.

 

Don't use [ to insert comments in a tag file, since [ is useful as a potential tag symbol. You can use # to represent a number (e.g. <h#> will pick up <h5>, <h1>, etc.). And use ? to represent any single character (<?> will pick up <s>, <p>, etc.), or * to represent any number of characters (e.g. <u*> will pick up <u who=Fred>, <u who=Mariana>, etc.). Otherwise, prepare your tag list file in the same way as for Stop Lists.

 

Use notepad or any other plain text editor, to create a new .tag file. Write one entry on each line.

Any number of pre-defined tags can be stored. But the more you use, the more work WordSmith has to do, of course and it will take time & memory ...

 

Mark-up to EXclude

 

tags_to_include_or_exclude

A tag file for stretches of mark-up like this <SCENE>A public library in London. A bald-headed man is sitting reading the News of the World.</SCENE>

where you want to exclude the whole stretch above from your concordance or word list, e.g. because you're processing a play and want only the actors' words. Mark-up to exclude will cut out the whole string from the opening to the closing tag inclusive.

 

For the Shakespeare corpus, a set of tags to EXclude might be used.

sample_exclusion_tag_file

(The idea is not to process any stage directions when processing the Shakespeare corpus.)

The syntax requires ></ or >*</ to be present.

Legal syntax examples would be:

<SCENE></SCENE>

<SCENE>*</SCENE>

<SCENE #>*</SCENE>

<HELLO?? #>*</GOODBYE>

(In this last example it'll cut only if <HELLO is followed by 2 characters, a space and a number then >, and if </GOODBYE> is found beyond that.)

<SCENE>*

</SCENE>

won't work, because both parts of the tag must be on the same line.

<SCENE>*<\SCENE>

won't work, because the slash must be /.

With your installation you will find (Documents\wsmith6\sample_lemma_exclude_tag.tag) included, which cuts out lemmas if constructed on the pattern <lemma tag="*>*</lemma>, i.e. with the word tag, an equals sign and a double-quote symbol, regardless of what is in the double-quotes.

 

Mark-up to INclude

A tag file for tags to retain contains a simple list of all the tags you want to retain. Sample tag list files for BNC handling (e.g bnc world.tag) are included with your installation (in your Documents\wsmith6 folder): you could make a new tag file by reading one of them in, altering it, and saving it under a new name.

 

Tags will by default be displayed in a standard tag colour (default=grey) but you can specify the foreground & background for tags which you want to be displayed differently by putting

/colour="foreground on background"

e.g. <noun> /colour="yellow on red"

Available colours:

'Black','White','Cream',

'Red','Maroon',

'Yellow',

'Navy','Blue','Light Blue','Sky Blue',

'Green','Olive','Dollar Green','Grey-Green','Lime',

'Purple','Light Purple',

'Grey','Silver','Light Grey','Dark Grey','Medium Grey'.

 

The colour names are not case sensitive (though the tags are). Note UK spelling of "grey" and "colour".

 

Also, you can put "/play media" if you wish a given tag, when found in your text files, to be able to attempt to play a sound or video file. For example, with a tag like

<sound *> /colour="blue on yellow" /play media

and a text occurrence like

<sound c:\windows\Beethoven's 5th Symphony.wav>

or

<sound http://www.political_speeches.com/Mao_Ze_Dung.mp3>

you will be able to choose to hear the .wav or .mp3 file.

 

Finally, you can put in a descriptive label, using /description "label" like this:

<w NN*> /description "noun" /colour="Cream on Purple"

<ABSTRACT> /description "section"

<INTRODUCTION> /description "section"

<SECTION 1> /description "section"

 

Tagstring_only tags

You can also define two tags as ones you want to use to mark the beginnings and ends of what will be shown in a concordance using /tagstring_only as a signal. For example, if concordancing text containing titles marked out with <title> and </title>, you may want to see only the title text.  You'd include in the tag file

<title> /tagstring_only

</title> /tagstring_only

To get Concord to show only the text between these two, choose View | Tag string only in Concord's menu.

 

Section tag

In the examples using "section", Concord's "Nearest Tag" will find the section however remote in the text file it may be.

This is particularly useful e.g. if you want to identify the speech of all characters in a play, and have a list of the characters, and they are marked up appropriately in the text file.

<Romeo> /description "section"

<Mercutio> /description "section"

<Benvolio> /description "section"

 

 

Here is an example of what you see after selecting a tag file and pressing "Load". The first tag is a "play media" tag, as is shown by the icon. You can see the cream on purple colour for nouns too. The tag file (BNC World.tag) is included in your installation.

 

tag_file_viewing

 

Entity File (entities to be translated)

 

entity_file

 

If you load it you might see something like this:

entity_file_loaded

 

A tag file for translation of one entity reference into another uses the following syntax: entity reference to be found + space + replacement. Examples:

&Eacute; É

&eacute; é

In the screenshot above, the sample tag file for translation (Documents\wsmith6\sgmltrns.tag) which is included with your installation has been loaded. You could make a new one by reading it in, altering it, and saving it under a new name.

 

See also: Overview of Tags, Handling Tags, Showing Nearest Tags in Concord, Tag Concordancing, Types of Tag, Viewing the Tags, Using Tags as Text Selectors, Guide to handling the BNC.

Click the Permalink button if you want to copy a link to this page.