Tag Syntax

Each tag is case sensitive.

Tags conventionally begin with < and end with > but the first & last characters of the tag can be any symbol.

You can use

 * to mean any sequence of characters;

 ? to mean any one character;

 # to mean any numerical digit.

 

Don't use [ to insert comments in a tag file, since [ is useful as a potential tag symbol. You can use # to represent a number (e.g. <h#> will pick up <h5>, <h1>, etc.). And use ? to represent any single character (<?> will pick up <s>, <p>, etc.), or * to represent any number of characters (e.g. <u*> will pick up <u who=Fred>, <u who=Mariana>, etc.). Otherwise, prepare your tag list file in the same way as for Stop Lists.

 

Use notepad or any other plain text editor, to create a new .tag file. Write one entry on each line.

Any number of pre-defined tags can be stored. But the more you use, the more work WordSmith has to do, of course and it will take time & memory ...

 

Mark-up to EXclude

 

tags_to_include_or_exclude

A tag file for stretches of mark-up like this <SCENE>A public library in London. A bald-headed man is sitting reading the News of the World.</SCENE>

where you want to exclude the whole stretch above from your concordance or word list, e.g. because you're processing a play and want only the actors' words. Mark-up to exclude will cut out the whole string from the opening to the closing tag inclusive.

 

For the Shakespeare corpus, a set of tags to EXclude might be used.

sample_exclusion_tag_file

(The idea is not to process any stage directions when processing the Shakespeare corpus.)

The syntax requires ></ or >*</ to be present.

Legal syntax examples would be:

<SCENE></SCENE>

<SCENE>*</SCENE>

<SCENE #>*</SCENE>

<HELLO?? #>*</GOODBYE>

(In this last example it'll cut only if <HELLO is followed by 2 characters, a space and a number then >, and if </GOODBYE> is found beyond that.)

<SCENE>*

</SCENE>

won't work, because both parts of the tag must be on the same line.

<SCENE>*<\SCENE>

won't work, because the slash must be /.

With your installation you will find (Documents\wsmith7\sample_lemma_exclude_tag.tag) included, which cuts out lemmas if constructed on the pattern <lemma tag="*>*</lemma>, i.e. with the word tag, an equals sign and a double-quote symbol, regardless of what is in the double-quotes.

 

Mark-up to INclude

A tag file for tags to retain contains a simple list of all the tags you want to retain. Sample tag list files for BNC handling (e.g bnc world.tag) are included with your installation (in your Documents\wsmith7 folder): you could make a new tag file by reading one of them in, altering it, and saving it under a new name.

 

tog_minus        Colours

 

tog_minus        Sound and Video

 

tog_minus        Descriptive Label

 

tog_minus        Section Tag

 

tog_minus        Tagstring_only tags

 

 

Here is an example of what you see after selecting a tag file and pressing "Load". The first tag is a "play media" tag, as is shown by the icon. You can see the cream on purple colour for nouns too. The tag file (BNC World.tag) is included in your installation.

 

tag_file_viewing

 

Entity File (entities to be translated)

 

entity_file

 

If you load it you might see something like this:

entity_file_loaded

 

A tag file for translation of one entity reference into another uses the following syntax: entity reference to be found + space + replacement. Examples:

&Eacute; É

&eacute; é

In the screenshot above, the sample tag file for translation (Documents\wsmith7\sgmltrns.tag) which is included with your installation has been loaded. You could make a new one by reading it in, altering it, and saving it under a new name.

 

See also: Overview of Tags, Handling Tags, Showing Nearest Tags in Concord, Tag Concordancing, Types of Tag, Viewing the Tags, Using Tags as Text Selectors, Guide to handling the BNC.

Click the Permalink button if you want to copy a link to this page.