The defaults are: select all sections of all texts selected in Choose Texts but cut out all angle-bracketed tags.
There are various alternatives in this box which help your choices with the boxes below. Choosing British National Corpus World Edition (as in the screenshot) will for example automatically put </teiHeader> into the Document header ends box below. You can also edit the options and their effects.
Markup to ignore
If you want to cut out unwanted tags e.g. in HTML files, leave something like < > or [ ] or < >,[ ] in Markup to ignore. The "search span" means how far should WordSmith look for a closing symbol such as > after it finds a starting symbol such as <. (The reason is that these symbols might also be used in mathematics.)
Markup to INclude or EXclude
See Making a Tag File.
See Making a Tag File.
Text Files and Mark-up
However, you can get WordSmith to use tags to select one section of a text and ignore the rest. This is "selecting within texts". You can also select between texts: that is, get WordSmith to look within the start of each text to see whether it meets certain criteria.
When you process a set of texts usually containing a standard header (e.g. a copyright notice) you may wish to remove it automatically.
Ensure that some suitable tag is specified as above in the </teiHeader> example. (If you choose Custom Settings above, you will get suitable choices automatically.) The process cuts by looking for the Document header ends mark-up and deleting all text to that point. (If you have a header repeated in the same text file, WordSmith will need to be told what mark-up is used for Document header starts too, and you will need to choose Only Part of File to get such headers removed.)
The order in which these choices are handled
If you choose either to select either between or within texts, WordSmith will check that each text file meets your requirements, before doing your concordance, word list, etc. It will
1. Select between files to check whether it contains the words you've specified;
2. Cut out any section specified as a "section to cut";
3. If there are "sections to keep", cut out everything which is not within them;
4. Cut start of each line, if applicable;
5. Process any entity references you want to translate;
6. Ignore any tags not to be retained (see the "Mark-up to ignore" section of the screenshot above).