BNC mark-up consists of


header information
mark-up within the body of each text


What I call "mark-up" means information other than "the text itself". If for example you consider an email to your sister, your sister would normally see no mark-up in the body of your message other than the actual message, but the full email text would contain further information conforming to international standards with details of the sender, their username and their password and IP address, etc. This we could consider "mark-up": extra information.


In the case of the BNC the header information supplies data about the length of the text, who wrote it and where it was found, who processed it for the BNC and when, copyright issues, etc.


Mark-up within the body is mostly part of speech tags telling you whether each word was a noun, verb, preposition, etc. as well as some simple information about text structure: sentence and paragraph breaks. BNC mark-up does not try to tell you what the text should look like, e.g. where the pages break, which words are in italics or where any illustrations are placed.


There's a list of the BNC tags here.

