Please enable JavaScript to view this site.

Handling the British National Corpus 

Text FL4 (a spoken broadcast discussion) contains the utterance "So tonight the wraps are off and I think you'll agreeably surprised by what you hear!" (in all editions we get "you'll agreeably" without "be").

 

Body tags look like this:

 

XML edition (2007)

 

BNC_XML_fragment

hw = head word -- lemmatisation; pos=part of speech. Note that you get grammatical information twice over, first the CLAWS 5 markup which is detailed, then after the head word a simpler form of part of speech.

 

Earlier editions (1994, 2001)

 

<s n="3"><w AV0>So <w AV0>tonight <w AT0>the <w NN2>wraps <w VBB>are <w AVP>off 

<w CJC>and <w PNP>I <w VVB>think <w PNP>you<w VM0>'ll <w AV0>agreeably 

<w AJ0-VVD>surprised <w PRP>by <w DTQ>what <w PNP>you <w VVB>hear<c PUN>!

 

 

Everything here within <> brackets is a body tag.

 

You may want to use only a part of this mark-up when processing using WordSmith.