Chargrams chargrams48, a tool to help find out which chargrams (sequences of N characters) are most frequent in a text or a set of texts.


The purpose could be to check out which chargrams are most frequent e.g. in word-initial position, in the middle of a word, or at the end.



These are 3-letter chargrams occurring in word-final position. ING is a well-known ending in English; HAT is is a frequent 3-letter sequence at the end of words too.


How does it work?

Chargrams are computed by taking only the valid characters of text. If a text contained "In 1845 there was a princess", the 3-character chargrams considered would be THE, HER, ERE, WAS, PRI, RIN, INC, NCE, CES, ESS. The positions are computed in relation to the original words, so THE is word-initial while ESS is word-final, and RIN is medial.

If including punctuation, the sequences would include IN_, N_1, _18, 184 etc. too.

