Sources of sound and video files
WordSmith does not provide or include corpora. However, there are specialised corpora such as NECTE, MICASE, ICE and then there are publicly available sources such as the TED Talks. You are expected to respect copyright provisions in all cases.
There is a lot of useful advice at
TED Open Translation Project where you will find transcripts.
These text files in English (.en), Spanish (.es), Italian (.it) and Japanese (.ja) were downloaded from there and later converted using the Text Converter