is a structured collection of real-world, authentic language data (spoken or written), typically large-scale, meant to represent a specific type of language use.
corpus
The data should reflect genuine language use, not artificially created samples. This is critical for drawing valid conclusions about language patterns.
• Authenticity
The corpus must represent the language variety or genre it is meant to reflect. For example, a corpus of academic English should contain texts from various academic disciplines and genres
• Representativeness
Unlock this slideshow and over 4 million more with Baamboozle+