lingu log

Multilayer tagging of the corpora

What is tagging? Just how much of tagging do we need? How much is enough? Are we doing it wrong?
Tagging is the process of identifying and marking segments in a language corpus whether it’s spoken or text.

Most corpora like the Brown Corpus have POS tagging (Part-Of-Speech Tagging). These corpora have only one purpose and they fulfil that purpose well. What I have in mind is multilayer tagging of the corpora. This is accomplished by having the corpus and annotations in different layers. For instance consider the following example:


This multilayer propose can help us annotate and view different parts of corpora.

I’m currently working on this proposal.