Codility

Classifying texts by CEFR level

The frequency of distribution of a word in language corresponds to a certain level of command of the language
- The more frequent the word, the earlier in your learning path you will learn it

text

units

frequency

excluding IT terms?

each unit's

in real language

corpus

CEFR level?

general language corpus

CEFR specialized corpus

Where is each unit sorted according to its absolute frequency?

In which level does each unit appear with a most similar relative frequency?

Just an example, advice from the expert would be needed to create these groups.

general language corpus

CEFR specialized corpus

'have': 3942

'hand' 431

have: 5/613

'hand': 1/613

The word appears this number of times in the general language corpus

By msoutopico