..or how do you know it's Shakespeare?
is the statistical analysis of variations in literary style between one writer or genre and another
Oxford Dictionary
in all their variety of material and method, have two features in common: the <...> texts they study have to be coaxed to yield numbers, and the numbers themselves have to be processed via statistics.
M. Eder, M. Kestemont, J. Rybicki. ‘Stylo’: a package for stylometric analyses
underlying stylometric studies is that authors have an unconscious as well as conscious aspect to their style
Encyclopaedia of Statistical Sciences
По-видимому, в разных стилях книжной и разговорной речи <...> частота употребления разных типов слов различна. Точные изыскания в этой области помогли бы установить структурно-грамматические, а отчасти и семантические различия между стилями <...>
В.В. Виноградов (1938) Введение в грамматическое учение о слове
Presumably, each national literature has its own famous unsolved attribution case, such as the Shakespearean canon, a collection of Polish erotic poems of the 16th century ascribed to Mikołaj Sęp Szarzyński, the Russian epic poem The Tale of Igor’s Campaign, and many other.
Eder M. (2011) Style-markers in authorship attribution: A cross-language study of the authorial fingerprint.
Уже двести лет не прекращается дискуссия о том, что представляет собой «Слово о полку Игореве», — подлинное древнерусское произведение или искусную подделку под древность, созданную в XVIII веке. <...> Гибель единственного списка этого произведения лишает исследователей возможности произвести анализ почерка, бумаги, чернил и прочих материальных характеристик первоисточника. Наиболее прочным основанием для решения проблемы подлинности или поддельности «Слова о полку Игореве» оказывается в таких условиях язык этого памятника.
A.А. Зализняк. "Слово о полку Игореве": взгляд лингвиста.
Lorenzo Valla (1407 – 1457)
1851 — A. De Morgan suggests mean word-length as an authorship feature
1873 — New Shakespeare Society (Furnival, Fleay et al)
1887 — T. Mendenhall, The Characteristic Curves of Composition, the first known work on quantitative authorship attribution
1867 — Campbell L. The Sophisties and Polilicus of Plato.
1880 — W. Dittenberger,
Sprachliche Kriterien für die Chronologie der Platonischen Dialoge
1890 — W. Lutosławski, Principes de stylométrie
1915 — Морозов Н.А. Лингвистические спектры
Acknowledges the importance of function words (see pic.)
1937 — Bolling, G.M. The Past Tense of 'To Be' in Homer
1938 — Carroll, J.B. Diversity of vocabulary and the harmonic series law of word-frequency distribution
the breakthrougs came in the 1960-ies, as usual
In summary, the following points are clear:
Most readers and critics behave as though common prepositions, conjunctions, personal pronouns, and articles — the parts of speech which make up at least a third of fictional works in English — do not really exist. But far from being a largely inert linguistic mass which has a simple but uninteresting function, these words and their frequency of use can tell us a great deal about the characters who speak them.
Preface to Computation into Criticism, 1987
Виноградов В. В. (1961) Проблема авторства и теория стилей
Unabomber Theodore Kaczynski perpetrated a number of bomb attacks on universities and airlines between 1978 and 1995
Promised to stop if his 35,000-word anti-industrialist “manifesto” was published in major newspapers
Distinctive writing style and turns of phrase enabled him to be identified
Authorship of Primary Colors
Derek Bentley and his disputed murder ‘confession’
Adversarial stylometry