On December 15, Professor Alexei Lavrentiev, École normale supérieure de Lyon, presented for MCU students and professors his second lecture on corpus linguistics, “Textometry and Distant Reading on TMX Platform.” The lecture explained how digital tools may assist linguists and philologists in processing and researching on language corpora, a collection of which Professor discussed in the first lecture “Humanities scholars and computer technologies: checkered history of relations.”
Professor began with the definition of distant reading, which implies an approach to the research in the history of literature using analytical and mathematical means of economics and social studies. Franco Moretti, the concept’s inventor, suggests it be an efficient way to process a large amount of literature data, that allows to comprehensively understand literature processes. However, Professor mentioned that Moretti was not the method’s developer as corpora had been processed since Medieval. In that time and later, the branch of language studies that explored the qualitative features of texts was called stylometry.
In the 2000s, the concept of lexicometry appeared in France but was soon replaced with textometry as the research rather relates to the text, its structure and composition, which may be resourceful for observing, inter alia, the language, discourse, literature, history, and politics.
Professor introduced TXM Platform, which allows researchers to efficiently process massive textual data, to synchronize audio or video records with subtitles, and to develop parallel corpora. TXM includes both tools for qualitative and quantitative analysis. The former comprises concordances of words and their combinations, electronic books, including scans of manuscripts, and commentary on the content. The most wide-spread form of the latter is frequency dictionaries.
As an example of TMX implementation in research, Professor presented the study of his own and his colleagues from Russia and France, “Using TMX Platform for Research on Language Changes over Time: the Dynamics of Vocabulary and Punctuation in Russian Literary Texts.” The study focuses on Russian literary texts of the period from 1900 until 1930. Its representative subcorpus includes 308 novels by Russian writers. The paper identifies the words most used in 1900-1913, 1914-1916, 1917-1922, and 1923-1930, and observes their frequency in each period so that to find specific features in the development of the Russian language.
The research reveals that Russian underwent the most drastic changes in 1917, which was predictable as the year featured the Russian Revolution, which influenced the language among others. For instance, the word товарищ (literally tovarisch, comrade) started to be used rather more often and acquired the specific meaning of a compatriot who possesses equal rights and opportunities. On the contrary, the traditional Russian name Иван (Ivan) turned out to be typical of the pre-revolution period.
After the lecture, Professor was asked several questions from the audience. Professor Larisa Vikulova, Department of Romance Philology at the Institute of Foreign Languages, inquired about the diachronic research on punctuation with the use of digital tools. Professor Lavrentiev mentioned that his research interest concern punctuation and it was one of the reasons why he got involved in digital Humanities. The speaker admitted that punctuation was believed to be non-systemic and modern printed editions of the Medieval books feature modern punctuation, and the format cannot combine the modern style and the original one. Here, digital editions come to the rescue as they may comprise both versions and allow comparing them.
Daniil Sigachev, an IFL student of the second year, asked Professor Lavrentiev about the research directions in which a beginning scholar may start his path to corpus linguistics and the methods accessible for this scholar. Professor Lavrentiev underlined that corpus studies involve several levels. Corpus-informed research presupposes that a researcher is aware of corpora and may extract necessary data from them. A corpus-based study includes collecting a corpus and testing a hypothesis with the use of it.
In conclusion, the lecture moderator Associate Professor Oxana Dubnyakova, Department of Romance Philology, highlighted the relevance of corpus linguistics, expresses hope that MCU students and teaching staff would contribute to the studies, and thanked Professor Alexei Lavrentiev for sharing his thoughts and experience.