A French text-message corpus : 88milSMS. Synthesis and usage


Rachel Panckhurst, Cédric Lopez, Mathieu Roche (2020), A French text-message corpus : 88milSMS. Synthesis and usage. In “Corpus complexes Traitements, standardisation et analyse des corpus de communication médiée par les réseaux”, CORPUS, 21, to appear.

Résumé de l'article

In this article, firstly we briefly summarise the sud4science project and data collection (, ensuing processing/analysing stages, and the resulting corpus, 88milSMS (, through a synthesis of quotes and references to previous articles (§ 1). Secondly, we provide a state of the art on some research initiatives that use 88milSMS in various domains and frameworks, which will enable future cross-disciplinary insight (§ 2). Then, we present other usages of the 88milSMS corpus we identified through surveys (§ 3). Finally, we suggest future paths for textual data collection and analysis.