References

This is box title

Iterative Named Entity Recognition from a Syntactic Dependency Structure and the NERD Ontology

Reference

Cédric Lopez, Melissa Mekaoui, Kevin Aubry, Jean Bort and Philippe Garnier (2019) Reconnaissance d’entités nommées itérative sur une structure en dépendances syntaxiques avec l’ontologie NERD, Revue des Nouvelles Technologies de l’Information, RNTI-E-35, p. 81-92 (this work was presented in Metz at the EGC’19 conference).


Abstract

Named entity recognition (NER) seeks to locate and classify named entities into predefined categories (persons, organizations, brand names, sports teams, etc.). NER is often considered as one of the main modules designed to structure a text. In this article, we describe our symbolic system which is characterized by 1) the use of limited resources, and 2) the embedding of results from other modules such as coreference resolution and relation extraction. The system is based on the output of a dependency parser that adopts an iterative execution flow that embeds results from other analysis blocks. At each iteration, candidate categories are generated and are all considered in subsequent iterations. The advantage of such a system is to select the best candidate only at the end of the process in order to take into account all the elements provided by the different modules. The system is compared to academic and industrial systems.


Resources

Wikipedia-ner : Download

Corpus developed by Emvista for named entity recognition. This corpus was built from Wikipedia abstracts. It consists of 587 abstracts and 3 125 named entities annotated with the BIO encoding and the concepts of the NERD ontology. See the publication for more details.
This corpus is under Creative Commons License Creative Commons CC-BY-NC-SA et LGPL-LR.

Le tour du monde en quatre-vingts jours, by Jules Verne, 1872 : Download

This corpus in WML format was initially annotated and disseminated by the LIFAT, with 12 named entity types (persons, organizations, location, vessels, facilities, oronyms, …). With the agreement of LIFAT, we propose a new version of this corpus in CSV format with a projection of NERD ontology types (place, person, organization, product, …). 6076 tokens are annotated with this ontology.
This corpus is under Creative Commons License Creative Commons CC-BY-NC-SA et LGPL-LR.

This is box title

SMILK, linking natural language and data from the web

Reference

Cédric Lopez, Molka Tounsi Dhouib, Elena Cabrio, Catherine Faron-Zucker, Fabien Gandon, Frédérique Segond (2018) SMILK, trait d’union entre langue naturelle et données sur le web, Revue d’Intelligence Artificielle, vol. 32/3, p. 287-312


Abstract

As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This article describes each step of our approach: the creation of ProVoc, an ontology to describe products and brands; the automatic population of a knowledge base mainly based on ProVoc from heterogeneous textual resources; and the evaluation of an application which that takes the form of a browser plugin providing additional knowledge to users browsing the web.