Plugin Information
Version | 2.1.1 |
---|---|
Author | Dataiku (Alex COMBESSIE, Du PHAN, Hicham EL BOUKKOURI) |
Released | 2018-09 |
Last updated | 2025-01 |
License | Apache Software License |
Source code | Github |
Reporting issues | Github |
This plugin provides a tool for extracting Named Entities (i.e. People names, Dates, Places, etc) which can be useful for extracting knowledge from your texts.
The plugin comes with a single recipe that extracts entities using one of two possible models:
– SpaCy: a faster but slightly less precise model. Another advantage of SpaCy is its support for many languages.
– Flair: a slower but more precise model for Named Entity Recognition.
How to use
Named Entity Recognition recipe
This recipe extracts named entities such as LOC (localisation) and PER (person) from your texts. The default model is SpaCy which is available for 8 languages. To use a more precise (but slower) model for English, choose Flair.
Using the recipe is straightforward. Just plug in your dataset, select the column containing your texts and run the recipe!
Optionally, you can set some advanced settings. For example, you can choose Flair (only available in English) for more precise extraction. You can also choose the format in which the extracted entities are presented: a separate column for each entity type (default) or a single column with a JSON containing all the entities.
Named Entity Visualization webapp
You can start the webapp from the main webapp menu, under Visual Webapp > Named Entity Visualization. After starting the backend of the webapp, you will be able to try visualizing named entities of any input text using spaCy.
References
Alan Akbik, Duncan Blythe and Roland Vollgraf Contextual String Embeddings for Sequence Labeling, 2018 In 27th International Conference on Computational Linguistics.