Using Natural Language Processing to Extract Relations: A Systematic Mapping of Open Information Extraction Approaches

Context: for thousands of years humans translate their knowledge using natural language format and register it so that others can access them. Natural Language Processing (NLP) is a subarea of Artificial Intelligence (AI) that studies the linguistic phenomena and uses computational methods to process natural language written texts. More specific areas such as Open Information Extraction (Open IE) were created to perform the information extraction in textual databases, such as relationship triples, without prior information of its context or structure. Recently, research was conducted by grouping studies related to Open IE initiatives. However, some information about this domain can still be explored.
Objective: this work aims to identify in literature the main characteristics that involve the Open IE approaches.
Method: in order to achieve the proposed objective, first we conducted the update of a mapping study, and then, we performed backward snowballing and manual search to find publications of researchers and research groups that accomplished these studies. In addition, we also considered a specialized electronic database in NLP.
Results: the study resulted in a set of 159 studies proposing Open IE approaches. Data analysis showed a migration from the use of supervised techniques to neural techniques. The study also showed that the most commonly used data sets are Journalistic News. Moreover, the preferred techniques for evaluating approaches are precision and recall.
Conclusion: many Open IE approaches have been published and community interest is growing in this topic. The advance of the area of AI and neural networks allowed this technique to be used to extract relevant information from texts that can be used later by other areas.

Presentation

Replication package

Systematic Literature Reviews process need to be transparent, facilitate the access to data, consequently, enhance the reproducibility and auditability. Therefore, the selection of primary studies selection, data extraction and data analysis spreadsheets are also available on the link:

To checkout our repository on github acess:

Contents available

Resource Description Link
RQ1 When and where have the studies been published? Link
RQ2 Which pipelines (interaction of resources and tools) are used to extract relationships in sentences? Link
RQ3 Which are the Open IE implemented tools?? Link
RQ4 Which corpora were used to evaluate the relation extraction approaches? Link
RQ5 How the quality of Open IE results were evaluated? Link
RQ6 Which automatic evaluation techniques the study proposes? Link
Publication Venues Access a complete list of venues (e.g., conferences, journals) used to publish studies about OpenIE Link