Context: for thousands of years humans translate their knowledge using natural language format and register it so that others can access them. Natural Language Processing (NLP) is a subarea of Artificial Intelligence (AI) that studies the linguistic phenomena and uses computational methods to process natural language written texts. More specific areas such as Open Information Extraction (Open IE) were created to perform the information extraction in textual databases, such as relationship triples, without prior information of its context or structure. Recently, research was conducted by grouping studies related to Open IE initiatives. However, some information about this domain can still be explored.
Objective: this work aims to identify in literature the main characteristics that involve the Open IE approaches.
Method: in order to achieve the proposed objective, first we conducted the update of a mapping study, and then, we performed backward snowballing and manual search to find publications of researchers and research groups that accomplished these studies. In addition, we also considered a specialized electronic database in NLP.
Results: the study resulted in a set of 159 studies proposing Open IE approaches. Data analysis showed a migration from the use of supervised techniques to neural techniques. The study also showed that the most commonly used data sets are Journalistic News. Moreover, the preferred techniques for evaluating approaches are precision and recall.
Conclusion: many Open IE approaches have been published and community interest is growing in this topic. The advance of the area of AI and neural networks allowed this technique to be used to extract relevant information from texts that can be used later by other areas.
Systematic Literature Reviews process need to be transparent, facilitate the access to data, consequently, enhance the reproducibility and auditability. Therefore, the selection of primary studies selection, data extraction and data analysis spreadsheets are also available on the link:
To checkout our repository on github acess:
Resource | Description | Link |
---|---|---|
RQ1 | When and where have the studies been published? | Link |
RQ2 | Which pipelines (interaction of resources and tools) are used to extract relationships in sentences? | Link |
RQ3 | Which are the Open IE implemented tools?? | Link |
RQ4 | Which corpora were used to evaluate the relation extraction approaches? | Link |
RQ5 | How the quality of Open IE results were evaluated? | Link |
RQ6 | Which automatic evaluation techniques the study proposes? | Link |
Publication Venues | Access a complete list of venues (e.g., conferences, journals) used to publish studies about OpenIE | Link |