Concept Maps Mining for Text Summarization

Name: Camila Zacché de Aguiar
Type: MSc dissertation
Publication date: 31/03/2017
Advisor:

Namesort descending Role
Davidson Cury Advisor *

Examining board:

Namesort descending Role
Aline Villavicencio External Examiner *
Credine Silva de Menezes External Examiner *
Davidson Cury Advisor *
Elias Silva de Oliveira Internal Examiner *

Summary: "Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way.
However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts.
This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics.
The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it.
In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the semi-automatic or automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach.
Besides, we present a process for Concept Map Mining elaborated following four issues of interest: Data Source Description, Domain Definition, Elements Identification and Map Visualization.
In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services.
In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary in the domain.
Finally, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed and presents 50% accuracy and 30% recall for English language. "

Access to document

Acesso à informação
Transparência Pública

© 2013 Universidade Federal do Espírito Santo. Todos os direitos reservados.
Av. Fernando Ferrari, 514 - Goiabeiras, Vitória - ES | CEP 29075-910