Name: Leonardo Santos Paulucio
Type: MSc dissertation
Publication date: 14/02/2022
Advisor:

Namesort descending Role
Thiago Oliveira dos Santos Advisor *

Examining board:

Namesort descending Role
Flávio Miguel Varejão Internal Examiner *
Patrick Marques Ciarelli External Examiner *
Thiago Oliveira dos Santos Advisor *

Summary: Natural Language Processing (NLP) has been receiving increasing attention in the past
few years. In part, this is related to the huge flow of data being made available everyday
on the internet, which increased the need for automatic tools capable of analyzing and
extracting relevant information, especially from the text. In this context, text classification
became one of the most studied tasks on the NLP domain. The objective is to assign
predefined categories or labels to text or sentences. Important applications include sentence
classification, sentiment analysis, spam detection, among many others. This work proposes
an automatic system for product categorization using only their titles. The proposed system
employs a state-of-the-art deep neural network as a tool to extract features from the titles
to be used as input in different machine learning models. The system is evaluated in the
large-scale Mercado Libre dataset, which has the common characteristics of real-world
problems such as imbalanced classes, unreliable labels, besides having a large number of
samples: 20,000,000 in total. The results showed that the proposed system was able to
correctly categorize the products with a balanced accuracy of 86.57% on the local test
split of the Mercado Libre dataset. It also surpassed the fourth place on the public rank of
the MeLi Data Challenge with 91.19% of balanced accuracy, which represents less than
1% of the difference to the winner.

Access to document

Acesso à informação
Transparência Pública

© 2013 Universidade Federal do Espírito Santo. Todos os direitos reservados.
Av. Fernando Ferrari, 514 - Goiabeiras, Vitória - ES | CEP 29075-910