Thesis details | Informática

Publication date: 12/06/2024

Examining board:

Name	Role
DIEGO ROBERTO COLOMBO DIAS	Examinador Interno
GIOVANNI VENTORIM COMARELA	Presidente
LEANDRO SANTIAGO DE ARAÚJO	Examinador Externo

Summary: Learning from label proportions (LLP) is a weakly supervised problem where the data is divided into subsets called bags. For each bag, only the proportion of labels is known. LLP is a problem that has many interesting applications where the individual information is available as group statistics, for instance, election analysis. However, LLP is a challenging problem to solve in most of its variants, and most machine learning models have difficulty achieving good classification performance. In an attempt to increase the performance of machine learning models in the LLP problem, one of the main ideas was to reveal the label of a few examples in the dataset. The belief is that even a few anchors could help increase the performance of the models in the LLP problem. From this belief, the combination of active learning – a machine learning subfield that is interested in how to obtain the best performance by labeling as few examples as possible – and LLP was proposed. In this work, we investigate the use of active learning in the learning from label proportions problem. Particularly, this work focuses on low-budget regimes, i.e., problems where the budget of labels to reveal is limited. We evaluate the models on a variety of experiments and variants, exploring the intricacies and tradeoffs of combining both these problems. Additionally, we extend the existing frameworks by using uncertainty sampling and propose a novel score combining uncertainty sampling and deep learning from label proportions.

Access to document

Search form

You are here