Novel semi-supervised algorithms based on extreme learning machine for unbalanced data streams with concept drift

Name: Carlos Alexandre Siqueira da Silva
Type: PhD thesis
Publication date: 06/08/2020

Namesort descending Role
Renato Antônio Krohling Advisor *

Examining board:

Namesort descending Role
Antônio de Pádua Carobrez External Examiner *
Celso Alberto Saibel Santos Internal Examiner *
Daniel Cruz Cavaliéri External Examiner *
Renato Antônio Krohling Advisor *
Vinicius Fernandes Soares Mota Internal Examiner *

Summary: Data streams are important sources of information nowadays, and with the popularization of mobile devices and sensor systems that collect all kinds of data, more and more information is generated at an ever increasing speed. This growth in data supply poses some problems for traditional machine learning algorithms. Tasks such as data classification, regression, or data clustering presents some limitations regarding very large datasets, data streams, or variations in data. In general, algorithms that works in one of these situations may not work in others. In addition, data streams pose further challenges to machine learning algorithms. The high cost of labeling instances for training classification algorithms makes it difficult to use fully supervised algorithms. Unbalanced datasets tend to cause algorithms to ignore one or more classes. Moreover, concept drifts in data streams require algorithms to be retrained from time to time. To minimize the problems mentioned, in this thesis semi-supervised and online algorithms based on Extreme Learning Machine (ELM) were proposed. The first proposed algorithm named Semi-Supervised Online Elastic ELM, for short, SSOE-ELM, overperform others in the literature in accuracy and training time, showing good results in cases of unbalanced datasets. SSOE-ELM uses labeled and unlabeled samples for training, and receives data sequentially in chunks of one or more instances, continuously updating the network. In general, as an Extreme Learning Machine based algorithm, its training is very fast compared to gradient descent based algorithms. The second proposed algorithm named Semi-Supervised Online Elastic ELM with Forgetting Parameter, for short, SSOE-FP-ELM, is an extension of SSOE-ELM to deal with data streams with concept drift. SSOE-FP-ELM uses a hybrid forgetting parameter that considers labeled and unlabeled instances to detect gradual and abrupt concept drift cases. Experimental results show that the two proposed algorithms outperform others in the literature in accuracy and generalization ability, showing suitable alternatives for data streams classification.

Keywords: Machine learning; Semi-supervised learning; Extreme learning machine (ELM); Data streams; Concept drift; Unbalanced datasets.

Access to document

Acesso à informação
Transparência Pública

© 2013 Universidade Federal do Espírito Santo. Todos os direitos reservados.
Av. Fernando Ferrari, 514 - Goiabeiras, Vitória - ES | CEP 29075-910