Thesis details | Informática

Name: JACSON RODRIGUES CORREIA DA SILVA
Type: PhD thesis
Publication date: 25/04/2023
Advisor:

Name	Role
THIAGO OLIVEIRA DOS SANTOS	Advisor *

Examining board:

Name	Role
ALBERTO FERREIRA DE SOUZA	Co advisor *
CLAUDINE SANTOS BADUE	Internal Examiner *
THOMAS WALTER RAUBER	Internal Examiner *

Summary: Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of
problems in recent years, leading to many companies developing neural-based products that require expensive data
acquisition, annotation, and model generation. To protect their models from being copied or attacked, companies
often deliver them as black-boxes only accessible through APIs, that must be secure, robust, and reliable across
different problem domains. However, recent studies have shown that state-of-the-art CNNs have vulnerabilities,
WHERE simple perturbations in input images can change the models response, and even images unrecognizable to
humans can achieve a higher level of confidence in the models output. These methods need to access the models
parameters, but there are studies showing how to generate a copy (imitation) of a model using its probabilities
(soft-labels) and problem domain data. By using the surrogate model, an adversary can perform attacks on the target
model with a higher possibility of success. We further explored these vulnerabilities. Our hypothesis is that by using
publicly available images (accessible to everyone) and responses that any model should provide (even black-boxes), it
is possible to copy a model achieving high performance. Therefore, we proposed a method called Copycat to explore
CNN classification models. Our main goal is to copy the model in two stages: first, by querying it with random natural
images, such as those from ImageNet, and annotating its maximum probabilities (hard-labels). Then, using these
labeled images to train a Copycat model that should achieve similar performance to the target model. We evaluated
this hypothesis on seven real-world problems and against a cloud-based API. All Copycat models achieved
performance (F1-Score) above 96.4% when compared to target models. After achieving these results, we performed
several experiments to consolidate and evaluate our method. Furthermore, concerned about such vulnerability, we
also analyzed various existing defenses against the Copycat method. Among the experiments, defenses that detect
attack queries do not work against our method, but defenses that use watermarking can identify the target models
Intellectual Property. Thus, the method proved to be effective in model extraction, having immunity to the literature
defenses, but being identified only by watermark defenses.

Access to document

Search form

You are here