Name: JOÃO FELIPE GOBETI CALENZANI
Publication date: 26/08/2025
Examining board:
| Name |
Role |
|---|---|
| ALBERTO FERREIRA DE SOUZA | Presidente |
| CLAUDINE SANTOS BADUE | Coorientador |
| LUIS ANTONIO DE SOUZA JUNIOR | Examinador Interno |
| MARIELLA BERGER ANDRADE | Examinador Externo |
Summary: This work presents a comprehensive study on the effectiveness of GPT-4V, a multimodal
large language model with vision processing capabilities, in classifying driver behaviors
from video data. The research focuses on scenarios where only a limited number of frames
from each video are analyzed, exploring the feasibility of few-shot video classification
for driver monitoring applications. The work targets critical risk behaviors, including
yawning, smoking, mobile phone usage, distraction, and cases where the driver’s face
is not visible. To conduct the evaluation, a private dataset of annotated driver videos,
recorded in real-world conditions, was used alongside the public Driver Monitoring Dataset
(DMD). In the private dataset, GPT-4V achieved high classification accuracy, with 98.9%
for yawning, 98.4% for smoking, 95.7% for phone usage, 91.7% for distraction, and 94.1%
for “face not visible” events. For the public dataset, results included 90.9% accuracy
for “using cellphone” (recall: 76.6%, precision: 92.1%), 91.0% for “distraction” (recall:
93.1%, precision: 97.4%), and 98.2% for “yawning” (recall: 43.7%, precision: 87.5%). The
results demonstrate GPT-4V’s potential as an additional classification layer for Advanced
Driver-Assistance Systems (ADAS), capable of filtering false positives and enhancing event
detection in resource-constrained environments. Beyond benchmarking performance, this
thesis also documents the prompt engineering strategies developed to adapt GPT-4V for
structured, domain-specific classification tasks. The findings contribute to the growing
body of knowledge on the application of multimodal foundation models to road safety and
provide a basis for future work on integrating such models into real-time driver monitoring
systems.
