Name: VICTOR NASCIMENTO NEVES
Publication date: 02/09/2025
Examining board:
| Name |
Role |
|---|---|
| ALBERTO FERREIRA DE SOUZA | Presidente |
| ANDRE GEORGHTON CARDOSO PACHECO | Examinador Interno |
| CLAUDINE SANTOS BADUE | Coorientador |
| FRANCISCO DE ASSIS BOLDT | Examinador Externo |
Summary: This study presents a computational framework for integrating transformer-based Large
Language Models (LLMs) into clinical nephrology to assist in transcription, diagnostic
reasoning, and documentation. Audio recordings of 34 chronic kidney disease (CKD)
consultations were preprocessed, transcribed with Whisper, and refined using GPT-4o to
ensure accurate speaker attribution, medical terminology, and contextual coherence. The
refined transcripts were combined with structured patient data, including laboratory and
imaging results, for GPT-4o to generate diagnostic hypotheses, clinical impressions, and
suggested conducts.
Transcription performance, evaluated with ROUGE-L F1 and cosine similarity, achieved
mean F1 scores of 0.42 for patient phrases, 0.54 for medical team phrases, and 0.60 overall,
with corresponding cosine similarities of 0.76, 0.89, and 0.91. The ingestion of structured
patient data proved essential for improving the robustness of downstream reasoning,
particularly in cases where noisy recordings reduced recall.
Clinical output evaluation, performed by OpenBioLLM-70B and DeepSeek v3, classified
most results as either aligned with physician assessments or valid alternative interpretations,
with some instances where one evaluator preferred the physician’s output and the other
preferred GPT-4o’s. DeepSeek v3, a higher-capacity generalist model, demonstrated more
consistent and stringent evaluations than the smaller, domain-trained OpenBioLLM-70B,
suggesting that model size and reasoning ability can outweigh domain-specific training for
evaluation purposes.
The findings demonstrate that LLM-based pipelines can reduce clinicians’ administrative
workload while maintaining clinically relevant output. The proposed approach shows
potential for deployment in real-world workflows, provided that human oversight and
domain-expert validation remain integral to the process.
