HOME   |    PDF   |   


Title

Prompt engineering and diagnostic accuracy of multimodal large language models in thyroid fine‑needle aspiration cytology

 

Authors

Bibhas Saha Dalal1,*, Kaushik Mukhopadhyay2, Dwaipayan Roy3, Souvik Bhattacharya1, Indranil Chakrabarti1 & Santosh Kumar Mondal1

 

Affiliation

1Department of Pathology, All India Institute of Medical Sciences (AIIMS), Kalyani, West Bengal, India; 2Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Kalyani, West Bengal, India; 3Department of Computational and Data Sciences, Indian Institute of Science Education and Research (IISER), Kolkata, West Bengal, India; *Corresponding author

 

Email

Bibhas Saha Dalal - E-mail: bibhas.patho@aiimskalyani.edu.in

Kaushik Mukhopadhyay - E-mail: kaushik.pharm@aiimskalyani.edu.in

Dwaipayan Roy - E-mail: dwaipayan.roy@iiserkol.ac.in

Souvik Bhattacharya - E-mail: souvik.patho_pgt23@aiimskalyani.edu.in

Indranil Chakrabarti - E-mail: indranil.patho@aiimskalyani.edu.in

Santosh Kumar Mondal - E-mail: santosh.path@aiimskalyani.edu.in

 

Article Type

Research Article

 

Date

Received June 30, 2025; Revised June 30, 2025; Accepted June 30, 2025, Published June 30, 2025

 

Abstract

Role of Large language models (LLMs) in fine-needle aspiration cytology (FNAC) image analysis remain uncertain. We evaluated two LLMs - Chat GPT-4o (OpenAI) and Claude 3.5 Sonnet (Anthropic) on 63 thyroid FNAC cases, each represented by eight microscopic images (Pap and MGG, 10×/40×), using generic and structured prompts. Structured prompts improved Bethesda concordance and near-match rates but inter-rater agreement remained poor (κ ≤ 0.09). Specificity reached 100% with structured prompts, but sensitivity dropped to ≤11.8% and misclassification persisted. LLMs show potential, but domain-specific training and validation are necessary for clinical use.

 

Keywords

Fine-needle aspiration cytology, large language models, Thyroid nodule, Artificial intelligence, Prompt engineering, Diagnostic accuracy

 

Citation

Dalal et al. Bioinformation 21(6): 1317-1323 (2025)

 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.