๐Ÿฉบ Medical Multimodal Large Language Model

HealthGPT-Pro

A high-performance multimodal large language model for medical understanding and analysis, supporting medical text, 2D medical images, and 3D medical volumes.

For research use only. HealthGPT-Pro should not be used as a substitute for professional clinical judgment, diagnosis, or treatment.

โœจ Features

HealthGPT-Pro preserves broad instruction-following ability while extending Qwen3-VL to diverse medical modalities and tasks.

Multimodal Input Support

Processes medical text, 2D medical images, and 3D volumetric data in a unified model interface.

Efficient Training

Uses a two-stage training recipe with 3M alignment samples and 10M SFT samples.

Instruction Following

Retains a substantial proportion of general data to maintain general instruction-following ability.

Comprehensive Tasks

Trained on diverse medical and general tasks for strong text-based and vision-language performance.

SoTA Performance

HealthGPT-Pro-8B reaches 61.3 average on text benchmarks and 69.0 average on multimodal benchmarks.

๐Ÿ–ผ๏ธ Modality Coverage

Computed Tomography Digital Photography Fundus Photography Infrared Reflectance Imaging Magnetic Resonance Imaging Optical Coherence Tomography Dermoscopy Endoscopy Microscopy X-ray Imaging Ultrasound Imaging Histopathology Colposcopy Text

๐Ÿ“Š Performance

Bold marks the best result in each benchmark; underlined values mark the second-best result.

๐Ÿ“ Medical Text Benchmarks

Average: HealthGPT-Pro-8B 61.3
Model MMLU-Med MMLU-Pro-Med MMedBench MedBullets MedMCQA MedQA MedXpertQA-Text PubMedQA SuperGPQA-Medical Avg.
Qwen3-VL-4B74.350.760.546.456.060.512.675.629.651.8
Qwen3-VL-8B79.857.465.951.361.165.912.876.230.255.6
Lingshu-7B75.853.564.557.856.664.416.976.829.955.1
HealthGPT-14B80.263.463.239.863.466.211.368.025.753.5
HuatuoGPT-V-34B74.751.860.742.754.758.811.454.726.548.4
Hulu-Med-4B78.658.666.759.464.871.916.877.629.558.2
Hulu-Med-7B79.560.672.861.567.673.519.677.431.160.4
HealthGPT-Pro-4B80.458.471.658.064.471.516.278.431.458.9
HealthGPT-Pro-8B83.164.171.460.668.571.318.379.235.461.3

๐Ÿ–ผ๏ธ Medical Multimodal Benchmarks

Average: HealthGPT-Pro-8B 69.0
Model MMMU-Med VQA-RAD SLAKE PathVQA MedXpertQA-Multimodal MedFrameQA OmniMedVQA-Mini PMC-VQA M3D-MCQ CT-RATE-MCQ AMOS-MM-MCQ Avg.
Qwen3-VL-4B44.359.977.053.013.440.674.753.057.258.849.252.8
Qwen3-VL-8B46.563.480.258.318.746.473.055.659.561.651.255.9
Lingshu-7B47.366.781.961.025.552.682.457.264.168.362.760.9
HealthGPT-14B45.562.664.256.024.145.370.256.455.257.346.553.0
HuatuoGPT-V-34B50.160.368.347.721.549.669.756.650.154.948.752.5
Hulu-Med-4B45.872.681.759.724.654.275.153.176.070.169.162.0
Hulu-Med-7B50.577.285.864.228.357.477.757.380.476.270.566.0
HealthGPT-Pro-4B52.076.683.966.720.861.478.260.081.086.271.167.1
HealthGPT-Pro-8B54.778.485.070.725.363.680.261.181.686.072.269.0

๐Ÿ“š Citation

If you find this model useful for your research, please cite:

BibTeX HealthGPT
@misc{lin2025healthgptmedicallargevisionlanguage,
  title={HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation},
  author={Tianwei Lin and Wenqiao Zhang and Sijing Li and Yuqian Yuan and Binhe Yu and Haoyuan Li and Wanggui He and Hao Jiang and Mengze Li and Xiaohui Song and Siliang Tang and Jun Xiao and Hui Lin and Yueting Zhuang and Beng Chin Ooi},
  year={2025},
  eprint={2502.09838},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2502.09838},
}