HealthGPT-Pro

✨ Features

HealthGPT-Pro preserves broad instruction-following ability while extending Qwen3-VL to diverse medical modalities and tasks.

Multimodal Input Support

Processes medical text, 2D medical images, and 3D volumetric data in a unified model interface.

Efficient Training

Uses a two-stage training recipe with 3M alignment samples and 10M SFT samples.

Instruction Following

Retains a substantial proportion of general data to maintain general instruction-following ability.

Comprehensive Tasks

Trained on diverse medical and general tasks for strong text-based and vision-language performance.

SoTA Performance

HealthGPT-Pro-8B reaches 61.3 average on text benchmarks and 69.0 average on multimodal benchmarks.

🖼️ Modality Coverage

Computed Tomography Digital Photography Fundus Photography Infrared Reflectance Imaging Magnetic Resonance Imaging Optical Coherence Tomography Dermoscopy Endoscopy Microscopy X-ray Imaging Ultrasound Imaging Histopathology Colposcopy Text

📊 Performance

Bold marks the best result in each benchmark; underlined values mark the second-best result.

📝 Medical Text Benchmarks

Average: HealthGPT-Pro-8B 61.3

Model	MMLU-Med	MMLU-Pro-Med	MMedBench	MedBullets	MedMCQA	MedQA	MedXpertQA-Text	PubMedQA	SuperGPQA-Medical	Avg.
Qwen3-VL-4B	74.3	50.7	60.5	46.4	56.0	60.5	12.6	75.6	29.6	51.8
Qwen3-VL-8B	79.8	57.4	65.9	51.3	61.1	65.9	12.8	76.2	30.2	55.6
Lingshu-7B	75.8	53.5	64.5	57.8	56.6	64.4	16.9	76.8	29.9	55.1
HealthGPT-14B	80.2	63.4	63.2	39.8	63.4	66.2	11.3	68.0	25.7	53.5
HuatuoGPT-V-34B	74.7	51.8	60.7	42.7	54.7	58.8	11.4	54.7	26.5	48.4
Hulu-Med-4B	78.6	58.6	66.7	59.4	64.8	71.9	16.8	77.6	29.5	58.2
Hulu-Med-7B	79.5	60.6	72.8	61.5	67.6	73.5	19.6	77.4	31.1	60.4
HealthGPT-Pro-4B	80.4	58.4	71.6	58.0	64.4	71.5	16.2	78.4	31.4	58.9
HealthGPT-Pro-8B	83.1	64.1	71.4	60.6	68.5	71.3	18.3	79.2	35.4	61.3

🖼️ Medical Multimodal Benchmarks

Average: HealthGPT-Pro-8B 69.0

Model	MMMU-Med	VQA-RAD	SLAKE	PathVQA	MedXpertQA-Multimodal	MedFrameQA	OmniMedVQA-Mini	PMC-VQA	M3D-MCQ	CT-RATE-MCQ	AMOS-MM-MCQ	Avg.
Qwen3-VL-4B	44.3	59.9	77.0	53.0	13.4	40.6	74.7	53.0	57.2	58.8	49.2	52.8
Qwen3-VL-8B	46.5	63.4	80.2	58.3	18.7	46.4	73.0	55.6	59.5	61.6	51.2	55.9
Lingshu-7B	47.3	66.7	81.9	61.0	25.5	52.6	82.4	57.2	64.1	68.3	62.7	60.9
HealthGPT-14B	45.5	62.6	64.2	56.0	24.1	45.3	70.2	56.4	55.2	57.3	46.5	53.0
HuatuoGPT-V-34B	50.1	60.3	68.3	47.7	21.5	49.6	69.7	56.6	50.1	54.9	48.7	52.5
Hulu-Med-4B	45.8	72.6	81.7	59.7	24.6	54.2	75.1	53.1	76.0	70.1	69.1	62.0
Hulu-Med-7B	50.5	77.2	85.8	64.2	28.3	57.4	77.7	57.3	80.4	76.2	70.5	66.0
HealthGPT-Pro-4B	52.0	76.6	83.9	66.7	20.8	61.4	78.2	60.0	81.0	86.2	71.1	67.1
HealthGPT-Pro-8B	54.7	78.4	85.0	70.7	25.3	63.6	80.2	61.1	81.6	86.0	72.2	69.0

📚 Citation

If you find this model useful for your research, please cite:

BibTeX HealthGPT

@misc{lin2025healthgptmedicallargevisionlanguage,
  title={HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation},
  author={Tianwei Lin and Wenqiao Zhang and Sijing Li and Yuqian Yuan and Binhe Yu and Haoyuan Li and Wanggui He and Hao Jiang and Mengze Li and Xiaohui Song and Siliang Tang and Jun Xiao and Hui Lin and Yueting Zhuang and Beng Chin Ooi},
  year={2025},
  eprint={2502.09838},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2502.09838},
}