Skip to content

models MedImageParse3D

github-actions[bot] edited this page Mar 1, 2025 · 1 revision

MedImageParse3D

Overview

Overview

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. 3D medical images such as CT and MRI play unique roles in clinical practices. MedImageParse 3D is a foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 3D medical images including CT and MRI. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object.

MedImageParse 3D was trained on a large dataset comprising triples of image, segmentation mask, and textual description. It takes in 3D medical image volume with a text prompt about the target object type (e.g. pancreas in CT), and outputs the corresponding segmentation mask in 3D volume the same shape as the input image. MedImageParse 3D is also able to identify invalid user inputs describing objects that do not exist in the image. MedImageParse 3D can perform object detection, which aims to locate a specific object of interest, including objects with irregular shapes or of small size.

Traditional segmentation models do segmentation alone, requiring a fully supervised mask during training and typically need either manual bounding boxes or automatic proposals at inference if multiple objects are present. Such model doesn’t inherently know which object to segment unless trained specifically for that class, and it can’t take a text query to switch targets. MedImageParse 3D can segment via text prompts describing the object without needing a user-drawn bounding box. This semantic prompt-based approach lets it parse the image and find relevant objects anywhere in the image.

In summary, MedImageParse shows potential to be a building block for an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition. It is broadly applicable to different 3D image modalities through text prompting, which may pave a future path for efficient and accurate image-based biomedical discovery when built upon and integrated into an application.

Model Architecture

MedImageParse 3D is built upon BiomedParse with the BoltzFormer architecture, optimized for locating small objects in 3D images. Leveraging Boltzmann attention sampling mechanisms, it excels at identifying subtle patterns corresponding to biomedical terminologies, as well as extracting contextually relevant information from dense scientific texts. The model is pre-trained on vast 3D medical image datasets, allowing it to generalize across various biomedical domains with high accuracy.

Version: 1

Tags

Preview Featured task : image-segmentation industry : health-and-life-sciences displayName : MedImageParse3D author : Microsoft hiddenlayerscanned SharedComputeCapacityEnabled license : mit languages : en `evaluation :

We benchmarked MedImageParse 3D against task-specific nnU-Net models on AMOS22 CT and MRI datasets. Note that we trained a single model to solve all different tasks solely via text prompting, e.g. "gallbladder in abdomen MRI", while nnU-Net was trained as multiple expert models for each individual object in each modality. Therefore, we made this comparison of one single model v.s. 27 task-specific models.

CT

Dice score (%) aorta bladder duodenum esophagus gallbladder left adrenal gland left kidney liver pancreas IVC right adrenal gland right kidney spleen stomach Average
BiomedParse3D 95.27 90.17 83.27 87.11 85.96 79.48 96.39 97.71 88.42 92.02 79.39 96.88 96.91 91.49 90.00
nnU-Net 95.20 87.52 80.72 87.31 83.06 78.06 95.39 96.09 86.57 90.38 78.24 93.19 96.91 89.79 88.35
SegVol 92.07 88.03 72.49 64.47 79.05 76.31 94.58 96.24 80.97 83.65 71.07 92.92 94.03 88.82 83.75

MRI

Dice score (%) aorta duodenum esophagus gallbladder left adrenal gland left kidney liver pancreas IVC right adrenal gland right kidney spleen stomach Average
BiomedParse3D 95.73 76.03 81.38 66.58 63.35 96.92 97.65 88.70 87.26 68.14 96.69 96.88 88.93 84.94
nnU-Net 95.64 66.78 73.62 66.32 57.15 95.82 97.25 79.29 90.66 53.29 85.48 96.66 88.80 80.52

notes :

Intended Use

Primary Use Cases

  • Supported Data Input Format
  1. The model expect 3D NIfTI images by default.
  2. The model outputs pixel probabilities in the same shape as the input image. The probability threshold for segmentation mask is 0.5.
  3. The model takes in text prompts for segmentation and doesn't have a fixed number of targets to handle. However, to ensure quality performance, we recommend the following tasks based on evaluation results. Wil will extend the model capability with more object types including tumors and nodules.
  • CT: abdomen: adrenal gland, aorta, bladder, duodenum, esophagus, gallbladder, kidney, left adrenal gland, left kidney, liver, pancreas, postcava, right adrenal gland, right kidney, spleen, stomach
  • MRI: abdomen: aorta, esophagus, gallbladder, kidney, left kidney, liver, pancreas, postcava, right kidney, spleen, stomach

Out-of-Scope Use Cases

This model is intended and provided as-is for research and model development exploration. MedImageParse 3D is not designed or intended to be deployed in clinical settings as-is nor is it intended for use in the diagnosis or treatment of any health or medical condition, and the model’s performance for such purposes has not been established. You bear sole responsibility and liability for any use of MedImageParse 3D, including verification of outputs and incorporation into any product or service intended for a medical purpose or to inform clinical decision-making, compliance with applicable healthcare laws and regulations, and obtaining any necessary clearances or approvals.

Responsible AI Considerations

Microsoft believes Responsible AI is a shared responsibility and we have identified six principles and practices to help organizations address risks, innovate, and create value: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant use case and addresses unforeseen product misuse. 

While testing the model with images and/or text, ensure that the data is PHI free and that there are no patient information or information that can be tracked to a patient identity.

The model is not designed for the following use cases:

  • Use by clinicians to inform clinical decision-making, as a diagnostic tool or as a medical device - Although MedImageParse 3D is highly accurate in parsing biomedical data, it is not designed or intended to be deployed in clinical settings as-is not is it for use in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions (including to support clinical decision-making), or as a substitute of professional medical advice, diagnosis, treatment, or clinical judgment of a healthcare professional. 

  • Scenarios without consent for data - Any scenario that uses health data for a purpose for which consent was not obtained.  

  • Use outside of health scenarios - Any scenario that uses non-medical related image and/or serving purposes outside of the healthcare domain. 

Please see Microsoft's Responsible AI Principles and approach available at https://www.microsoft.com/en-us/ai/principles-and-approach/

Training Data

The training data include AMOS22-CT, AMOS22-MRI, MSD, LIDC-IDRI.

License and where to send questions or comments about the model

Please cite our Paper if you use the model for your research. For questions or comments, please contact: [email protected]

Citation

Zhao, T., Gu, Y., Yang, J. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02499-w inputModalities : image outputModalities : text,image keywords : Multimodal inference_compute_allow_list : ['Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2', 'Standard_NC40ads_H100_v5', 'Standard_NC80adis_H100_v5', 'Standard_ND96isr_H100_v5'] inference_supported_envs : ['hf']`

View in Studio: https://ml.azure.com/registries/azureml/models/MedImageParse3D/version/1

License: mit

Properties

inference-min-sku-spec: 24|1|220|64

inference-recommended-sku: Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2, Standard_NC40ads_H100_v5, Standard_NC80adis_H100_v5, Standard_ND96isr_H100_v5

languages: en

SharedComputeCapacityEnabled: True

Clone this wiki locally