From 4c54204bb06d19390884dd632444434277dce86b Mon Sep 17 00:00:00 2001 From: Toby Roseman Date: Tue, 7 Nov 2023 12:13:31 -0800 Subject: [PATCH 1/2] Compiled Models in Python --- README.md | 6 +++--- python_coreml_stable_diffusion/coreml_model.py | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index ebe1d008..54d6ad05 100644 --- a/README.md +++ b/README.md @@ -429,7 +429,7 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi - `--refiner-version`: The refiner version name as published on the [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion). This is optional and if specified, this argument will convert and bundle the refiner unet alongside the model unet. -- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. +- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. However using these compiled models in Python will significantly speed up inference. - `--quantize-nbits`: Quantizes the weights of unet and text_encoder models down to 2, 4, 6 or 8 bits using a globally optimal k-means clustering algorithm. By default all models are weight-quantized to 16 bits even if this argument is not specified. Please refer to [this section](#compression-6-bits-and-higher for details and further guidance on weight compression. @@ -455,11 +455,11 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi Run text-to-image generation using the example Python pipeline based on [diffusers](https://github.com/huggingface/diffusers): ```shell -python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i -o --compute-unit ALL --seed 93 +python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i -o --compute-unit ALL --seed 93 ``` Please refer to the help menu for all available arguments: `python -m python_coreml_stable_diffusion.pipeline -h`. Some notable arguments: -- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you had specified `--bundle-resources-for-swift-cli` during conversion, then `-i` should point to the resulting `Resources` folder which holds the compiled `.mlmodelc` files. The compiled models load much faster after first use. +- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). The compiled models load much faster after first use. - `--model-version`: If you overrode the default model version while converting models to Core ML, you will need to specify the same model version here. - `--compute-unit`: Note that the most performant compute unit for this particular implementation may differ across different hardware. `CPU_AND_GPU` or `CPU_AND_NE` may be faster than `ALL`. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance. - `--scheduler`: If you would like to experiment with different schedulers, you may specify it here. For available options, please see the help menu. You may also specify a custom number of inference steps by `--num-inference-steps` which defaults to 50. diff --git a/python_coreml_stable_diffusion/coreml_model.py b/python_coreml_stable_diffusion/coreml_model.py index 9413ea4b..23f399f8 100644 --- a/python_coreml_stable_diffusion/coreml_model.py +++ b/python_coreml_stable_diffusion/coreml_model.py @@ -159,8 +159,8 @@ def _load_mlpackage(submodule_name, logger.info(f"Loading {submodule_name} mlmodelc") # FixMe: Submodule names and compiled resources names differ. Can change if names match in the future. - submodule_names = ["text_encoder", "text_encoder_2", "unet", "vae_decoder"] - compiled_names = ['TextEncoder', 'TextEncoder2', 'Unet', 'VAEDecoder', 'VAEEncoder'] + submodule_names = ["text_encoder", "text_encoder_2", "unet", "vae_decoder", "vae_encoder", "safety_checker"] + compiled_names = ['TextEncoder', 'TextEncoder2', 'Unet', 'VAEDecoder', 'VAEEncoder', 'SafetyChecker'] name_map = dict(zip(submodule_names, compiled_names)) cname = name_map[submodule_name] + '.mlmodelc' From 46c94e8a86e8e1e2ba62dee77dd4a4bae05b145b Mon Sep 17 00:00:00 2001 From: Toby Roseman Date: Tue, 7 Nov 2023 12:46:13 -0800 Subject: [PATCH 2/2] Link to Core ML documentation about why compiled models are faster --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 54d6ad05..dfc01e6f 100644 --- a/README.md +++ b/README.md @@ -429,7 +429,7 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi - `--refiner-version`: The refiner version name as published on the [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion). This is optional and if specified, this argument will convert and bundle the refiner unet alongside the model unet. -- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. However using these compiled models in Python will significantly speed up inference. +- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. [However using these compiled models in Python will significantly speed up inference](https://apple.github.io/coremltools/docs-guides/source/model-prediction.html#why-use-a-compiled-model). - `--quantize-nbits`: Quantizes the weights of unet and text_encoder models down to 2, 4, 6 or 8 bits using a globally optimal k-means clustering algorithm. By default all models are weight-quantized to 16 bits even if this argument is not specified. Please refer to [this section](#compression-6-bits-and-higher for details and further guidance on weight compression. @@ -459,7 +459,7 @@ python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astron ``` Please refer to the help menu for all available arguments: `python -m python_coreml_stable_diffusion.pipeline -h`. Some notable arguments: -- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). The compiled models load much faster after first use. +- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). [The compiled models load much faster after first use](https://apple.github.io/coremltools/docs-guides/source/model-prediction.html#why-use-a-compiled-model). - `--model-version`: If you overrode the default model version while converting models to Core ML, you will need to specify the same model version here. - `--compute-unit`: Note that the most performant compute unit for this particular implementation may differ across different hardware. `CPU_AND_GPU` or `CPU_AND_NE` may be faster than `ALL`. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance. - `--scheduler`: If you would like to experiment with different schedulers, you may specify it here. For available options, please see the help menu. You may also specify a custom number of inference steps by `--num-inference-steps` which defaults to 50.