From 4c54204bb06d19390884dd632444434277dce86b Mon Sep 17 00:00:00 2001
From: Toby Roseman <troseman@apple.com>
Date: Tue, 7 Nov 2023 12:13:31 -0800
Subject: [PATCH 1/2] Compiled Models in Python

---
 README.md                                      | 6 +++---
 python_coreml_stable_diffusion/coreml_model.py | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index ebe1d008..54d6ad05 100644
--- a/README.md
+++ b/README.md
@@ -429,7 +429,7 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi
 
 - `--refiner-version`: The refiner version name as published on the [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion). This is optional and if specified, this argument will convert and bundle the refiner unet alongside the model unet.
 
-- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `<output-mlpackages-directory>/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline.
+- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `<output-mlpackages-directory>/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. However using these compiled models in Python will significantly speed up inference.
 
 - `--quantize-nbits`: Quantizes the weights of unet and text_encoder models down to 2, 4, 6 or 8 bits using a globally optimal k-means clustering algorithm. By default all models are weight-quantized to 16 bits even if this argument is not specified. Please refer to [this section](#compression-6-bits-and-higher for details and further guidance on weight compression.
 
@@ -455,11 +455,11 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi
 Run text-to-image generation using the example Python pipeline based on [diffusers](https://github.com/huggingface/diffusers):
 
 ```shell
-python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i <output-mlpackages-directory> -o </path/to/output/image> --compute-unit ALL --seed 93
+python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i <core-ml-model-directory> -o </path/to/output/image> --compute-unit ALL --seed 93
 ```
 Please refer to the help menu for all available arguments: `python -m python_coreml_stable_diffusion.pipeline -h`. Some notable arguments:
 
-- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you had specified `--bundle-resources-for-swift-cli` during conversion, then `-i` should point to the resulting `Resources` folder which holds the compiled `.mlmodelc` files. The compiled models load much faster after first use.
+- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). The compiled models load much faster after first use.
 - `--model-version`: If you overrode the default model version while converting models to Core ML, you will need to specify the same model version here.
 - `--compute-unit`: Note that the most performant compute unit for this particular implementation may differ across different hardware. `CPU_AND_GPU` or `CPU_AND_NE` may be faster than `ALL`. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance.
 - `--scheduler`: If you would like to experiment with different schedulers, you may specify it here. For available options, please see the help menu. You may also specify a custom number of inference steps by `--num-inference-steps` which defaults to 50.
diff --git a/python_coreml_stable_diffusion/coreml_model.py b/python_coreml_stable_diffusion/coreml_model.py
index 9413ea4b..23f399f8 100644
--- a/python_coreml_stable_diffusion/coreml_model.py
+++ b/python_coreml_stable_diffusion/coreml_model.py
@@ -159,8 +159,8 @@ def _load_mlpackage(submodule_name,
         logger.info(f"Loading {submodule_name} mlmodelc")
 
         # FixMe: Submodule names and compiled resources names differ. Can change if names match in the future.
-        submodule_names = ["text_encoder", "text_encoder_2", "unet", "vae_decoder"]
-        compiled_names = ['TextEncoder', 'TextEncoder2', 'Unet', 'VAEDecoder', 'VAEEncoder']
+        submodule_names = ["text_encoder", "text_encoder_2", "unet", "vae_decoder", "vae_encoder", "safety_checker"]
+        compiled_names = ['TextEncoder', 'TextEncoder2', 'Unet', 'VAEDecoder', 'VAEEncoder', 'SafetyChecker']
         name_map = dict(zip(submodule_names, compiled_names))
 
         cname = name_map[submodule_name] + '.mlmodelc'

From 46c94e8a86e8e1e2ba62dee77dd4a4bae05b145b Mon Sep 17 00:00:00 2001
From: Toby Roseman <troseman@apple.com>
Date: Tue, 7 Nov 2023 12:46:13 -0800
Subject: [PATCH 2/2] Link to Core ML documentation about why compiled models
 are faster

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 54d6ad05..dfc01e6f 100644
--- a/README.md
+++ b/README.md
@@ -429,7 +429,7 @@ This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful executi
 
 - `--refiner-version`: The refiner version name as published on the [Hugging Face Hub](https://huggingface.co/models?search=stable-diffusion). This is optional and if specified, this argument will convert and bundle the refiner unet alongside the model unet.
 
-- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `<output-mlpackages-directory>/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. However using these compiled models in Python will significantly speed up inference.
+- `--bundle-resources-for-swift-cli`: Compiles all 4 models and bundles them along with necessary resources for text tokenization into `<output-mlpackages-directory>/Resources` which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline. [However using these compiled models in Python will significantly speed up inference](https://apple.github.io/coremltools/docs-guides/source/model-prediction.html#why-use-a-compiled-model).
 
 - `--quantize-nbits`: Quantizes the weights of unet and text_encoder models down to 2, 4, 6 or 8 bits using a globally optimal k-means clustering algorithm. By default all models are weight-quantized to 16 bits even if this argument is not specified. Please refer to [this section](#compression-6-bits-and-higher for details and further guidance on weight compression.
 
@@ -459,7 +459,7 @@ python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astron
 ```
 Please refer to the help menu for all available arguments: `python -m python_coreml_stable_diffusion.pipeline -h`. Some notable arguments:
 
-- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). The compiled models load much faster after first use.
+- `-i`: Should point to the `-o` directory from Step 4 of [Converting Models to Core ML](#converting-models-to-coreml) section from above. If you specified `--bundle-resources-for-swift-cli` during conversion, then use the resulting `Resources` folder (which holds the compiled `.mlmodelc` files). [The compiled models load much faster after first use](https://apple.github.io/coremltools/docs-guides/source/model-prediction.html#why-use-a-compiled-model).
 - `--model-version`: If you overrode the default model version while converting models to Core ML, you will need to specify the same model version here.
 - `--compute-unit`: Note that the most performant compute unit for this particular implementation may differ across different hardware. `CPU_AND_GPU` or `CPU_AND_NE` may be faster than `ALL`. Please refer to the [Performance Benchmark](#performance-benchmark) section for further guidance.
 - `--scheduler`: If you would like to experiment with different schedulers, you may specify it here. For available options, please see the help menu. You may also specify a custom number of inference steps by `--num-inference-steps` which defaults to 50.