From 32c75c226cc39c1aae833b404a2b38c3daf1cafb Mon Sep 17 00:00:00 2001 From: Iden Kalemaj Date: Tue, 17 Dec 2024 08:25:17 -0800 Subject: [PATCH 01/10] Add LoRA to the BERT fine-tuning tutorial (#698) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/698 Update the BERT fine-tuning tutorial to show how LoRA can be used with DP-SGD. Reviewed By: HuanyuZhang Differential Revision: D67281956 fbshipit-source-id: e7f099ba2e7e816de96cd61f4adff4ab84e9e7d5 --- tutorials/building_text_classifier.ipynb | 4468 ++++++++++++++++++---- 1 file changed, 3660 insertions(+), 808 deletions(-) diff --git a/tutorials/building_text_classifier.ipynb b/tutorials/building_text_classifier.ipynb index a8a3fa45..585d54d7 100644 --- a/tutorials/building_text_classifier.ipynb +++ b/tutorials/building_text_classifier.ipynb @@ -1,815 +1,3667 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Building text classifier with Differential Privacy" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this tutorial, we will train a text classifier with Differential Privacy by taking a model pre-trained on public text data and fine-tuning it for a different task.\n", - "\n", - "When training a model with differential privacy, we almost always face a trade-off between model size and accuracy on the task. The exact details depend on the problem, but a rule of thumb is that the fewer parameters the model has, the easier it is to get good performance with DP.\n", - "\n", - "Most state-of-the-art NLP models are quite deep and large (e.g. [BERT-base](https://github.com/google-research/bert) has over 100M parameters), which makes the task of training text models on private datasets rather challenging.\n", - "\n", - "One way of addressing this problem is to divide the training process into two stages. First, we will pre-train the model on a public dataset, exposing the model to generic text data. Assuming that the generic text data is public, we will not be using differential privacy at this step. Then, we freeze most of the layers, leaving only a few upper layers to be trained on the private dataset using DP-SGD. This way we can get the best of both worlds - we have a deep and powerful text understanding model, while only training a small number of parameters with differentially private algorithm.\n", - "\n", - "In this tutorial, we will take the pre-trained [BERT-base](https://github.com/google-research/bert) model and fine-tune it to recognize textual entailment on the [SNLI](https://nlp.stanford.edu/projects/snli/) dataset.\n", - "\n", - "We also fine-tune it with Ghost Clipping DP-SGD, a memory-efficient implementation of DP-SGD, which enables the use of large batch sizes. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, we need to download the dataset (we'll use Stanford NLP mirror)" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "STANFORD_SNLI_URL = \"https://nlp.stanford.edu/projects/snli/snli_1.0.zip\"\n", - "DATA_DIR = \"data\"" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Downloading and extracting ...\n", - "Completed!\n" - ] - } - ], - "source": [ - "import zipfile\n", - "import urllib.request\n", - "import os\n", - "\n", - "import warnings\n", - "warnings.simplefilter(\"ignore\")\n", - "\n", - "def download_and_extract(dataset_url, data_dir):\n", - " print(\"Downloading and extracting ...\")\n", - " filename = \"snli.zip\"\n", - " urllib.request.urlretrieve(dataset_url, filename)\n", - " with zipfile.ZipFile(filename) as zip_ref:\n", - " zip_ref.extractall(data_dir)\n", - " os.remove(filename)\n", - " print(\"Completed!\")\n", - "\n", - "download_and_extract(STANFORD_SNLI_URL, DATA_DIR)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The dataset comes in two formats (`tsv` and `json`) and has already been split into train/dev/test. Let’s verify that’s the case." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['snli_1.0_dev.txt',\n", - " 'README.txt',\n", - " 'snli_1.0_dev.jsonl',\n", - " 'Icon\\r',\n", - " '.DS_Store',\n", - " 'snli_1.0_test.txt',\n", - " 'snli_1.0_train.jsonl',\n", - " 'snli_1.0_test.jsonl',\n", - " 'snli_1.0_train.txt']" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" + "metadata": { + "kernelspec": { + "name": "python3", + "display_name": "python3", + "languaage": "python" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.6" + }, + "colab": { + "provenance": [], + "gpuType": "T4" + }, + "accelerator": "GPU", + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "a6f080fa6f4b4de399af5d1d7850b960": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_47fec328e2464db3861b16e68e6cc65d", + "IPY_MODEL_3ec2b8f4e38d4b05a09c83e6925960a6", + "IPY_MODEL_6fee830ab9f545cea62081d8cf5b3240" + ], + "layout": "IPY_MODEL_cf024e76fe9b4766ac035f617391deb7" + } + }, + "47fec328e2464db3861b16e68e6cc65d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fc0b17bfb44c45dd9825c6f1719b61cc", + "placeholder": "​", + "style": "IPY_MODEL_2dac3a8089c34d0b9ce81653bde67603", + "value": "config.json: 100%" + } + }, + "3ec2b8f4e38d4b05a09c83e6925960a6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ffb2ab66b3ca4d9899658fe58c43acb1", + "max": 570, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_4b6aad944453432dbf957da380d059a1", + "value": 570 + } + }, + "6fee830ab9f545cea62081d8cf5b3240": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cf3dae1b960440d7984e9ed287b54cee", + "placeholder": "​", + "style": "IPY_MODEL_36254c0a04c840f3bf4038096c736873", + "value": " 570/570 [00:00<00:00, 33.8kB/s]" + } + }, + "cf024e76fe9b4766ac035f617391deb7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fc0b17bfb44c45dd9825c6f1719b61cc": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2dac3a8089c34d0b9ce81653bde67603": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ffb2ab66b3ca4d9899658fe58c43acb1": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4b6aad944453432dbf957da380d059a1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "cf3dae1b960440d7984e9ed287b54cee": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "36254c0a04c840f3bf4038096c736873": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e3ffd50ee822433fabd9c1ee4a39612e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f248160605a2450a8411e4f5d58a5cfa", + "IPY_MODEL_a7b6e7aa521647649bb4157b6504d4e8", + "IPY_MODEL_1c1e86bca0534caaa7ad435fd7e67bf2" + ], + "layout": "IPY_MODEL_22ca9e6c6c1f4bc6b0f3db1a09a5e562" + } + }, + "f248160605a2450a8411e4f5d58a5cfa": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e9699698559c4860bdf6a312c492e7da", + "placeholder": "​", + "style": "IPY_MODEL_874e45fe39844927ad1fd10d4899a428", + "value": "tokenizer_config.json: 100%" + } + }, + "a7b6e7aa521647649bb4157b6504d4e8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2dca6b2477344b45b1c17f124e27ce72", + "max": 49, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c7afefe6f907441b9e466605cb4f5c7f", + "value": 49 + } + }, + "1c1e86bca0534caaa7ad435fd7e67bf2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_df3c4c12e06245b1a8f4a4a7d71a530c", + "placeholder": "​", + "style": "IPY_MODEL_e63eb3e5c06140249f6d8d4c04fe8693", + "value": " 49.0/49.0 [00:00<00:00, 2.50kB/s]" + } + }, + "22ca9e6c6c1f4bc6b0f3db1a09a5e562": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e9699698559c4860bdf6a312c492e7da": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "874e45fe39844927ad1fd10d4899a428": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2dca6b2477344b45b1c17f124e27ce72": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c7afefe6f907441b9e466605cb4f5c7f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "df3c4c12e06245b1a8f4a4a7d71a530c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e63eb3e5c06140249f6d8d4c04fe8693": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d951c3592058414ab00cf754e9b70685": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c3f9146e082346a3bed274efb2265376", + "IPY_MODEL_1602c2298e9443f78007fdbf101a0c2b", + "IPY_MODEL_d5dda55bd4de4f12bf3718fb386c5bf9" + ], + "layout": "IPY_MODEL_943e61866ed74be4b10ba383450cb4c3" + } + }, + "c3f9146e082346a3bed274efb2265376": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0727d77eaf28466c93c2c6021661ac9a", + "placeholder": "​", + "style": "IPY_MODEL_3ab2e1a9ba694463ab5f3ec78ad0a8f4", + "value": "vocab.txt: 100%" + } + }, + "1602c2298e9443f78007fdbf101a0c2b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_30b4db204fa644128198abf6d82664bf", + "max": 213450, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0517fdac88784fe6b51ad1f989f99cb7", + "value": 213450 + } + }, + "d5dda55bd4de4f12bf3718fb386c5bf9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ce1c55bc51dc49a9b261f104f49d38d8", + "placeholder": "​", + "style": "IPY_MODEL_4b7e11bb32bc43c9ad0449bd39bc4d40", + "value": " 213k/213k [00:00<00:00, 613kB/s]" + } + }, + "943e61866ed74be4b10ba383450cb4c3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0727d77eaf28466c93c2c6021661ac9a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3ab2e1a9ba694463ab5f3ec78ad0a8f4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "30b4db204fa644128198abf6d82664bf": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0517fdac88784fe6b51ad1f989f99cb7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ce1c55bc51dc49a9b261f104f49d38d8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4b7e11bb32bc43c9ad0449bd39bc4d40": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3dbf36a5c0884579ab2f36c2e91c04fb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d71c49bd9f8c438898250c3874c06240", + "IPY_MODEL_1b70fa16b803466ea31649dcd644e3d7", + "IPY_MODEL_7a728bea623646c182c508e34b582fc9" + ], + "layout": "IPY_MODEL_8494caffe83a4743a11d2751b38c56bb" + } + }, + "d71c49bd9f8c438898250c3874c06240": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1580380472df40e38cbde67659c5221d", + "placeholder": "​", + "style": "IPY_MODEL_349b262479b9418badd6a3acff386dd2", + "value": "tokenizer.json: 100%" + } + }, + "1b70fa16b803466ea31649dcd644e3d7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3370a9d70dd04d5195bc3f1f81b18728", + "max": 435797, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ee235d5ffc5142c895175cbea5c94dfe", + "value": 435797 + } + }, + "7a728bea623646c182c508e34b582fc9": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f6c5ef333a2b45f1bf196fdd58873688", + "placeholder": "​", + "style": "IPY_MODEL_e24ff9dae78241d9b5a6a7199c888e45", + "value": " 436k/436k [00:00<00:00, 1.24MB/s]" + } + }, + "8494caffe83a4743a11d2751b38c56bb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1580380472df40e38cbde67659c5221d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "349b262479b9418badd6a3acff386dd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3370a9d70dd04d5195bc3f1f81b18728": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ee235d5ffc5142c895175cbea5c94dfe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "f6c5ef333a2b45f1bf196fdd58873688": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e24ff9dae78241d9b5a6a7199c888e45": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d4a768f261614ac69b3004fbf2323c89": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_81a7d1b27cd94916ac3c330aa2551cf0", + "IPY_MODEL_eac4d3f8e59a4d4c81178cc76600182f", + "IPY_MODEL_5bcca2ee852144c28bfd40de0978cadc" + ], + "layout": "IPY_MODEL_05fa1e761bc14c929b19abf0d8a93f5f" + } + }, + "81a7d1b27cd94916ac3c330aa2551cf0": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_32ba8daa0c6e4a9c9f62df588a198b1b", + "placeholder": "​", + "style": "IPY_MODEL_e1e93f905b494126a6a9e1e0a2f92022", + "value": "model.safetensors: 100%" + } + }, + "eac4d3f8e59a4d4c81178cc76600182f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9ca2cb3f116547d3bb062f4da11762d7", + "max": 435755784, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_554184f1c9b44bd3a8116773572347eb", + "value": 435755784 + } + }, + "5bcca2ee852144c28bfd40de0978cadc": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bb89b007667e4bf0b696ad84dfb2f91d", + "placeholder": "​", + "style": "IPY_MODEL_610f073056924398b9229978dff5ff4d", + "value": " 436M/436M [00:02<00:00, 177MB/s]" + } + }, + "05fa1e761bc14c929b19abf0d8a93f5f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "32ba8daa0c6e4a9c9f62df588a198b1b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e1e93f905b494126a6a9e1e0a2f92022": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9ca2cb3f116547d3bb062f4da11762d7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "554184f1c9b44bd3a8116773572347eb": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bb89b007667e4bf0b696ad84dfb2f91d": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "610f073056924398b9229978dff5ff4d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } } - ], - "source": [ - "snli_folder = os.path.join(DATA_DIR, \"snli_1.0\")\n", - "os.listdir(snli_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's now take a look inside. [SNLI dataset](https://nlp.stanford.edu/projects/snli/) provides ample syntactic metadata, but we'll only use raw input text. Therefore, the only fields we're interested in are **sentence1** (premise), **sentence2** (hypothesis), and **gold_label** (label chosen by the majority of annotators).\n", - "\n", - "The label defines the relation between premise and hypothesis: either *contradiction*, *neutral*, or *entailment*." - ] }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
sentence1sentence2gold_label
0A person on a horse jumps over a broken down a...A person is training his horse for a competition.neutral
1A person on a horse jumps over a broken down a...A person is at a diner, ordering an omelette.contradiction
2A person on a horse jumps over a broken down a...A person is outdoors, on a horse.entailment
3Children smiling and waving at cameraThey are smiling at their parentsneutral
4Children smiling and waving at cameraThere are children presententailment
\n", - "
" + "nbformat": 4, + "nbformat_minor": 0, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "originalKey": "c21b7ad1-cba1-43cd-b602-42294e10cc9a", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "showInput": false, + "id": "IccO-A2JpH_1" + }, + "source": [ + "# Building a text classifier with Differential Privacy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "bfca9bac-8231-4ebe-a67b-eb070f4a5958", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "showInput": false, + "id": "PDqDnN-FpH_2" + }, + "source": [ + "In this tutorial, we will train a text classifier with Differential Privacy by taking a model pre-trained on public text data and fine-tuning it for a different task.\n", + "\n", + "When training a model with differential privacy, we almost always face a trade-off between model size and accuracy on the task. The exact details depend on the problem, but a rule of thumb is that the fewer parameters the model has, the easier it is to get good performance with DP.\n", + "\n", + "Most state-of-the-art NLP models are quite deep and large (e.g. [BERT-base](https://github.com/google-research/bert) has over 100M parameters), which makes the task of training text models on private datasets rather challenging.\n", + "\n", + "One way of addressing this problem is to divide the training process into two stages. First, we will pre-train the model on a public dataset, exposing the model to generic text data. Assuming that the generic text data is public, we will not be using differential privacy at this step. Then, we freeze most of the layers, leaving only a few upper layers to be trained on the private dataset using DP-SGD. This way we can get the best of both worlds - we have a deep and powerful text understanding model, while only training a small number of parameters with differentially private algorithm.\n", + "\n", + "In this tutorial, we will take the pre-trained [BERT-base](https://github.com/google-research/bert) model and fine-tune it to recognize textual entailment on the [SNLI](https://nlp.stanford.edu/projects/snli/) dataset.\n", + "\n", + "We further demonstrate fine-tuning results with\n", + "\n", + "- Ghost Clipping DP-SGD, a memory-efficient implementation of DP-SGD, which enables the use of large batch sizes.\n", + "- LoRA (low-rank adaptation), a method for parameter-efficienct fine-tuning which can be used in conjucture with DP-SGD to further reduce the number of trainable parameters" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "239e9b8e-09ba-4c61-8bee-d64dd51e73e3", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "PA1qSy0ipH_2" + }, + "source": [ + "## Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "fdcba1fc-0d70-4ca0-b27f-724355d95e7d", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "Fp-i3-N5pH_3" + }, + "source": [ + "First, we need to download the dataset (we'll use Stanford NLP mirror)" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "0b3afde6-52df-4226-acc9-68346e5d91cc", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734022773178, + "executionStopTime": 1734022773601, + "serverExecutionDuration": 2.28899018839, + "requestMsgId": "0b3afde6-52df-4226-acc9-68346e5d91cc", + "id": "dIAKXrvNpH_3" + }, + "source": [ + "STANFORD_SNLI_URL = \"https://nlp.stanford.edu/projects/snli/snli_1.0.zip\"\n", + "DATA_DIR = \"data\"" ], - "text/plain": [ - " sentence1 \\\n", - "0 A person on a horse jumps over a broken down a... \n", - "1 A person on a horse jumps over a broken down a... \n", - "2 A person on a horse jumps over a broken down a... \n", - "3 Children smiling and waving at camera \n", - "4 Children smiling and waving at camera \n", - "\n", - " sentence2 gold_label \n", - "0 A person is training his horse for a competition. neutral \n", - "1 A person is at a diner, ordering an omelette. contradiction \n", - "2 A person is outdoors, on a horse. entailment \n", - "3 They are smiling at their parents neutral \n", - "4 There are children present entailment " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import pandas as pd\n", - "train_path = os.path.join(snli_folder, \"snli_1.0_train.txt\")\n", - "dev_path = os.path.join(snli_folder, \"snli_1.0_dev.txt\")\n", - "\n", - "df_train = pd.read_csv(train_path, sep='\\t')\n", - "df_test = pd.read_csv(dev_path, sep='\\t')\n", - "\n", - "df_train[['sentence1', 'sentence2', 'gold_label']][:5]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art approach to various NLP tasks. It uses a Transformer architecture and relies heavily on the concept of pre-training. \n", - "\n", - "We'll use a pre-trained BERT-base model, provided in the huggingface [transformers](https://github.com/huggingface/transformers) repo.\n", - "It gives us a PyTorch implementation for the classic BERT architecture, as well as a tokenizer and weights, pre-trained on a public English corpus (Wikipedia).\n", - "\n", - "Please follow these [installation instructions](https://github.com/huggingface/transformers#installation) before proceeding." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html\n", - "100%|██████████| 433/433 [00:00<00:00, 455171.34B/s]\n", - "100%|██████████| 213450/213450 [00:00<00:00, 37577090.82B/s]\n", - "100%|██████████| 435779157/435779157 [00:11<00:00, 39433911.33B/s]\n" - ] - } - ], - "source": [ - "from transformers import BertConfig, BertTokenizer, BertForSequenceClassification\n", - "\n", - "model_name = \"bert-base-cased\"\n", - "config = BertConfig.from_pretrained(\n", - " model_name,\n", - " num_labels=3,\n", - ")\n", - "tokenizer = BertTokenizer.from_pretrained(\n", - " \"bert-base-cased\",\n", - " do_lower_case=False,\n", - ")\n", - "model = BertForSequenceClassification.from_pretrained(\n", - " \"bert-base-cased\",\n", - " config=config,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The model has the following structure. It uses a combination of word, positional and token *embeddings* to create a sequence representation, then passes the data through 12 *transformer encoders* and finally uses a *linear classifier* to produce the final label.\n", - "As the model is already pre-trained and we only plan to fine-tune a few upper layers, we want to freeze all layers, except for the last encoder and above (`BertPooler` and `Classifier`)." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "", - "text/plain": [ - "" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from IPython.display import Image\n", - "Image(filename='img/BERT.png')" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Total parameters count: 108312579\n", - "Trainable parameters count: 7680771\n" - ] - } - ], - "source": [ - "trainable_layers = [model.bert.encoder.layer[-1], model.bert.pooler, model.classifier]\n", - "total_params = 0\n", - "trainable_params = 0\n", - "\n", - "for p in model.parameters():\n", - " p.requires_grad = False\n", - " total_params += p.numel()\n", - "\n", - "for layer in trainable_layers:\n", - " for p in layer.parameters():\n", - " p.requires_grad = True\n", - " trainable_params += p.numel()\n", - "\n", - "print(f\"Total parameters count: {total_params}\") # ~108M\n", - "print(f\"Trainable parameters count: {trainable_params}\") # ~7M" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Thus, by using a pre-trained model we reduce the number of trainable params from over 100 million to just above 7.5 million. This will help both performance and convergence with added noise." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prepare the data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Before we begin training, we need to preprocess the data and convert it to the format our model expects. \n", - "\n", - "(Note: it'll take 5-10 minutes to run on a laptop)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "LABEL_LIST = ['contradiction', 'entailment', 'neutral']\n", - "MAX_SEQ_LENGHT = 128\n", - "\n", - "import torch\n", - "import torch.nn as nn\n", - "import transformers\n", - "from torch.utils.data import TensorDataset\n", - "from transformers.data.processors.utils import InputExample\n", - "from transformers.data.processors.glue import glue_convert_examples_to_features\n", - "\n", - "\n", - "def _create_examples(df, set_type):\n", - " \"\"\" Convert raw dataframe to a list of InputExample. Filter malformed examples\n", - " \"\"\"\n", - " examples = []\n", - " for index, row in df.iterrows():\n", - " if row['gold_label'] not in LABEL_LIST:\n", - " continue\n", - " if not isinstance(row['sentence1'], str) or not isinstance(row['sentence2'], str):\n", - " continue\n", - "\n", - " guid = f\"{index}-{set_type}\"\n", - " examples.append(\n", - " InputExample(guid=guid, text_a=row['sentence1'], text_b=row['sentence2'], label=row['gold_label']))\n", - " return examples\n", - "\n", - "def _df_to_features(df, set_type):\n", - " \"\"\" Pre-process text. This method will:\n", - " 1) tokenize inputs\n", - " 2) cut or pad each sequence to MAX_SEQ_LENGHT\n", - " 3) convert tokens into ids\n", - "\n", - " The output will contain:\n", - " `input_ids` - padded token ids sequence\n", - " `attention mask` - mask indicating padded tokens\n", - " `token_type_ids` - mask indicating the split between premise and hypothesis\n", - " `label` - label\n", - " \"\"\"\n", - " examples = _create_examples(df, set_type)\n", - "\n", - " #backward compatibility with older transformers versions\n", - " legacy_kwards = {}\n", - " from packaging import version\n", - " if version.parse(transformers.__version__) < version.parse(\"2.9.0\"):\n", - " legacy_kwards = {\n", - " \"pad_on_left\": False,\n", - " \"pad_token\": tokenizer.convert_tokens_to_ids([tokenizer.pad_token])[0],\n", - " \"pad_token_segment_id\": 0,\n", - " }\n", - "\n", - " return glue_convert_examples_to_features(\n", - " examples=examples,\n", - " tokenizer=tokenizer,\n", - " label_list=LABEL_LIST,\n", - " max_length=MAX_SEQ_LENGHT,\n", - " output_mode=\"classification\",\n", - " **legacy_kwards,\n", - " )\n", - "\n", - "def _features_to_dataset(features):\n", - " \"\"\" Convert features from `_df_to_features` into a single dataset\n", - " \"\"\"\n", - " all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)\n", - " all_attention_mask = torch.tensor(\n", - " [f.attention_mask for f in features], dtype=torch.long\n", - " )\n", - " all_token_type_ids = torch.tensor(\n", - " [f.token_type_ids for f in features], dtype=torch.long\n", - " )\n", - " all_labels = torch.tensor([f.label for f in features], dtype=torch.long)\n", - " dataset = TensorDataset(\n", - " all_input_ids, all_attention_mask, all_token_type_ids, all_labels\n", - " )\n", - "\n", - " return dataset\n", - "\n", - "train_features = _df_to_features(df_train, \"train\")\n", - "test_features = _df_to_features(df_test, \"test\")\n", - "\n", - "train_dataset = _features_to_dataset(train_features)\n", - "test_dataset = _features_to_dataset(test_features)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Choosing batch size\n", - "\n", - "Let's talk about batch sizes for a bit.\n", - "\n", - "In addition to all the considerations you normally take into account when choosing batch size, training models with DP adds another one - privacy cost. \n", - "\n", - "Because of the threat model we assume and the way we add noise to the gradients, larger batch sizes (to a certain extent) generally help convergence. We add the same amount of noise to each gradient update (scaled to the norm of one sample in the batch) regardless of the batch size. What this means is that as the batch size increases, the relative amount of noise added decreases. while preserving the same epsilon guarantee. \n", - "\n", - "You should, however, keep in mind that increasing batch size has its price in terms of epsilon, which grows at `O(sqrt(batch_size))` as we train (therefore larger batches make it grow faster). The good strategy here is to experiment with multiple combinations of `batch_size` and `noise_multiplier` to find the one that provides the best possible quality at acceptable privacy guarantee.\n", - "\n", - "There's another side to this - memory. Opacus computes and stores *per sample* gradients, so for every normal gradient, Opacus will store `n=batch_size` per-sample gradients on each step, thus increasing the memory footprint by at least `O(batch_size)`. In reality, however, the peak memory requirement is `O(batch_size^2)` compared to a non-private model. This is because some intermediate steps in per sample gradient computation involve operations on two matrices, each with batch_size as one of the dimensions.\n", - "\n", - "The good news is, we can pick the most appropriate batch size, regardless of memory constraints. Opacus has built-in support for *virtual* batches. Using it we can separate physical steps (gradient computation) and logical steps (noise addition and parameter updates): use larger batches for training, while keeping memory footprint low. Below we will specify two constants:\n", - "\n", - "- `MAX_PHYSICAL_BATCH_SIZE` defines the maximum batch size we can afford from a memory standpoint, and only affects computation speed\n", - "- `BATCH_SIZE`, on the other hand, will affect only convergence and privacy guarantee.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "BATCH_SIZE = 32\n", - "MAX_PHYSICAL_BATCH_SIZE = 8" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "from torch.utils.data import DataLoader, RandomSampler, SequentialSampler\n", - "from opacus.utils.uniform_sampler import UniformWithReplacementSampler\n", - "\n", - "\n", - "train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE)\n", - "test_dataloader = DataLoader(test_dataset, sampler=SequentialSampler(test_dataset), batch_size=BATCH_SIZE)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "# Move the model to appropriate device\n", - "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", - "model = model.to(device)\n", - "\n", - "# Set the model to train mode (HuggingFace models load in eval mode)\n", - "model = model.train()\n", - "# Define optimizer\n", - "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, eps=1e-8)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, we specify some training parameters ready to run the training loop for three epochs" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "EPOCHS = 3\n", - "LOGGING_INTERVAL = 5000 # once every how many steps we run evaluation cycle and report metrics\n", - "EPSILON = 7.5\n", - "DELTA = 1 / len(train_dataloader) # Parameter for privacy accounting. Probability of not achieving privacy guarantees" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let’s now define the evaluation cycle." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "from tqdm.notebook import tqdm\n", - "\n", - "def accuracy(preds, labels):\n", - " return (preds == labels).mean()\n", - "\n", - "# define evaluation cycle\n", - "def evaluate(model):\n", - " model.eval()\n", - "\n", - " loss_arr = []\n", - " accuracy_arr = []\n", - "\n", - " for batch in test_dataloader:\n", - " batch = tuple(t.to(device) for t in batch)\n", - "\n", - " with torch.no_grad():\n", - " inputs = {'input_ids': batch[0],\n", - " 'attention_mask': batch[1],\n", - " 'token_type_ids': batch[2],\n", - " 'labels': batch[3]}\n", - "\n", - " outputs = model(**inputs)\n", - " loss, logits = outputs[:2]\n", - "\n", - " preds = np.argmax(logits.detach().cpu().numpy(), axis=1)\n", - " labels = inputs['labels'].detach().cpu().numpy()\n", - "\n", - " loss_arr.append(loss.item())\n", - " accuracy_arr.append(accuracy(preds, labels))\n", - "\n", - " model.train()\n", - " return np.mean(loss_arr), np.mean(accuracy_arr)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will define and attach PrivacyEngine. There are two parameters you need to consider here:\n", - "\n", - "- `noise_multiplier`. It defines the trade-off between privacy and accuracy. Adding more noise will provide stronger privacy guarantees, but will also hurt model quality. In this run, the PrivacyEngine will determine this value based on the target values of `EPSILON`, `DELTA`, and `EPOCHS`. For the default settings, this will set `noise_multiplier` to about 0.4. \n", - "- `max_grad_norm`. Defines the maximum magnitude of L2 norms to which we clip per sample gradients. There is a bit of tug of war with this threshold: on the one hand, a low threshold means that we will clip many gradients, hurting convergence, so we might be tempted to raise it. However, recall that we add noise with `std=noise_multiplier * max_grad_norm` so we will pay for the increased threshold with more noise. In most cases you can rely on the model being quite resilient to clipping (after the first few iterations your model will tend to adjust so that its gradients stay below the clipping threshold), so you can often just keep the default value (`=1.0`) and focus on tuning `batch_size` and `noise_multiplier` instead. That being said, sometimes clipping hurts the model so it may be worth experimenting with different clipping thresholds, like we are doing in this tutorial.\n", - "\n", - "These two parameters define the scale of the noise we add to gradients: the noise will be sampled from a Gaussian distribution with `std=noise_multiplier * max_grad_norm`.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "from opacus import PrivacyEngine\n", - "\n", - "MAX_GRAD_NORM = 0.1\n", - "\n", - "privacy_engine = PrivacyEngine()\n", - "\n", - "model, optimizer, train_dataloader = privacy_engine.make_private_with_epsilon(\n", - " module=model,\n", - " optimizer=optimizer,\n", - " data_loader=train_dataloader,\n", - " target_delta=DELTA,\n", - " target_epsilon=EPSILON,\n", - " epochs=EPOCHS,\n", - " max_grad_norm=MAX_GRAD_NORM,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can train the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from opacus.utils.batch_memory_manager import BatchMemoryManager\n", - "\n", - "for epoch in range(1, EPOCHS+1):\n", - " losses = []\n", - "\n", - " with BatchMemoryManager(\n", - " data_loader=train_dataloader,\n", - " max_physical_batch_size=MAX_PHYSICAL_BATCH_SIZE,\n", - " optimizer=optimizer\n", - " ) as memory_safe_data_loader:\n", - " for step, batch in enumerate(tqdm(memory_safe_data_loader)):\n", - " optimizer.zero_grad()\n", - "\n", - " batch = tuple(t.to(device) for t in batch)\n", - " inputs = {'input_ids': batch[0],\n", - " 'attention_mask': batch[1],\n", - " 'token_type_ids': batch[2],\n", - " 'labels': batch[3]}\n", - "\n", - " outputs = model(**inputs) # output = loss, logits, hidden_states, attentions\n", - "\n", - " loss = outputs[0]\n", - " loss.backward()\n", - " losses.append(loss.item())\n", - "\n", - " optimizer.step()\n", - "\n", - " if step > 0 and step % LOGGING_INTERVAL == 0:\n", - " train_loss = np.mean(losses)\n", - " eps = privacy_engine.get_epsilon(DELTA)\n", - "\n", - " eval_loss, eval_accuracy = evaluate(model)\n", - "\n", - " print(\n", - " f\"Epoch: {epoch} | \"\n", - " f\"Step: {step} | \"\n", - " f\"Train loss: {train_loss:.3f} | \"\n", - " f\"Eval loss: {eval_loss:.3f} | \"\n", - " f\"Eval accuracy: {eval_accuracy:.3f} | \"\n", - " f\"ɛ: {eps:.2f}\"\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For the test accuracy, after training for three epochs you should expect something close to the results below.\n", - "\n", - "You can see that we can achieve quite strong privacy guarantee at epsilon=7.5 with a moderate accuracy cost of 11 percentage points compared to non-private model trained in a similar setting (upper layers only) and 16 points compared to best results we were able to achieve using the same architecture.\n", - "\n", - "*NB: When not specified, DP-SGD is trained with upper layers only*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "| Model | Noise multiplier | Batch size | Accuracy | Epsilon |\n", - "| --- | --- | --- | --- | --- |\n", - "| no DP, train full model | N/A | 32 | 90.1% | N/A |\n", - "| no DP, train upper layers only | N/A | 32 | 85.4% | N/A |\n", - "| DP-SGD | 1.0 | 32 | 70.5% | 0.7 |\n", - "| **DP-SGD (this tutorial)** | **0.4** | **32** | **74.3%** | **7.5** |\n", - "| DP-SGD | 0.3 | 32 | 75.8% | 20.7 |\n", - "| DP-SGD | 0.1 | 32 | 78.3% | 2865 |\n", - "| DP-SGD | 0.4 | 8 | 67.3% | 5.9 |" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Ghost Clipping\n", - "\n", - "In this section, we show how to use Fast Gradient Clipping and Ghost Clipping DP-SGD. The training loop is nearly identical to the existing one in Opacus, which was based on the (non-private) PyTorch training loop. To use Fast Gradient Clipping, we need to pass grad_sample_mode = 'ghost' in the make_private function.\n", - "\n", - "\n", - "The other change is that privacy engine's make_private function takes the loss criterion as input too and sanitizes it. This allows us to repurpose loss.backward to do two backward passes, and a loss rescaling in between. The first backward computes per-sample gradient norms, where as the second backward on the rescaled loss computes the aggregard clipped gradient" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "device = torch.device(\"cuda:0\")\n", - "os.environ[\"CUDA_LAUNCH_BLOCKING\"] = \"1\"\n", - "\n", - "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, eps=1e-8)\n", - "model = model.train()\n", - "\n", - "privacy_engine = PrivacyEngine()\n", - "criterion = nn.CrossEntropyLoss(reduction=\"mean\")\n", - "\n", - "model_gc, optimizer_gc, criterion_gc, train_dataloader = (\n", - " privacy_engine.make_private_with_epsilon(\n", - " module=model,\n", - " optimizer=optimizer,\n", - " data_loader=train_dataloader,\n", - " criterion=criterion,\n", - " target_delta=DELTA,\n", - " target_epsilon=EPSILON,\n", - " epochs=EPOCHS,\n", - " max_grad_norm=MAX_GRAD_NORM,\n", - " grad_sample_mode=\"ghost\",\n", - " )\n", - ")\n", - "\n", - "model_gc = model_gc.to(device)\n", - "model_gc = model_gc.train()\n", - "\n", - "for epoch in range(1, EPOCHS + 1):\n", - " losses = []\n", - " for step, batch in enumerate(tqdm(train_dataloader)):\n", - " optimizer_gc.zero_grad()\n", - " batch = tuple(t.to(device) for t in batch)\n", - " inputs = {\n", - " \"input_ids\": batch[0],\n", - " \"attention_mask\": batch[1],\n", - " \"token_type_ids\": batch[2],\n", - " \"labels\": batch[3],\n", - " }\n", - " outputs = model_gc(**inputs) # output = loss, logits, hidden_states, attentions\n", - " loss = criterion_gc(outputs[1], batch[3])\n", - " loss.backward()\n", - " optimizer_gc.step()\n", - " losses.append(loss.item())\n", - "\n", - " if step > 0 and step % LOGGING_INTERVAL == 0:\n", - " train_loss = np.mean(losses)\n", - " eval_loss, eval_accuracy = evaluate(model_gc)\n", - " eps = privacy_engine.get_epsilon(DELTA)\n", - " print(\n", - " f\"Epoch: {epoch} | \"\n", - " f\"Step: {step} | \"\n", - " f\"Train loss: {train_loss:.3f} | \"\n", - " f\"Eval loss: {eval_loss:.3f} | \"\n", - " f\"Eval accuracy: {eval_accuracy:.3f} | \"\n", - " f\"ɛ: {eps:.2f}\"\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Epoch: 1 | Step: 500 | Train loss: 1.209 | Eval loss: 1.409 | Eval accuracy: 0.443 | ɛ: 5.25\n", - "Epoch: 1 | Step: 1000 | Train loss: 1.273 | Eval loss: 1.496 | Eval accuracy: 0.481 | ɛ: 6.12\n", - "Epoch: 1 | Step: 1500 | Train loss: 1.316 | Eval loss: 1.514 | Eval accuracy: 0.537 | ɛ: 6.72" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" + "execution_count": null, + "outputs": [] }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" + { + "cell_type": "code", + "metadata": { + "originalKey": "02a529bf-d250-4e0b-b7cb-f047265eedaf", + "outputsInitialized": true, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734022774784, + "executionStopTime": 1734022777897, + "serverExecutionDuration": 2957.9766383395, + "requestMsgId": "02a529bf-d250-4e0b-b7cb-f047265eedaf", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DgEx_iaepH_3", + "outputId": "648e2934-e9f2-476e-cf69-29faca0c4a82" + }, + "source": [ + "import zipfile\n", + "import urllib.request\n", + "import os\n", + "\n", + "import warnings\n", + "warnings.simplefilter(\"ignore\")\n", + "\n", + "def download_and_extract(dataset_url, data_dir):\n", + " print(\"Downloading and extracting ...\")\n", + " filename = \"snli_1.0.zip\"\n", + " urllib.request.urlretrieve(dataset_url, filename)\n", + " with zipfile.ZipFile(filename) as zip_ref:\n", + " zip_ref.extractall(data_dir)\n", + " os.remove(filename)\n", + " print(\"Completed!\")\n", + "\n", + "download_and_extract(STANFORD_SNLI_URL, DATA_DIR)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Downloading and extracting ...\n", + "Completed!\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "a3734548-7eef-4b88-a1ba-6ef28a6dd954", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "JcSMVhyKpH_4" + }, + "source": [ + "The dataset comes in two formats (`tsv` and `json`) and has already been split into train/dev/test. Let’s verify that’s the case." + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "4efc1661-0aae-4c89-b2ab-79c4bf4bf548", + "outputsInitialized": true, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734022779737, + "executionStopTime": 1734022779877, + "serverExecutionDuration": 9.7765219397843, + "requestMsgId": "4efc1661-0aae-4c89-b2ab-79c4bf4bf548", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "33zpUvz3pH_4", + "outputId": "9535d490-3e57-4f23-b348-8e6f6f2b9e9d" + }, + "source": [ + "snli_folder = os.path.join(DATA_DIR, \"snli_1.0\")\n", + "os.listdir(snli_folder)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['snli_1.0_test.jsonl',\n", + " 'snli_1.0_train.txt',\n", + " '.DS_Store',\n", + " 'snli_1.0_dev.jsonl',\n", + " 'Icon\\r',\n", + " 'README.txt',\n", + " 'snli_1.0_dev.txt',\n", + " 'snli_1.0_test.txt',\n", + " 'snli_1.0_train.jsonl']" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "a975a807-7365-4504-a47d-067e70c0d8bf", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "Qnv2-RWWpH_5" + }, + "source": [ + "Let's now take a look inside. [SNLI dataset](https://nlp.stanford.edu/projects/snli/) provides ample syntactic metadata, but we'll only use raw input text. Therefore, the only fields we're interested in are **sentence1** (premise), **sentence2** (hypothesis), and **gold_label** (label chosen by the majority of annotators).\n", + "\n", + "The label defines the relation between premise and hypothesis: either *contradiction*, *neutral*, or *entailment*." + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "82b717b7-ddc0-4096-a623-de39767ccc25", + "outputsInitialized": true, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734022781467, + "executionStopTime": 1734022786972, + "serverExecutionDuration": 5307.5669091195, + "requestMsgId": "82b717b7-ddc0-4096-a623-de39767ccc25", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "7jhnTgT3pH_5", + "outputId": "d5523231-85c0-4003-cebc-e1baf814e8d9" + }, + "source": [ + "import pandas as pd\n", + "train_path = os.path.join(snli_folder, \"snli_1.0_train.txt\")\n", + "dev_path = os.path.join(snli_folder, \"snli_1.0_dev.txt\")\n", + "\n", + "df_train = pd.read_csv(train_path, sep='\\t')\n", + "df_test = pd.read_csv(dev_path, sep='\\t')\n", + "\n", + "df_train[['sentence1', 'sentence2', 'gold_label']][:5]" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " sentence1 \\\n", + "0 A person on a horse jumps over a broken down a... \n", + "1 A person on a horse jumps over a broken down a... \n", + "2 A person on a horse jumps over a broken down a... \n", + "3 Children smiling and waving at camera \n", + "4 Children smiling and waving at camera \n", + "\n", + " sentence2 gold_label \n", + "0 A person is training his horse for a competition. neutral \n", + "1 A person is at a diner, ordering an omelette. contradiction \n", + "2 A person is outdoors, on a horse. entailment \n", + "3 They are smiling at their parents neutral \n", + "4 There are children present entailment " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sentence1sentence2gold_label
0A person on a horse jumps over a broken down a...A person is training his horse for a competition.neutral
1A person on a horse jumps over a broken down a...A person is at a diner, ordering an omelette.contradiction
2A person on a horse jumps over a broken down a...A person is outdoors, on a horse.entailment
3Children smiling and waving at cameraThey are smiling at their parentsneutral
4Children smiling and waving at cameraThere are children presententailment
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"df_train[['sentence1', 'sentence2', 'gold_label']][:5]\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"sentence1\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Children smiling and waving at camera\",\n \"A person on a horse jumps over a broken down airplane.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sentence2\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"A person is at a diner, ordering an omelette.\",\n \"There are children present\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"gold_label\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"neutral\",\n \"contradiction\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "c6251c2b-155c-44c9-a6da-18188598d6fa", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "T2yWe4xMpH_6" + }, + "source": [ + "## Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "53ecea2a-3d50-4cbc-bfe5-e13292e1f27d", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "LezVNRtypH_6" + }, + "source": [ + "BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art approach to various NLP tasks. It uses a Transformer architecture and relies heavily on the concept of pre-training.\n", + "\n", + "We'll use a pre-trained BERT-base model, provided in the huggingface [transformers](https://github.com/huggingface/transformers) repo.\n", + "It gives us a PyTorch implementation for the classic BERT architecture, as well as a tokenizer and weights, pre-trained on a public English corpus (Wikipedia).\n", + "\n", + "Please follow these [installation instructions](https://github.com/huggingface/transformers#installation) before proceeding." + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "755dd641-c9ec-4681-9bd0-652255767547", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734032316314, + "executionStopTime": 1734032317257, + "serverExecutionDuration": 818.86289687827, + "requestMsgId": "755dd641-c9ec-4681-9bd0-652255767547", + "outputsInitialized": true, + "isAgentGenerated": false, + "colab": { + "base_uri": "https://localhost:8080/", + "height": 232, + "referenced_widgets": [ + "a6f080fa6f4b4de399af5d1d7850b960", + "47fec328e2464db3861b16e68e6cc65d", + "3ec2b8f4e38d4b05a09c83e6925960a6", + "6fee830ab9f545cea62081d8cf5b3240", + "cf024e76fe9b4766ac035f617391deb7", + "fc0b17bfb44c45dd9825c6f1719b61cc", + "2dac3a8089c34d0b9ce81653bde67603", + "ffb2ab66b3ca4d9899658fe58c43acb1", + "4b6aad944453432dbf957da380d059a1", + "cf3dae1b960440d7984e9ed287b54cee", + "36254c0a04c840f3bf4038096c736873", + "e3ffd50ee822433fabd9c1ee4a39612e", + "f248160605a2450a8411e4f5d58a5cfa", + "a7b6e7aa521647649bb4157b6504d4e8", + "1c1e86bca0534caaa7ad435fd7e67bf2", + "22ca9e6c6c1f4bc6b0f3db1a09a5e562", + "e9699698559c4860bdf6a312c492e7da", + "874e45fe39844927ad1fd10d4899a428", + "2dca6b2477344b45b1c17f124e27ce72", + "c7afefe6f907441b9e466605cb4f5c7f", + "df3c4c12e06245b1a8f4a4a7d71a530c", + "e63eb3e5c06140249f6d8d4c04fe8693", + "d951c3592058414ab00cf754e9b70685", + "c3f9146e082346a3bed274efb2265376", + "1602c2298e9443f78007fdbf101a0c2b", + "d5dda55bd4de4f12bf3718fb386c5bf9", + "943e61866ed74be4b10ba383450cb4c3", + "0727d77eaf28466c93c2c6021661ac9a", + "3ab2e1a9ba694463ab5f3ec78ad0a8f4", + "30b4db204fa644128198abf6d82664bf", + "0517fdac88784fe6b51ad1f989f99cb7", + "ce1c55bc51dc49a9b261f104f49d38d8", + "4b7e11bb32bc43c9ad0449bd39bc4d40", + "3dbf36a5c0884579ab2f36c2e91c04fb", + "d71c49bd9f8c438898250c3874c06240", + "1b70fa16b803466ea31649dcd644e3d7", + "7a728bea623646c182c508e34b582fc9", + "8494caffe83a4743a11d2751b38c56bb", + "1580380472df40e38cbde67659c5221d", + "349b262479b9418badd6a3acff386dd2", + "3370a9d70dd04d5195bc3f1f81b18728", + "ee235d5ffc5142c895175cbea5c94dfe", + "f6c5ef333a2b45f1bf196fdd58873688", + "e24ff9dae78241d9b5a6a7199c888e45", + "d4a768f261614ac69b3004fbf2323c89", + "81a7d1b27cd94916ac3c330aa2551cf0", + "eac4d3f8e59a4d4c81178cc76600182f", + "5bcca2ee852144c28bfd40de0978cadc", + "05fa1e761bc14c929b19abf0d8a93f5f", + "32ba8daa0c6e4a9c9f62df588a198b1b", + "e1e93f905b494126a6a9e1e0a2f92022", + "9ca2cb3f116547d3bb062f4da11762d7", + "554184f1c9b44bd3a8116773572347eb", + "bb89b007667e4bf0b696ad84dfb2f91d", + "610f073056924398b9229978dff5ff4d" + ] + }, + "id": "bxwD3rYepH_6", + "outputId": "96ef0f4f-6ff3-435d-c2bc-8fed92ca0d05" + }, + "source": [ + "from transformers import BertConfig, BertTokenizer, BertForSequenceClassification\n", + "\n", + "model_name = \"bert-base-cased\"\n", + "config = BertConfig.from_pretrained(\n", + " model_name,\n", + " num_labels=3,\n", + ")\n", + "tokenizer = BertTokenizer.from_pretrained(\n", + " \"bert-base-cased\",\n", + " do_lower_case=False,\n", + ")\n", + "model = BertForSequenceClassification.from_pretrained(\n", + " \"bert-base-cased\",\n", + " config=config,\n", + ")" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "config.json: 0%| | 0.00/570 [00:00" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "6332ec74-d2e5-48a6-bf20-bee1ddf327c6", + "outputsInitialized": true, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032318121, + "executionStopTime": 1734032318289, + "serverExecutionDuration": 8.0086700618267, + "requestMsgId": "6332ec74-d2e5-48a6-bf20-bee1ddf327c6", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "F8iA_l3xpH_7", + "outputId": "8d8dd938-f073-4f16-8a9c-dbeaee2bfc4a" + }, + "source": [ + "trainable_layers = [model.bert.encoder.layer[-1], model.bert.pooler, model.classifier]\n", + "total_params = 0\n", + "trainable_params = 0\n", + "\n", + "for p in model.parameters():\n", + " p.requires_grad = False\n", + " total_params += p.numel()\n", + "\n", + "for layer in trainable_layers:\n", + " for p in layer.parameters():\n", + " p.requires_grad = True\n", + " trainable_params += p.numel()\n", + "\n", + "print(f\"Total parameters count: {total_params:,}\") # ~108M\n", + "print(f\"Trainable parameters count: {trainable_params:,}\") # ~7M" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Total parameters count: 108,312,579\n", + "Trainable parameters count: 7,680,771\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "123236c7-ba61-47e0-89a8-204627f2d9f0", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "r_KuCSszpH_7" + }, + "source": [ + "Thus, by using a pre-trained model we reduce the number of trainable params from over 100 million to just above 7.5 million. This will help both performance and convergence with added noise." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "b2909681-24bd-45ad-98d6-872d92fd237e", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "P63ItKGTpH_7" + }, + "source": [ + "## Prepare the data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "a7d059bf-0844-4457-b77b-6755b5bbb674", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "49gvBz19pH_7" + }, + "source": [ + "Before we begin training, we need to preprocess the data and convert it to the format our model expects.\n", + "\n", + "(Note: it'll take 5-10 minutes to run on a laptop)" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "42698e36-08f9-4a83-9a0a-d5f32c384758", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734022792696, + "executionStopTime": 1734022819705, + "serverExecutionDuration": 1299.7262682766, + "requestMsgId": "42698e36-08f9-4a83-9a0a-d5f32c384758", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "5SPPcMkFpH_7" + }, + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import transformers\n", + "from torch.utils.data import TensorDataset\n", + "from transformers.data.processors.utils import InputExample\n", + "from transformers.data.processors.glue import glue_convert_examples_to_features" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "9871bf56-ba44-4a14-92dd-6626c58883ee", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734022793887, + "executionStopTime": 1734023174330, + "serverExecutionDuration": 354598.03974768, + "requestMsgId": "9871bf56-ba44-4a14-92dd-6626c58883ee", + "id": "RMPbfeMvpH_7" + }, + "source": [ + "LABEL_LIST = ['contradiction', 'entailment', 'neutral']\n", + "MAX_SEQ_LENGHT = 128\n", + "\n", + "\n", + "\n", + "\n", + "def _create_examples(df, set_type):\n", + " \"\"\" Convert raw dataframe to a list of InputExample. Filter malformed examples\n", + " \"\"\"\n", + " examples = []\n", + " for index, row in df.iterrows():\n", + " if row['gold_label'] not in LABEL_LIST:\n", + " continue\n", + " if not isinstance(row['sentence1'], str) or not isinstance(row['sentence2'], str):\n", + " continue\n", + "\n", + " guid = f\"{index}-{set_type}\"\n", + " examples.append(\n", + " InputExample(guid=guid, text_a=row['sentence1'], text_b=row['sentence2'], label=row['gold_label']))\n", + " return examples\n", + "\n", + "def _df_to_features(df, set_type):\n", + " \"\"\" Pre-process text. This method will:\n", + " 1) tokenize inputs\n", + " 2) cut or pad each sequence to MAX_SEQ_LENGHT\n", + " 3) convert tokens into ids\n", + "\n", + " The output will contain:\n", + " `input_ids` - padded token ids sequence\n", + " `attention mask` - mask indicating padded tokens\n", + " `token_type_ids` - mask indicating the split between premise and hypothesis\n", + " `label` - label\n", + " \"\"\"\n", + " examples = _create_examples(df, set_type)\n", + "\n", + " #backward compatibility with older transformers versions\n", + " legacy_kwards = {}\n", + " from packaging import version\n", + " if version.parse(transformers.__version__) < version.parse(\"2.9.0\"):\n", + " legacy_kwards = {\n", + " \"pad_on_left\": False,\n", + " \"pad_token\": tokenizer.convert_tokens_to_ids([tokenizer.pad_token])[0],\n", + " \"pad_token_segment_id\": 0,\n", + " }\n", + "\n", + " return glue_convert_examples_to_features(\n", + " examples=examples,\n", + " tokenizer=tokenizer,\n", + " label_list=LABEL_LIST,\n", + " max_length=MAX_SEQ_LENGHT,\n", + " output_mode=\"classification\",\n", + " **legacy_kwards,\n", + " )\n", + "\n", + "def _features_to_dataset(features):\n", + " \"\"\" Convert features from `_df_to_features` into a single dataset\n", + " \"\"\"\n", + " all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)\n", + " all_attention_mask = torch.tensor(\n", + " [f.attention_mask for f in features], dtype=torch.long\n", + " )\n", + " all_token_type_ids = torch.tensor(\n", + " [f.token_type_ids for f in features], dtype=torch.long\n", + " )\n", + " all_labels = torch.tensor([f.label for f in features], dtype=torch.long)\n", + " dataset = TensorDataset(\n", + " all_input_ids, all_attention_mask, all_token_type_ids, all_labels\n", + " )\n", + "\n", + " return dataset\n", + "\n", + "train_features = _df_to_features(df_train, \"train\")\n", + "test_features = _df_to_features(df_test, \"test\")\n", + "\n", + "train_dataset = _features_to_dataset(train_features)\n", + "test_dataset = _features_to_dataset(test_features)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "06f80462-8e47-4f1a-8687-f5b891481ab5", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "yU5kzHNhpH_7" + }, + "source": [ + "## Choosing batch size\n", + "\n", + "Let's talk about batch sizes for a bit.\n", + "\n", + "In addition to all the considerations you normally take into account when choosing batch size, training models with DP adds another one - privacy cost.\n", + "\n", + "Because of the threat model we assume and the way we add noise to the gradients, larger batch sizes (to a certain extent) generally help convergence. We add the same amount of noise to each gradient update (scaled to the norm of one sample in the batch) regardless of the batch size. What this means is that as the batch size increases, the relative amount of noise added decreases. while preserving the same epsilon guarantee.\n", + "\n", + "You should, however, keep in mind that increasing batch size has its price in terms of epsilon, which grows at `O(sqrt(batch_size))` as we train (therefore larger batches make it grow faster). The good strategy here is to experiment with multiple combinations of `batch_size` and `noise_multiplier` to find the one that provides the best possible quality at acceptable privacy guarantee.\n", + "\n", + "There's another side to this - memory. Opacus computes and stores *per sample* gradients, so for every normal gradient, Opacus will store `n=batch_size` per-sample gradients on each step, thus increasing the memory footprint by at least `O(batch_size)`. In reality, however, the peak memory requirement is `O(batch_size^2)` compared to a non-private model. This is because some intermediate steps in per sample gradient computation involve operations on two matrices, each with batch_size as one of the dimensions.\n", + "\n", + "The good news is, we can pick the most appropriate batch size, regardless of memory constraints. Opacus has built-in support for *virtual* batches. Using it we can separate physical steps (gradient computation) and logical steps (noise addition and parameter updates): use larger batches for training, while keeping memory footprint low. Below we will specify two constants:\n", + "\n", + "- `MAX_PHYSICAL_BATCH_SIZE` defines the maximum batch size we can afford from a memory standpoint, and only affects computation speed\n", + "- `BATCH_SIZE`, on the other hand, will affect only convergence and privacy guarantee.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "b06c0410-a2d1-407a-b543-199e23605ad5", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032324563, + "executionStopTime": 1734032324694, + "serverExecutionDuration": 2.0098211243749, + "requestMsgId": "b06c0410-a2d1-407a-b543-199e23605ad5", + "id": "TefYWR8mpH_7" + }, + "source": [ + "BATCH_SIZE = 32\n", + "MAX_PHYSICAL_BATCH_SIZE = 8" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "07602a22-7950-426b-9d1a-32188f163cb8", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032325733, + "executionStopTime": 1734032325869, + "serverExecutionDuration": 2.5164256803691, + "requestMsgId": "07602a22-7950-426b-9d1a-32188f163cb8", + "id": "CcH-whfRpH_7" + }, + "source": [ + "from torch.utils.data import DataLoader, RandomSampler, SequentialSampler\n", + "from opacus.utils.uniform_sampler import UniformWithReplacementSampler\n", + "\n", + "train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE)\n", + "test_dataloader = DataLoader(test_dataset, sampler=SequentialSampler(test_dataset), batch_size=BATCH_SIZE)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "16320607-1a02-4588-8b7d-3e9ab859c7e7", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "9Atc-d7QpH_8" + }, + "source": [ + "## Training" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "e37dd036-25c4-47fa-996e-aced5bd1856b", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032332678, + "executionStopTime": 1734032332917, + "serverExecutionDuration": 123.88719897717, + "requestMsgId": "e37dd036-25c4-47fa-996e-aced5bd1856b", + "id": "Ibx6k7GspH_8" + }, + "source": [ + "# Move the model to appropriate device\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "model = model.to(device)\n", + "\n", + "# Set the model to train mode (HuggingFace models load in eval mode)\n", + "model = model.train()\n", + "# Define optimizer\n", + "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, eps=1e-8)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "4352fbca-a402-4f32-a150-243fbd6cf721", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "0ghT3eN1pH_8" + }, + "source": [ + "First, we specify some training parameters ready to run the training loop for three epochs" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "b2807795-97b2-4969-af72-2a434d161c58", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032333878, + "executionStopTime": 1734032335582, + "serverExecutionDuration": 2.3859869688749, + "requestMsgId": "b2807795-97b2-4969-af72-2a434d161c58", + "id": "cNQ-Rb5LpH_8" + }, + "source": [ + "EPOCHS = 3\n", + "LOGGING_INTERVAL = 5000 # once every how many steps we run evaluation cycle and report metrics\n", + "EPSILON = 7.5\n", + "DELTA = 1 / len(train_dataloader) # Parameter for privacy accounting. Probability of not achieving privacy guarantees" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "5d20a64b-7e9a-4a02-a367-a01ec40d10d0", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "VszTFifzpH_8" + }, + "source": [ + "Let’s now define the evaluation cycle." + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "04f41504-002f-4da3-89b2-71a306e3bb9f", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032340042, + "executionStopTime": 1734032340181, + "serverExecutionDuration": 3.5456721670926, + "requestMsgId": "04f41504-002f-4da3-89b2-71a306e3bb9f", + "id": "ri93DMyTpH_8" + }, + "source": [ + "import numpy as np\n", + "from tqdm.notebook import tqdm\n", + "\n", + "def accuracy(preds, labels):\n", + " return (preds == labels).mean()\n", + "\n", + "# define evaluation cycle\n", + "def evaluate(model):\n", + " model.eval()\n", + "\n", + " loss_arr = []\n", + " accuracy_arr = []\n", + "\n", + " for batch in test_dataloader:\n", + " batch = tuple(t.to(device) for t in batch)\n", + "\n", + " with torch.no_grad():\n", + " inputs = {'input_ids': batch[0],\n", + " 'attention_mask': batch[1],\n", + " 'token_type_ids': batch[2],\n", + " 'labels': batch[3]}\n", + "\n", + " outputs = model(**inputs)\n", + " loss, logits = outputs[:2]\n", + "\n", + " preds = np.argmax(logits.detach().cpu().numpy(), axis=1)\n", + " labels = inputs['labels'].detach().cpu().numpy()\n", + "\n", + " loss_arr.append(loss.item())\n", + " accuracy_arr.append(accuracy(preds, labels))\n", + "\n", + " model.train()\n", + " return np.mean(loss_arr), np.mean(accuracy_arr)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "13b3b5e7-70d5-4ee3-951b-8db81148a974", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "Rl8ncwc1pH_8" + }, + "source": [ + "Next, we will define and attach PrivacyEngine. There are two parameters you need to consider here:\n", + "\n", + "- `noise_multiplier`. It defines the trade-off between privacy and accuracy. Adding more noise will provide stronger privacy guarantees, but will also hurt model quality. In this run, the PrivacyEngine will determine this value based on the target values of `EPSILON`, `DELTA`, and `EPOCHS`. For the default settings, this will set `noise_multiplier` to about 0.4.\n", + "- `max_grad_norm`. Defines the maximum magnitude of L2 norms to which we clip per sample gradients. There is a bit of tug of war with this threshold: on the one hand, a low threshold means that we will clip many gradients, hurting convergence, so we might be tempted to raise it. However, recall that we add noise with `std=noise_multiplier * max_grad_norm` so we will pay for the increased threshold with more noise. In most cases you can rely on the model being quite resilient to clipping (after the first few iterations your model will tend to adjust so that its gradients stay below the clipping threshold), so you can often just keep the default value (`=1.0`) and focus on tuning `batch_size` and `noise_multiplier` instead. That being said, sometimes clipping hurts the model so it may be worth experimenting with different clipping thresholds, like we are doing in this tutorial.\n", + "\n", + "These two parameters define the scale of the noise we add to gradients: the noise will be sampled from a Gaussian distribution with `std=noise_multiplier * max_grad_norm`.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "12c0094d-2f1d-4cf9-aa41-5bb010b3ad97", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "python", + "executionStartTime": 1734032342308, + "executionStopTime": 1734032384336, + "serverExecutionDuration": 41907.956605777, + "requestMsgId": "12c0094d-2f1d-4cf9-aa41-5bb010b3ad97", + "id": "nai3pjOqpH_8" + }, + "source": [ + "from opacus import PrivacyEngine\n", + "\n", + "MAX_GRAD_NORM = 0.1\n", + "\n", + "privacy_engine = PrivacyEngine()\n", + "\n", + "model, optimizer, train_dataloader = privacy_engine.make_private_with_epsilon(\n", + " module=model,\n", + " optimizer=optimizer,\n", + " data_loader=train_dataloader,\n", + " target_delta=DELTA,\n", + " target_epsilon=EPSILON,\n", + " epochs=EPOCHS,\n", + " max_grad_norm=MAX_GRAD_NORM,\n", + ")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "fff606f7-8ca5-4e01-abdd-aeab3e36f0cc", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "JDHUkp59pH_8" + }, + "source": [ + "Now we can train the model." + ] + }, + { + "cell_type": "code", + "source": [ + "from opacus.utils.batch_memory_manager import BatchMemoryManager\n", + "\n", + "for epoch in range(1, EPOCHS+1):\n", + " losses = []\n", + "\n", + " with BatchMemoryManager(\n", + " data_loader=train_dataloader,\n", + " max_physical_batch_size=MAX_PHYSICAL_BATCH_SIZE,\n", + " optimizer=optimizer\n", + " ) as memory_safe_data_loader:\n", + " for step, batch in enumerate(tqdm(memory_safe_data_loader)):\n", + " optimizer.zero_grad()\n", + "\n", + " batch = tuple(t.to(device) for t in batch)\n", + " inputs = {'input_ids': batch[0],\n", + " 'attention_mask': batch[1],\n", + " 'token_type_ids': batch[2],\n", + " 'labels': batch[3]}\n", + "\n", + " outputs = model(**inputs) # output = loss, logits, hidden_states, attentions\n", + "\n", + " loss = outputs[0]\n", + " loss.backward()\n", + " losses.append(loss.item())\n", + "\n", + " optimizer.step()\n", + "\n", + " if step > 0 and step % LOGGING_INTERVAL == 0:\n", + " train_loss = np.mean(losses)\n", + " eps = privacy_engine.get_epsilon(DELTA)\n", + "\n", + " eval_loss, eval_accuracy = evaluate(model)\n", + "\n", + " print(\n", + " f\"Epoch: {epoch} | \"\n", + " f\"Step: {step} | \"\n", + " f\"Train loss: {train_loss:.3f} | \"\n", + " f\"Eval loss: {eval_loss:.3f} | \"\n", + " f\"Eval accuracy: {eval_accuracy:.3f} | \"\n", + " f\"ɛ: {eps:.2f}\"\n", + " )" + ], + "metadata": { + "id": "KuwJNsdSscEu" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "4017f2a8-7ffc-4752-82d4-dac5b10e77bd", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "showInput": false, + "id": "3c6wH8tfpH_8" + }, + "source": [ + "For the test accuracy, after training for three epochs you should expect something close to the results below.\n", + "\n", + "You can see that we can achieve quite strong privacy guarantee at epsilon=7.5 with a moderate accuracy cost of 11 percentage points compared to non-private model trained in a similar setting (upper layers only) and 16 points compared to best results we were able to achieve using the same architecture.\n", + "\n", + "*NB: When not specified, DP-SGD is trained with upper layers only*" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "a5a2c5d2-9e45-41be-9c31-846808588385", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "showInput": false, + "id": "ZE2OFPcMpH_9" + }, + "source": [ + "| Model | Noise multiplier | Batch size | Accuracy | Epsilon |\n", + "| --- | --- | --- | --- | --- |\n", + "| no DP, train full model | N/A | 32 | 90.1% | N/A |\n", + "| no DP, train upper layers only | N/A | 32 | 85.4% | N/A |\n", + "| DP-SGD | 1.0 | 32 | 70.5% | 0.7 |\n", + "| **DP-SGD (this tutorial)** | **0.4** | **32** | **74.3%** | **7.5** |\n", + "| DP-SGD | 0.3 | 32 | 75.8% | 20.7 |\n", + "| DP-SGD | 0.1 | 32 | 78.3% | 2865 |\n", + "| DP-SGD | 0.4 | 8 | 67.3% | 5.9 |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "4cfbe9ac-18cd-487c-865b-3da5e13e07b8", + "outputsInitialized": false, + "isAgentGenerated": false, + "language": "markdown", + "id": "8wuqVkG4pH_9" + }, + "source": [ + "## Ghost Clipping\n", + "\n", + "In this section, we show how to use Fast Gradient Clipping and Ghost Clipping DP-SGD. The training loop is nearly identical to the existing one in Opacus, which was based on the (non-private) PyTorch training loop. To use Fast Gradient Clipping, we need to pass grad_sample_mode = 'ghost' in the make_private function.\n", + "\n", + "\n", + "The other change is that privacy engine's make_private function takes the loss criterion as input too and sanitizes it. This allows us to repurpose loss.backward to do two backward passes, and a loss rescaling in between. The first backward computes per-sample gradient norms, where as the second backward on the rescaled loss computes the aggregard clipped gradient" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "8dec9e28-dbed-43ce-8b8b-172ced512e6d", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1733867931416, + "executionStopTime": 1733867931607, + "serverExecutionDuration": 2.2143041715026, + "requestMsgId": "8dec9e28-dbed-43ce-8b8b-172ced512e6d", + "outputsInitialized": false, + "isAgentGenerated": false, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cZWxTEFqpH_9", + "outputId": "6ad5c023-07ff-485f-9acd-a98acd05de8d" + }, + "source": [ + "# Let's import the model again and freeze layers as before\n", + "model = BertForSequenceClassification.from_pretrained(\n", + " \"bert-base-cased\",\n", + " config=config,\n", + ")\n", + "\n", + "trainable_layers = [model.bert.encoder.layer[-1], model.bert.pooler, model.classifier]\n", + "\n", + "for p in model.parameters():\n", + " p.requires_grad = False\n", + " total_params += p.numel()\n", + "\n", + "for layer in trainable_layers:\n", + " for p in layer.parameters():\n", + " p.requires_grad = True\n", + " trainable_params += p.numel()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\n", + "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "device = torch.device(\"cuda:0\")\n", + "os.environ[\"CUDA_LAUNCH_BLOCKING\"] = \"1\"\n", + "\n", + "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, eps=1e-8)\n", + "model = model.train()\n", + "\n", + "privacy_engine = PrivacyEngine()\n", + "criterion = nn.CrossEntropyLoss(reduction=\"mean\")\n", + "\n", + "model_gc, optimizer_gc, criterion_gc, train_dataloader = (\n", + " privacy_engine.make_private_with_epsilon(\n", + " module=model,\n", + " optimizer=optimizer,\n", + " data_loader=train_dataloader,\n", + " criterion=criterion,\n", + " target_delta=DELTA,\n", + " target_epsilon=EPSILON,\n", + " epochs=EPOCHS,\n", + " max_grad_norm=MAX_GRAD_NORM,\n", + " grad_sample_mode=\"ghost\",\n", + " )\n", + ")\n", + "\n", + "model_gc = model_gc.to(device)\n", + "model_gc = model_gc.train()\n", + "\n", + "for epoch in range(1, EPOCHS + 1):\n", + " losses = []\n", + " for step, batch in enumerate(tqdm(train_dataloader)):\n", + " optimizer_gc.zero_grad()\n", + " batch = tuple(t.to(device) for t in batch)\n", + " inputs = {\n", + " \"input_ids\": batch[0],\n", + " \"attention_mask\": batch[1],\n", + " \"token_type_ids\": batch[2],\n", + " \"labels\": batch[3],\n", + " }\n", + " outputs = model_gc(**inputs) # output = loss, logits, hidden_states, attentions\n", + " loss = criterion_gc(outputs[1], batch[3])\n", + " loss.backward()\n", + " optimizer_gc.step()\n", + " losses.append(loss.item())\n", + "\n", + " if step > 0 and step % LOGGING_INTERVAL == 0:\n", + " train_loss = np.mean(losses)\n", + " eval_loss, eval_accuracy = evaluate(model_gc)\n", + " eps = privacy_engine.get_epsilon(DELTA)\n", + " print(\n", + " f\"Epoch: {epoch} | \"\n", + " f\"Step: {step} | \"\n", + " f\"Train loss: {train_loss:.3f} | \"\n", + " f\"Eval loss: {eval_loss:.3f} | \"\n", + " f\"Eval accuracy: {eval_accuracy:.3f} | \"\n", + " f\"ɛ: {eps:.2f}\"\n", + " )" + ], + "metadata": { + "id": "lAgYg2IUuI3P" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "f3a32c29-aaf7-4c89-b736-5d558021202e", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "H6DxmW2LpIAA" + }, + "source": [ + "\n", + "Epoch: 1 | Step: 5000 | Train loss: 1.559 | Eval loss: 1.508 | Eval accuracy: 0.683 | ɛ: 4.83\n", + "\n", + "Epoch: 1 | Step: 10000 | Train loss: 1.625 | Eval loss: 1.635 | Eval accuracy: 0.723 | ɛ: 5.46\n", + "\n", + "Epoch: 1 | Step: 15000 | Train loss: 1.655 | Eval loss: 1.649 | Eval accuracy: 0.735 | ɛ: 5.86\n", + "\n", + "Epoch: 2 | Step: 5000 | Train loss: 1.742 | Eval loss: 1.676 | Eval accuracy: 0.739 | ɛ: 6.29\n", + "\n", + "Epoch: 2 | Step: 10000 | Train loss: 1.746 | Eval loss: 1.681 | Eval accuracy: 0.743 | ɛ: 6.54\n", + "\n", + "Epoch: 2 | Step: 15000 | Train loss: 1.759 | Eval loss: 1.683 | Eval accuracy: 0.745 | ɛ: 6.76\n", + "\n", + "Epoch: 3 | Step: 5000 | Train loss: 1.784 | Eval loss: 1.769 | Eval accuracy: 0.745 | ɛ: 7.05\n", + "\n", + "Epoch: 3 | Step: 10000 | Train loss: 1.789 | Eval loss: 1.695 | Eval accuracy: 0.748 | ɛ: 7.24\n", + "\n", + "Epoch: 3 | Step: 15000 | Train loss: 1.792 | Eval loss: 1.714 | Eval accuracy: 0.749 | ɛ: 7.42" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "0db42c5e-e10d-4298-b11a-962100627c27", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "P99rSXt9pIAA" + }, + "source": [ + "## Low-Rank Adaptation (LoRA) with DP-SGD\n", + "\n", + "\n", + "\n", + "In this section, we show that DP-SGD fine-tuning is compatible with LoRA and other parameter-efficient fine-tuning techinques (PEFT). LoRA can be set up with only a few lines and there are no conceptual changes in the privacy analysis. When full fine-tuning of large models is costly, PEFT methods speed up training by training only a small number of extra parameters, while maintaining on par accuracy with full fine-tuning. In the context of DP-SGD, PEFT have the potential to further improve accuracy, since training fewer parameters means less noise is infused in the computation.\n", + "\n", + "We will use the [`peft`](https://huggingface.co/docs/peft/main/en/index) libarary from HuggingFace, compatible with the `transformers` library." + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "a99e5687-82c3-4a60-b606-615c90bcda46", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734035446835, + "executionStopTime": 1734035447482, + "serverExecutionDuration": 524.47218215093, + "requestMsgId": "a99e5687-82c3-4a60-b606-615c90bcda46", + "outputsInitialized": true, + "isAgentGenerated": false, + "id": "RUguJUh-pIAA", + "outputId": "ae21e4a8-0754-4d0d-e71a-0dbc9d69ef21" + }, + "source": [ + "# reset the model\n", + "model = BertForSequenceClassification.from_pretrained(\n", + " \"bert-base-cased\",\n", + " config=config,\n", + ")" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\nYou should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "3ed0fe44-7bd7-47f3-ac79-2039fe102fda", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "kZ5hxMaypIAA" + }, + "source": [ + "Recall that the total number of model parameters is ~108 M." + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "ca5a2e9f-a10f-414a-a608-68c0e592801b", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734035448142, + "executionStopTime": 1734035448269, + "serverExecutionDuration": 3.400239162147, + "requestMsgId": "ca5a2e9f-a10f-414a-a608-68c0e592801b", + "outputsInitialized": true, + "isAgentGenerated": false, + "id": "718Vz1cgpIAA", + "outputId": "ce2731bc-859a-49bb-e9e9-28c9e439aab1" + }, + "source": [ + "total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n", + "print(f\"Total parameters count: {total_params:,}\") # ~108M" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Total parameters count: 108,312,579\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "d28ddd8b-a38a-489b-b0e9-64a4758a1cb2", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "YLeZo1RapIAA" + }, + "source": [ + "After enabling LoRA, the total number of trainable parameters decreases 100-fold to ~1M.\n", + "\n", + "Note some key hyper-parameters when using LoRA such as the rank $r$ of the decomposition matrix. These parameters might need tuning. " + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "19206705-bf23-4979-bcfe-ae4c392091ef", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734035450177, + "executionStopTime": 1734035451571, + "serverExecutionDuration": 1273.7480518408, + "requestMsgId": "19206705-bf23-4979-bcfe-ae4c392091ef", + "outputsInitialized": true, + "isAgentGenerated": false, + "id": "dNTL2qtApIAA", + "outputId": "ac3714b0-7dc5-472e-b604-2ac665e9fa8d" + }, + "source": [ + "from peft import get_peft_model, LoraConfig, TaskType\n", + "\n", + "lora_config = LoraConfig(\n", + " task_type=TaskType.SEQ_CLS, # our particular task is sequence classification\n", + " inference_mode=False, # Enable training mode\n", + " r=32, # Low-rank dimension\n", + " lora_alpha=32, # Alpha scaling factor\n", + " lora_dropout=0.05, # Dropout for LoRA layers\n", + ")\n", + "\n", + "model_with_lora = get_peft_model(model, lora_config)\n", + "trainable_params = sum(p.numel() for p in model_with_lora.parameters() if p.requires_grad)\n", + "print(f\"Total trainable parameters with LoRA: {trainable_params:,}\") # ~1M" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Total trainable parameters with LoRA: 1,181,955\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "4e46f5d6-decc-442a-a7a8-a710302718c1", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "spK3IRN_pIAA" + }, + "source": [ + "Similar to before, we will freeze all but the last attention layer of the model. This further reduces the number of trainable parameters to ~100k, compared to ~7M without LoRA.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "5d457fd9-e3f0-4aa2-a601-7231fc579323", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734035452652, + "executionStopTime": 1734035452895, + "serverExecutionDuration": 5.7214903645217, + "requestMsgId": "5d457fd9-e3f0-4aa2-a601-7231fc579323", + "outputsInitialized": true, + "isAgentGenerated": false, + "id": "MeB6CaOcpIAB", + "outputId": "1fdc4384-ce55-4380-9f7d-7c4ff844ca15" + }, + "source": [ + "attention_layers_to_freeze = model_with_lora.base_model.bert.encoder.layer[:-1]\n", + "\n", + "# Freeze the parameters in the first 11 attention layers\n", + "for param in attention_layers_to_freeze.parameters():\n", + " param.requires_grad = False\n", + "\n", + "\n", + "trainable_params = sum(p.numel() for p in model_with_lora.parameters() if p.requires_grad)\n", + "print(f\"Total trainable parameters with LoRA after freezing: {trainable_params:,}\") # ~1M" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Total trainable parameters with LoRA after freezing: 100,611\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "714ebb6a-7512-491e-946b-b802b6bbc1bc", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "VWXYrtQ8pIAB" + }, + "source": [ + "Now that we have prepared the model with the LoRA setup, it is business as usual for training with DP-SGD. We use DP-SGD with Ghost Clipping.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "697d14ed-0b45-43ca-a700-c898ad30dfb8", + "showInput": true, + "customInput": null, + "language": "python", + "outputsInitialized": false, + "isAgentGenerated": false, + "executionStartTime": 1734035472216, + "executionStopTime": 1734035472334, + "serverExecutionDuration": 2.4106916971505, + "requestMsgId": "697d14ed-0b45-43ca-a700-c898ad30dfb8", + "id": "SdDoJGk2pIAB" + }, + "source": [ + "EPOCHS = 3\n", + "LOGGING_INTERVAL = 5000 # once every how many steps we run evaluation cycle and report metrics\n", + "DELTA = 1 / len(train_dataloader) # Parameter for privacy accounting. Probability of not achieving privacy guarantees\n", + "MAX_GRAD_NORM = 0.1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "originalKey": "409821f0-9513-463b-9077-39d1bf2e5a64", + "showInput": true, + "customInput": null, + "language": "python", + "executionStartTime": 1734035480604, + "executionStopTime": 1734039134648, + "serverExecutionDuration": 3653747.3752331, + "requestMsgId": "409821f0-9513-463b-9077-39d1bf2e5a64", + "outputsInitialized": true, + "isAgentGenerated": false, + "customOutput": null, + "id": "rtSdOuV2pIAB" + }, + "source": [ + "device = torch.device(\"cuda:0\")\n", + "os.environ[\"CUDA_LAUNCH_BLOCKING\"] = \"1\"\n", + "\n", + "optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, eps=1e-8)\n", + "model = model.train()\n", + "\n", + "privacy_engine = PrivacyEngine()\n", + "criterion = nn.CrossEntropyLoss(reduction=\"mean\")\n", + "\n", + "model_lora, optimizer_lora, criterion_lora, train_dataloader = (\n", + " privacy_engine.make_private_with_epsilon(\n", + " module=model,\n", + " optimizer=optimizer,\n", + " data_loader=train_dataloader,\n", + " criterion=criterion,\n", + " target_delta=DELTA,\n", + " target_epsilon=EPSILON,\n", + " epochs=EPOCHS,\n", + " max_grad_norm=MAX_GRAD_NORM,\n", + " grad_sample_mode=\"ghost\",\n", + " )\n", + ")\n", + "\n", + "model_lora = model_lora.to(device)\n", + "model_lora = model_lora.train()\n", + "\n", + "for epoch in range(1, EPOCHS + 1):\n", + " losses = []\n", + " for step, batch in enumerate(tqdm(train_dataloader)):\n", + " optimizer_lora.zero_grad()\n", + " batch = tuple(t.to(device) for t in batch)\n", + " inputs = {\n", + " \"input_ids\": batch[0],\n", + " \"attention_mask\": batch[1],\n", + " \"token_type_ids\": batch[2],\n", + " \"labels\": batch[3],\n", + " }\n", + " outputs = model_lora(**inputs) # output = loss, logits, hidden_states, attentions\n", + " loss = criterion_lora(outputs[1], batch[3])\n", + " loss.backward()\n", + " optimizer_lora.step()\n", + " losses.append(loss.item())\n", + "\n", + " if step > 0 and step % LOGGING_INTERVAL == 0:\n", + " train_loss = np.mean(losses)\n", + " eval_loss, eval_accuracy = evaluate(model_lora)\n", + " eps = privacy_engine.get_epsilon(DELTA)\n", + " print(\n", + " f\"Epoch: {epoch} | \"\n", + " f\"Step: {step} | \"\n", + " f\"Train loss: {train_loss:.3f} | \"\n", + " f\"Eval loss: {eval_loss:.3f} | \"\n", + " f\"Eval accuracy: {eval_accuracy:.3f} | \"\n", + " f\"ɛ: {eps:.2f}\"\n", + " )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "110a245f-fa68-4c59-8cb2-07a95c18a85e", + "showInput": false, + "customInput": null, + "language": "markdown", + "outputsInitialized": false, + "isAgentGenerated": false, + "id": "YDOhAShopIAB" + }, + "source": [ + "\n", + "Epoch: 1 | Step: 5000 | Train loss: 1.370 | Eval loss: 1.208 | Eval accuracy: 0.512 | ɛ: 4.83\n", + "\n", + "Epoch: 1 | Step: 10000 | Train loss: 1.551 | Eval loss: 1.584 | Eval accuracy: 0.621 | ɛ: 5.46\n", + "\n", + "Epoch: 1 | Step: 15000 | Train loss: 1.601 | Eval loss: 1.523 | Eval accuracy: 0.661 | ɛ: 5.86\n", + "\n", + "\n", + "Epoch: 2 | Step: 5000 | Train loss: 1.713 | Eval loss: 1.556 | Eval accuracy: 0.708 | ɛ: 6.29\n", + "\n", + "Epoch: 2 | Step: 10000 | Train loss: 1.720 | Eval loss: 1.573 | Eval accuracy: 0.712 | ɛ: 6.54\n", + "\n", + "Epoch: 2 | Step: 15000 | Train loss: 1.723 | Eval loss: 1.510 | Eval accuracy: 0.725 | ɛ: 6.76\n", + "\n", + "\n", + "Epoch: 3 | Step: 5000 | Train loss: 1.704 | Eval loss: 1.492 | Eval accuracy: 0.735 | ɛ: 7.05\n", + "\n", + "Epoch: 3 | Step: 10000 | Train loss: 1.705 | Eval loss: 1.535 | Eval accuracy: 0.734 | ɛ: 7.24\n", + "\n", + "Epoch: 3 | Step: 15000 | Train loss: 1.704 | Eval loss: 1.492 | Eval accuracy: 0.740 | ɛ: 7.42" + ] + }, + { + "cell_type": "markdown", + "source": [ + "We achieve on-par accuracy with ghost clipping (and vanilla DP-SGD), while training ~100x fewer parameters." + ], + "metadata": { + "id": "cXd0DWIUsnBH" + } + }, + { + "cell_type": "markdown", + "metadata": { + "originalKey": "1340f00d-22db-4623-b2ec-8fd5dd4088ae", + "showInput": false, + "customInput": null, + "language": "markdown", + "id": "HPAik4kTpIAC" + }, + "source": [ + "## Final notes\n", + "\n", + "Notice that there is a significant gap in model accuracy, of about 15 percentage points, between DP and non-DP training.\n", + "\n", + "This gap could be further improved by:\n", + "\n", + "- using larger batch sizes to overcome the effect of noise.\n", + "- hyper-parameter tuning of learning rate and clipping norm (in conjuction with larger batch size).\n", + "- training more attention layers (and exploring the noise-to-model-capacity tradeoff introduced by training more parameters). This strategy is a good fit with LoRA, where we can train more layers with fewer parameters.\n", + "\n", + "We invite you to play along and further improve the accuracy of DP-SGD training." + ] } - }, - "nbformat": 4, - "nbformat_minor": 2 + ] } From 144bd2aee0b0cb3be2c9506b73da9d562cebf534 Mon Sep 17 00:00:00 2001 From: Huanyu Zhang Date: Thu, 19 Dec 2024 08:30:52 -0800 Subject: [PATCH 02/10] Delete CircleCI configs since GithubActions CI are now live (#701) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/701 Reviewed By: iden-kalemaj Differential Revision: D67430600 fbshipit-source-id: 19c10c550a4a438b3125ce487d5a0f239e547367 --- .circleci/config.yml | 517 ------------------------------------ .circleci/flake8_config.ini | 119 --------- 2 files changed, 636 deletions(-) delete mode 100644 .circleci/config.yml delete mode 100644 .circleci/flake8_config.ini diff --git a/.circleci/config.yml b/.circleci/config.yml deleted file mode 100644 index 99f1f43f..00000000 --- a/.circleci/config.yml +++ /dev/null @@ -1,517 +0,0 @@ -version: 2.1 - -# ------------------------------------------------------------------------------------- -# Commands -# ------------------------------------------------------------------------------------- - -commands: - - py_3_9_setup: - description: "Install and switch to Python 3.9; also install pip and pytest." - steps: - - run: - name: "Setup Python v3.9 environment" - command: | - cd /opt/circleci/.pyenv && git pull && cd - - pyenv install -s 3.9.4 - pyenv global 3.9.4 - pyenv local 3.9.4 - pyenv versions - echo "In venv: $(pyenv local) - $(python -V), $(pip -V)" - sudo "$(which python)" -m pip install --upgrade pip - sudo "$(which python)" -m pip install pytest - sudo "$(which python)" -m pip install coverage - sudo "$(which python)" -m pip install coveralls - - run_nvidia_smi: - description: "Prints GPU capabilities from nvidia-smi" - steps: - - run: - name: "Run Nvidia-SMI" - command: | - nvidia-smi - - pip_dev_install: - description: "Install dependencies via pip, including extra deps. Also supports more options, such as building on top of PyTorch nightly." - parameters: - args: - type: string - default: "" - steps: - - run: - name: "Install dependencies via pip" - command: ./scripts/install_via_pip.sh << parameters.args >> - - lint_flake8: - description: "Lint with flake8" - steps: - - run: - name: "Lint with flake8" - command: flake8 --config ./.circleci/flake8_config.ini - - lint_black: - description: "Lint with black" - steps: - - run: - name: "Lint with black" - command: black --check --diff --color . - - isort: - description: "Check import order with isort" - steps: - - run: - name: "Check import order with isort" - command: isort -v -l 88 -o opacus --lines-after-imports 2 -m 3 --trailing-comma --check-only . - - unit_tests: - description: "Run unit tests" - steps: - - run: - name: "Unit tests & doctests" - no_output_timeout: 1h - command: | - mkdir unittest-reports - coverage run -m pytest --doctest-modules -p conftest --junitxml=unittest-reports/junit.xml opacus - coverage report -i -m - - - store_test_results: - path: unittest-reports - - store_artifacts: - path: unittest-reports - - command_unit_tests_multi_gpu: - description: "Run multi gpu unit tests" - steps: - - run: - name: "Unit test multi_gpu" - no_output_timeout: 1h - command: | - mkdir unittest-multigpu-reports - coverage run -m unittest opacus.tests.multigpu_gradcheck.GradientComputationTest.test_gradient_correct - coverage report -i -m - - coveralls_upload_parallel: - description: "upload coverage to coveralls" - steps: - - run: - name: "coveralls upload" - no_output_timeout: 5m - command: | - pip install coveralls --user - COVERALLS_PARALLEL=true COVERALLS_FLAG_NAME="${CIRCLE_JOB}" coveralls - - mnist_integration_test: - description: "Runs MNIST example end to end" - parameters: - device: - default: "cpu" - type: string - steps: - - run: - name: MNIST example - command: | - mkdir -p runs/mnist/data - mkdir -p runs/mnist/test-reports - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - python examples/mnist.py --lr 0.25 --sigma 0.7 -c 1.5 --batch-size 64 --epochs 1 --data-root runs/mnist/data --n-runs 1 --device <> - python -c "import torch; accuracy = torch.load('run_results_mnist_0.25_0.7_1.5_64_1.pt'); exit(0) if (accuracy[0]>0.78 and accuracy[0]<0.95) else exit(1)" - when: always - - store_test_results: - path: runs/mnist/test-reports - - store_artifacts: - path: runs/mnist/test-reports - - mnist_lightning_integration_test: - description: "Runs MNIST-Lightning example end to end" - parameters: - device: - default: "cpu" - type: string - steps: - - run: - name: MNIST-Lightning example - command: | - mkdir -p runs/mnist/data - mkdir -p runs/mnist/test-reports - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - python examples/mnist_lightning.py fit --trainer.accelerator <> --model.lr 0.25 --model.sigma 0.7 --model.max_per_sample_grad_norm 1.5 --model.sample_rate 0.004 --trainer.max_epochs 1 --data.data_dir runs/mnist/data --data.sample_rate 0.004 - python -c "import torch; exit(0)" - when: always - - store_test_results: - path: runs/mnist-lightning/test-reports - - store_artifacts: - path: runs/mnist-lightning/test-reports - - cifar10_integration_test: - description: "Runs CIFAR10 example end to end" - parameters: - device: - default: "cpu" - type: string - steps: - - run: - name: CIFAR10 example - command: | - mkdir -p runs/cifar10/data - mkdir -p runs/cifar10/logs - mkdir -p runs/cifar10/test-reports - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - pip install tensorboard - python examples/cifar10.py --lr 0.1 --sigma 1.5 -c 10 --batch-size 2000 --epochs 10 --data-root runs/cifar10/data --log-dir runs/cifar10/logs --device <> - python -c "import torch; model = torch.load('model_best.pth.tar'); exit(0) if (model['best_acc1']>0.4 and model['best_acc1']<0.49) else exit(1)" - python examples/cifar10.py --lr 0.1 --sigma 1.5 -c 10 --batch-size 2000 --epochs 10 --data-root runs/cifar10/data --log-dir runs/cifar10/logs --device <> --grad_sample_mode no_op - python -c "import torch; model = torch.load('model_best.pth.tar'); exit(0) if (model['best_acc1']>0.4 and model['best_acc1']<0.49) else exit(1)" - when: always - - store_test_results: - path: runs/cifar10/test-reports - - store_artifacts: - path: runs/cifar10/test-reports - - dcgan_integration_test: - description: "Runs dcgan example end to end" - parameters: - device: - default: "cpu" - type: string - steps: - - run: - name: dcgan example - command: | - mkdir -p runs/dcgan/data - mkdir -p runs/dcgan/test-reports - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - python examples/dcgan.py --lr 2e-4 --sigma 0.7 -c 1.5 --batch-size 32 --epochs 1 --data-root runs/dcgan/data --device <> - when: always - - store_test_results: - path: runs/dcgan/test-reports - - store_artifacts: - path: runs/dcgan/test-reports - - imdb_integration_test: - description: "Runs imdb example end to end" - parameters: - device: - default: "cpu" - type: string - steps: - - run: - name: imdb example - command: | - mkdir -p runs/imdb/data - mkdir -p runs/imdb/test-reports - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - pip install --user datasets transformers - python examples/imdb.py --lr 0.02 --sigma 1.0 -c 1.0 --batch-size 64 --max-sequence-length 256 --epochs 2 --data-root runs/imdb/data --device <> - python -c "import torch; accuracy = torch.load('run_results_imdb_classification.pt'); exit(0) if (accuracy>0.54 and accuracy<0.66) else exit(1)" - when: always - - store_test_results: - path: runs/imdb/test-reports - - store_artifacts: - path: runs/imdb/test-reports - - charlstm_integration_test: - description: "Runs charlstm example end to end" - parameters: - device: - default: "cpu" - type: string - steps: - - run: - name: charlstm example - command: | - mkdir -p runs/charlstm/data - wget https://download.pytorch.org/tutorial/data.zip -O runs/charlstm/data/data.zip - unzip runs/charlstm/data/data.zip -d runs/charlstm/data - rm runs/charlstm/data/data.zip - mkdir -p runs/charlstm/test-reports - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - pip install scikit-learn - python examples/char-lstm-classification.py --epochs=20 --learning-rate=2.0 --hidden-size=128 --delta=8e-5 --batch-size 400 --n-layers=1 --sigma=1.0 --max-per-sample-grad-norm=1.5 --data-root="runs/charlstm/data/data/names/" --device=<> --test-every 5 - python -c "import torch; accuracy = torch.load('run_results_chr_lstm_classification.pt'); exit(0) if (accuracy>0.60 and accuracy<0.80) else exit(1)" - when: always - - store_test_results: - path: runs/charlstm/test-reports - - store_artifacts: - path: runs/charlstm/test-reports - - benchmark_layers_integration_test: - description: "Runs benchmark end to end" - parameters: - device: - default: "cpu" - type: string - layers: - type: string - grad_sample_modes: - default: "baseline hooks" - type: string - report_column: - default: "hooks/baseline" - type: string - runtime_ratio_threshold: - type: string - memory_ratio_threshold: - type: string - steps: - - run: - name: benchmarks - command: | - mkdir -p benchmarks/results/raw - echo "Using $(python -V) ($(which python))" - echo "Using $(pip -V) ($(which pip))" - python benchmarks/run_benchmarks.py --batch_size 16 --layers <> --config_file ./benchmarks/config.json --root ./benchmarks/results/raw/ --cont - IFS=$' ';layers=(<>); rm -rf /tmp/report_layers; mkdir -p /tmp/report_layers; IFS=$'\n'; files=`( echo "${layers[*]}" ) | sed 's/.*/.\/benchmarks\/results\/raw\/&*/'` - cp -v ${files[@]} /tmp/report_layers - report_id=`IFS=$'-'; echo "${layers[*]}"` - python benchmarks/generate_report.py --path-to-results /tmp/report_layers --save-path benchmarks/results/report-${report_id}.csv --format csv - python benchmarks/generate_report.py --path-to-results /tmp/report_layers --save-path benchmarks/results/report-${report_id}.pkl --format pkl - - python benchmarks/check_threshold.py --report-path "./benchmarks/results/report-"$report_id".pkl" --metric runtime --threshold <> --column <> - when: always - - store_artifacts: - path: benchmarks/results/ -# ------------------------------------------------------------------------------------- -# Jobs -# ------------------------------------------------------------------------------------- - -jobs: - - lint_py39_torch_release: - docker: - - image: cimg/python:3.9 - steps: - - checkout - - pip_dev_install - - lint_flake8 - - lint_black - - isort - - unittest_py38_torch_release: - docker: - - image: cimg/python:3.8 - steps: - - checkout - - pip_dev_install - - unit_tests - - unittest_py39_torch_release: - docker: - - image: cimg/python:3.9 - steps: - - checkout - - pip_dev_install - - unit_tests - - unittest_py39_torch_nightly: - docker: - - image: cimg/python:3.9 - steps: - - checkout - - pip_dev_install: - args: "-n" - - unit_tests - - prv_accountant_values: - docker: - - image: cimg/python:3.9 - steps: - - checkout - - py_3_9_setup - - pip_dev_install - - run: - name: "Unit test prv accountant" - no_output_timeout: 1h - command: | - python -m unittest opacus.tests.prv_accountant - - integrationtest_py39_torch_release_cpu: - docker: - - image: cimg/python:3.9 - steps: - - checkout - - py_3_9_setup - - pip_dev_install - - mnist_integration_test: - device: "cpu" - - integrationtest_py39_torch_release_cuda: - machine: - resource_class: gpu.nvidia.small.multi - image: linux-cuda-12:default - steps: - - checkout - - py_3_9_setup - - pip_dev_install - - run_nvidia_smi - - mnist_integration_test: - device: "cuda" - - cifar10_integration_test: - device: "cuda" - - imdb_integration_test: - device: "cuda" - - charlstm_integration_test: - device: "cuda" - - dcgan_integration_test: - device: "cuda" - - micro_benchmarks_py39_torch_release_cuda: - machine: - resource_class: gpu.nvidia.small.multi - image: linux-cuda-12:default - steps: - - checkout - - py_3_9_setup - - pip_dev_install - - run_nvidia_smi - - benchmark_layers_integration_test: - device: "cuda" - layers: "groupnorm instancenorm layernorm" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "3.0" - memory_ratio_threshold: "1.6" - - benchmark_layers_integration_test: - device: "cuda" - layers: "linear" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "3.6" - memory_ratio_threshold: "13.0" - - benchmark_layers_integration_test: - device: "cuda" - layers: "mha dpmha" - report_column: "dp_baseline/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "3.0" - memory_ratio_threshold: "1.6" - - benchmark_layers_integration_test: - device: "cuda" - layers: "mha dpmha" - report_column: "dp_hooks/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "3.5" - memory_ratio_threshold: "2.0" - - benchmark_layers_integration_test: - device: "cuda" - layers: "gru dpgru" - report_column: "dp_baseline/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "55.2" - memory_ratio_threshold: "1.2" - - benchmark_layers_integration_test: - device: "cuda" - layers: "gru dpgru" - report_column: "dp_hooks/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "140" - memory_ratio_threshold: "1.6" - - benchmark_layers_integration_test: - device: "cuda" - layers: "lstm dplstm" - report_column: "dp_baseline/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "48.6" - memory_ratio_threshold: "1.2" - - benchmark_layers_integration_test: - device: "cuda" - layers: "lstm dplstm" - report_column: "dp_hooks/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "126.0" - memory_ratio_threshold: "1.8" - - benchmark_layers_integration_test: - device: "cuda" - layers: "rnn dprnn" - report_column: "dp_baseline/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "21.4" - memory_ratio_threshold: "1.2" - - benchmark_layers_integration_test: - device: "cuda" - layers: "rnn dprnn" - report_column: "dp_hooks/baseline" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "98.5" - memory_ratio_threshold: "1.2" - - benchmark_layers_integration_test: - device: "cuda" - layers: "embedding" - grad_sample_modes: "baseline hooks" - runtime_ratio_threshold: "8.0" - memory_ratio_threshold: "15.0" - - unittest_multi_gpu: - machine: - resource_class: gpu.nvidia.medium.multi - image: linux-cuda-12:default - steps: - - checkout - - py_3_9_setup - - pip_dev_install - - run_nvidia_smi - - command_unit_tests_multi_gpu - - finish_coveralls_parallel: - docker: - - image: cimg/python:3.9 - steps: - - run: - name: "finish coveralls parallel" - no_output_timeout: 5m - command: | - pip install coveralls --user - coveralls --finish - - -aliases: - - - &exclude_ghpages - branches: - ignore: - - gh-pages - -# ------------------------------------------------------------------------------------- -# Workflows -# ------------------------------------------------------------------------------------- - -workflows: - commit: - when: - not: - equal: [ scheduled_pipeline, << pipeline.trigger_source >> ] - jobs: - - lint_py39_torch_release: - filters: *exclude_ghpages - - unittest_py38_torch_release: - filters: *exclude_ghpages - - unittest_py39_torch_release: - filters: *exclude_ghpages - - unittest_py39_torch_nightly: - filters: *exclude_ghpages - - unittest_multi_gpu: - filters: *exclude_ghpages - - integrationtest_py39_torch_release_cpu: - filters: *exclude_ghpages - - integrationtest_py39_torch_release_cuda: - filters: *exclude_ghpages - - prv_accountant_values: - filters: *exclude_ghpages - - nightly: - when: - equal: [ scheduled_pipeline, << pipeline.trigger_source >> ] - jobs: - - unittest_py39_torch_nightly: - filters: *exclude_ghpages - - integrationtest_py39_torch_release_cpu: - filters: *exclude_ghpages - - integrationtest_py39_torch_release_cuda: - filters: *exclude_ghpages - - lint_py39_torch_release: - filters: *exclude_ghpages - - micro_benchmarks_py39_torch_release_cuda: - filters: *exclude_ghpages diff --git a/.circleci/flake8_config.ini b/.circleci/flake8_config.ini deleted file mode 100644 index 988ed7f3..00000000 --- a/.circleci/flake8_config.ini +++ /dev/null @@ -1,119 +0,0 @@ -[flake8] -select = B,C,E,F,P,W,B9 -max-line-length = 80 -# Main Explanation Docs: https://github.com/grantmcconnaughey/Flake8Rules -ignore = - # Black conflicts and overlaps. - # Found in https://github.com/psf/black/issues/429 - # B950: Line too long. (Use `arc lint`'s LINEWRAP instead) - B950, - # E111: Indentation is not a multiple of four. - E111, - # E115: Expected an indented block (comment). - E115, - # E117: Over-indented. - E117, - # E121: Continuation line under-indented for hanging indent. - E121, - # E122: Continuation line missing indentation or outdented. - E122, - # E123: Closing bracket does not match indentation of opening bracket's line. - E123, - # E124: Closing bracket does not match visual indentation. - E124, - # E125: Continuation line with same indent as next logical line. - E125, - # E126: Continuation line over-indented for hanging indent. - E126, - # E127: Continuation line over-indented for visual indent. - E127, - # E128: Continuation line under-indented for visual indent. - E128, - # E129: Visually indented line with same indent as next logical line. - E129, - # E201: Whitespace after '('. - E201, - # E202: Whitespace before ')'. - E202, - # E203: Whitespace before ':'. - E203, - # E221: Multiple spaces before operator. - E221, - # E222: Multiple spaces after operator. - E222, - # E225: Missing whitespace around operator. - E225, - # E226: Missing whitespace around arithmetic operator. - E226, - # E227: Missing whitespace around bitwise or shift operator. - E227, - # E231: Missing whitespace after ',', ';', or ':'. - E231, - # E241: Multiple spaces after ','. - E241, - # E251: Unexpected spaces around keyword / parameter equals. - E251, - # E261: At least two spaces before inline comment. - E261, - # E262: Inline comment should start with '# '. - E262, - # E265: Block comment should start with '# '. - E265, - # E271: Multiple spaces after keyword. - E271, - # E272: Multiple spaces before keyword. - E272, - # E301: Expected 1 blank line, found 0. - E301, - # E302: Expected 2 blank lines, found 0. - E302, - # E303: Too many blank lines (3). - E303, - # E305: Expected 2 blank lines after end of function or class. - E305, - # E306: Expected 1 blank line before a nested definition. - E306, - # E501: Line too long (82 > 79 characters). - E501, - # E502: The backslash is redundant between brackets. - E502, - # E701: Multiple statements on one line (colon). - E701, - # E702: Multiple statements on one line (semicolon). - E702, - # E703: Statement ends with a semicolon. - E703, - # E704: Multiple statements on one line (def). - E704, - # W291: Trailing whitespace. - W291, - # W292: No newline at end of file. - W292, - # W293: Blank line contains whitespace. - W293, - # W391: Blank line at end of file. - W391, - - # Too opinionated. - # E265: Block comment should start with '# '. - E265, - # E266: Too many leading '#' for block comment. - E266, - # E402: Module level import not at top of file. - E402, - # E722: Do not use bare except, specify exception instead. (Duplicate of B001) - E722, - # F811: Redefinition of unused name from line n. - F811, - # P207: (Duplicate of B003) - P207, - # P208: (Duplicate of C403) - P208, - # W503: Line break occurred before a binary operator. - W503 - -exclude = - .hg, - __pycache__, - -max-complexity = 12 From f86ddf4380cdbcb2fd37bb886b8da7fa7d20b613 Mon Sep 17 00:00:00 2001 From: Iden Kalemaj Date: Thu, 19 Dec 2024 11:23:42 -0800 Subject: [PATCH 03/10] Separate function for preparing criterion in PrivacyEngine (#703) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/703 Having a separate function for preparing the criterion makes it easy to build custom extensions of PrivacyEnginge for methods that require a different DPLoss class, e.g., adaptive clipping. Reviewed By: EnayatUllah Differential Revision: D67458234 fbshipit-source-id: 9fca64fcde7714708ac1cb9a35a991099606f449 --- opacus/privacy_engine.py | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/opacus/privacy_engine.py b/opacus/privacy_engine.py index 1af891c4..558c8f8e 100644 --- a/opacus/privacy_engine.py +++ b/opacus/privacy_engine.py @@ -212,6 +212,26 @@ def _prepare_model( loss_reduction=loss_reduction, ) + def _prepare_criterion( + self, + *, + module: GradSampleModule, + optimizer: DPOptimizer, + criterion=nn.CrossEntropyLoss(), + loss_reduction: str = "mean", + **kwargs, + ) -> DPLossFastGradientClipping: + """ + Args: + module: GradSampleModule used for training, + optimizer: DPOptimizer used for training, + criterion: Loss function used for training, + loss_reduction: "mean" or "sum", indicates if the loss reduction (for aggregating the gradients) + + Prepare the DP loss class, which packages the two backward passes for fast gradient clipping. + """ + return DPLossFastGradientClipping(module, optimizer, criterion, loss_reduction) + def is_compatible( self, *, @@ -403,9 +423,14 @@ def make_private( self.accountant.get_optimizer_hook_fn(sample_rate=sample_rate) ) if grad_sample_mode == "ghost": - criterion = DPLossFastGradientClipping( - module, optimizer, criterion, loss_reduction + criterion = self._prepare_criterion( + module=module, + optimizer=optimizer, + criterion=criterion, + loss_reduction=loss_reduction, + **kwargs, ) + return module, optimizer, criterion, data_loader return module, optimizer, data_loader From 2a2583013bcb84b2b191b84a30706ef50526b965 Mon Sep 17 00:00:00 2001 From: Iden Kalemaj Date: Fri, 20 Dec 2024 02:04:34 -0800 Subject: [PATCH 04/10] Add research folder (#700) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/700 Add a research folder for open-source contributions of new methods enhancing DP-SGD. Reviewed By: HuanyuZhang Differential Revision: D67400804 fbshipit-source-id: 49e9dbd7914822c73538a723380ae7578962c1f7 --- research/README.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 research/README.md diff --git a/research/README.md b/research/README.md new file mode 100644 index 00000000..3d16e32b --- /dev/null +++ b/research/README.md @@ -0,0 +1,22 @@ +# New Methods and Extensions of Opacus + +This directory contains novel methods built on top of Opacus that enhance DP-SGD. These contributions, made by the community, stem from research demonstrating potential improvements in differentially private model training. By consolidating these methods within the Opacus repository, we facilitate new research and provide a broader array of tools for DP-ML practitioners. + + +## Contributions +We warmly welcome and encourage contributions of new methods! To contribute, please follow these steps: + +1. Fork the repo and create your branch from `main`. +2. Place the new method in a separate subfolder within the `research` directory. +3. The new folder should include a `README.md` that explains the method at a high level, demonstrates usage (e.g., introducing new parameters to the `PrivacyEngine`), and cites relevant sources. The subfolder name should aptly represent the method. +4. If you have added code that should be tested, add unit tests. +5. If you have changed APIs, document the API change in the PR. Also update the documentation and make sure the documentation builds. +6. Ensure the test suite passes. +7. Make sure your code passes both `black` and `flake8` formatting checks. + +More detailed PR instructions can be found [here](https://github.com/pytorch/opacus/blob/main/CONTRIBUTING.md). + +Feel free to reach out with any questions about the process or to discuss whether your method is a good fit for the repository. + +## Notes +Please note that the code provided in this directory will not be maintained by the Opacus team, which may lead to compatibility issues with future changes. If you have any questions, please reach out to the PR contributor. From ed1bd0bad2d73780c384317f50496ea027e7c6f3 Mon Sep 17 00:00:00 2001 From: Iden Kalemaj Date: Fri, 20 Dec 2024 02:32:57 -0800 Subject: [PATCH 05/10] Remove **kwargs from optim_class initialization (#702) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/702 None of the optimizer classes accept **kwargs or have special arguments so I am removing **kwargs from optim_class(). Otherwise, the current code throws an error when creating a custom PrivacyEngine that takes in additional arguments. Reviewed By: EnayatUllah Differential Revision: D67456352 fbshipit-source-id: 8d2985a3f1b5b0af7004b249b6da13aee02debd7 --- opacus/privacy_engine.py | 1 - 1 file changed, 1 deletion(-) diff --git a/opacus/privacy_engine.py b/opacus/privacy_engine.py index 558c8f8e..bdddafe4 100644 --- a/opacus/privacy_engine.py +++ b/opacus/privacy_engine.py @@ -136,7 +136,6 @@ def _prepare_optimizer( loss_reduction=loss_reduction, generator=generator, secure_mode=self.secure_mode, - **kwargs, ) def _prepare_data_loader( From b3c390efb70319ea6b4f84b30df849e7960b6253 Mon Sep 17 00:00:00 2001 From: Huanyu Zhang Date: Fri, 20 Dec 2024 07:06:29 -0800 Subject: [PATCH 06/10] Padding on the research folder (D67400804) (#705) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/705 Slightly change the language of the ``readme``. Reviewed By: iden-kalemaj Differential Revision: D67520069 fbshipit-source-id: ee8ac1b878daa78270ef8c145a339a917de899bb --- research/README.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/research/README.md b/research/README.md index 3d16e32b..952b0d40 100644 --- a/research/README.md +++ b/research/README.md @@ -9,14 +9,10 @@ We warmly welcome and encourage contributions of new methods! To contribute, ple 1. Fork the repo and create your branch from `main`. 2. Place the new method in a separate subfolder within the `research` directory. 3. The new folder should include a `README.md` that explains the method at a high level, demonstrates usage (e.g., introducing new parameters to the `PrivacyEngine`), and cites relevant sources. The subfolder name should aptly represent the method. -4. If you have added code that should be tested, add unit tests. -5. If you have changed APIs, document the API change in the PR. Also update the documentation and make sure the documentation builds. -6. Ensure the test suite passes. -7. Make sure your code passes both `black` and `flake8` formatting checks. More detailed PR instructions can be found [here](https://github.com/pytorch/opacus/blob/main/CONTRIBUTING.md). Feel free to reach out with any questions about the process or to discuss whether your method is a good fit for the repository. ## Notes -Please note that the code provided in this directory will not be maintained by the Opacus team, which may lead to compatibility issues with future changes. If you have any questions, please reach out to the PR contributor. +Please note that the code provided in this directory will not be maintained by the Opacus team, which may lead to compatibility issues with future changes. If you have any questions, please reach out to the PR contributor directly. From 6b756a7c506c9d610487e5a6b5e66771df87e720 Mon Sep 17 00:00:00 2001 From: Iden Kalemaj Date: Fri, 20 Dec 2024 08:30:05 -0800 Subject: [PATCH 07/10] Add *kwargs to get_epsilon (#704) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/704 Add **kwargs as argument to the get_epsilon() function in accountants. It allows for building custom PrivacyEngines that take in additional arguments. Reviewed By: EnayatUllah Differential Revision: D67514874 fbshipit-source-id: 6ebee60d9662ec90ae10b7dcbbb1a939b33f9bfa --- opacus/accountants/gdp.py | 2 +- opacus/accountants/prv.py | 7 ++++++- opacus/accountants/rdp.py | 5 ++++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/opacus/accountants/gdp.py b/opacus/accountants/gdp.py index c39dc765..8a147ae0 100644 --- a/opacus/accountants/gdp.py +++ b/opacus/accountants/gdp.py @@ -44,7 +44,7 @@ def step(self, *, noise_multiplier: float, sample_rate: float): else: self.history = [(noise_multiplier, sample_rate, 1)] - def get_epsilon(self, delta: float, poisson: bool = True) -> float: + def get_epsilon(self, delta: float, poisson: bool = True, **kwargs) -> float: """ Return privacy budget (epsilon) expended so far. diff --git a/opacus/accountants/prv.py b/opacus/accountants/prv.py index ee0e0d6d..b1d09ebf 100644 --- a/opacus/accountants/prv.py +++ b/opacus/accountants/prv.py @@ -81,7 +81,12 @@ def step(self, *, noise_multiplier: float, sample_rate: float): self.history.append((noise_multiplier, sample_rate, 1)) def get_epsilon( - self, delta: float, *, eps_error: float = 0.01, delta_error: float = None + self, + delta: float, + *, + eps_error: float = 0.01, + delta_error: float = None, + **kwargs, ) -> float: """ Return privacy budget (epsilon) expended so far. diff --git a/opacus/accountants/rdp.py b/opacus/accountants/rdp.py index a5b9cb03..1697e352 100644 --- a/opacus/accountants/rdp.py +++ b/opacus/accountants/rdp.py @@ -68,7 +68,10 @@ def get_privacy_spent( return float(eps), float(best_alpha) def get_epsilon( - self, delta: float, alphas: Optional[List[Union[float, int]]] = None + self, + delta: float, + alphas: Optional[List[Union[float, int]]] = None, + **kwargs, ): """ Return privacy budget (epsilon) expended so far. From 53b3c25432bb75dd92b22131d02fcdc39c8dbe5f Mon Sep 17 00:00:00 2001 From: Enayat Ullah Date: Tue, 7 Jan 2025 10:51:24 -0800 Subject: [PATCH 08/10] Fixing Ghost Clipping with Batch Memory Manager Summary: Ghost Clipping with Batch memory manager had an error, resulting in major accuracy loss. The issue was in the accumulate function, the command `p.summed_grad +=p.grad`, wasn't working as expected, since `p.grad` is modified in every iteration. The fix is to copy it and modify in-place. Reviewed By: HuanyuZhang Differential Revision: D67778159 fbshipit-source-id: b103cf95905c0b1feb9745249ec1669c95c11979 --- opacus/optimizers/optimizer_fast_gradient_clipping.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/opacus/optimizers/optimizer_fast_gradient_clipping.py b/opacus/optimizers/optimizer_fast_gradient_clipping.py index aa415e33..21489779 100644 --- a/opacus/optimizers/optimizer_fast_gradient_clipping.py +++ b/opacus/optimizers/optimizer_fast_gradient_clipping.py @@ -14,6 +14,7 @@ from __future__ import annotations +import copy import logging from typing import Callable, Optional @@ -112,9 +113,9 @@ def accumulate(self): """ for p in self.params: if p.summed_grad is not None: - p.summed_grad += p.grad + p.summed_grad.add_(p.grad.data) else: - p.summed_grad = p.grad + p.summed_grad = copy.deepcopy(p.grad.data) def zero_grad(self, set_to_none: bool = False): """ From 9b7854347e1ab172ceb8e1ac1ed4118cd3a96544 Mon Sep 17 00:00:00 2001 From: Enayat Ullah Date: Thu, 9 Jan 2025 08:30:20 -0800 Subject: [PATCH 09/10] Website and Github update (#677) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/677 Two updates: 1. Github page: Added a line that the latest version supports fast gradient and ghost clipping. 2. Wesbite: Removed the line about passing in custom alphas in the privacy accountant in the FAQs section of website. Reviewed By: HuanyuZhang Differential Revision: D63790553 fbshipit-source-id: 566914c9c92451f3b90804cab6c560f57ba597e2 --- README.md | 2 + docs/faq.md | 3 +- tutorials/building_text_classifier.ipynb | 1716 +--------------------- website/package.json | 26 +- 4 files changed, 19 insertions(+), 1728 deletions(-) diff --git a/README.md b/README.md index 5ef7651d..a671d35c 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ [Opacus](https://opacus.ai) is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to online track the privacy budget expended at any given moment. + ## Target audience This code release is aimed at two target audiences: 1. ML practitioners will find this to be a gentle introduction to training a model with differential privacy as it requires minimal code changes. @@ -99,6 +100,7 @@ If you want to learn more about DP-SGD and related topics, check out our series - [PriCon 2020 Tutorial: Differentially Private Model Training with Opacus](https://www.youtube.com/watch?v=MWPwofiQMdE&list=PLUNOsx6Az_ZGKQd_p4StdZRFQkCBwnaY6&index=52) - [Differential Privacy on PyTorch | PyTorch Developer Day 2020](https://www.youtube.com/watch?v=l6fbl2CBnq0) - [Opacus v1.0 Highlights | PyTorch Developer Day 2021](https://www.youtube.com/watch?v=U1mszp8lzUI) +- [Enabling Fast Gradient Clipping and Ghost Clipping in Opacus](https://pytorch.org/blog/clipping-in-opacus/) ## FAQ diff --git a/docs/faq.md b/docs/faq.md index 1de387a8..ea12bebe 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -108,7 +108,8 @@ Opacus computes and stores *per-sample* gradients under the hood. What this mean Although we report expended privacy budget using the (epsilon, delta) language, internally, we track it using Rényi Differential Privacy (RDP) [[Mironov 2017](https://arxiv.org/abs/1702.07476), [Mironov et al. 2019](https://arxiv.org/abs/1908.10530)]. In short, (alpha, epsilon)-RDP bounds the [Rényi divergence](https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#R%C3%A9nyi_divergence) of order alpha between the distribution of the mechanism’s outputs on any two datasets that differ in a single element. An (alpha, epsilon)-RDP statement is a relaxation of epsilon-DP but retains many of its important properties that make RDP particularly well-suited for privacy analysis of DP-SGD. The `alphas` parameter instructs the privacy engine what RDP orders to use for tracking privacy expenditure. -When the privacy engine needs to bound the privacy loss of a training run using (epsilon, delta)-DP for a given delta, it searches for the optimal order from among `alphas`. There’s very little additional cost in expanding the list of orders. We suggest using a list `[1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))`. You can pass your own alphas by passing `alphas=custom_alphas` when calling `privacy_engine.make_private_with_epsilon`. +When the privacy engine needs to bound the privacy loss of a training run using (epsilon, delta)-DP for a given delta, it searches for the optimal order from among `alphas`. There’s very little additional cost in expanding the list of orders. We suggest using a list `[1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))`. + A call to `privacy_engine.get_epsilon(delta=delta)` returns a pair: an epsilon such that the training run satisfies (epsilon, delta)-DP and an optimal order alpha. An easy diagnostic to determine whether the list of `alphas` ought to be expanded is whether the returned value alpha is one of the two boundary values of `alphas`. diff --git a/tutorials/building_text_classifier.ipynb b/tutorials/building_text_classifier.ipynb index 585d54d7..b4eb0d58 100644 --- a/tutorials/building_text_classifier.ipynb +++ b/tutorials/building_text_classifier.ipynb @@ -21,1721 +21,7 @@ "provenance": [], "gpuType": "T4" }, - "accelerator": "GPU", - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "a6f080fa6f4b4de399af5d1d7850b960": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HBoxModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_47fec328e2464db3861b16e68e6cc65d", - "IPY_MODEL_3ec2b8f4e38d4b05a09c83e6925960a6", - "IPY_MODEL_6fee830ab9f545cea62081d8cf5b3240" - ], - "layout": "IPY_MODEL_cf024e76fe9b4766ac035f617391deb7" - } - }, - "47fec328e2464db3861b16e68e6cc65d": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_fc0b17bfb44c45dd9825c6f1719b61cc", - "placeholder": "​", - "style": "IPY_MODEL_2dac3a8089c34d0b9ce81653bde67603", - "value": "config.json: 100%" - } - }, - "3ec2b8f4e38d4b05a09c83e6925960a6": { - "model_module": "@jupyter-widgets/controls", - "model_name": "FloatProgressModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_ffb2ab66b3ca4d9899658fe58c43acb1", - "max": 570, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_4b6aad944453432dbf957da380d059a1", - "value": 570 - } - }, - "6fee830ab9f545cea62081d8cf5b3240": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_cf3dae1b960440d7984e9ed287b54cee", - "placeholder": "​", - "style": "IPY_MODEL_36254c0a04c840f3bf4038096c736873", - "value": " 570/570 [00:00<00:00, 33.8kB/s]" - } - }, - "cf024e76fe9b4766ac035f617391deb7": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "fc0b17bfb44c45dd9825c6f1719b61cc": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "2dac3a8089c34d0b9ce81653bde67603": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "ffb2ab66b3ca4d9899658fe58c43acb1": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "4b6aad944453432dbf957da380d059a1": { - "model_module": "@jupyter-widgets/controls", - "model_name": "ProgressStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "cf3dae1b960440d7984e9ed287b54cee": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "36254c0a04c840f3bf4038096c736873": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "e3ffd50ee822433fabd9c1ee4a39612e": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HBoxModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_f248160605a2450a8411e4f5d58a5cfa", - "IPY_MODEL_a7b6e7aa521647649bb4157b6504d4e8", - "IPY_MODEL_1c1e86bca0534caaa7ad435fd7e67bf2" - ], - "layout": "IPY_MODEL_22ca9e6c6c1f4bc6b0f3db1a09a5e562" - } - }, - "f248160605a2450a8411e4f5d58a5cfa": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_e9699698559c4860bdf6a312c492e7da", - "placeholder": "​", - "style": "IPY_MODEL_874e45fe39844927ad1fd10d4899a428", - "value": "tokenizer_config.json: 100%" - } - }, - "a7b6e7aa521647649bb4157b6504d4e8": { - "model_module": "@jupyter-widgets/controls", - "model_name": "FloatProgressModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_2dca6b2477344b45b1c17f124e27ce72", - "max": 49, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_c7afefe6f907441b9e466605cb4f5c7f", - "value": 49 - } - }, - "1c1e86bca0534caaa7ad435fd7e67bf2": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_df3c4c12e06245b1a8f4a4a7d71a530c", - "placeholder": "​", - "style": "IPY_MODEL_e63eb3e5c06140249f6d8d4c04fe8693", - "value": " 49.0/49.0 [00:00<00:00, 2.50kB/s]" - } - }, - "22ca9e6c6c1f4bc6b0f3db1a09a5e562": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "e9699698559c4860bdf6a312c492e7da": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "874e45fe39844927ad1fd10d4899a428": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "2dca6b2477344b45b1c17f124e27ce72": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "c7afefe6f907441b9e466605cb4f5c7f": { - "model_module": "@jupyter-widgets/controls", - "model_name": "ProgressStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "df3c4c12e06245b1a8f4a4a7d71a530c": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "e63eb3e5c06140249f6d8d4c04fe8693": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "d951c3592058414ab00cf754e9b70685": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HBoxModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_c3f9146e082346a3bed274efb2265376", - "IPY_MODEL_1602c2298e9443f78007fdbf101a0c2b", - "IPY_MODEL_d5dda55bd4de4f12bf3718fb386c5bf9" - ], - "layout": "IPY_MODEL_943e61866ed74be4b10ba383450cb4c3" - } - }, - "c3f9146e082346a3bed274efb2265376": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_0727d77eaf28466c93c2c6021661ac9a", - "placeholder": "​", - "style": "IPY_MODEL_3ab2e1a9ba694463ab5f3ec78ad0a8f4", - "value": "vocab.txt: 100%" - } - }, - "1602c2298e9443f78007fdbf101a0c2b": { - "model_module": "@jupyter-widgets/controls", - "model_name": "FloatProgressModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_30b4db204fa644128198abf6d82664bf", - "max": 213450, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_0517fdac88784fe6b51ad1f989f99cb7", - "value": 213450 - } - }, - "d5dda55bd4de4f12bf3718fb386c5bf9": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_ce1c55bc51dc49a9b261f104f49d38d8", - "placeholder": "​", - "style": "IPY_MODEL_4b7e11bb32bc43c9ad0449bd39bc4d40", - "value": " 213k/213k [00:00<00:00, 613kB/s]" - } - }, - "943e61866ed74be4b10ba383450cb4c3": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "0727d77eaf28466c93c2c6021661ac9a": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "3ab2e1a9ba694463ab5f3ec78ad0a8f4": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "30b4db204fa644128198abf6d82664bf": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "0517fdac88784fe6b51ad1f989f99cb7": { - "model_module": "@jupyter-widgets/controls", - "model_name": "ProgressStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "ce1c55bc51dc49a9b261f104f49d38d8": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "4b7e11bb32bc43c9ad0449bd39bc4d40": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "3dbf36a5c0884579ab2f36c2e91c04fb": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HBoxModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_d71c49bd9f8c438898250c3874c06240", - "IPY_MODEL_1b70fa16b803466ea31649dcd644e3d7", - "IPY_MODEL_7a728bea623646c182c508e34b582fc9" - ], - "layout": "IPY_MODEL_8494caffe83a4743a11d2751b38c56bb" - } - }, - "d71c49bd9f8c438898250c3874c06240": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_1580380472df40e38cbde67659c5221d", - "placeholder": "​", - "style": "IPY_MODEL_349b262479b9418badd6a3acff386dd2", - "value": "tokenizer.json: 100%" - } - }, - "1b70fa16b803466ea31649dcd644e3d7": { - "model_module": "@jupyter-widgets/controls", - "model_name": "FloatProgressModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_3370a9d70dd04d5195bc3f1f81b18728", - "max": 435797, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_ee235d5ffc5142c895175cbea5c94dfe", - "value": 435797 - } - }, - "7a728bea623646c182c508e34b582fc9": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_f6c5ef333a2b45f1bf196fdd58873688", - "placeholder": "​", - "style": "IPY_MODEL_e24ff9dae78241d9b5a6a7199c888e45", - "value": " 436k/436k [00:00<00:00, 1.24MB/s]" - } - }, - "8494caffe83a4743a11d2751b38c56bb": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "1580380472df40e38cbde67659c5221d": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "349b262479b9418badd6a3acff386dd2": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "3370a9d70dd04d5195bc3f1f81b18728": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "ee235d5ffc5142c895175cbea5c94dfe": { - "model_module": "@jupyter-widgets/controls", - "model_name": "ProgressStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "f6c5ef333a2b45f1bf196fdd58873688": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "e24ff9dae78241d9b5a6a7199c888e45": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "d4a768f261614ac69b3004fbf2323c89": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HBoxModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_81a7d1b27cd94916ac3c330aa2551cf0", - "IPY_MODEL_eac4d3f8e59a4d4c81178cc76600182f", - "IPY_MODEL_5bcca2ee852144c28bfd40de0978cadc" - ], - "layout": "IPY_MODEL_05fa1e761bc14c929b19abf0d8a93f5f" - } - }, - "81a7d1b27cd94916ac3c330aa2551cf0": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_32ba8daa0c6e4a9c9f62df588a198b1b", - "placeholder": "​", - "style": "IPY_MODEL_e1e93f905b494126a6a9e1e0a2f92022", - "value": "model.safetensors: 100%" - } - }, - "eac4d3f8e59a4d4c81178cc76600182f": { - "model_module": "@jupyter-widgets/controls", - "model_name": "FloatProgressModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_9ca2cb3f116547d3bb062f4da11762d7", - "max": 435755784, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_554184f1c9b44bd3a8116773572347eb", - "value": 435755784 - } - }, - "5bcca2ee852144c28bfd40de0978cadc": { - "model_module": "@jupyter-widgets/controls", - "model_name": "HTMLModel", - "model_module_version": "1.5.0", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_bb89b007667e4bf0b696ad84dfb2f91d", - "placeholder": "​", - "style": "IPY_MODEL_610f073056924398b9229978dff5ff4d", - "value": " 436M/436M [00:02<00:00, 177MB/s]" - } - }, - "05fa1e761bc14c929b19abf0d8a93f5f": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "32ba8daa0c6e4a9c9f62df588a198b1b": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "e1e93f905b494126a6a9e1e0a2f92022": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "9ca2cb3f116547d3bb062f4da11762d7": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "554184f1c9b44bd3a8116773572347eb": { - "model_module": "@jupyter-widgets/controls", - "model_name": "ProgressStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "bb89b007667e4bf0b696ad84dfb2f91d": { - "model_module": "@jupyter-widgets/base", - "model_name": "LayoutModel", - "model_module_version": "1.2.0", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "610f073056924398b9229978dff5ff4d": { - "model_module": "@jupyter-widgets/controls", - "model_name": "DescriptionStyleModel", - "model_module_version": "1.5.0", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - } - } - } + "accelerator": "GPU" }, "nbformat": 4, "nbformat_minor": 0, diff --git a/website/package.json b/website/package.json index 9dd4fb7d..b1b47913 100644 --- a/website/package.json +++ b/website/package.json @@ -9,21 +9,23 @@ "rename-version": "docusaurus-rename-version" }, "devDependencies": { - "docusaurus": "^1.9.0" + "docusaurus": "^1.14.7" }, "dependencies": { - "prismjs": "^1.23.0", - "bl": "^1.2.3" + "@babel/helper-compilation-targets": "^8.0.0-alpha.14", + "bl": "^5.0.0", + "browserslist": "^4.21.4", + "prismjs": "^1.29.0" }, "resolutions": { - "trim-newlines": "3.0.1", - "normalize-url": "4.5.1", - "highlight.js" : "10.5.0", - "react-dev-utils": "11.0.4", - "immer": "8.0.1", - "prismjs": "1.23.0", - "bl": "1.2.3", - "glob-parent": "5.1.2", - "browserslist": "4.16.5" + "trim-newlines": "^4.0.2", + "normalize-url": "^6.1.0", + "highlight.js": "^11.8.0", + "react-dev-utils": "^12.0.0", + "immer": "^10.0.0", + "prismjs": "^1.29.0", + "bl": "^5.0.0", + "glob-parent": "^6.0.2", + "browserslist": "^4.21.4" } } From 3934851902599ad915981055506fecf57cf298dd Mon Sep 17 00:00:00 2001 From: Enayat Ullah Date: Thu, 9 Jan 2025 18:04:17 -0800 Subject: [PATCH 10/10] Generate the status badge on Github using Github Actions (#712) Summary: Pull Request resolved: https://github.com/pytorch/opacus/pull/712 Since CircleCI is disabled, we now use tests from Github Actions to indicate its status on the Github main page. Reviewed By: iden-kalemaj Differential Revision: D67990187 fbshipit-source-id: 058268cad1cafd3abf319a60229c1ee24752f033 --- README.md | 71 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 50 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index a671d35c..c46147a6 100644 --- a/README.md +++ b/README.md @@ -2,32 +2,43 @@
-[![CircleCI](https://dl.circleci.com/status-badge/img/gh/pytorch/opacus/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/pytorch/opacus/tree/main) +[![GitHub Actions](https://github.com/pytorch/opacus/actions/workflows/ci_cpu.yml/badge.svg)](https://github.com/pytorch/opacus/actions/workflows/ci_cpu.yml) [![Coverage Status](https://coveralls.io/repos/github/pytorch/opacus/badge.svg?branch=main)](https://coveralls.io/github/pytorch/opacus?branch=main) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md) [![License](https://img.shields.io/badge/license-apache2-green.svg)](LICENSE) -[Opacus](https://opacus.ai) is a library that enables training PyTorch models with differential privacy. -It supports training with minimal code changes required on the client, has little impact on training performance, and allows the client to online track the privacy budget expended at any given moment. +[Opacus](https://opacus.ai) is a library that enables training PyTorch models +with differential privacy. It supports training with minimal code changes +required on the client, has little impact on training performance, and allows +the client to online track the privacy budget expended at any given moment. ## Target audience + This code release is aimed at two target audiences: -1. ML practitioners will find this to be a gentle introduction to training a model with differential privacy as it requires minimal code changes. -2. Differential Privacy researchers will find this easy to experiment and tinker with, allowing them to focus on what matters. +1. ML practitioners will find this to be a gentle introduction to training a + model with differential privacy as it requires minimal code changes. +2. Differential Privacy researchers will find this easy to experiment and tinker + with, allowing them to focus on what matters. ## Installation + The latest release of Opacus can be installed via `pip`: + ```bash pip install opacus ``` + OR, alternatively, via `conda`: + ```bash conda install -c conda-forge opacus ``` -You can also install directly from the source for the latest features (along with its quirks and potentially occasional bugs): +You can also install directly from the source for the latest features (along +with its quirks and potentially occasional bugs): + ```bash git clone https://github.com/pytorch/opacus.git cd opacus @@ -35,7 +46,10 @@ pip install -e . ``` ## Getting started -To train your model with differential privacy, all you need to do is to instantiate a `PrivacyEngine` and pass your model, data_loader, and optimizer to the engine's `make_private()` method to obtain their private counterparts. + +To train your model with differential privacy, all you need to do is to +instantiate a `PrivacyEngine` and pass your model, data_loader, and optimizer to +the engine's `make_private()` method to obtain their private counterparts. ```python # define your components as usual @@ -55,21 +69,25 @@ model, optimizer, data_loader = privacy_engine.make_private( # Now it's business as usual ``` -The [MNIST example](https://github.com/pytorch/opacus/tree/main/examples/mnist.py) shows an end-to-end run using Opacus. The [examples](https://github.com/pytorch/opacus/tree/main/examples/) folder contains more such examples. +The +[MNIST example](https://github.com/pytorch/opacus/tree/main/examples/mnist.py) +shows an end-to-end run using Opacus. The +[examples](https://github.com/pytorch/opacus/tree/main/examples/) folder +contains more such examples. ### Migrating to 1.0 -Opacus 1.0 introduced many improvements to the library, but also some breaking changes. -If you've been using Opacus 0.x and want to update to the latest release, -please use this [Migration Guide](https://github.com/pytorch/opacus/blob/main/Migration_Guide.md) - +Opacus 1.0 introduced many improvements to the library, but also some breaking +changes. If you've been using Opacus 0.x and want to update to the latest +release, please use this +[Migration Guide](https://github.com/pytorch/opacus/blob/main/Migration_Guide.md) ## Learn more ### Interactive tutorials -We've built a series of IPython-based tutorials as a gentle introduction to training models -with privacy and using various Opacus features. +We've built a series of IPython-based tutorials as a gentle introduction to +training models with privacy and using various Opacus features. - [Building an Image Classifier with Differential Privacy](https://github.com/pytorch/opacus/blob/main/tutorials/building_image_classifier.ipynb) - [Training a differentially private LSTM model for name classification](https://github.com/pytorch/opacus/blob/main/tutorials/building_lstm_name_classifier.ipynb) @@ -79,9 +97,13 @@ with privacy and using various Opacus features. - [Opacus Guide: Module Validator and Fixer](https://github.com/pytorch/opacus/blob/main/tutorials/guide_to_module_validator.ipynb) ## Technical report and citation -The technical report introducing Opacus, presenting its design principles, mathematical foundations, and benchmarks can be found [here](https://arxiv.org/abs/2109.12298). + +The technical report introducing Opacus, presenting its design principles, +mathematical foundations, and benchmarks can be found +[here](https://arxiv.org/abs/2109.12298). Consider citing the report if you use Opacus in your papers, as follows: + ``` @article{opacus, title={Opacus: {U}ser-Friendly Differential Privacy Library in {PyTorch}}, @@ -93,7 +115,8 @@ Consider citing the report if you use Opacus in your papers, as follows: ### Blogposts and talks -If you want to learn more about DP-SGD and related topics, check out our series of blogposts and talks: +If you want to learn more about DP-SGD and related topics, check out our series +of blogposts and talks: - [Differential Privacy Series Part 1 | DP-SGD Algorithm Explained](https://medium.com/pytorch/differential-privacy-series-part-1-dp-sgd-algorithm-explained-12512c3959a3) - [Differential Privacy Series Part 2 | Efficient Per-Sample Gradient Computation in Opacus](https://medium.com/pytorch/differential-privacy-series-part-2-efficient-per-sample-gradient-computation-in-opacus-5bf4031d9e22) @@ -102,13 +125,19 @@ If you want to learn more about DP-SGD and related topics, check out our series - [Opacus v1.0 Highlights | PyTorch Developer Day 2021](https://www.youtube.com/watch?v=U1mszp8lzUI) - [Enabling Fast Gradient Clipping and Ghost Clipping in Opacus](https://pytorch.org/blog/clipping-in-opacus/) - ## FAQ -Check out the [FAQ](https://opacus.ai/docs/faq) page for answers to some of the most frequently asked questions about differential privacy and Opacus. + +Check out the [FAQ](https://opacus.ai/docs/faq) page for answers to some of the +most frequently asked questions about differential privacy and Opacus. ## Contributing -See the [CONTRIBUTING](https://github.com/pytorch/opacus/tree/main/CONTRIBUTING.md) file for how to help out. -Do also check out the README files inside the repo to learn how the code is organized. + +See the +[CONTRIBUTING](https://github.com/pytorch/opacus/tree/main/CONTRIBUTING.md) file +for how to help out. Do also check out the README files inside the repo to learn +how the code is organized. ## License -This code is released under Apache 2.0, as found in the [LICENSE](https://github.com/pytorch/opacus/tree/main/LICENSE) file. + +This code is released under Apache 2.0, as found in the +[LICENSE](https://github.com/pytorch/opacus/tree/main/LICENSE) file.