Skip to content

Commit

Permalink
docs: CLIP benchmark on zeroshot classification and retrieval tasks
Browse files Browse the repository at this point in the history
  • Loading branch information
ZiniuYu committed Sep 27, 2022
1 parent 2ba8a4f commit 7e379ad
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 1 deletion.
70 changes: 70 additions & 0 deletions docs/user-guides/benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# CLIP Benchmark

## Basic statistics

We include the disk usage (in delta) and the peak RAM and VRAM usage (in delta) when running on a single Nvidia TITAN RTX GPU (24GB VRAM) for a series of text and image encoding tasks with `batch_size=8` using PyTorch runtime.

| Model | Disk Usage (MB) | Peak RAM Usage (GB) | Peak VRAM Usage (GB) |
|---------------------------------------|-----------------|---------------------|----------------------|
| RN50::openai | 244 | 2.99 | 1.36 |
| RN50::yfcc15m | 389 | 2.86 | 1.36 |
| RN50::cc12m | 389 | 2.84 | 1.36 |
| RN101::openai | 278 | 3.05 | 1.40 |
| RN101::yfcc15m | 457 | 2.88 | 1.40 |
| RN50x4::openai | 402 | 3.23 | 1.63 |
| RN50x16::openai | 631 | 3.63 | 2.02 |
| RN50x64::openai | 1291 | 4.08 | 2.98 |
| ViT-B-32::openai | 338 | 3.20 | 1.40 |
| ViT-B-32::laion400m_e31 | 577 | 2.93 | 1.40 |
| ViT-B-32::laion400m_e32 | 577 | 2.94 | 1.40 |
| ViT-B-32::laion2b_e16 | 577 | 2.93 | 1.40 |
| ViT-B-32::laion2B-s34B-b79K | 577 | 2.94 | 1.40 |
| ViT-B-16::openai | 335 | 3.20 | 1.44 |
| ViT-B-16::laion400m_e31 | 571 | 2.93 | 1.44 |
| ViT-B-16::laion400m_e32 | 571 | 2.94 | 1.44 |
| ViT-B-16-plus-240::laion400m_e31 | 795 | 3.03 | 1.59 |
| ViT-B-16-plus-240::laion400m_e32 | 795 | 3.03 | 1.59 |
| ViT-L-14::openai | 890 | 3.66 | 2.04 |
| ViT-L-14::laion400m_e31 | 1631 | 3.43 | 2.03 |
| ViT-L-14::laion400m_e32 | 1631 | 3.42 | 2.03 |
| ViT-L-14::laion2B-s32B-b82K | 1631 | 3.43 | 2.03 |
| ViT-L-14-336::openai | 891 | 3.74 | 2.23 |
| ViT-H-14::laion2B-s32B-b79K | 3762 | 4.45 | 3.26 |
| ViT-g-14::laion2B-s12B-b42K | 5214 | 5.16 | 4.00 |
| M-CLIP/LABSE-Vit-L-14 | 3609 | 4.30 | 4.70 |
| M-CLIP/XLM-Roberta-Large-Vit-B-32 | 4284 | 5.37 | 1.68 |
| M-CLIP/XLM-Roberta-Large-Vit-B-16Plus | 4293 | 4.30 | 4.13 |
| M-CLIP/XLM-Roberta-Large-Vit-L-14 | 4293 | 4.30 | 4.97 |


````{dropdown} Zero-shot Retrieval: MS COCO Captions
| model_fullname | image_retrieval_recall@5 | text_retrieval_recall@5 |
|----------------------------------|--------------------------|-------------------------|
| RN50::openai | 0.5291883349 | 0.7282000184 |
| RN50::yfcc15m | 0.3610555828 | 0.5338000059 |
| RN50::cc12m | 0.4464214444 | 0.6065999866 |
| RN101::openai | 0.5550180078 | 0.7447999716 |
| RN101::yfcc15m | 0.3760095835 | 0.5490000248 |
| RN50x4::openai | 0.5814074278 | 0.7670000196 |
| RN50x16::openai | 0.6001599431 | 0.7868000269 |
| RN50x64::openai | 0.5992003083 | 0.8033999801 |
| ViT-B-32::openai | 0.5596161485 | 0.7491999865 |
| ViT-B-32::laion400m_e31 | 0.600039959 | 0.7630000114 |
| ViT-B-32::laion400m_e32 | 0.6000000238 | 0.7645999789 |
| ViT-B-32::laion2b_e16 | 0.6468212605 | 0.7950000167 |
| ViT-B-32::laion2b_s34b_b79k | 0.6540184021 | 0.7983999848 |
| ViT-B-16::openai | 0.5842063427 | 0.7671999931 |
| ViT-B-16::laion400m_e31 | 0.6368252635 | 0.7961999774 |
| ViT-B-16::laion400m_e32 | 0.6363854408 | 0.7964000106 |
| ViT-B-16-plus-240::laion400m_e31 | 0.6604158282 | 0.8090000153 |
| ViT-B-16-plus-240::laion400m_e32 | 0.6618952155 | 0.8108000159 |
| ViT-L-14::openai | 0.610355854 | 0.793200016 |
| ViT-L-14::laion400m_e31 | 0.679688096 | 0.82099998 |
| ViT-L-14::laion400m_e32 | 0.6801279783 | 0.8212000132 |
| ViT-L-14::laion2b_s32b_b82k | 0.7109556198 | 0.8399999738 |
| ViT-L-14-336::openai | 0.6162734628 | 0.8123999834 |
| ViT-H-14::laion2b_s32b_b79k | 0.7339064479 | 0.8605999947 |
| ViT-g-14::laion2b_s12b_b42k | 0.7235905528 | 0.853399992 |
````
2 changes: 1 addition & 1 deletion docs/user-guides/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@ Please also note that **different models give different sizes of output dimensio
| RN50x16::openai |||| 768 | 631 | 3.63 | 2.02 |
| RN50x64::openai |||| 1024 | 1291 | 4.08 | 2.98 |
| ViT-B-32::openai |||| 512 | 338 | 3.20 | 1.40 |
| ViT-B-32::laion2b_e16 |||| 512 | 577 | 2.93 | 1.40 |
| ViT-B-32::laion400m_e31 |||| 512 | 577 | 2.93 | 1.40 |
| ViT-B-32::laion400m_e32 |||| 512 | 577 | 2.94 | 1.40 |
| ViT-B-32::laion2b_e16 |||| 512 | 577 | 2.93 | 1.40 |
| ViT-B-32::laion2B-s34B-b79K |||| 512 | 577 | 2.94 | 1.40 |
| ViT-B-16::openai |||| 512 | 335 | 3.20 | 1.44 |
| ViT-B-16::laion400m_e31 |||| 512 | 571 | 2.93 | 1.44 |
Expand Down

0 comments on commit 7e379ad

Please sign in to comment.