-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Features] Multi-modal Jailbreaking Attack on LLaVA #587
Merged
Merged
Changes from 10 commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
e4d5a8c
add empty classes to garak.detectors.base
DavidLee528 65d3ee8
add new empty generator llava
DavidLee528 e3c94ed
git ignore
DavidLee528 749fb40
add todo item
DavidLee528 6f9130f
Merge branch 'leondz:main' into llava_dev
DavidLee528 a653721
comment test code, connect with probe
DavidLee528 cdb3a91
add multi-modal probe visual_jailbreak
DavidLee528 3f30976
Merge branch 'leondz:main' into llava_dev
DavidLee528 5993e49
set max_tokens for LLaVA generator
DavidLee528 572b9b5
Merge branch 'llava_dev' of https://github.com/DavidLee528/garak into…
DavidLee528 f244bd5
add detector visual_jailbreak
DavidLee528 21e2bcd
remove redundant code
DavidLee528 c2a597a
remove redundant line in gitignore file
DavidLee528 7897360
modify comment on visual_jailbreak detector
DavidLee528 adbf7bd
remove proxy setting on llava generator
DavidLee528 bac1c6b
remove unused comments on llava generator
DavidLee528 a8721ab
change storage location of visual_jailbreak_0.jpg
DavidLee528 c58824d
append new empty line as EOF markers for all changed files
DavidLee528 fd185ec
Merge branch 'leondz:main' into llava_dev
DavidLee528 1a03b75
Merge branch 'leondz:main' into llava_dev
DavidLee528 a0d55df
migrate generator LLaVA from garak/generators/llava.py to garak/gener…
DavidLee528 dd4e732
add temporary code, need remove when revision is done
DavidLee528 3ea7927
add modality attribute to base classes of generator, probe, detector
DavidLee528 f933211
modify default modality attribute of generator LLaVA
DavidLee528 724c9ad
modify default modality attribute of probe VisualJailbreak
DavidLee528 298ff37
optimize prompts data structure of VisualJailbreak
DavidLee528 a90ff7b
adopt data structure update of probe and add error handle logic of ga…
DavidLee528 0ffdca0
remove hard coded model name, support a list of llava
DavidLee528 8bd4dd1
add cuda availability check before invoke
DavidLee528 8b9619b
remove redundant lines
DavidLee528 4990b4e
add dynamic max_new_tokens calculation based on the <4K golden rule
DavidLee528 36425eb
Update garak/generators/llava.py
DavidLee528 cf25f41
Update garak/probes/visual_jailbreak.py
DavidLee528 d8e34e0
Merge branch 'llava_dev' of https://github.com/DavidLee528/garak into…
DavidLee528 5749588
remove temporary proxy setting
DavidLee528 aa315a3
convert image resource path from relative to absolute
DavidLee528 edc1712
add check of argument class type
DavidLee528 8ce4841
Merge branch 'leondz:main' into llava_dev
DavidLee528 6745f0b
Merge branch 'leondz:main' into llava_dev
DavidLee528 a9e9dc0
temp commit (recovery after all done)
DavidLee528 1821833
Merge branch 'llava_dev' of https://github.com/DavidLee528/garak into…
DavidLee528 d7521e5
Update garak/detectors/visual_jailbreak.py
DavidLee528 2a632f8
add dataset SafeBench for FigStep visual jailbreaking attack
DavidLee528 10e68a3
expand prompts size from 1 to 500 for visual jailbreak
DavidLee528 65f94cc
Change the class name from VisualJailbreak to FigStep
DavidLee528 869735b
Simplify the FigStep detector to StringDetector
DavidLee528 d149268
Update garak/__main__.py
DavidLee528 5d671d9
Update garak/generators/huggingface.py
DavidLee528 0c943ed
Update garak/generators/huggingface.py
DavidLee528 789fb4a
Update garak/generators/huggingface.py
DavidLee528 ae38ed8
Update garak/generators/huggingface.py
DavidLee528 cb8d917
Update garak/generators/huggingface.py
DavidLee528 296f6ad
remove unnecessary files in garak/resources/visual_jailbreak/SafeBench/
DavidLee528 39c57a2
add paper title, link, and reference of FigStep
DavidLee528 de6c6ef
add default probe class FigStep80 in garak/probes/visual_jailbreak.py
DavidLee528 ce23255
add prompts number check for FigStep80 in /home/sda/tianhaoli/garak/g…
DavidLee528 6784255
add prompts number check for FigStep in garak/probes/visual_jailbreak.py
DavidLee528 d3569c6
Merge branch 'llava_dev' of https://github.com/DavidLee528/garak into…
DavidLee528 bd16b64
rm figstep safebench data files
leondz b98ff1f
safebench downloading instead of distr. with garak
leondz b7acff6
fix: move safebench_image_filenames from local function variable to c…
DavidLee528 16a5972
fix: self.prompts filter in FigStepTiny
DavidLee528 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,6 @@ | |
|
||
import sys | ||
|
||
sys.path.append("/home/sda/tianhaoli/garak") | ||
from garak import cli | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
Binary file added
BIN
+45.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_10_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+28.1 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_11_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.4 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_12_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+26.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_13_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.2 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_14_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+46.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_15_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+28.3 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_16_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+28.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_17_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+40.3 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_18_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+39.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_19_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+52.7 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_21_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+27.6 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_22_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+36.1 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_23_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+39.8 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_24_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+45.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_25_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+46.2 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_26_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+33.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_27_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+43.1 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_28_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+26.7 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_29_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.3 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_30_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+43.8 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_31_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+39.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_32_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+36.2 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_33_6.png
Oops, something went wrong.
Binary file added
BIN
+41.1 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_34_6.png
Oops, something went wrong.
Binary file added
BIN
+30.2 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_35_6.png
Oops, something went wrong.
Binary file added
BIN
+35.7 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_36_6.png
Oops, something went wrong.
Binary file added
BIN
+30.4 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_37_6.png
Oops, something went wrong.
Binary file added
BIN
+35.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_38_6.png
Oops, something went wrong.
Binary file added
BIN
+41.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_39_6.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+30.8 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_40_6.png
Oops, something went wrong.
Binary file added
BIN
+36.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_41_6.png
Oops, something went wrong.
Binary file added
BIN
+41.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_42_6.png
Oops, something went wrong.
Binary file added
BIN
+43.5 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_43_6.png
Oops, something went wrong.
Binary file added
BIN
+44.4 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_44_6.png
Oops, something went wrong.
Binary file added
BIN
+37.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_45_6.png
Oops, something went wrong.
Binary file added
BIN
+37.7 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_46_6.png
Oops, something went wrong.
Binary file added
BIN
+35.9 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_47_6.png
Oops, something went wrong.
Binary file added
BIN
+25.7 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_48_6.png
Oops, something went wrong.
Binary file added
BIN
+47.3 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_49_6.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+35.7 KB
garak/resources/visual_jailbreak/SafeBench/query_ForbidQI_10_50_6.png
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to work on this, but I'm happy for that to be tracked in a separate issue/PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinging llm-as-a-judge issue: #419
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! We are working on this now.