Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize Output Dictionaries, Mock API Calls, Math (GSM8K, SVAMP, TabMWP) for Reflexion #186

Merged
merged 125 commits into from
Jun 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
c36d52d
reflexion cot/react math strat skeleton
alckasoc Jun 23, 2024
5ac46bc
.
alckasoc Jun 23, 2024
e83b76f
added gsm8k fewshot cot
alckasoc Jun 23, 2024
a2403d2
.
alckasoc Jun 24, 2024
d9c7b4e
react math fix
alckasoc Jun 24, 2024
e9f72b7
react code strat fix
alckasoc Jun 24, 2024
65cfaa9
adding eext tool innfo
alckasoc Jun 24, 2024
61eb219
edit qa react test
alckasoc Jun 24, 2024
240b752
pass test gen obs
alckasoc Jun 24, 2024
b14656d
fix math
alckasoc Jun 24, 2024
aba4a0c
fixed math
alckasoc Jun 24, 2024
e67a904
auto linted
alckasoc Jun 24, 2024
b0e8435
code fix
alckasoc Jun 24, 2024
ee6791e
code test gen obs react
alckasoc Jun 24, 2024
bf56ed5
.
alckasoc Jun 24, 2024
21928d0
fix critic qa
alckasoc Jun 24, 2024
9b51896
.
alckasoc Jun 24, 2024
f6e2266
fix
alckasoc Jun 24, 2024
8118bd4
auto linted
alckasoc Jun 24, 2024
f11c340
fix
alckasoc Jun 24, 2024
4f4e190
auto linted
alckasoc Jun 24, 2024
a63d457
reflexioncot fix
alckasoc Jun 24, 2024
cf18a73
.
alckasoc Jun 24, 2024
b31cef1
.
alckasoc Jun 24, 2024
11b4ead
fix
alckasoc Jun 24, 2024
8617976
auto linted
alckasoc Jun 24, 2024
3a5f5a2
fix
alckasoc Jun 24, 2024
6c89fe8
fix
alckasoc Jun 24, 2024
3d071fb
fix
alckasoc Jun 24, 2024
2a446e6
react mocked
alckasoc Jun 24, 2024
b065f71
.
alckasoc Jun 24, 2024
01642ab
mock react test gen
alckasoc Jun 24, 2024
1896495
mock critic
alckasoc Jun 24, 2024
e452e4f
.
alckasoc Jun 24, 2024
169c48d
ok
alckasoc Jun 24, 2024
1b4add8
1
alckasoc Jun 24, 2024
88ada46
auto linted
alckasoc Jun 24, 2024
962853e
.
alckasoc Jun 24, 2024
5f2007b
auto linted
alckasoc Jun 24, 2024
b8022f7
fix
alckasoc Jun 25, 2024
c12a044
critic math fix
alckasoc Jun 25, 2024
b33b011
critic code fix
alckasoc Jun 25, 2024
2722af0
.
alckasoc Jun 25, 2024
dd51d62
.
alckasoc Jun 25, 2024
95ad673
halt and reset
alckasoc Jun 25, 2024
7569ed7
.
alckasoc Jun 25, 2024
e6f1164
.
alckasoc Jun 25, 2024
a43603d
reflexion cot exmaples
alckasoc Jun 25, 2024
afd9bd8
time to test
alckasoc Jun 25, 2024
a9ffd35
.
alckasoc Jun 25, 2024
8ddb0fe
.
alckasoc Jun 25, 2024
caed1f2
FX
alckasoc Jun 25, 2024
0b52ced
em
alckasoc Jun 25, 2024
eeb3c06
.
alckasoc Jun 25, 2024
23e042f
FIRST TEST WORKS; gsm8k reactreflexion; tabmwp; svvamp left
alckasoc Jun 25, 2024
23b493d
.
alckasoc Jun 26, 2024
4d2e45f
WIP math reflexionreact
alckasoc Jun 26, 2024
eea6557
fix
alckasoc Jun 26, 2024
16ec73a
.
alckasoc Jun 27, 2024
7715bf4
.
alckasoc Jun 27, 2024
5c98148
fix
alckasoc Jun 27, 2024
8ff34fa
fix
alckasoc Jun 27, 2024
7bb55f4
.
alckasoc Jun 27, 2024
4c6eb58
.
alckasoc Jun 27, 2024
42b08fe
.
alckasoc Jun 27, 2024
ab66b8d
les go
alckasoc Jun 27, 2024
c12be90
.
alckasoc Jun 27, 2024
4f37768
docs
alckasoc Jun 27, 2024
3050dc3
auto lint
alckasoc Jun 27, 2024
d15a6e7
ready for testing
alckasoc Jun 27, 2024
987ea48
fix em
alckasoc Jun 28, 2024
67c5581
.
alckasoc Jun 28, 2024
a87a61e
add fewshots
alckasoc Jun 28, 2024
220dd2c
max tokens to 5k
alckasoc Jun 28, 2024
68a5cd9
2 fixes
alckasoc Jun 28, 2024
b0cdf1c
,
alckasoc Jun 28, 2024
ea830eb
.
alckasoc Jun 28, 2024
62983b7
done with gsm8kk; working on svamp
alckasoc Jun 28, 2024
c1c61c7
SVAMP_FEWSHOT_EXAMPLES_COT
alckasoc Jun 28, 2024
d065863
fewhots cot reflect
alckasoc Jun 28, 2024
536960d
.
alckasoc Jun 29, 2024
fa7d03c
tabmwp init
alckasoc Jun 29, 2024
7a052c5
.
alckasoc Jun 29, 2024
5e4b868
tabmwp cot
alckasoc Jun 29, 2024
4a88be8
okk
alckasoc Jun 29, 2024
7111877
.
alckasoc Jun 29, 2024
12ddd37
.
alckasoc Jun 29, 2024
5618a0e
tabmwp instructions
alckasoc Jun 29, 2024
590b5c4
reflexion cot reflect
alckasoc Jun 29, 2024
c86db7d
ok
alckasoc Jun 29, 2024
cf6527b
IT RUNS!
alckasoc Jun 29, 2024
44ee60e
clear outputs
alckasoc Jun 29, 2024
8bbb1f0
auto linted
alckasoc Jun 29, 2024
429a448
remove prints
alckasoc Jun 29, 2024
2804490
docs
alckasoc Jun 29, 2024
1e610b6
init
alckasoc Jun 29, 2024
6307c12
1 done
alckasoc Jun 29, 2024
6c898a3
2 done
alckasoc Jun 29, 2024
1217fd8
lint
alckasoc Jun 29, 2024
d3e8328
4 done
alckasoc Jun 29, 2024
7c23e4d
.
alckasoc Jun 29, 2024
088daff
.
alckasoc Jun 29, 2024
d47922c
2 more done
alckasoc Jun 29, 2024
c9009b2
.
alckasoc Jun 29, 2024
4e6fb4a
1 more down
alckasoc Jun 29, 2024
dba9e0b
anotha one!
alckasoc Jun 29, 2024
9a42b66
.
alckasoc Jun 29, 2024
6b53af1
anotha oneee
alckasoc Jun 29, 2024
a71e135
anotha one
alckasoc Jun 29, 2024
d07675a
ok
alckasoc Jun 29, 2024
d2cda6b
ok anotha one
alckasoc Jun 29, 2024
9f0cabc
2 more done
alckasoc Jun 29, 2024
9b3c8de
1 more down
alckasoc Jun 29, 2024
9a176a4
ok
alckasoc Jun 29, 2024
62a78ba
yay 1 more
alckasoc Jun 29, 2024
71d38bd
2/3
alckasoc Jun 29, 2024
019a6f7
1 down
alckasoc Jun 29, 2024
c761033
2 more down
alckasoc Jun 29, 2024
e6de881
1
alckasoc Jun 29, 2024
afe7452
al
alckasoc Jun 29, 2024
1b4b599
2 more
alckasoc Jun 29, 2024
d23628f
1 mmore down
alckasoc Jun 29, 2024
f8bb8b0
all done LES GO
alckasoc Jun 29, 2024
5d64b74
.
alckasoc Jun 29, 2024
c38a6fc
del
alckasoc Jun 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions agential/cog/agent/react.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,13 +96,17 @@ def generate(
)

# Observe.
obs = self.strategy.generate_observation(
obs, external_tool_info = self.strategy.generate_observation(
idx=idx, action_type=action_type, query=query
)

out.append(
self.strategy.create_output_dict(
thought=thought, action_type=action_type, query=query, obs=obs
thought=thought,
action_type=action_type,
query=query,
obs=obs,
external_tool_info=external_tool_info,
)
)

Expand Down
3 changes: 2 additions & 1 deletion agential/cog/agent/reflexion.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ def _generate_react(
)

# Observe.
is_correct, obs = self.strategy.generate_observation(
is_correct, obs, external_tool_info = self.strategy.generate_observation(
step_idx=step_idx,
action_type=action_type,
query=query,
Expand All @@ -278,6 +278,7 @@ def _generate_react(
action_type=action_type,
query=query,
obs=obs,
external_tool_info=external_tool_info,
is_correct=is_correct,
)
)
Expand Down
8 changes: 7 additions & 1 deletion agential/cog/eval/reflexion.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from agential.utils.parse import normalize_answer


def EM(answer: str, key: str) -> bool:
def EM(answer: str, key: str, normalize: bool = True) -> bool:
"""Compares two strings, `answer` and `key`, after normalizing them.
The Exact Match grading 'metric' compares for an exact match between 2 strings
Expand All @@ -12,8 +12,14 @@ def EM(answer: str, key: str) -> bool:
Args:
answer (str): A string to be compared with `key`.
key (str): A string to be compared with `answer`.
normalize (bool): If True, then normalize answer and key. Defaults to True.
Returns:
bool: True if the normalized `answer` and `key` match, else False.
"""
if answer is None:
return False

if not normalize:
return answer == key
return normalize_answer(answer) == normalize_answer(key)
Loading
Loading