Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize CRITIC/Self-Refine Few-shots for Math, Standardize error types in Reflexion with CRITIC/SR, Code for Self-Refine (HumanEval, MBPP) #228

Merged
merged 24 commits into from
Jul 14, 2024

Conversation

alckasoc
Copy link
Member

@alckasoc alckasoc commented Jul 13, 2024

πŸ€” Reasoning

Explain the purpose of this PR...

🚧 Changes

Describe the changes made...

βœ… PR Checklist

  • Using this PR template?
  • Linked issue?
  • Added feature?
    • Added/updated docs?
    • Added/updated tests?

Summary by CodeRabbit

  • New Features

    • Added new instructions and examples for self-refinement tasks, including HUMANEVAL, MBPP, and improved instructions for AMBIGNQ.
    • Introduced tasks for checking close elements in a list and finding the first repeated character in a string.
  • Tests

    • Implemented unit tests for self-refinement strategies, covering initialization, code generation, critique generation, and more.

@alckasoc alckasoc added enhancement New feature or request add-benchmark Adding support for a benchmark labels Jul 13, 2024
@alckasoc alckasoc added this to the Self-Refine milestone Jul 13, 2024
@alckasoc alckasoc self-assigned this Jul 13, 2024
Copy link
Contributor

coderabbitai bot commented Jul 13, 2024

Walkthrough

The recent changes enhance the self-refinement tasks in the self_refine.ipynb notebook by adding new instructions and examples for several tasks (HUMANEVAL, MBPP, AMBIGNQ). Additionally, unit tests for self-refinement strategies are included in test_code.py, ensuring comprehensive coverage of initialization, code generation, critique generation, and more. Notably, a duplicate import declaration of SelfRefineCodeStrategy was added.

Changes

File Path Change Summary
notebooks/self_refine.ipynb Introduced new instructions and examples for self-refinement tasks (HUMANEVAL, MBPP, AMBIGNQ). Added code for checking close elements in a list and finding the first repeated character in a string.
tests/cog/self_refine/strategies/test_code.py Added unit tests for self-refinement strategies, covering initialization, code generation, critique generation, output dictionary creation, updating answers based on critique, halting conditions, resetting, and strategy instantiation.
agential/cog/self_refine/strategies/code.py Added a duplicate declaration of SelfRefineCodeStrategy in the import statement.

Poem

In notebooks, where code refines,
New tasks and tests now intertwine.
Strategies critiqued, answers reset,
A rabbit's joy in code well met.
Code and tests, a perfect blend,
πŸŽ‰ To these changes, we commend! πŸ‡


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Jul 13, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

Files Coverage Ξ”
agential/cog/critic/prompts.py 100.00% <ΓΈ> (ΓΈ)
agential/cog/reflexion/agent.py 92.07% <ΓΈ> (ΓΈ)
agential/cog/reflexion/prompts.py 100.00% <ΓΈ> (ΓΈ)
agential/cog/reflexion/strategies/code.py 96.77% <ΓΈ> (ΓΈ)
agential/cog/reflexion/strategies/math.py 96.02% <ΓΈ> (ΓΈ)
agential/cog/reflexion/strategies/qa.py 97.67% <ΓΈ> (ΓΈ)
agential/cog/self_refine/factory.py 97.05% <100.00%> (+0.08%) ⬆️
agential/cog/self_refine/functional.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/prompts.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/strategies/code.py 100.00% <100.00%> (ΓΈ)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9bd4278 and 1996958.

Files selected for processing (7)
  • agential/cog/reflexion/agent.py (2 hunks)
  • agential/cog/reflexion/strategies/code.py (1 hunks)
  • agential/cog/reflexion/strategies/math.py (1 hunks)
  • agential/cog/reflexion/strategies/qa.py (1 hunks)
  • tests/cog/reflexion/strategies/test_code.py (1 hunks)
  • tests/cog/reflexion/strategies/test_math.py (1 hunks)
  • tests/cog/reflexion/strategies/test_qa.py (1 hunks)
Additional comments not posted (5)
tests/cog/reflexion/strategies/test_qa.py (1)

72-72: LGTM!

The change to set max_trials to 3 aligns with the updated default value in the strategy class.

tests/cog/reflexion/strategies/test_math.py (1)

74-74: LGTM! The change aligns with the updated class definition.

The max_trials parameter is now correctly set to 3 in the test, reflecting the changes made in the ReflexionCoTMathStrategy class.

agential/cog/reflexion/strategies/math.py (1)

95-103: LGTM!

The change to update max_trials from 1 to 3 is consistent and aligns with the provided context.

tests/cog/reflexion/strategies/test_code.py (1)

97-97: LGTM!

The change to update max_trials from 1 to 3 in the test function is consistent and aligns with the provided context.

agential/cog/reflexion/strategies/code.py (1)

95-103: LGTM!

The change to update max_trials from 1 to 3 is consistent and aligns with the provided context.

@@ -99,7 +99,7 @@ def generate(
additional_keys: Dict[str, str] = {},
reflect_additional_keys: Dict[str, str] = {},
fewshot_type: str = "",
patience: int = 1,
patience: int = 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring to reflect the new default value for patience.

The patience parameter's default value has been changed from 1 to 3, and this should be updated in the docstring to maintain consistency.

-            patience (int, optional): The patience for the agent. Defaults to 1.
+            patience (int, optional): The patience for the agent. Defaults to 3.

Also applies to: 123-123

@@ -60,15 +60,15 @@ class ReflexionCoTQAStrategy(ReflexionCoTBaseStrategy):
llm (BaseChatModel): The language model used for generating answers and critiques.
reflector (Optional[ReflexionCoTReflector]): The reflector used for generating reflections. Defaults to None.
max_reflections (int): The maximum number of reflections allowed. Defaults to 3.
max_trials (int): The maximum number of trials allowed. Defaults to 1.
max_trials (int): The maximum number of trials allowed. Defaults to 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring to reflect the new default value for max_trials.

The max_trials parameter's default value has been changed from 1 to 3, and this should be updated in the docstring to maintain consistency.

-        max_trials (int): The maximum number of trials allowed. Defaults to 1.
+        max_trials (int): The maximum number of trials allowed. Defaults to 3.

Also applies to: 71-71

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (6)
agential/cog/self_refine/strategies/code.py (2)

5-5: The import statement is not used in the file.

  • The import from langchain_core.language_models.chat_models import BaseChatModel is present but not used in agential/cog/self_refine/strategies/code.py.

Consider adding tests to cover the imports or remove the unused import if it is not necessary.

Analysis chain

Add tests to cover the imports.

The import statement is not covered by tests.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --type python -A 5 $'from langchain_core.language_models.chat_models import BaseChatModel'

Length of output: 125


Script:

#!/bin/bash
# Description: Verify if the imported module BaseChatModel is used in the file agential/cog/self_refine/strategies/code.py

# Test: Search for the usage of BaseChatModel in the file.
ast-grep --lang python --pattern 'BaseChatModel' agential/cog/self_refine/strategies/code.py

Length of output: 207

Tools
GitHub Check: codecov/patch

[warning] 5-5: agential/cog/self_refine/strategies/code.py#L5
Added line #L5 was not covered by tests


12-13: Add tests to cover the imports or remove unused imports.

The imported modules SelfRefineBaseStrategy and EM are not used in the file agential/cog/self_refine/strategies/code.py. Consider adding tests to cover these imports or removing them if they are unnecessary.

  • agential/cog/self_refine/strategies/code.py: Lines 12-13
Analysis chain

Add tests to cover the imports.

The import statement is not covered by tests.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --type python -A 5 $'from agential.cog.self_refine.strategies.base import SelfRefineBaseStrategy\nfrom agential.eval.em import EM'

Length of output: 325


Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --type python --multiline -A 5 $'from agential.cog.self_refine.strategies.base import SelfRefineBaseStrategy\nfrom agential.eval.em import EM'

Length of output: 176


Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file `agential/cog/self_refine/strategies/code.py`.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --multiline -A 5 $'from agential.cog.self_refine.strategies.base import SelfRefineBaseStrategy\nfrom agential.eval.em import EM' agential/cog/self_refine/strategies/code.py

Length of output: 284

Tools
GitHub Check: codecov/patch

[warning] 12-13: agential/cog/self_refine/strategies/code.py#L12-L13
Added lines #L12 - L13 were not covered by tests

agential/cog/self_refine/prompts.py (4)

2358-2358: Add a TODO comment for HUMANEVAL instructions.

The instruction set for HUMANEVAL is currently empty. Consider adding a TODO comment to indicate that content needs to be added.

+ SELF_REFINE_INSTRUCTION_HUMANEVAL = ""  # TODO: Add instructions for HUMANEVAL

2376-2376: Add a TODO comment for MBPP instructions.

The instruction set for MBPP is currently empty. Consider adding a TODO comment to indicate that content needs to be added.

+ SELF_REFINE_INSTRUCTION_MBPP = ""  # TODO: Add instructions for MBPP

2361-2370: Add TODO comments for HUMANEVAL few-shot examples and critique instructions.

The few-shot examples and critique instructions for HUMANEVAL are currently empty. Consider adding TODO comments to indicate that content needs to be added.

+ HUMANEVAL_CRITIQUE_FEWSHOT_EXAMPLES = ""  # TODO: Add few-shot examples for HUMANEVAL
+ SELF_REFINE_CRITIQUE_INSTRUCTION_HUMANEVAL = ""  # TODO: Add critique instructions for HUMANEVAL
+ HUMANEVAL_REFINE_FEWSHOT_EXAMPLES = ""  # TODO: Add refine few-shot examples for HUMANEVAL
+ SELF_REFINE_REFINE_INSTRUCTION_HUMANEVAL = ""  # TODO: Add refine instructions for HUMANEVAL

2379-2388: Add TODO comments for MBPP few-shot examples and critique instructions.

The few-shot examples and critique instructions for MBPP are currently empty. Consider adding TODO comments to indicate that content needs to be added.

+ MBPP_CRITIQUE_FEWSHOT_EXAMPLES = ""  # TODO: Add few-shot examples for MBPP
+ SELF_REFINE_CRITIQUE_INSTRUCTION_MBPP = ""  # TODO: Add critique instructions for MBPP
+ MBPP_REFINE_FEWSHOT_EXAMPLES = ""  # TODO: Add refine few-shot examples for MBPP
+ SELF_REFINE_REFINE_INSTRUCTION_MBPP = ""  # TODO: Add refine instructions for MBPP
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1996958 and 0615de3.

Files selected for processing (3)
  • agential/cog/self_refine/prompts.py (3 hunks)
  • agential/cog/self_refine/strategies/code.py (1 hunks)
  • notebooks/self_refine.ipynb (3 hunks)
Additional context used
GitHub Check: codecov/patch
agential/cog/self_refine/strategies/code.py

[warning] 3-3: agential/cog/self_refine/strategies/code.py#L3
Added line #L3 was not covered by tests


[warning] 5-5: agential/cog/self_refine/strategies/code.py#L5
Added line #L5 was not covered by tests


[warning] 7-7: agential/cog/self_refine/strategies/code.py#L7
Added line #L7 was not covered by tests


[warning] 12-13: agential/cog/self_refine/strategies/code.py#L12-L13
Added lines #L12 - L13 were not covered by tests

Additional comments not posted (15)
agential/cog/self_refine/strategies/code.py (4)

7-7: Add tests to cover the imports.

The import statement is not covered by tests.

Tools
GitHub Check: codecov/patch

[warning] 7-7: agential/cog/self_refine/strategies/code.py#L7
Added line #L7 was not covered by tests


Line range hint 21-35:
Add tests to cover the generate method.

The generate method is not covered by tests.

Tools
GitHub Check: codecov/patch

[warning] 3-3: agential/cog/self_refine/strategies/code.py#L3
Added line #L3 was not covered by tests


[warning] 5-5: agential/cog/self_refine/strategies/code.py#L5
Added line #L5 was not covered by tests


[warning] 7-7: agential/cog/self_refine/strategies/code.py#L7
Added line #L7 was not covered by tests


[warning] 12-13: agential/cog/self_refine/strategies/code.py#L12-L13
Added lines #L12 - L13 were not covered by tests


Line range hint 15-19:
Add tests to cover the __init__ method.

The __init__ method is not covered by tests.

Verification successful

Add tests to cover the __init__ method.

The __init__ method of the SelfRefineCodeStrategy class is not covered by tests. Ensure that tests are added to verify the initialization parameters and behavior.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the `__init__` method is tested.

# Test: Search for the usage of `SelfRefineCodeStrategy`. Expect: The `__init__` method is tested.
rg --type python -A 5 $'SelfRefineCodeStrategy'

Length of output: 77


Script:

#!/bin/bash
# Description: Verify if the `__init__` method is tested.

# Test: Search for the usage of `SelfRefineCodeStrategy`. Expect: The `__init__` method is tested.
rg --type py -A 5 $'SelfRefineCodeStrategy'

Length of output: 42


Script:

#!/bin/bash
# Description: Locate potential test files and search for references to `SelfRefineCodeStrategy`.

# Step 1: Locate potential test files.
test_files=$(fd --type f --extension py test)

# Step 2: Search within those files for references to `SelfRefineCodeStrategy`.
if [ -n "$test_files" ]; then
  rg --type py -A 5 $'SelfRefineCodeStrategy' $test_files
else
  echo "No test files found."
fi

Length of output: 3665

Tools
GitHub Check: codecov/patch

[warning] 3-3: agential/cog/self_refine/strategies/code.py#L3
Added line #L3 was not covered by tests


[warning] 5-5: agential/cog/self_refine/strategies/code.py#L5
Added line #L5 was not covered by tests


[warning] 7-7: agential/cog/self_refine/strategies/code.py#L7
Added line #L7 was not covered by tests


[warning] 12-13: agential/cog/self_refine/strategies/code.py#L12-L13
Added lines #L12 - L13 were not covered by tests


3-3: Add tests to cover the imports.

The import statement is not covered by tests.

Verification successful

The imports are used and necessary.

The imported modules BaseChatModel, _prompt_agent, _prompt_critique, _prompt_refine, SelfRefineBaseStrategy, and EM are utilized within the codebase, including in tests and functional files. Therefore, the import statement is valid and necessary.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --type python -A 5 $'from typing import Any, Dict\nfrom langchain_core.language_models.chat_models import BaseChatModel\nfrom agential.cog.self_refine.functional import (\n    _prompt_agent,\n    _prompt_critique,\n    _prompt_refine,\n)\nfrom agential.cog.self_refine.strategies.base import SelfRefineBaseStrategy\nfrom agential.eval.em import EM'

Length of output: 536


Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --multiline --type python -A 5 $'from typing import Any, Dict\nfrom langchain_core.language_models.chat_models import BaseChatModel\nfrom agential.cog.self_refine.functional import (\n    _prompt_agent,\n    _prompt_critique,\n    _prompt_refine,\n)\nfrom agential.cog.self_refine.strategies.base import SelfRefineBaseStrategy\nfrom agential.eval.em import EM'

Length of output: 387


Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of imported modules. Expect: All imported modules are used.
rg --multiline -A 5 $'from typing import Any, Dict\nfrom langchain_core.language_models.chat_models import BaseChatModel\nfrom agential.cog.self_refine.functional import (\n    _prompt_agent,\n    _prompt_critique,\n    _prompt_refine,\n)\nfrom agential.cog.self_refine.strategies.base import SelfRefineBaseStrategy\nfrom agential.eval.em import EM'

Length of output: 342


Script:

#!/bin/bash
# Description: Verify if the imported modules are used in the file.

# Test: Search for the usage of each imported module individually.

# Check usage of 'Any' and 'Dict' from 'typing'
rg --multiline -A 5 'Any\|Dict'

# Check usage of 'BaseChatModel' from 'langchain_core.language_models.chat_models'
rg --multiline -A 5 'BaseChatModel'

# Check usage of '_prompt_agent', '_prompt_critique', and '_prompt_refine' from 'agential.cog.self_refine.functional'
rg --multiline -A 5 '_prompt_agent\|_prompt_critique\|_prompt_refine'

# Check usage of 'SelfRefineBaseStrategy' from 'agential.cog.self_refine.strategies.base'
rg --multiline -A 5 'SelfRefineBaseStrategy'

# Check usage of 'EM' from 'agential.eval.em'
rg --multiline -A 5 'EM'

Length of output: 132746

Tools
GitHub Check: codecov/patch

[warning] 3-3: agential/cog/self_refine/strategies/code.py#L3
Added line #L3 was not covered by tests

notebooks/self_refine.ipynb (9)

98-98: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.


97-97: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.

Verification successful

Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is already covered by tests in tests/cog/self_refine/test_agent.py.

  • tests/cog/self_refine/test_agent.py contains multiple instances where SelfRefineAgent is initialized and tested.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Test: Search for the usage of `SelfRefineAgent`. Expect: The initialization is tested.
rg --type python -A 5 $'SelfRefineAgent'

Length of output: 70


Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Test: Search for the usage of `SelfRefineAgent`. Expect: The initialization is tested.
rg --glob '*.py' -A 5 $'SelfRefineAgent'

Length of output: 3676


92-92: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.

Verification successful

Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is already covered by tests in tests/cog/self_refine/test_agent.py.

  • tests/cog/self_refine/test_agent.py: Multiple instances of SelfRefineAgent initialization and corresponding assertions are present.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Test: Search for the usage of `SelfRefineAgent`. Expect: The initialization is tested.
rg --type python -A 5 $'SelfRefineAgent'

Length of output: 70


Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Test: Search for the usage of `SelfRefineAgent`. Expect: The initialization is tested.
rg -A 5 'SelfRefineAgent'

Length of output: 6219


96-96: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.


95-95: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.


91-91: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.


93-93: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.


94-94: Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is not covered by tests.

Verification successful

Add tests to cover the SelfRefineAgent initialization.

The initialization of SelfRefineAgent is already covered by tests in the tests/cog/self_refine/test_agent.py file. Here are the relevant instances:

  • tests/cog/self_refine/test_agent.py: agent = SelfRefineAgent(llm=FakeListChatModel(responses=[]), benchmark="gsm8k")
  • tests/cog/self_refine/test_agent.py: agent = SelfRefineAgent(llm=FakeListChatModel(responses=responses), benchmark="gsm8k")

These lines indicate that the initialization of SelfRefineAgent is indeed tested.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Test: Search for the usage of `SelfRefineAgent`. Expect: The initialization is tested.
rg --type python -A 5 $'SelfRefineAgent'

Length of output: 70


Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Find all Python files and search for the usage of `SelfRefineAgent`.
fd -e py -x rg 'SelfRefineAgent' {}

Length of output: 414


Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Find all Python test files and search for the usage of `SelfRefineAgent`.
fd -e py -p 'test' -x rg 'SelfRefineAgent' {}

# Additionally, search for `SelfRefineAgent` within functions that follow test naming conventions.
rg -e 'def test_' -A 5 'SelfRefineAgent'

Length of output: 485


Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Find all Python test files in typical test directories and search for the usage of `SelfRefineAgent`.
fd -e py --search-path tests -x rg 'SelfRefineAgent' {}

# Additionally, search for `SelfRefineAgent` within functions that follow test naming conventions.
rg 'def test_' -A 5 'SelfRefineAgent'

Length of output: 494


Script:

#!/bin/bash
# Description: Verify if the `SelfRefineAgent` initialization is tested.

# Find all occurrences of `SelfRefineAgent` within the entire codebase and include more context lines.
rg 'SelfRefineAgent' -A 10

Length of output: 11979


Line range hint 1-1:
Add tests to cover the imports.

The import statements are not covered by tests.

agential/cog/self_refine/prompts.py (2)

1148-1150: Improved clarity in instructions.

The updated instruction for GSM8K now includes specific guidelines for writing Python code and storing the result in a variable named 'answer'. This enhances clarity.


1554-1556: Improved clarity in instructions.

The updated instruction for SVAMP now includes specific guidelines for writing Python code and storing the result in a variable named 'answer'. This enhances clarity.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 0615de3 and a4ebcdc.

Files selected for processing (3)
  • agential/cog/critic/prompts.py (4 hunks)
  • agential/cog/self_refine/prompts.py (7 hunks)
  • docs/docs/index.md (2 hunks)
Files skipped from review due to trivial changes (1)
  • agential/cog/critic/prompts.py
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/self_refine/prompts.py
Additional comments not posted (1)
docs/docs/index.md (1)

111-111: LGTM! Ensure consistency and clarity.

The new section "CRITIC, Self-Refine" provides detailed error types for various datasets. The changes are clear and consistent with the rest of the document.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a4ebcdc and ff48401.

Files selected for processing (2)
  • agential/cog/critic/prompts.py (5 hunks)
  • agential/cog/self_refine/prompts.py (28 hunks)
Files not summarized due to errors (1)
  • agential/cog/self_refine/prompts.py: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/critic/prompts.py
Additional comments not posted (2)
agential/cog/self_refine/prompts.py (2)

1148-1149: Approved: Improved instruction clarity.

The addition of the guideline to store the result in a variable named 'answer' enhances clarity and consistency.


1564-1565: Approved: Improved instruction clarity.

The addition of the guideline to store the result in a variable named 'answer' enhances clarity and consistency.

# ======================================================================== HUMANEVAL ======================================================================== #


SELF_REFINE_INSTRUCTION_HUMANEVAL = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the instruction.

The SELF_REFINE_INSTRUCTION_HUMANEVAL is currently empty and needs to be populated with appropriate instructions.

SELF_REFINE_INSTRUCTION_HUMANEVAL = """"""


HUMANEVAL_CRITIQUE_FEWSHOT_EXAMPLES = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the few-shot examples.

The HUMANEVAL_CRITIQUE_FEWSHOT_EXAMPLES is currently empty and needs to be populated with appropriate few-shot examples.

HUMANEVAL_CRITIQUE_FEWSHOT_EXAMPLES = """"""


SELF_REFINE_CRITIQUE_INSTRUCTION_HUMANEVAL = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the critique instruction.

The SELF_REFINE_CRITIQUE_INSTRUCTION_HUMANEVAL is currently empty and needs to be populated with appropriate critique instructions.

SELF_REFINE_CRITIQUE_INSTRUCTION_HUMANEVAL = """"""


HUMANEVAL_REFINE_FEWSHOT_EXAMPLES = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the few-shot examples.

The HUMANEVAL_REFINE_FEWSHOT_EXAMPLES is currently empty and needs to be populated with appropriate few-shot examples.

HUMANEVAL_REFINE_FEWSHOT_EXAMPLES = """"""


SELF_REFINE_REFINE_INSTRUCTION_HUMANEVAL = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the refinement instruction.

The SELF_REFINE_REFINE_INSTRUCTION_HUMANEVAL is currently empty and needs to be populated with appropriate refinement instructions.

# ======================================================================== MBPP ======================================================================== #


SELF_REFINE_INSTRUCTION_MBPP = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the instruction.

The SELF_REFINE_INSTRUCTION_MBPP is currently empty and needs to be populated with appropriate instructions.

SELF_REFINE_INSTRUCTION_MBPP = """"""


MBPP_CRITIQUE_FEWSHOT_EXAMPLES = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the few-shot examples.

The MBPP_CRITIQUE_FEWSHOT_EXAMPLES is currently empty and needs to be populated with appropriate few-shot examples.

MBPP_CRITIQUE_FEWSHOT_EXAMPLES = """"""


SELF_REFINE_CRITIQUE_INSTRUCTION_MBPP = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the critique instruction.

The SELF_REFINE_CRITIQUE_INSTRUCTION_MBPP is currently empty and needs to be populated with appropriate critique instructions.

SELF_REFINE_CRITIQUE_INSTRUCTION_MBPP = """"""


MBPP_REFINE_FEWSHOT_EXAMPLES = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the few-shot examples.

The MBPP_REFINE_FEWSHOT_EXAMPLES is currently empty and needs to be populated with appropriate few-shot examples.

MBPP_REFINE_FEWSHOT_EXAMPLES = """"""


SELF_REFINE_REFINE_INSTRUCTION_MBPP = """"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content to the refinement instruction.

The SELF_REFINE_REFINE_INSTRUCTION_MBPP is currently empty and needs to be populated with appropriate refinement instructions.

@alckasoc alckasoc changed the title Code for Self-Refine (HumanEval, MBPP) Standardize CRITIC/Self-Refine Few-shots for Math, Code for Self-Refine (HumanEval, MBPP) Jul 13, 2024
@alckasoc alckasoc linked an issue Jul 13, 2024 that may be closed by this pull request
@alckasoc alckasoc changed the title Standardize CRITIC/Self-Refine Few-shots for Math, Code for Self-Refine (HumanEval, MBPP) Standardize CRITIC/Self-Refine Few-shots for Math, Standardize error types in Reflexion with CRITIC/SR, Code for Self-Refine (HumanEval, MBPP) Jul 13, 2024
@alckasoc alckasoc linked an issue Jul 13, 2024 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (1)
docs/docs/index.md (1)

127-139: Reword repetitive sentences for better readability.

Consider rewording the sentences to avoid repetitive beginnings and improve readability.

- | **HotpotQA**  | 1. Misinterpretation<br>2. Incorrect assumption<br>3. Misinterpretation<br>4. Misinterpretation<br>5. Misinterpretation | 1. Misled action<br>2. Misled action<br>3. Misread context<br>4. Wrong answer<br>5. Logical error |
+ | **HotpotQA**  | 1. Misinterpretation<br>2. Incorrect assumption<br>3. Misinterpretation<br>4. Misinterpretation<br>5. Misinterpretation | 1. Misled action<br>2. Misled action<br>3. Misread context<br>4. Wrong answer<br>5. Logical error |
+ | **FEVER**     | 1. Insufficient info<br>2. Misinterpretation<br>3. Insufficient info<br>4. Insufficient info<br>5. Misinterpretation | 1. Ignored context<br>2. Insufficient info<br>3. Insufficient info<br>4. Ignore context<br>5. Ignore context |
+ | **AmbigNQ**   | 1. Knowledge error<br>2. Knowledge error<br>3. Knowledge error<br>4. Misinterpret question<br>5. Knowledge error | 1. Incorrect assumption/Insufficient info<br>2. Insufficient info<br>3. Knowledge error<br>4. Incorrect answer format<br>5. Misread context |
+ | **TriviaQA**  | 1. Incorrect assumption<br>2. Incorrect assumption<br>3. Incorrect assumption<br>4. Misinterpretation<br>5. Incorrect assumption | 1. Ignore context<br>2. Ignore context<br>3. Ignore context<br>4. Ignore context<br>5. Ignore context |
+ | **GSM8K**     | 1. Logical error<br>2. Logical error<br>3. Misinterpret question<br>4. Logical error<br>5. Misinterpret question | 1. Logical error/Misinterpret question<br>2. Logical error/Misinterpret question<br>3. Logical error/Re-calculation error<br>4. Logical error/Re-calculation error<br>5. Logical error/Misinterpret question |
+ | **SVAMP**     | 1. Logical error<br>2. Logical error<br>3. Logical error<br>4. Logical error<br>5. Logical error | 1. Misinterpret question<br>2. Logical error<br>3. Logical error<br>4. Logical error<br>5. Logical error |
+ | **TabMWP**    | 1. Incorrect operator<br>2. Incorrect operator<br>3. Misinterpret question<br>4. Incorrect operator<br>5. Logical error | 1. Misinterpret question<br>2. Logical error<br>3. Logical error<br>4. Re-calculation error<br>5. Logical error |
+ | **HumanEval** | 1. Conceptual error<br>2. Logical error<br>3. Logical error<br>4. Logical error<br>5. Logical error | 1. Logical error<br>2. Logical error<br>3. Logical error<br>4. Logical error<br>5. Logical error |
+ | **MBPP**      | 1. Logical error<br>2. Logical error<br>3. Incorrect function usage<br>4. Logical error<br>5. Logical error | 1. Incorrect function implementation<br>2. Logical error<br>3. Incorrect function usage<br>4. Logical error<br>5. Logical error |
Tools
LanguageTool

[style] ~131-~131: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...pretation
4. Misinterpretation
5. Misinterpretation | 1. Misled action
2. Misled action<...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~133-~133: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ledge error
2. Knowledge error
3. Knowledge error
4. Misinterpret question
5....

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...mption
2. Incorrect assumption
3. Incorrect assumption
4. Misinterpretation
5...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nore context
2. Ignore context
3. Ignore context
4. Ignore context
5. Igno...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nore context
3. Ignore context
4. Ignore context
5. Ignore context | | **GSM8...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nore context
4. Ignore context
5. Ignore context | | GSM8K | 1. Logical ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~135-~135: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...gical error/Misinterpret question
3. Logical error/Re-calculation error
4. Logica...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~135-~135: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ogical error/Re-calculation error
4. Logical error/Re-calculation error
5. Logica...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~135-~135: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ogical error/Re-calculation error
5. Logical error/Misinterpret question | | **SVAMP...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... SVAMP | 1. Logical error
2. Logical error
3. Logical error
4. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
2. Logical error
3. Logical error
4. Logical error
5. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | 1. Misinter...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | 1. Misinterpret question
2. ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | | *TabMWP...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | | TabMWP | 1. Incorrect ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | 1. Logical ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | 1. Logical error
2. Logical ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... Logical error
5. Logical error | 1. Logical error
2. Logical error
3. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... Logical error | 1. Logical error
2. Logical error
3. Logical error
4. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
2. Logical error
3. Logical error
4. Logical error
5. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | | MBPP ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | | MBPP | 1. Logical er...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~139-~139: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... MBPP | 1. Logical error
2. Logical error
3. Incorrect function usage<br...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ff48401 and 6f3ae4d.

Files selected for processing (5)
  • agential/cog/reflexion/prompts.py (3 hunks)
  • agential/cog/self_refine/functional.py (3 hunks)
  • agential/cog/self_refine/prompts.py (28 hunks)
  • docs/docs/index.md (2 hunks)
  • notebooks/self_refine.ipynb (3 hunks)
Files not summarized due to errors (1)
  • agential/cog/self_refine/prompts.py: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (1)
  • notebooks/self_refine.ipynb
Additional context used
LanguageTool
docs/docs/index.md

[style] ~131-~131: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...pretation
4. Misinterpretation
5. Misinterpretation | 1. Misled action
2. Misled action<...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~133-~133: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ledge error
2. Knowledge error
3. Knowledge error
4. Misinterpret question
5....

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...mption
2. Incorrect assumption
3. Incorrect assumption
4. Misinterpretation
5...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nore context
2. Ignore context
3. Ignore context
4. Ignore context
5. Igno...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nore context
3. Ignore context
4. Ignore context
5. Ignore context | | **GSM8...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~134-~134: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nore context
4. Ignore context
5. Ignore context | | GSM8K | 1. Logical ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~135-~135: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...gical error/Misinterpret question
3. Logical error/Re-calculation error
4. Logica...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~135-~135: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ogical error/Re-calculation error
4. Logical error/Re-calculation error
5. Logica...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~135-~135: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ogical error/Re-calculation error
5. Logical error/Misinterpret question | | **SVAMP...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... SVAMP | 1. Logical error
2. Logical error
3. Logical error
4. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
2. Logical error
3. Logical error
4. Logical error
5. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | 1. Misinter...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | 1. Misinterpret question
2. ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | | *TabMWP...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | | TabMWP | 1. Incorrect ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | 1. Logical ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | 1. Logical error
2. Logical ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... Logical error
5. Logical error | 1. Logical error
2. Logical error
3. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... Logical error | 1. Logical error
2. Logical error
3. Logical error
4. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
2. Logical error
3. Logical error
4. Logical error
5. Logical...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
3. Logical error
4. Logical error
5. Logical error | | MBPP ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~138-~138: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Logical error
4. Logical error
5. Logical error | | MBPP | 1. Logical er...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~139-~139: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... MBPP | 1. Logical error
2. Logical error
3. Incorrect function usage<br...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

Additional comments not posted (6)
docs/docs/index.md (1)

Line range hint 111-125:
LGTM!

The section on CRITIC and Self-Refine error types is clear and well-organized.

Tools
Markdownlint

109-109: null
Multiple headings with the same content

(MD024, no-duplicate-heading)

agential/cog/self_refine/prompts.py (2)

1148-1150: Improved Instructional Clarity.

The updated SELF_REFINE_INSTRUCTION_GSM8K now includes specific guidelines for writing Python code and storing the result in a variable named 'answer'. This enhances clarity and standardization.


1564-1566: Improved Instructional Clarity.

The updated SELF_REFINE_INSTRUCTION_SVAMP now includes specific guidelines for writing Python code and storing the result in a variable named 'answer'. This enhances clarity and standardization.

agential/cog/reflexion/prompts.py (3)

110-110: Ensure the example is comprehensive and accurate.

The few-shot example for HOTPOTQA using Reflexion-React-Reflect strategy should be comprehensive and accurate to guide the model effectively.

Ensure that the example covers a variety of scenarios and provides clear reflections to improve the model's reasoning.


1716-1716: Ensure the example is comprehensive and accurate.

The few-shot example for HUMANEVAL using Reflexion-React-Reflect strategy should be comprehensive and accurate to guide the model effectively.

Ensure that the example covers a variety of scenarios and provides clear reflections to improve the model's reasoning.


1726-1726: Ensure the example is comprehensive and accurate.

The few-shot example for MBPP using Reflexion-React-Reflect strategy should be comprehensive and accurate to guide the model effectively.

Ensure that the example covers a variety of scenarios and provides clear reflections to improve the model's reasoning.

Comment on lines +152 to +154
print("<OUT CRITIQUE=======================================================>")
print(repr(out))
print("<OUT CRITIQUE=======================================================>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using logging instead of print statements.

While print statements help with debugging, using a logging framework would be more flexible and appropriate for production code.

-    print("<OUT CRITIQUE=======================================================>")
-    print(repr(out))
-    print("<OUT CRITIQUE=======================================================>")
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.debug("<OUT CRITIQUE=======================================================>")
+    logger.debug(repr(out))
+    logger.debug("<OUT CRITIQUE=======================================================>")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("<OUT CRITIQUE=======================================================>")
print(repr(out))
print("<OUT CRITIQUE=======================================================>")
import logging
logger = logging.getLogger(__name__)
logger.debug("<OUT CRITIQUE=======================================================>")
logger.debug(repr(out))
logger.debug("<OUT CRITIQUE=======================================================>")

Comment on lines +73 to +75
print("<OUT AGENT=======================================================>")
print(repr(out))
print("<OUT AGENT=======================================================>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using logging instead of print statements.

While print statements help with debugging, using a logging framework would be more flexible and appropriate for production code.

-    print("<OUT AGENT=======================================================>")
-    print(repr(out))
-    print("<OUT AGENT=======================================================>")
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.debug("<OUT AGENT=======================================================>")
+    logger.debug(repr(out))
+    logger.debug("<OUT AGENT=======================================================>")

Committable suggestion was skipped due to low confidence.

Comment on lines +223 to +225
print("<PROMPT REFINE=======================================================>")
print(prompt)
print("<PROMPT REFINE=======================================================>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using logging instead of print statements.

While print statements help with debugging, using a logging framework would be more flexible and appropriate for production code.

-    print("<PROMPT REFINE=======================================================>")
-    print(prompt)
-    print("<PROMPT REFINE=======================================================>")
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.debug("<PROMPT REFINE=======================================================>")
+    logger.debug(prompt)
+    logger.debug("<PROMPT REFINE=======================================================>")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("<PROMPT REFINE=======================================================>")
print(prompt)
print("<PROMPT REFINE=======================================================>")
import logging
logger = logging.getLogger(__name__)
logger.debug("<PROMPT REFINE=======================================================>")
logger.debug(prompt)
logger.debug("<PROMPT REFINE=======================================================>")

Comment on lines +142 to +144
print("<PROMPT CRITIQUE=======================================================>")
print(prompt)
print("<PROMPT CRITIQUE=======================================================>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using logging instead of print statements.

While print statements help with debugging, using a logging framework would be more flexible and appropriate for production code.

-    print("<PROMPT CRITIQUE=======================================================>")
-    print(prompt)
-    print("<PROMPT CRITIQUE=======================================================>")
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.debug("<PROMPT CRITIQUE=======================================================>")
+    logger.debug(prompt)
+    logger.debug("<PROMPT CRITIQUE=======================================================>")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("<PROMPT CRITIQUE=======================================================>")
print(prompt)
print("<PROMPT CRITIQUE=======================================================>")
import logging
logger = logging.getLogger(__name__)
logger.debug("<PROMPT CRITIQUE=======================================================>")
logger.debug(prompt)
logger.debug("<PROMPT CRITIQUE=======================================================>")

Comment on lines +63 to +65
print("<PROMPT AGENT=======================================================>")
print(prompt)
print("<PROMPT AGENT=======================================================>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using logging instead of print statements.

While print statements help with debugging, using a logging framework would be more flexible and appropriate for production code.

-    print("<PROMPT AGENT=======================================================>")
-    print(prompt)
-    print("<PROMPT AGENT=======================================================>")
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.debug("<PROMPT AGENT=======================================================>")
+    logger.debug(prompt)
+    logger.debug("<PROMPT AGENT=======================================================>")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("<PROMPT AGENT=======================================================>")
print(prompt)
print("<PROMPT AGENT=======================================================>")
import logging
logger = logging.getLogger(__name__)
logger.debug("<PROMPT AGENT=======================================================>")
logger.debug(prompt)
logger.debug("<PROMPT AGENT=======================================================>")

Comment on lines +233 to +235
print("<OUT REFINE=======================================================>")
print(repr(out))
print("<OUT REFINE=======================================================>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using logging instead of print statements.

While print statements help with debugging, using a logging framework would be more flexible and appropriate for production code.

-    print("<OUT REFINE=======================================================>")
-    print(repr(out))
-    print("<OUT REFINE=======================================================>")
+    import logging
+    logger = logging.getLogger(__name__)
+    logger.debug("<OUT REFINE=======================================================>")
+    logger.debug(repr(out))
+    logger.debug("<OUT REFINE=======================================================>")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("<OUT REFINE=======================================================>")
print(repr(out))
print("<OUT REFINE=======================================================>")
import logging
logger = logging.getLogger(__name__)
logger.debug("<OUT REFINE=======================================================>")
logger.debug(repr(out))
logger.debug("<OUT REFINE=======================================================>")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6f3ae4d and 73efe03.

Files selected for processing (1)
  • agential/cog/self_refine/strategies/code.py (1 hunks)
Additional context used
GitHub Check: codecov/patch
agential/cog/self_refine/strategies/code.py

[warning] 3-3: agential/cog/self_refine/strategies/code.py#L3
Added line #L3 was not covered by tests


[warning] 5-5: agential/cog/self_refine/strategies/code.py#L5
Added line #L5 was not covered by tests


[warning] 7-7: agential/cog/self_refine/strategies/code.py#L7
Added line #L7 was not covered by tests


[warning] 12-13: agential/cog/self_refine/strategies/code.py#L12-L13
Added lines #L12 - L13 were not covered by tests


[warning] 16-16: agential/cog/self_refine/strategies/code.py#L16
Added line #L16 was not covered by tests


[warning] 25-25: agential/cog/self_refine/strategies/code.py#L25
Added line #L25 was not covered by tests


[warning] 27-31: agential/cog/self_refine/strategies/code.py#L27-L31
Added lines #L27 - L31 were not covered by tests


[warning] 33-33: agential/cog/self_refine/strategies/code.py#L33
Added line #L33 was not covered by tests


[warning] 53-53: agential/cog/self_refine/strategies/code.py#L53
Added line #L53 was not covered by tests


[warning] 60-60: agential/cog/self_refine/strategies/code.py#L60
Added line #L60 was not covered by tests


[warning] 62-62: agential/cog/self_refine/strategies/code.py#L62
Added line #L62 was not covered by tests


[warning] 64-64: agential/cog/self_refine/strategies/code.py#L64
Added line #L64 was not covered by tests


[warning] 87-87: agential/cog/self_refine/strategies/code.py#L87
Added line #L87 was not covered by tests


[warning] 96-99: agential/cog/self_refine/strategies/code.py#L96-L99
Added lines #L96 - L99 were not covered by tests


[warning] 101-101: agential/cog/self_refine/strategies/code.py#L101
Added line #L101 was not covered by tests


[warning] 103-103: agential/cog/self_refine/strategies/code.py#L103
Added line #L103 was not covered by tests


[warning] 105-105: agential/cog/self_refine/strategies/code.py#L105
Added line #L105 was not covered by tests


[warning] 115-115: agential/cog/self_refine/strategies/code.py#L115
Added line #L115 was not covered by tests


[warning] 117-117: agential/cog/self_refine/strategies/code.py#L117
Added line #L117 was not covered by tests


[warning] 139-139: agential/cog/self_refine/strategies/code.py#L139
Added line #L139 was not covered by tests


[warning] 148-148: agential/cog/self_refine/strategies/code.py#L148
Added line #L148 was not covered by tests


[warning] 150-150: agential/cog/self_refine/strategies/code.py#L150
Added line #L150 was not covered by tests


[warning] 152-152: agential/cog/self_refine/strategies/code.py#L152
Added line #L152 was not covered by tests


[warning] 160-160: agential/cog/self_refine/strategies/code.py#L160
Added line #L160 was not covered by tests


[warning] 162-162: agential/cog/self_refine/strategies/code.py#L162
Added line #L162 was not covered by tests


[warning] 170-172: agential/cog/self_refine/strategies/code.py#L170-L172
Added lines #L170 - L172 were not covered by tests


[warning] 175-175: agential/cog/self_refine/strategies/code.py#L175
Added line #L175 was not covered by tests


[warning] 178-178: agential/cog/self_refine/strategies/code.py#L178
Added line #L178 was not covered by tests


[warning] 181-181: agential/cog/self_refine/strategies/code.py#L181
Added line #L181 was not covered by tests


[warning] 184-184: agential/cog/self_refine/strategies/code.py#L184
Added line #L184 was not covered by tests

Additional comments not posted (9)
agential/cog/self_refine/strategies/code.py (9)

25-31: Constructor Initialization

The constructor properly initializes the SelfRefineCodeStrategy class, setting default values for patience, _prev_code_answer, patience_counter, and _halt. The attributes are well-documented.

Tools
GitHub Check: codecov/patch

[warning] 25-25: agential/cog/self_refine/strategies/code.py#L25
Added line #L25 was not covered by tests


[warning] 27-31: agential/cog/self_refine/strategies/code.py#L27-L31
Added lines #L27 - L31 were not covered by tests


33-62: Verify the format of the answer string

The generate method extracts Python code from the answer string by splitting on "python" and "". Ensure that the format of the answer string is consistent and that edge cases are handled.

Tools
GitHub Check: codecov/patch

[warning] 33-33: agential/cog/self_refine/strategies/code.py#L33
Added line #L33 was not covered by tests


[warning] 53-53: agential/cog/self_refine/strategies/code.py#L53
Added line #L53 was not covered by tests


[warning] 60-60: agential/cog/self_refine/strategies/code.py#L60
Added line #L60 was not covered by tests


[warning] 62-62: agential/cog/self_refine/strategies/code.py#L62
Added line #L62 was not covered by tests


64-103: Verify the usage of EM and halting logic

The generate_critique method uses EM to check if the answer remains the same and increments the patience_counter. Verify that EM is the appropriate method for this comparison and that the halting logic works as intended.

Tools
GitHub Check: codecov/patch

[warning] 64-64: agential/cog/self_refine/strategies/code.py#L64
Added line #L64 was not covered by tests


[warning] 87-87: agential/cog/self_refine/strategies/code.py#L87
Added line #L87 was not covered by tests


[warning] 96-99: agential/cog/self_refine/strategies/code.py#L96-L99
Added lines #L96 - L99 were not covered by tests


[warning] 101-101: agential/cog/self_refine/strategies/code.py#L101
Added line #L101 was not covered by tests


[warning] 103-103: agential/cog/self_refine/strategies/code.py#L103
Added line #L103 was not covered by tests


105-115: LGTM!

The create_output_dict method is straightforward and correctly implemented.

Tools
GitHub Check: codecov/patch

[warning] 105-105: agential/cog/self_refine/strategies/code.py#L105
Added line #L105 was not covered by tests


[warning] 115-115: agential/cog/self_refine/strategies/code.py#L115
Added line #L115 was not covered by tests


117-150: Verify the format of the updated answer string

The update_answer_based_on_critique method extracts Python code from the updated answer string by splitting on "python" and "". Ensure that the format of the updated answer string is consistent and that edge cases are handled.

Tools
GitHub Check: codecov/patch

[warning] 117-117: agential/cog/self_refine/strategies/code.py#L117
Added line #L117 was not covered by tests


[warning] 139-139: agential/cog/self_refine/strategies/code.py#L139
Added line #L139 was not covered by tests


[warning] 148-148: agential/cog/self_refine/strategies/code.py#L148
Added line #L148 was not covered by tests


[warning] 150-150: agential/cog/self_refine/strategies/code.py#L150
Added line #L150 was not covered by tests


152-160: LGTM!

The halting_condition method is straightforward and correctly implemented.

Tools
GitHub Check: codecov/patch

[warning] 152-152: agential/cog/self_refine/strategies/code.py#L152
Added line #L152 was not covered by tests


[warning] 160-160: agential/cog/self_refine/strategies/code.py#L160
Added line #L160 was not covered by tests


162-172: LGTM!

The reset method is straightforward and correctly implemented.

Tools
GitHub Check: codecov/patch

[warning] 162-162: agential/cog/self_refine/strategies/code.py#L162
Added line #L162 was not covered by tests


[warning] 170-172: agential/cog/self_refine/strategies/code.py#L170-L172
Added lines #L170 - L172 were not covered by tests


175-178: LGTM!

The SelfRefineHEvalStrategy class inherits from SelfRefineCodeStrategy and does not add any new functionality.

Tools
GitHub Check: codecov/patch

[warning] 175-175: agential/cog/self_refine/strategies/code.py#L175
Added line #L175 was not covered by tests


[warning] 178-178: agential/cog/self_refine/strategies/code.py#L178
Added line #L178 was not covered by tests


181-184: LGTM!

The SelfRefineMBPPStrategy class inherits from SelfRefineCodeStrategy and does not add any new functionality.

Tools
GitHub Check: codecov/patch

[warning] 181-181: agential/cog/self_refine/strategies/code.py#L181
Added line #L181 was not covered by tests


[warning] 184-184: agential/cog/self_refine/strategies/code.py#L184
Added line #L184 was not covered by tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 73efe03 and 078fdcd.

Files selected for processing (1)
  • agential/cog/self_refine/factory.py (6 hunks)
Additional comments not posted (6)
agential/cog/self_refine/factory.py (6)

43-52: Imports for HUMANEVAL and MBPP constants look good.

The added import statements for HUMANEVAL and MBPP related constants are correct and necessary for the functionality.


66-69: Imports for HUMANEVAL and MBPP strategies look good.

The added import statements for HUMANEVAL and MBPP strategies are correct and necessary for the functionality.


80-81: Additions to SELF_REFINE_BENCHMARK_FEWSHOTS look good.

The HUMANEVAL and MBPP benchmarks were correctly added to the SELF_REFINE_BENCHMARK_FEWSHOTS dictionary.


121-128: Additions to SELF_REFINE_PROMPTS look good.

The HUMANEVAL and MBPP benchmarks were correctly added to the SELF_REFINE_PROMPTS dictionary with their respective prompts.


161-168: Additions to SELF_REFINE_FEWSHOTS look good.

The HUMANEVAL and MBPP benchmarks were correctly added to the SELF_REFINE_FEWSHOTS dictionary with their respective few-shot examples.


179-180: Additions to SELF_REFINE_STRATEGIES look good.

The HUMANEVAL and MBPP strategies were correctly added to the SELF_REFINE_STRATEGIES dictionary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 078fdcd and f59b2be.

Files selected for processing (6)
  • agential/cog/self_refine/factory.py (6 hunks)
  • agential/cog/self_refine/prompts.py (28 hunks)
  • agential/cog/self_refine/strategies/code.py (1 hunks)
  • notebooks/critic.ipynb (1 hunks)
  • notebooks/reflexion.ipynb (1 hunks)
  • tests/cog/self_refine/strategies/test_code.py (1 hunks)
Files not summarized due to errors (1)
  • agential/cog/self_refine/prompts.py: Error: Message exceeds token limit
Files skipped from review due to trivial changes (2)
  • notebooks/critic.ipynb
  • notebooks/reflexion.ipynb
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/self_refine/factory.py
Additional context used
GitHub Check: codecov/patch
agential/cog/self_refine/strategies/code.py

[warning] 27-31: agential/cog/self_refine/strategies/code.py#L27-L31
Added lines #L27 - L31 were not covered by tests


[warning] 53-53: agential/cog/self_refine/strategies/code.py#L53
Added line #L53 was not covered by tests


[warning] 60-60: agential/cog/self_refine/strategies/code.py#L60
Added line #L60 was not covered by tests


[warning] 62-62: agential/cog/self_refine/strategies/code.py#L62
Added line #L62 was not covered by tests


[warning] 87-87: agential/cog/self_refine/strategies/code.py#L87
Added line #L87 was not covered by tests


[warning] 96-99: agential/cog/self_refine/strategies/code.py#L96-L99
Added lines #L96 - L99 were not covered by tests


[warning] 101-101: agential/cog/self_refine/strategies/code.py#L101
Added line #L101 was not covered by tests


[warning] 103-103: agential/cog/self_refine/strategies/code.py#L103
Added line #L103 was not covered by tests


[warning] 115-115: agential/cog/self_refine/strategies/code.py#L115
Added line #L115 was not covered by tests


[warning] 139-139: agential/cog/self_refine/strategies/code.py#L139
Added line #L139 was not covered by tests


[warning] 148-148: agential/cog/self_refine/strategies/code.py#L148
Added line #L148 was not covered by tests


[warning] 150-150: agential/cog/self_refine/strategies/code.py#L150
Added line #L150 was not covered by tests


[warning] 160-160: agential/cog/self_refine/strategies/code.py#L160
Added line #L160 was not covered by tests


[warning] 170-172: agential/cog/self_refine/strategies/code.py#L170-L172
Added lines #L170 - L172 were not covered by tests

Additional comments not posted (10)
tests/cog/self_refine/strategies/test_code.py (1)

1-16: LGTM! Import statements are correct.

The import statements are necessary for the functionality of the tests and are correctly included.

agential/cog/self_refine/strategies/code.py (1)

105-116: LGTM! The method correctly creates the output dictionary.

The create_output_dict method is straightforward and correctly creates the output dictionary.

Tools
GitHub Check: codecov/patch

[warning] 115-115: agential/cog/self_refine/strategies/code.py#L115
Added line #L115 was not covered by tests

agential/cog/self_refine/prompts.py (8)

1148-1150: Approved: Improved clarity in instructions.

The updated instruction provides clearer guidance for writing Python code to solve the questions.


1161-1206: Approved: Correct identification of inefficiency in example code.

The critique correctly identifies the inefficiency in the example code and suggests a better solution.


1214-1222: Approved: Correct identification of variable naming error.

The critique correctly identifies the variable naming error in the example code and suggests a better solution.


1232-1245: Approved: Correct identification of logical error.

The critique correctly identifies the logical error in the example code and suggests a better solution.


1261-1289: Approved: Correct identification of logical error.

The critique correctly identifies the logical error in the example code and suggests a better solution.


1303-1318: Approved: Correct identification of logical error.

The critique correctly identifies the logical error in the example code and suggests a better solution.


1338-1383: Approved: Correct identification of inefficiency in example code.

The critique correctly identifies the inefficiency in the example code and suggests a better solution.


1399-1407: Approved: Correct identification of variable naming error.

The critique correctly identifies the variable naming error in the example code and suggests a better solution.

Comment on lines 19 to 21
def test_init() -> None:
"""Test SelfRefineCodeStrategy initialization."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the test for initialization.

The function test_init is defined but not implemented. It should be implemented to test the initialization of SelfRefineCodeStrategy.

def test_init() -> None:
    """Test SelfRefineCodeStrategy initialization."""
    llm = FakeListChatModel()
    strategy = SelfRefineCodeStrategy(llm=llm, patience=2)
    assert strategy.llm == llm
    assert strategy.patience == 2
    assert strategy._prev_code_answer == ""
    assert strategy.patience_counter == 0
    assert strategy._halt == False

Comment on lines 22 to 24
def test_generate() -> None:
"""Tests SelfRefineCodeStrategy generate."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the test for generate method.

The function test_generate is defined but not implemented. It should be implemented to test the generate method of SelfRefineCodeStrategy.

def test_generate() -> None:
    """Tests SelfRefineCodeStrategy generate."""
    llm = FakeListChatModel()
    strategy = SelfRefineCodeStrategy(llm=llm)
    question = "What is 2 + 2?"
    examples = "Example: 1 + 1 = 2"
    prompt = "Solve the following math problem."
    additional_keys = {}
    answer = strategy.generate(question, examples, prompt, additional_keys)
    assert answer == "4"

Comment on lines 25 to 27
def test_generate_critique() -> None:
"""Tests SelfRefineCodeStrategy generate_critique."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the test for generate_critique method.

The function test_generate_critique is defined but not implemented. It should be implemented to test the generate_critique method of SelfRefineCodeStrategy.

def test_generate_critique() -> None:
    """Tests SelfRefineCodeStrategy generate_critique."""
    llm = FakeListChatModel()
    strategy = SelfRefineCodeStrategy(llm=llm)
    question = "What is 2 + 2?"
    examples = "Example: 1 + 1 = 2"
    answer = "4"
    prompt = "Critique the following answer."
    additional_keys = {}
    critique = strategy.generate_critique(question, examples, answer, prompt, additional_keys)
    assert critique == "The answer is correct."

Comment on lines 28 to 30
def test_create_output_dict() -> None:
"""Tests SelfRefineCodeStrategy create_output_dict."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the test for create_output_dict method.

The function test_create_output_dict is defined but not implemented. It should be implemented to test the create_output_dict method of SelfRefineCodeStrategy.

def test_create_output_dict() -> None:
    """Tests SelfRefineCodeStrategy create_output_dict."""
    llm = FakeListChatModel()
    strategy = SelfRefineCodeStrategy(llm=llm)
    answer = "4"
    critique = "The answer is correct."
    output_dict = strategy.create_output_dict(answer, critique)
    assert output_dict == {"answer": answer, "critique": critique}

Comment on lines 31 to 33
def test_update_answer_based_on_critique() -> None:
"""Tests SelfRefineCodeStrategy update_answer_based_on_critique."""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the test for update_answer_based_on_critique method.

The function test_update_answer_based_on_critique is defined but not implemented. It should be implemented to test the update_answer_based_on_critique method of SelfRefineCodeStrategy.

def test_update_answer_based_on_critique() -> None:
    """Tests SelfRefineCodeStrategy update_answer_based_on_critique."""
    llm = FakeListChatModel()
    strategy = SelfRefineCodeStrategy(llm=llm)
    question = "What is 2 + 2?"
    examples = "Example: 1 + 1 = 2"
    answer = "4"
    critique = "The answer is correct."
    prompt = "Refine the following answer."
    additional_keys = {}
    new_answer = strategy.update_answer_based_on_critique(question, examples, answer, critique, prompt, additional_keys)
    assert new_answer == "4"

Comment on lines 40 to 41
def test_instantiate_strategies() -> None:
"""Test instantiate all Code strategies."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the test for instantiating strategies.

The function test_instantiate_strategies is defined but not implemented. It should be implemented to test the instantiation of all code strategies.

def test_instantiate_strategies() -> None:
    """Test instantiate all Code strategies."""
    llm = FakeListChatModel()
    heval_strategy = SelfRefineHEvalStrategy(llm=llm)
    mbpp_strategy = SelfRefineMBPPStrategy(llm=llm)
    assert isinstance(heval_strategy, SelfRefineHEvalStrategy)
    assert isinstance(mbpp_strategy, SelfRefineMBPPStrategy)

Comment on lines +25 to +32
def __init__(self, llm: BaseChatModel, patience: int = 1) -> None:
"""Initialization."""
super().__init__(llm)
self.patience = patience
self._prev_code_answer = ""
self.patience_counter = 0
self._halt = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type check for llm argument.

The __init__ method correctly initializes the attributes. However, a type check for the llm argument should be added to ensure it is an instance of BaseChatModel.

def __init__(self, llm: BaseChatModel, patience: int = 1) -> None:
    """Initialization."""
    if not isinstance(llm, BaseChatModel):
        raise TypeError("llm must be an instance of BaseChatModel")
    super().__init__(llm)
    self.patience = patience
    self._prev_code_answer = ""
    self.patience_counter = 0
    self._halt = False
Tools
GitHub Check: codecov/patch

[warning] 27-31: agential/cog/self_refine/strategies/code.py#L27-L31
Added lines #L27 - L31 were not covered by tests

Comment on lines +33 to +62
def generate(
self,
question: str,
examples: str,
prompt: str,
additional_keys: Dict[str, str],
**kwargs: Dict[str, Any],
) -> str:
"""Generates an answer for the given question using the provided prompt and examples.

Args:
question (str): The math question to generate an answer for.
examples (str): Few-shot examples to guide the language model.
prompt (str): The prompt to generate an answer.
additional_keys (Dict[str, str]): Additional keys for the prompt.
**kwargs (Dict[str, Any]): Additional arguments.

Returns:
str: The generated answer.
"""
answer = _prompt_agent(
llm=self.llm,
question=question,
examples=examples,
prompt=prompt,
additional_keys=additional_keys,
)
answer = answer.split("```python")[-1].split("```")[0].strip()

return answer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve answer extraction logic.

The generate method processes the answer to extract the Python code. The extraction logic can be improved for clarity and robustness.

def generate(
    self,
    question: str,
    examples: str,
    prompt: str,
    additional_keys: Dict[str, str],
    **kwargs: Dict[str, Any],
) -> str:
    """Generates an answer for the given question using the provided prompt and examples."""
    answer = _prompt_agent(
        llm=self.llm,
        question=question,
        examples=examples,
        prompt=prompt,
        additional_keys=additional_keys,
    )
    # Extract the Python code from the answer
    python_code = answer.partition("```python")[-1].partition("```")[0].strip()
    return python_code
Tools
GitHub Check: codecov/patch

[warning] 53-53: agential/cog/self_refine/strategies/code.py#L53
Added line #L53 was not covered by tests


[warning] 60-60: agential/cog/self_refine/strategies/code.py#L60
Added line #L60 was not covered by tests


[warning] 62-62: agential/cog/self_refine/strategies/code.py#L62
Added line #L62 was not covered by tests

Comment on lines +64 to +103
def generate_critique(
self,
question: str,
examples: str,
answer: str,
prompt: str,
additional_keys: Dict[str, str],
) -> str:
"""Generates a critique for the provided answer using the given prompt and examples.

Stops early if patience is reached and answer remains the same.

Args:
question (str): The math question that was answered.
examples (str): Few-shot examples to guide the language model in generating the critique.
answer (str): The answer to be critiqued.
prompt (str): The prompt to generate a critique.
additional_keys (Dict[str, str]): Additional keys for the prompt.

Returns:
str: The generated critique. If the same incorrect answer is repeated for the number of
interactions specified by patience, the halting condition is triggered.
"""
critique = _prompt_critique(
llm=self.llm,
question=question,
examples=examples,
answer=answer,
prompt=prompt,
additional_keys=additional_keys,
)

if EM(answer.strip(), self._prev_code_answer, normalize=False):
self.patience_counter += 1
if self.patience_counter == self.patience:
self._halt = True
else:
self._prev_code_answer = answer.strip()

return critique
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve halting condition logic.

The generate_critique method checks for halting conditions based on the patience attribute. The logic for checking the halting condition can be improved for clarity.

def generate_critique(
    self,
    question: str,
    examples: str,
    answer: str,
    prompt: str,
    additional_keys: Dict[str, str],
) -> str:
    """Generates a critique for the provided answer using the given prompt and examples."""
    critique = _prompt_critique(
        llm=self.llm,
        question=question,
        examples=examples,
        answer=answer,
        prompt=prompt,
        additional_keys=additional_keys,
    )
    # Check for halting condition
    if EM(answer.strip(), self._prev_code_answer, normalize=False):
        self.patience_counter += 1
        if self.patience_counter >= self.patience:
            self._halt = True
    else:
        self._prev_code_answer = answer.strip()
        self.patience_counter = 0  # Reset counter if the answer changes
    return critique
Tools
GitHub Check: codecov/patch

[warning] 87-87: agential/cog/self_refine/strategies/code.py#L87
Added line #L87 was not covered by tests


[warning] 96-99: agential/cog/self_refine/strategies/code.py#L96-L99
Added lines #L96 - L99 were not covered by tests


[warning] 101-101: agential/cog/self_refine/strategies/code.py#L101
Added line #L101 was not covered by tests


[warning] 103-103: agential/cog/self_refine/strategies/code.py#L103
Added line #L103 was not covered by tests

Comment on lines +117 to +150
def update_answer_based_on_critique(
self,
question: str,
examples: str,
answer: str,
critique: str,
prompt: str,
additional_keys: Dict[str, str],
) -> str:
"""Updates the answer based on the given critique.

Args:
question: The question that was answered by the language model.
examples: Few-shot examples to guide the language model.
answer: The answer provided by the language model.
critique: The critique of the answer.
prompt: The prompt to be used for generating the updated answer.
additional_keys: Additional context or parameters to include in the critique prompt.

Returns:
str: The updated answer.
"""
new_answer = _prompt_refine(
llm=self.llm,
question=question,
examples=examples,
answer=answer,
critique=critique,
prompt=prompt,
additional_keys=additional_keys,
)
new_answer = new_answer.split("```python")[-1].split("```")[0].strip()

return new_answer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve new answer extraction logic.

The update_answer_based_on_critique method processes the new answer to extract the Python code. The extraction logic can be improved for clarity and robustness.

def update_answer_based_on_critique(
    self,
    question: str,
    examples: str,
    answer: str,
    critique: str,
    prompt: str,
    additional_keys: Dict[str, str],
) -> str:
    """Updates the answer based on the given critique."""
    new_answer = _prompt_refine(
        llm=self.llm,
        question=question,
        examples=examples,
        answer=answer,
        critique=critique,
        prompt=prompt,
        additional_keys=additional_keys,
    )
    # Extract the Python

<details>
<summary>Tools</summary>

<details>
<summary>GitHub Check: codecov/patch</summary><blockquote>

[warning] 139-139: agential/cog/self_refine/strategies/code.py#L139
Added line #L139 was not covered by tests

---

[warning] 148-148: agential/cog/self_refine/strategies/code.py#L148
Added line #L148 was not covered by tests

---

[warning] 150-150: agential/cog/self_refine/strategies/code.py#L150
Added line #L150 was not covered by tests

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit -->

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f59b2be and a5ad9b0.

Files selected for processing (2)
  • notebooks/self_refine.ipynb (3 hunks)
  • tests/cog/self_refine/strategies/test_code.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • notebooks/self_refine.ipynb
  • tests/cog/self_refine/strategies/test_code.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a5ad9b0 and fbe78df.

Files selected for processing (1)
  • tests/cog/self_refine/strategies/test_code.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • tests/cog/self_refine/strategies/test_code.py

@alckasoc alckasoc merged commit 4750c58 into main Jul 14, 2024
2 checks passed
@alckasoc alckasoc deleted the self_refine_code branch July 14, 2024 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add-benchmark Adding support for a benchmark enhancement New feature or request
Projects
None yet
2 participants