Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-Refine, Refactoring Self-Refine, Math for Self-Refine (GSM8K, SVAMP, TabMWP) #225

Merged
merged 30 commits into from
Jul 12, 2024

Conversation

alckasoc
Copy link
Member

@alckasoc alckasoc commented Jul 11, 2024

πŸ€” Reasoning

Explain the purpose of this PR...

🚧 Changes

Describe the changes made...

βœ… PR Checklist

  • Using this PR template?
  • Linked issue?
  • Added feature?
    • Added/updated docs?
    • Added/updated tests?

Summary by CodeRabbit

  • New Features
    • Introduced self-refinement prompts and few-shot examples for error identification and correction.
    • Added a factory class for creating instances of self-refine strategies, prompts, and examples based on benchmarks.
  • Tests
    • Added unit tests for Self-Refine functionality, covering initialization, reset, code generation, critique, and refinement strategies.

@alckasoc alckasoc self-assigned this Jul 11, 2024
Copy link
Contributor

coderabbitai bot commented Jul 11, 2024

Warning

Rate limit exceeded

@alckasoc has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 44 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between ea54d38 and ec993bd.

Walkthrough

This update introduces several new test functions and a factory class for self-refinement strategies in the context of mathematical problem-solving. It includes new files for tests and prompts, and the SelfRefineFactory class to facilitate strategy creation. No alterations were made to existing public or exported entities.

Changes

Files Change Summary
tests/cog/self_refine/test_agent.py Added unit tests for the SelfRefineAgent class, covering initialization, reset, and code generation.
tests/cog/self_refine/test_functional.py Added tests for math problem self-refinement, including initialization, generation, critique, and more.
agential/cog/self_refine/factory.py Introduced SelfRefineFactory class for creating self-refine strategies and prompts based on benchmarks.
agential/cog/self_refine/prompts.py Added constants for self-refinement prompts and examples, providing instructions for error identification and correction.

Poem

In code so crisp and functions neat,
Self-refinement brings a treat. 🌟
With tests and prompts all set to go,
Our agents learn and swiftly grow. πŸš€
A factory now stands so grand,
To guide the code with steady hand.
Here's to changes, smart and bright,
Refining paths with pure delight! 🐰✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Jul 11, 2024

Codecov Report

Attention: Patch coverage is 92.41379% with 11 lines in your changes missing coverage. Please review.

Files Coverage Ξ”
agential/cog/self_refine/agent.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/functional.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/output.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/prompts.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/strategies/math.py 100.00% <100.00%> (ΓΈ)
agential/cog/self_refine/factory.py 65.62% <65.62%> (ΓΈ)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Outside diff range, codebase verification and nitpick comments (7)
agential/cog/self_refine/prompts.py (7)

13-27: Typographical error in the critique example.

There's a minor typo in the comment explaining the error in the first example.

- # Warning: the variables should be named better as she got 5 packs of bacon, not 5 bacons.
+ # Warning: the variables should be named better as she got 5 packs of bacon, not 5 bacons.

29-53: Typographical error and clarity improvement in the critique example.

There's a minor typo and the explanation can be improved for clarity.

- # Let's check the other parts
+ # Let's check the other parts.

- # wrong! 6 *packs* of chicken, and each pack costs twice as much as a pack of bacon. So we need to calculate the cost of one pack of bacon first (total bags of bacon / total cost) and use that.
+ # Wrong! 6 *packs* of chicken, and each pack costs twice as much as a pack of bacon. So we need to calculate the cost of one pack of bacon first (total cost / total packs of bacon) and use that.

74-84: Typographical error and clarity improvement in the critique example.

There's a minor typo and the explanation can be improved for clarity.

- # wrong! The cost of a cup is not the same as the cost of a plate. The cost of a cup is $1200 less than the total cost of half a dozen plates sold at $6000 each. So we need to calculate the cost of a cup first (total cost of half a dozen plates sold at $6000 each - $1200) and use that.
+ # Wrong! The cost of a cup is not the same as the cost of a plate. The cost of a cup is $1200 less than the total cost of half a dozen plates sold at $6000 each. So we need to calculate the cost of a cup first (total cost of half a dozen plates sold at $6000 each - $1200) and use that.

100-125: Typographical error and clarity improvement in the critique example.

There's a minor typo and the explanation can be improved for clarity.

- # looks good
+ # Looks good.

- # looks good
+ # Looks good.

- # looks good
+ # Looks good.

- # looks good
+ # Looks good.

174-239: Typographical error in the critique example.

There's a minor typo in the comment explaining the error in the first example.

- # Let's check the other parts
+ # Let's check the other parts.

- # wrong! 6 *packs* of chicken, and each pack costs twice as much as a pack of bacon. So we need to calculate the cost of one pack of bacon first (total bags of bacon / total cost) and use that.
+ # Wrong! 6 *packs* of chicken, and each pack costs twice as much as a pack of bacon. So we need to calculate the cost of one pack of bacon first (total cost / total packs of bacon) and use that.

253-263: Typographical error and clarity improvement in the critique example.

There's a minor typo and the explanation can be improved for clarity.

- # wrong! The cost of a cup is not the same as the cost of a plate. The cost of a cup is $1200 less than the total cost of half a dozen plates sold at $6000 each. So we need to calculate the cost of a cup first (total cost of half a dozen plates sold at $6000 each - $1200) and use that.
+ # Wrong! The cost of a cup is not the same as the cost of a plate. The cost of a cup is $1200 less than the total cost of half a dozen plates sold at $6000 each. So we need to calculate the cost of a cup first (total cost of half a dozen plates sold at $6000 each - $1200) and use that.

290-315: Typographical error and clarity improvement in the critique example.

There's a minor typo and the explanation can be improved for clarity.

- # looks good
+ # Looks good.

- # looks good
+ # Looks good.

- # looks good
+ # Looks good.

- # looks good
+ # Looks good.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d1bd586 and 7918de5.

Files selected for processing (7)
  • agential/cog/self_refine/init.py (1 hunks)
  • agential/cog/self_refine/agent.py (1 hunks)
  • agential/cog/self_refine/functional.py (1 hunks)
  • agential/cog/self_refine/prompts.py (1 hunks)
  • agential/cog/self_refine/strategies/init.py (1 hunks)
  • agential/cog/self_refine/strategies/base.py (1 hunks)
  • agential/cog/self_refine/strategies/math.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • agential/cog/self_refine/init.py
  • agential/cog/self_refine/strategies/init.py
Additional context used
GitHub Check: codecov/patch
agential/cog/self_refine/agent.py

[warning] 7-7: agential/cog/self_refine/agent.py#L7
Added line #L7 was not covered by tests


[warning] 9-9: agential/cog/self_refine/agent.py#L9
Added line #L9 was not covered by tests


[warning] 11-12: agential/cog/self_refine/agent.py#L11-L12
Added lines #L11 - L12 were not covered by tests


[warning] 15-15: agential/cog/self_refine/agent.py#L15
Added line #L15 was not covered by tests


[warning] 29-29: agential/cog/self_refine/agent.py#L29
Added line #L29 was not covered by tests


[warning] 36-36: agential/cog/self_refine/agent.py#L36
Added line #L36 was not covered by tests


[warning] 38-39: agential/cog/self_refine/agent.py#L38-L39
Added lines #L38 - L39 were not covered by tests


[warning] 41-41: agential/cog/self_refine/agent.py#L41
Added line #L41 was not covered by tests


[warning] 45-45: agential/cog/self_refine/agent.py#L45
Added line #L45 was not covered by tests


[warning] 82-83: agential/cog/self_refine/agent.py#L82-L83
Added lines #L82 - L83 were not covered by tests


[warning] 85-85: agential/cog/self_refine/agent.py#L85
Added line #L85 was not covered by tests


[warning] 88-88: agential/cog/self_refine/agent.py#L88
Added line #L88 was not covered by tests


[warning] 90-90: agential/cog/self_refine/agent.py#L90
Added line #L90 was not covered by tests


[warning] 92-92: agential/cog/self_refine/agent.py#L92
Added line #L92 was not covered by tests


[warning] 100-100: agential/cog/self_refine/agent.py#L100
Added line #L100 was not covered by tests


[warning] 102-103: agential/cog/self_refine/agent.py#L102-L103
Added lines #L102 - L103 were not covered by tests


[warning] 106-106: agential/cog/self_refine/agent.py#L106
Added line #L106 was not covered by tests


[warning] 115-115: agential/cog/self_refine/agent.py#L115
Added line #L115 was not covered by tests


[warning] 117-117: agential/cog/self_refine/agent.py#L117
Added line #L117 was not covered by tests


[warning] 119-119: agential/cog/self_refine/agent.py#L119
Added line #L119 was not covered by tests

agential/cog/self_refine/functional.py

[warning] 3-3: agential/cog/self_refine/functional.py#L3
Added line #L3 was not covered by tests


[warning] 5-7: agential/cog/self_refine/functional.py#L5-L7
Added lines #L5 - L7 were not covered by tests


[warning] 10-10: agential/cog/self_refine/functional.py#L10
Added line #L10 was not covered by tests


[warning] 27-27: agential/cog/self_refine/functional.py#L27
Added line #L27 was not covered by tests


[warning] 32-32: agential/cog/self_refine/functional.py#L32
Added line #L32 was not covered by tests


[warning] 35-35: agential/cog/self_refine/functional.py#L35
Added line #L35 was not covered by tests


[warning] 57-57: agential/cog/self_refine/functional.py#L57
Added line #L57 was not covered by tests


[warning] 63-63: agential/cog/self_refine/functional.py#L63
Added line #L63 was not covered by tests


[warning] 70-71: agential/cog/self_refine/functional.py#L70-L71
Added lines #L70 - L71 were not covered by tests


[warning] 74-74: agential/cog/self_refine/functional.py#L74
Added line #L74 was not covered by tests

Additional comments not posted (17)
agential/cog/self_refine/strategies/base.py (4)

18-39: Abstract method generate_critique looks good.

The method is correctly defined as an abstract method with a comprehensive docstring.


41-52: Abstract method create_output_dict looks good.

The method is correctly defined as an abstract method with a comprehensive docstring.


54-77: Abstract method update_answer_based_on_critique looks good.

The method is correctly defined as an abstract method with a comprehensive docstring.


79-86: Abstract method halting_condition looks good.

The method is correctly defined as an abstract method with a comprehensive docstring.

agential/cog/self_refine/agent.py (2)

29-43: Constructor __init__ looks good.

The method is correctly defined and the initialization logic is clear and concise.

Tools
GitHub Check: codecov/patch

[warning] 29-29: agential/cog/self_refine/agent.py#L29
Added line #L29 was not covered by tests


[warning] 36-36: agential/cog/self_refine/agent.py#L36
Added line #L36 was not covered by tests


[warning] 38-39: agential/cog/self_refine/agent.py#L38-L39
Added lines #L38 - L39 were not covered by tests


[warning] 41-41: agential/cog/self_refine/agent.py#L41
Added line #L41 was not covered by tests


117-119: Method reset looks good.

The method is correctly defined and calls the reset method of the strategy.

Tools
GitHub Check: codecov/patch

[warning] 117-117: agential/cog/self_refine/agent.py#L117
Added line #L117 was not covered by tests


[warning] 119-119: agential/cog/self_refine/agent.py#L119
Added line #L119 was not covered by tests

agential/cog/self_refine/strategies/math.py (8)

24-31: Constructor __init__ looks good.

The method is correctly defined and the initialization logic is clear and concise.


32-61: Method generate looks good.

The method is correctly defined and the logic for generating the answer is clear and concise.


63-102: Method generate_critique looks good.

The method is correctly defined and the logic for generating the critique is clear and concise. The halting condition based on patience is a nice addition.


104-114: Method create_output_dict looks good.

The method is correctly defined and the logic for creating the output dictionary is clear and concise.


116-149: Method update_answer_based_on_critique looks good.

The method is correctly defined and the logic for updating the answer is clear and concise.


151-159: Method halting_condition looks good.

The method is correctly defined and the logic for checking the halting condition is clear and concise.


161-174: Method reset looks good.

The method is correctly defined and the logic for resetting the strategy is clear and concise.


177-180: Class SelfRefineGSM8KStrategy looks good.

The class is correctly defined as a subclass of SelfRefineMathStrategy.

agential/cog/self_refine/prompts.py (3)

6-10: LGTM! The constant is well-formed.

The SELF_REFINE_INSTRUCTION_GSM8K constant is consistent with the file's purpose.


163-171: LGTM! The constant is well-formed.

The SELF_REFINE_CRITIQUE_INSTRUCTION_GSM8K constant is consistent with the file's purpose.


379-391: LGTM! The constant is well-formed.

The SELF_REFINE_REFINE_INSTRUCTION_GSM8K constant is consistent with the file's purpose.

Comment on lines +10 to +32
def _build_agent_prompt(
question: str,
examples: str,
prompt: str,
additional_keys: Dict[str, str] = {},
) -> str:
"""Constructs a formatted prompt for the agent based on the question and provided fewshot examples.

Parameters:
question (str): The main question for which the agent is to generate an answer.
examples (str): Pre-formatted few-shot examples that provide context for the question.
prompt (str): The base template string into which all other components will be inserted.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.

Returns:
str: The fully constructed and formatted prompt ready to be processed by the agent.
"""
prompt = PromptTemplate.from_template(prompt).format(
question=question,
examples=examples,
**additional_keys,
)
return prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _build_agent_prompt looks good, but add tests.

The function is correctly defined and the logic for constructing the prompt is clear and concise. However, it is not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Tools
GitHub Check: codecov/patch

[warning] 10-10: agential/cog/self_refine/functional.py#L10
Added line #L10 was not covered by tests


[warning] 27-27: agential/cog/self_refine/functional.py#L27
Added line #L27 was not covered by tests


[warning] 32-32: agential/cog/self_refine/functional.py#L32
Added line #L32 was not covered by tests

Comment on lines +35 to +71
def _prompt_agent(
llm: BaseChatModel,
question: str,
examples: str,
prompt: str,
additional_keys: Dict[str, str] = {},
) -> str:
"""Generates a response from the LLM based on a given question with fewshot examples.

This function creates a prompt using `_build_agent_prompt` and then gets the LLM's
output.

Args:
llm (BaseChatModel): The language model to be prompted.
question (str): The main question for which the agent is to generate an answer.
examples (str): Pre-formatted few-shot examples that provide context for the question.
prompt (str): The base template string into which all other components will be inserted.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.

Returns:
str: The processed response from the language model.
"""
prompt = _build_agent_prompt(
question=question,
examples=examples,
prompt=prompt,
additional_keys=additional_keys,
)
out = llm(
[
HumanMessage(
content=prompt,
)
]
).content
assert isinstance(out, str)
return out.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _prompt_agent looks good, but add tests.

The function is correctly defined and the logic for generating the response is clear and concise. However, it is not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Tools
GitHub Check: codecov/patch

[warning] 35-35: agential/cog/self_refine/functional.py#L35
Added line #L35 was not covered by tests


[warning] 57-57: agential/cog/self_refine/functional.py#L57
Added line #L57 was not covered by tests


[warning] 63-63: agential/cog/self_refine/functional.py#L63
Added line #L63 was not covered by tests


[warning] 70-71: agential/cog/self_refine/functional.py#L70-L71
Added lines #L70 - L71 were not covered by tests

Comment on lines +74 to +103
def _build_critique_prompt(
question: str,
examples: str,
answer: str,
prompt: str,
additional_keys: Dict[str, str] = {},
) -> str:
"""Builds critique prompt.

This function compiles a detailed prompt with contextual examples and a specific question format, then
prompts the language model for a response.

Parameters:
llm (BaseChatModel): The language model to prompt for a response.
question (str): The question to be answered by the language model.
examples (str): Pre-formatted examples that provide context to the question.
answer (str): The answer to the question.
prompt (str): Prompt template string.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.

Returns:
str: The language model's response to the question, trimmed of extraneous whitespace.
"""
prompt = PromptTemplate.from_template(prompt).format(
question=question,
examples=examples,
answer=answer,
**additional_keys,
)
return prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _build_critique_prompt looks good, but add tests.

The function is correctly defined and the logic for building the critique prompt is clear and concise. However, it is not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Tools
GitHub Check: codecov/patch

[warning] 74-74: agential/cog/self_refine/functional.py#L74
Added line #L74 was not covered by tests

Comment on lines +106 to +144
def _prompt_critique(
llm: BaseChatModel,
question: str,
examples: str,
answer: str,
prompt: str,
additional_keys: Dict[str, str] = {},
) -> str:
"""Requests critique from the language model based on a provided answer and contextual examples.

A critique prompt is constructed using the provided examples and answer.

Parameters:
llm (BaseChatModel): The language model to prompt for critique.
question (str): The question to be answered by the language model.
examples (str): Contextual examples related to the answer.
answer (str): The answer for which critique is being sought.
prompt (str): Prompt template string.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.

Returns:
str: The language model's critique, with no leading or trailing whitespace.
"""
prompt = _build_critique_prompt(
question=question,
examples=examples,
answer=answer,
prompt=prompt,
additional_keys=additional_keys,
)
out = llm(
[
HumanMessage(
content=prompt,
)
]
).content
assert isinstance(out, str)
return out.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _prompt_critique looks good, but add tests.

The function is correctly defined and the logic for requesting the critique is clear and concise. However, it is not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Comment on lines +147 to +175
def _build_refine_prompt(
question: str,
examples: str,
answer: str,
critique: str,
prompt: str,
additional_keys: Dict[str, str] = {},
) -> str:
"""Builds a refinement prompt.

Parameters:
llm (BaseChatModel): The language model to prompt for a response.
question (str): The question to be answered by the language model.
examples (str): Pre-formatted examples that provide context to the question.
critique (str): The critique on the answer.
prompt (str): Prompt template string.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.

Returns:
str: The language model's response to the question, trimmed of extraneous whitespace.
"""
prompt = PromptTemplate.from_template(prompt).format(
question=question,
examples=examples,
answer=answer,
critique=critique,
**additional_keys,
)
return prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _build_refine_prompt looks good, but add tests.

The function is correctly defined and the logic for building the refinement prompt is clear and concise. However, it is not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Comment on lines 178 to 219
def _prompt_refine(
llm: BaseChatModel,
question: str,
examples: str,
answer: str,
critique: str,
prompt: str,
additional_keys: Dict[str, str] = {},
) -> str:
"""Refines answer based on critique from the language model.

A refine prompt is constructed using the provided answer, examples, and critique.

Parameters:
llm (BaseChatModel): The language model to prompt for critique.
question (str): The question to be answered by the language model.
examples (str): Contextual examples related to the answer.
answer (str): The answer for which critique is being sought.
critique (str): The critique on the answer.
prompt (str): Prompt template string.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.

Returns:
str: The language model's critique, with no leading or trailing whitespace.
"""
prompt = _build_refine_prompt(
question=question,
examples=examples,
answer=answer,
critique=critique,
prompt=prompt,
additional_keys=additional_keys,
)
out = llm(
[
HumanMessage(
content=prompt,
)
]
).content
assert isinstance(out, str)
return out.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _prompt_refine looks good, but add tests.

The function is correctly defined and the logic for refining the answer is clear and concise. However, it is not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7918de5 and 50320c2.

Files selected for processing (3)
  • agential/cog/self_refine/agent.py (1 hunks)
  • agential/cog/self_refine/factory.py (1 hunks)
  • notebooks/self_refine.ipynb (1 hunks)
Additional context used
GitHub Check: codecov/patch
agential/cog/self_refine/agent.py

[warning] 7-7: agential/cog/self_refine/agent.py#L7
Added line #L7 was not covered by tests


[warning] 9-9: agential/cog/self_refine/agent.py#L9
Added line #L9 was not covered by tests


[warning] 11-12: agential/cog/self_refine/agent.py#L11-L12
Added lines #L11 - L12 were not covered by tests


[warning] 15-15: agential/cog/self_refine/agent.py#L15
Added line #L15 was not covered by tests


[warning] 29-29: agential/cog/self_refine/agent.py#L29
Added line #L29 was not covered by tests


[warning] 36-36: agential/cog/self_refine/agent.py#L36
Added line #L36 was not covered by tests


[warning] 38-39: agential/cog/self_refine/agent.py#L38-L39
Added lines #L38 - L39 were not covered by tests


[warning] 41-41: agential/cog/self_refine/agent.py#L41
Added line #L41 was not covered by tests


[warning] 45-45: agential/cog/self_refine/agent.py#L45
Added line #L45 was not covered by tests


[warning] 82-83: agential/cog/self_refine/agent.py#L82-L83
Added lines #L82 - L83 were not covered by tests


[warning] 85-85: agential/cog/self_refine/agent.py#L85
Added line #L85 was not covered by tests


[warning] 88-88: agential/cog/self_refine/agent.py#L88
Added line #L88 was not covered by tests


[warning] 90-90: agential/cog/self_refine/agent.py#L90
Added line #L90 was not covered by tests


[warning] 92-92: agential/cog/self_refine/agent.py#L92
Added line #L92 was not covered by tests


[warning] 100-100: agential/cog/self_refine/agent.py#L100
Added line #L100 was not covered by tests


[warning] 102-103: agential/cog/self_refine/agent.py#L102-L103
Added lines #L102 - L103 were not covered by tests


[warning] 106-106: agential/cog/self_refine/agent.py#L106
Added line #L106 was not covered by tests


[warning] 115-115: agential/cog/self_refine/agent.py#L115
Added line #L115 was not covered by tests


[warning] 117-117: agential/cog/self_refine/agent.py#L117
Added line #L117 was not covered by tests


[warning] 119-119: agential/cog/self_refine/agent.py#L119
Added line #L119 was not covered by tests

agential/cog/self_refine/factory.py

[warning] 3-3: agential/cog/self_refine/factory.py#L3
Added line #L3 was not covered by tests


[warning] 5-8: agential/cog/self_refine/factory.py#L5-L8
Added lines #L5 - L8 were not covered by tests


[warning] 15-15: agential/cog/self_refine/factory.py#L15
Added line #L15 was not covered by tests


[warning] 17-17: agential/cog/self_refine/factory.py#L17
Added line #L17 was not covered by tests


[warning] 29-29: agential/cog/self_refine/factory.py#L29
Added line #L29 was not covered by tests


[warning] 61-61: agential/cog/self_refine/factory.py#L61
Added line #L61 was not covered by tests


[warning] 76-76: agential/cog/self_refine/factory.py#L76
Added line #L76 was not covered by tests


[warning] 88-88: agential/cog/self_refine/factory.py#L88
Added line #L88 was not covered by tests


[warning] 91-92: agential/cog/self_refine/factory.py#L91-L92
Added lines #L91 - L92 were not covered by tests


[warning] 105-106: agential/cog/self_refine/factory.py#L105-L106
Added lines #L105 - L106 were not covered by tests

Additional comments not posted (2)
agential/cog/self_refine/agent.py (1)

117-119: Ensure consistency in state reset.

The reset method resets the agent's internal state by calling self.strategy.reset(). Ensure that all necessary state variables are correctly reset.

Tools
GitHub Check: codecov/patch

[warning] 117-117: agential/cog/self_refine/agent.py#L117
Added line #L117 was not covered by tests


[warning] 119-119: agential/cog/self_refine/agent.py#L119
Added line #L119 was not covered by tests

notebooks/self_refine.ipynb (1)

9-63: Ensure the imports are necessary and used.

The notebook imports several modules and constants. Ensure that all the imported modules and constants are used in the notebook.

Comment on lines 45 to 115
def generate(
self,
question: str,
examples: str,
prompt: str,
critique_examples: str,
critique_prompt: str,
refine_examples: str,
refine_prompt: str,
additional_keys: Dict[str, str] = {},
critique_additional_keys: Dict[str, str] = {},
refine_additional_keys: Dict[str, str] = {},
max_interactions: int = 3,
reset: bool = True,
) -> List[Dict[str, str]]:
"""Generates a refined solution for a given question through an iterative self-refinement process.

The process includes generating initial solutions, soliciting critique, and refining the solution
based on critique, repeated for a maximum number of attempts or until critique indicates satisfaction.

Args:
question (str): The question or problem to solve.
examples (str): Precedent examples to guide initial solution generation.
prompt (str): Instructional prompt for initial solution generation.
critique_examples (str): Precedent examples to guide critique generation.
critique_prompt (str): Instructional prompt for critique generation.
refine_examples (str): Precedent examples to guide solution refinement.
refine_prompt (str): Instructional prompt for refining the solution.
additional_keys (Dict[str, str]): Additional keys to format the prompt. Defaults to {}.
critique_additional_keys (Dict[str, str]): Additional keys to format the critique_prompt. Defaults to {}.
refine_additional_keys (Dict[str, str]): Additional keys to format the refine_prompt. Defaults to {}.
max_interactions (int): Maximum number of refinement iterations.
reset (bool): Resets the agent's state. Defaults to True.

Returns:
str: The final refined solution.
"""
if reset:
self.reset()

out = []

# Initial answer generation.
answer = self.strategy.generate(question, examples, prompt, additional_keys)

for _ in range(max_interactions):
# Generate critique.
critique = self.strategy.generate_critique(
question=question,
examples=critique_examples,
answer=answer,
prompt=critique_prompt,
additional_keys=critique_additional_keys,
)

out.append(self.strategy.create_output_dict(answer, critique))

if self.strategy.halting_condition():
break

# Improve answer based on critique.
answer = self.strategy.update_answer_based_on_critique(
question=question,
examples=refine_examples,
answer=answer,
critique=critique,
prompt=refine_prompt,
additional_keys=refine_additional_keys,
)

return out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method generate looks good, but add tests.

The method is well-defined and includes a reset option. However, there are several lines not covered by tests.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Tools
GitHub Check: codecov/patch

[warning] 45-45: agential/cog/self_refine/agent.py#L45
Added line #L45 was not covered by tests


[warning] 82-83: agential/cog/self_refine/agent.py#L82-L83
Added lines #L82 - L83 were not covered by tests


[warning] 85-85: agential/cog/self_refine/agent.py#L85
Added line #L85 was not covered by tests


[warning] 88-88: agential/cog/self_refine/agent.py#L88
Added line #L88 was not covered by tests


[warning] 90-90: agential/cog/self_refine/agent.py#L90
Added line #L90 was not covered by tests


[warning] 92-92: agential/cog/self_refine/agent.py#L92
Added line #L92 was not covered by tests


[warning] 100-100: agential/cog/self_refine/agent.py#L100
Added line #L100 was not covered by tests


[warning] 102-103: agential/cog/self_refine/agent.py#L102-L103
Added lines #L102 - L103 were not covered by tests


[warning] 106-106: agential/cog/self_refine/agent.py#L106
Added line #L106 was not covered by tests


[warning] 115-115: agential/cog/self_refine/agent.py#L115
Added line #L115 was not covered by tests

Comment on lines 91 to 119
@staticmethod
def get_fewshots(
benchmark: str, fewshot_type: str, **kwargs: Any
) -> Dict[str, str]:
"""Retrieve few-shot examples based on the benchmark.

Args:
benchmark (str): The benchmark name.
fewshot_type (str): The benchmark few-shot type.
**kwargs (Any): Additional arguments.

Returns:
Dict[str, str]: A dictionary of few-shot examples.
"""
if benchmark not in SELF_REFINE_FEWSHOTS:
raise ValueError(f"Benchmark '{benchmark}' few-shots not found for Self-Refine.")

if fewshot_type not in SELF_REFINE_BENCHMARK_FEWSHOTS[benchmark]:
raise ValueError(
f"Benchmark '{benchmark}' few-shot type not supported for Self-Refine."
)

benchmark_fewshots = BENCHMARK_FEWSHOTS[benchmark]

return {
"examples": benchmark_fewshots,
**SELF_REFINE_FEWSHOTS[benchmark]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method get_fewshots looks good, but add tests.

The method retrieves few-shot examples based on the benchmark and fewshot type. Ensure that test coverage is added.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Tools
GitHub Check: codecov/patch

[warning] 91-92: agential/cog/self_refine/factory.py#L91-L92
Added lines #L91 - L92 were not covered by tests


[warning] 105-106: agential/cog/self_refine/factory.py#L105-L106
Added lines #L105 - L106 were not covered by tests

Comment on lines 120 to 134
@staticmethod
def get_prompts(benchmark: str, **kwargs: Any) -> Dict[str, str]:
"""Retrieve the prompt instruction based on the benchmark.

Args:
benchmark (str): The benchmark name.
**kwargs (Any): Additional arguments.

Returns:
Dict[str, str]: A dictionary of prompt instructions.
"""
if benchmark not in SELF_REFINE_PROMPTS:
raise ValueError(f"Benchmark '{benchmark}' prompt not found for Self-Refine.")

return SELF_REFINE_PROMPTS[benchmark]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method get_prompts looks good, but add tests.

The method retrieves prompt instructions based on the benchmark. Ensure that test coverage is added.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Comment on lines 136 to 155
@staticmethod
def get_strategy(benchmark: str, **kwargs: Any) -> SelfRefineBaseStrategy:
"""Returns an instance of the appropriate Self-Refine strategy based on the provided benchmark.

Args:
benchmark (str): The benchmark name.
**kwargs (Any): Additional keyword arguments to pass to
the strategy's constructor.

Returns:
SelfRefineBaseStrategy: An instance of the appropriate Self-Refine strategy.
"""
if benchmark not in SELF_REFINE_STRATEGIES:
raise ValueError(f"Unsupported benchmark: {benchmark} for agent Self-Refine")

strategy = SELF_REFINE_STRATEGIES[benchmark]
if strategy is None:
raise ValueError(f"No strategy defined for benchmark: {benchmark}")

return strategy(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method get_strategy looks good, but add tests.

The method returns an instance of the appropriate Self-Refine strategy based on the provided benchmark. Ensure that test coverage is added.

Do you want me to generate the unit testing code or open a GitHub issue to track this task?

Comment on lines 145 to 170
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with 4933828. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\"\n",
"\n",
"agent = SelfRefineAgent(\n",
" llm=llm,\n",
" mode={\"math\": \"gsm8k\"},\n",
" patience=2\n",
")\n",
"\n",
"out = agent.generate(\n",
" question=question,\n",
" examples=GSM8K_FEWSHOT_EXAMPLES_POT,\n",
" prompt=SELF_REFINE_INSTRUCTION_GSM8K,\n",
" critique_examples=GSM8K_CRITIQUE_FEWSHOT_EXAMPLES,\n",
" critique_prompt=SELF_REFINE_CRITIQUE_INSTRUCTION_GSM8K,\n",
" refine_examples=GSM8K_REFINE_FEWSHOT_EXAMPLES,\n",
" refine_prompt=SELF_REFINE_REFINE_INSTRUCTION_GSM8K,\n",
" additional_keys={},\n",
" critique_additional_keys={},\n",
" refine_additional_keys={},\n",
" max_interactions=3,\n",
" reset=True\n",
")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for generate method call.

The generate method is called without any error handling. Consider adding error handling to ensure robustness.

try:
    out = agent.generate(
        question=question,
        examples=GSM8K_FEWSHOT_EXAMPLES_POT,
        prompt=SELF_REFINE_INSTRUCTION_GSM8K,
        critique_examples=GSM8K_CRITIQUE_FEWSHOT_EXAMPLES,
        critique_prompt=SELF_REFINE_CRITIQUE_INSTRUCTION_GSM8K,
        refine_examples=GSM8K_REFINE_FEWSHOT_EXAMPLES,
        refine_prompt=SELF_REFINE_REFINE_INSTRUCTION_GSM8K,
        additional_keys={},
        critique_additional_keys={},
        refine_additional_keys={},
        max_interactions=3,
        reset=True
    )
except Exception as e:
    print(f"Error generating output: {e}")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (2)
notebooks/self_refine.ipynb (2)

9-11: Organize imports and ensure no unused imports.

The current import block is extensive and includes multiple modules. Ensure all imported modules are used within the notebook.

- from agential.cog.fewshots.hotpotqa import (
-     HOTPOTQA_FEWSHOT_EXAMPLES_COT,
-     HOTPOTQA_FEWSHOT_EXAMPLES_DIRECT,
-     HOTPOTQA_FEWSHOT_EXAMPLES_REACT,
- )
- from agential.cog.fewshots.fever import (
-     FEVER_FEWSHOT_EXAMPLES_COT,
-     FEVER_FEWSHOT_EXAMPLES_DIRECT,
-     FEVER_FEWSHOT_EXAMPLES_REACT,
- )
- from agential.cog.fewshots.triviaqa import (
-     TRIVIAQA_FEWSHOT_EXAMPLES_COT,
-     TRIVIAQA_FEWSHOT_EXAMPLES_DIRECT,
-     TRIVIAQA_FEWSHOT_EXAMPLES_REACT,
- )
- from agential.cog.fewshots.ambignq import (
-     AMBIGNQ_FEWSHOT_EXAMPLES_COT,
-     AMBIGNQ_FEWSHOT_EXAMPLES_DIRECT,
-     AMBIGNQ_FEWSHOT_EXAMPLES_REACT,
- )
- from agential.cog.fewshots.gsm8k import (
-     GSM8K_FEWSHOT_EXAMPLES_POT,
- )
- from agential.cog.fewshots.svamp import (
-     SVAMP_FEWSHOT_EXAMPLES_POT,
- )
- from agential.cog.fewshots.tabmwp import (
-     TABMWP_FEWSHOT_EXAMPLES_POT,
- )
- from agential.cog.fewshots.humaneval import (
-     HUMANEVAL_FEWSHOT_EXAMPLES_POT,
- )
- from agential.cog.fewshots.mbpp import (
-     MBPP_FEWSHOT_EXAMPLES_POT,
-     MBPP_FEWSHOT_EXAMPLES_REACT
- )
+ from agential.cog.fewshots.hotpotqa import *
+ from agential.cog.fewshots.fever import *
+ from agential.cog.fewshots.triviaqa import *
+ from agential.cog.fewshots.ambignq import *
+ from agential.cog.fewshots.gsm8k import *
+ from agential.cog.fewshots.svamp import *
+ from agential.cog.fewshots.tabmwp import *
+ from agential.cog.fewshots.humaneval import *
+ from agential.cog.fewshots.mbpp import *

Also applies to: 13-47, 49-61


244-263: Discrepancy in Initialization Parameters for SelfRefineAgent

The SelfRefineAgent class in agential/cog/self_refine/agent.py does not have a patience parameter in its __init__ method. Instead, it uses **strategy_kwargs for additional arguments. Please review and update the initialization code accordingly.

  • SelfRefineAgent class __init__ method parameters:
    • llm: BaseChatModel
    • benchmark: str
    • **strategy_kwargs: Dict[str, Any]
Analysis chain

Ensure the agent initialization and output generation are correct.

The initialization of the SelfRefineAgent and the generation of outputs look correct. However, verify the correctness of the parameters passed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the initialization parameters for SelfRefineAgent are correct.

# Test: Search for the class definition and ensure the parameters match.
ast-grep --lang python --pattern $'class SelfRefineAgent {
  $$$
  def __init__(self, llm, benchmark, patience):
    $$$
}'

Length of output: 634


Script:

#!/bin/bash
# Description: Verify the initialization parameters for SelfRefineAgent are correct.

# Test: Search for the class definition and ensure the parameters match.
ast-grep --lang python --pattern 'class SelfRefineAgent:
  def __init__(self, llm, benchmark, patience):
    pass'

Length of output: 116


Script:

#!/bin/bash
# Description: Verify the initialization parameters for SelfRefineAgent are correct.

# Test: Search for the class definition and ensure the parameters match.
rg 'class SelfRefineAgent' -A 10

Length of output: 1071


Script:

#!/bin/bash
# Description: Verify the initialization parameters for SelfRefineAgent are correct.

# Test: Search for the class definition and ensure the parameters match.
rg 'class SelfRefineAgent' -A 20

Length of output: 1676

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 50320c2 and 2334c89.

Files selected for processing (14)
  • agential/cog/self_refine/init.py (1 hunks)
  • agential/cog/self_refine/agent.py (1 hunks)
  • agential/cog/self_refine/factory.py (1 hunks)
  • agential/cog/self_refine/functional.py (1 hunks)
  • agential/cog/self_refine/output.py (1 hunks)
  • agential/cog/self_refine/prompts.py (1 hunks)
  • agential/cog/self_refine/strategies/base.py (1 hunks)
  • agential/cog/self_refine/strategies/math.py (1 hunks)
  • notebooks/self_refine.ipynb (1 hunks)
  • tests/cog/self_refine/init.py (1 hunks)
  • tests/cog/self_refine/strategies/init.py (1 hunks)
  • tests/cog/self_refine/strategies/test_math.py (1 hunks)
  • tests/cog/self_refine/test_agent.py (1 hunks)
  • tests/cog/self_refine/test_functional.py (1 hunks)
Files not summarized due to errors (1)
  • notebooks/self_refine.ipynb: Error: Message exceeds token limit
Files skipped from review due to trivial changes (3)
  • agential/cog/self_refine/output.py
  • tests/cog/self_refine/init.py
  • tests/cog/self_refine/strategies/init.py
Files skipped from review as they are similar to previous changes (6)
  • agential/cog/self_refine/agent.py
  • agential/cog/self_refine/factory.py
  • agential/cog/self_refine/functional.py
  • agential/cog/self_refine/prompts.py
  • agential/cog/self_refine/strategies/base.py
  • agential/cog/self_refine/strategies/math.py
Additional comments not posted (21)
agential/cog/self_refine/__init__.py (1)

1-1: Module docstring added.

The addition of the module docstring provides a brief description of the module, which is a good practice for code documentation.

tests/cog/self_refine/test_agent.py (6)

18-23: Initialization test for SelfRefineAgent is correct.

The function correctly tests the initialization of the SelfRefineAgent with the expected parameters.


26-31: Reset functionality test for SelfRefineAgent is correct.

The function correctly tests if the reset method reverts the agent to its initial state.


34-82: Generate method test for SelfRefineAgent is comprehensive.

The function provides comprehensive test cases for the generate method, ensuring it produces the expected output.


Line range hint 45-87: Generate critique method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the generate_critique method, ensuring it produces the expected output.


Line range hint 90-96: Create output dict method test for SelfRefineMathStrategy is correct.

The function correctly tests if the create_output_dict method produces the expected output dictionary.


Line range hint 99-116: Update answer based on critique method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the update_answer_based_on_critique method, ensuring it updates the answer as expected.

tests/cog/self_refine/strategies/test_math.py (7)

19-27: Initialization test for SelfRefineMathStrategy is correct.

The function correctly tests the initialization of the SelfRefineMathStrategy with the expected parameters.


30-42: Generate method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the generate method, ensuring it produces the expected output.


45-87: Generate critique method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the generate_critique method, ensuring it produces the expected output.


90-96: Create output dict method test for SelfRefineMathStrategy is correct.

The function correctly tests if the create_output_dict method produces the expected output dictionary.


99-116: Update answer based on critique method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the update_answer_based_on_critique method, ensuring it updates the answer as expected.


119-129: Halting condition method test for SelfRefineMathStrategy is correct.

The function correctly tests if the halting_condition method works as expected.


132-143: Reset method test for SelfRefineMathStrategy is correct.

The function correctly tests if the reset method reverts the strategy to its initial state.

tests/cog/self_refine/test_functional.py (7)

19-27: Initialization test for SelfRefineMathStrategy is correct.

The function correctly tests the initialization of the SelfRefineMathStrategy with the expected parameters.


30-42: Generate method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the generate method, ensuring it produces the expected output.


45-87: Generate critique method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the generate_critique method, ensuring it produces the expected output.


90-96: Create output dict method test for SelfRefineMathStrategy is correct.

The function correctly tests if the create_output_dict method produces the expected output dictionary.


99-116: Update answer based on critique method test for SelfRefineMathStrategy is comprehensive.

The function provides comprehensive test cases for the update_answer_based_on_critique method, ensuring it updates the answer as expected.


119-129: Halting condition method test for SelfRefineMathStrategy is correct.

The function correctly tests if the halting_condition method works as expected.


132-143: Reset method test for SelfRefineMathStrategy is correct.

The function correctly tests if the reset method reverts the strategy to its initial state.

Comment on lines 145 to 199
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<PROMPT AGENT====================================================>\n",
"Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\n",
"# Python code, return answer\n",
"jason_lollipops_initial = 20\n",
"jason_lollipops_after = 12\n",
"denny_lollipops = jason_lollipops_initial - jason_lollipops_after\n",
"answer = denny_lollipops\n",
"\n",
"---\n",
"\n",
"Question: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\n",
"# Python code, return answer\n",
"trees_initial = 15\n",
"trees_after = 21\n",
"trees_added = trees_after - trees_initial\n",
"answer = trees_added\n",
"\n",
"---\n",
"\n",
"Question: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\n",
"# Python code, return answer\n",
"toys_initial = 5\n",
"mom_toys = 2\n",
"dad_toys = 2\n",
"total_received = mom_toys + dad_toys\n",
"total_toys = toys_initial + total_received\n",
"answer = total_toys\n",
"\n",
"---\n",
"\n",
"Question: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\n",
"# Python code, return answer\n",
"computers_initial = 9\n",
"computers_per_day = 5\n",
"num_days = 4 # 4 days between monday and thursday\n",
"computers_added = computers_per_day * num_days\n",
"computers_total = computers_initial + computers_added\n",
"answer = computers_total\n",
"\n",
"---\n",
"\n",
"Question: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\n",
"# Python code, return answer\n",
"golf_balls_initial = 58\n",
"golf_balls_lost_tuesday = 23\n",
"golf_balls_lost_wednesday = 2\n",
"golf_balls_left = golf_balls_initial - golf_balls_lost_tuesday - golf_balls_lost_wednesday\n",
"answer = golf_balls_left\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the logic errors in the example solutions.

The solutions provided for some examples contain logical errors. Ensure the code aligns with the problem statements.

-  question: Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with 4933828. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
-  # Python code, return answer
-  eggs_laid_per_day = 16
-  eggs_for_breakfast = 3
-  eggs_for_muffins = 4933828
-  eggs_remaining = eggs_laid_per_day - eggs_for_breakfast - eggs_for_muffins
-  money_per_egg = 2
-  money_made_per_day = eggs_remaining * money_per_egg
-  answer = money_made_per_day
+  # Corrected code
+  eggs_laid_per_day = 16
+  eggs_for_breakfast = 3
+  eggs_for_muffins = 0  # Correct value for muffins
+  eggs_remaining = eggs_laid_per_day - eggs_for_breakfast - eggs_for_muffins
+  money_per_egg = 2
+  money_made_per_day = eggs_remaining * money_per_egg
+  answer = money_made_per_day
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<PROMPT AGENT====================================================>\n",
"Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\n",
"# Python code, return answer\n",
"jason_lollipops_initial = 20\n",
"jason_lollipops_after = 12\n",
"denny_lollipops = jason_lollipops_initial - jason_lollipops_after\n",
"answer = denny_lollipops\n",
"\n",
"---\n",
"\n",
"Question: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\n",
"# Python code, return answer\n",
"trees_initial = 15\n",
"trees_after = 21\n",
"trees_added = trees_after - trees_initial\n",
"answer = trees_added\n",
"\n",
"---\n",
"\n",
"Question: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\n",
"# Python code, return answer\n",
"toys_initial = 5\n",
"mom_toys = 2\n",
"dad_toys = 2\n",
"total_received = mom_toys + dad_toys\n",
"total_toys = toys_initial + total_received\n",
"answer = total_toys\n",
"\n",
"---\n",
"\n",
"Question: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\n",
"# Python code, return answer\n",
"computers_initial = 9\n",
"computers_per_day = 5\n",
"num_days = 4 # 4 days between monday and thursday\n",
"computers_added = computers_per_day * num_days\n",
"computers_total = computers_initial + computers_added\n",
"answer = computers_total\n",
"\n",
"---\n",
"\n",
"Question: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\n",
"# Python code, return answer\n",
"golf_balls_initial = 58\n",
"golf_balls_lost_tuesday = 23\n",
"golf_balls_lost_wednesday = 2\n",
"golf_balls_left = golf_balls_initial - golf_balls_lost_tuesday - golf_balls_lost_wednesday\n",
"answer = golf_balls_left\n",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<PROMPT AGENT====================================================>\n",
"Question: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\n",
"# Python code, return answer\n",
"jason_lollipops_initial = 20\n",
"jason_lollipops_after = 12\n",
"denny_lollipops = jason_lollipops_initial - jason_lollipops_after\n",
"answer = denny_lollipops\n",
"\n",
"---\n",
"\n",
"Question: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\n",
"# Python code, return answer\n",
"trees_initial = 15\n",
"trees_after = 21\n",
"trees_added = trees_after - trees_initial\n",
"answer = trees_added\n",
"\n",
"---\n",
"\n",
"Question: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\n",
"# Python code, return answer\n",
"toys_initial = 5\n",
"mom_toys = 2\n",
"dad_toys = 2\n",
"total_received = mom_toys + dad_toys\n",
"total_toys = toys_initial + total_received\n",
"answer = total_toys\n",
"\n",
"---\n",
"\n",
"Question: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\n",
"# Python code, return answer\n",
"computers_initial = 9\n",
"computers_per_day = 5\n",
"num_days = 4 # 4 days between monday and thursday\n",
"computers_added = computers_per_day * num_days\n",
"computers_total = computers_initial + computers_added\n",
"answer = computers_total\n",
"\n",
"---\n",
"\n",
"Question: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\n",
"# Python code, return answer\n",
"golf_balls_initial = 58\n",
"golf_balls_lost_tuesday = 23\n",
"golf_balls_lost_wednesday = 2\n",
"golf_balls_left = golf_balls_initial - golf_balls_lost_tuesday - golf_balls_lost_wednesday\n",
"answer = golf_balls_left\n",
"\n",
"---\n",
"\n",
"Question: Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with 4933828. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\n",
"# Python code, return answer\n",
"eggs_laid_per_day = 16\n",
"eggs_for_breakfast = 3\n",
"eggs_for_muffins = 0 # Correct value for muffins\n",
"eggs_remaining = eggs_laid_per_day - eggs_for_breakfast - eggs_for_muffins\n",
"money_per_egg = 2\n",
"money_made_per_day = eggs_remaining * money_per_egg\n",
"answer = money_made_per_day\n",

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2334c89 and ef89e2a.

Files selected for processing (1)
  • agential/cog/self_refine/prompts.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/self_refine/prompts.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ef89e2a and aeb9975.

Files selected for processing (1)
  • agential/cog/self_refine/prompts.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/self_refine/prompts.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between aeb9975 and 070e43b.

Files selected for processing (1)
  • agential/cog/self_refine/prompts.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/self_refine/prompts.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 070e43b and 419e2e3.

Files selected for processing (10)
  • agential/cog/self_refine/agent.py (1 hunks)
  • agential/cog/self_refine/factory.py (1 hunks)
  • agential/cog/self_refine/functional.py (1 hunks)
  • agential/cog/self_refine/output.py (1 hunks)
  • agential/cog/self_refine/prompts.py (1 hunks)
  • agential/cog/self_refine/strategies/math.py (1 hunks)
  • notebooks/self_refine.ipynb (1 hunks)
  • tests/cog/self_refine/strategies/test_math.py (1 hunks)
  • tests/cog/self_refine/test_agent.py (1 hunks)
  • tests/cog/self_refine/test_functional.py (1 hunks)
Files skipped from review as they are similar to previous changes (10)
  • agential/cog/self_refine/agent.py
  • agential/cog/self_refine/factory.py
  • agential/cog/self_refine/functional.py
  • agential/cog/self_refine/output.py
  • agential/cog/self_refine/prompts.py
  • agential/cog/self_refine/strategies/math.py
  • notebooks/self_refine.ipynb
  • tests/cog/self_refine/strategies/test_math.py
  • tests/cog/self_refine/test_agent.py
  • tests/cog/self_refine/test_functional.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 419e2e3 and ea54d38.

Files selected for processing (3)
  • agential/cog/self_refine/factory.py (1 hunks)
  • agential/cog/self_refine/prompts.py (1 hunks)
  • tests/cog/self_refine/test_agent.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • agential/cog/self_refine/factory.py
  • agential/cog/self_refine/prompts.py
  • tests/cog/self_refine/test_agent.py

@alckasoc alckasoc merged commit be7526c into main Jul 12, 2024
2 checks passed
@alckasoc alckasoc deleted the self_refine branch July 12, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants