Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardizing Critique Prompts for CRITIC Benchmarks #187

Merged
merged 61 commits into from
Jul 2, 2024

Conversation

tedasdf
Copy link
Collaborator

@tedasdf tedasdf commented Jun 23, 2024

Issue

Topic: Standardizing Critique Prompts for CRITIC Benchmarks

For CRITIC, we need to craft critique prompts distinct from few-shot examples. This distinction is essential for each benchmark according to the criteria outlined in Figure 1:

  • The number of critique few-shot examples should equal the number of few-shot examples for each benchmark.
  • Each critique few-shot example within a benchmark must use the same question as every example in the few-shot examples.

The presence of a green checkmark in the "Matches Figure 1" column indicates both criteria are satisfied. The "Issues" column highlights discrepancies between the two sets of benchmark few-shot examples.

Approach to Generating Examples

I generate examples with the assistance of ChatGPT:

  • I initially consult ChatGPT to brainstorm initial thoughts on the question.
  • Using LangChain or similar tools, I perform Wikipedia searches based on the benchmark.
  • Based on the results from these tools, I formulate the observation statement.
  • For mathematical benchmarks, I verify the calculations using a calculator.
  • For query benchmarks, I validate the queries using the wikiquery function.

Proposed Changes

To address these discrepancies effectively:

For benchmarks where num few-shots > num critique few-shots:

  • Increase the number of critique examples.
  • Ensure all critique examples match the few-shot examples in terms of ordering and question usage.

For benchmarks where num few-shots == num critique few-shots AND no checkmark above (Table 1):

  • Replace existing critique examples with new ones.
  • Maintain consistent ordering with few-shot examples.

For benchmarks where num few-shots < num critique few-shots:

  • Reduce the number of critique examples.
  • Ensure all examples maintain consistent question usage and ordering.

This PR aims to standardize critique prompts to align with the requirements outlined in Figure 1, ensuring coherence and accuracy in the CRITIC benchmarks.

Summary by CodeRabbit

  • New Features

    • Added new question-answer pairs and examples across various benchmarks, including AmbignQ, HotpotQA, SVAMP, TABMWP, and TriviaQA.
    • Introduced new functions for specific tasks, such as checking bit positions and identifying long words in strings.
  • Bug Fixes

    • Corrected logical errors in the is_not_prime function.
  • Enhancements

    • Updated existing few-shot examples with more detailed information and refined responses.
    • Enhanced problem-solving examples with structured approaches and Python code snippets for better clarity.

Copy link
Contributor

coderabbitai bot commented Jun 23, 2024

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

Files selected but had no reviewable changes (1)
  • agential/cog/prompts/agent/critic.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The modifications introduce enhancements across multiple benchmarking scripts related to problem-solving, Q&A examples, and Python code snippets. These changes focus on adding new examples, refining existing responses, fixing bugs, and introducing new functionality to improve the variety and accuracy of the examples. This enriches the datasets and ensures more robust and comprehensive training and testing capabilities.

Changes

Files/Groups Change Summaries
.../benchmark/ambignq.py Added new QA pairs, updated existing answers with detailed info, and refined overall responses.
.../benchmark/hotpotqa.py Added new few-shot examples in formats: CoT, Direct, React and modified existing CoT examples.
.../benchmark/mbpp.py Corrected is_not_prime function, added functions differ_at_one_bit_pos(a, b) and find_char_long(s).
.../benchmark/svamp.py Added new few-shot examples with Python code snippets for math problems, expanded existing examples.
.../benchmark/tabmwp.py Introduced new few-shot examples with tables, included questions requiring Python code solutions, replaced previous examples.
.../benchmark/triviaqa.py Added new few-shot examples in styles: CoT, Direct, React with questions on historical events and informative answers.

Poem

In bytes and whispers coding grows,
New questions bloom where logic flows,
From Hotpot's thoughts to Trivia's lore,
Each line refined to learn much more.
Bugs squashed, new functions rise,
Solutions gleam before our eyes.
🐰✨🌟


Tip

Early access features: enabled

We are currently testing the following features in early access:

  • OpenAI gpt-4o model for code reviews and chat: OpenAI claims that this model is better at understanding and generating code than the previous models. We seek your feedback over the next few weeks before making it generally available.

Note:

  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.
  • Please join our Discord Community to provide feedback and report issues.
  • OSS projects are currently opted into early access features by default.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Jun 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
agential/cog/agent/reflexion.py 90.24% <ø> (ø)
agential/cog/prompts/agent/critic.py 100.00% <ø> (ø)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a6b6d41 and 3af88b3.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (1 hunks)
Additional comments not posted (1)
agential/cog/prompts/agent/critic.py (1)

113-137: Verify the accuracy and plausibility of the new constant's content.

The new constant AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC_TESTING contains a detailed critique of a proposed answer about men's figure skating in the Olympics. It's crucial to ensure that the historical facts presented are accurate and that the logic of the critique aligns with the intended educational or informational purpose of the content.

Verification successful

Historical Accuracy of AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC_TESTING Verified

The historical claims in the constant AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC_TESTING have been verified:

  • Claim 1: Men's figure skating has been part of the Summer Olympics in 1908 and 1920. It is no longer a Summer Olympic sport.
  • Claim 2: Men's figure skating was part of the 1908 Summer Olympics and has been part of the Winter Olympics since 1924.

The provided information in the constant is accurate and aligns with historical records.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the historical accuracy of the content in the new constant.

# Test: Search for historical references online. Expect: Accurate matches with the content.
echo "Please manually verify the historical accuracy of the content in AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC_TESTING by consulting reliable historical sources or databases."

Length of output: 330



Script:

#!/bin/bash
# Description: Verify historical accuracy of figure skating in the Olympics.

# Verify claim 1: Men's figure skating has never been a summer Olympic sport.
echo "Verifying claim 1: Men's figure skating has never been a summer Olympic sport."
curl -s "https://en.wikipedia.org/wiki/Figure_skating_at_the_Olympic_Games" | grep -i "summer"

# Verify claim 2: Men's figure skating was part of the 1908 Summer Olympics and has been part of the Winter Olympics since 1924.
echo "Verifying claim 2: Men's figure skating was part of the 1908 Summer Olympics and has been part of the Winter Olympics since 1924."
curl -s "https://en.wikipedia.org/wiki/Figure_skating_at_the_Olympic_Games" | grep -i "1908"
curl -s "https://en.wikipedia.org/wiki/Figure_skating_at_the_Olympic_Games" | grep -i "1924"

Length of output: 39825

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3af88b3 and 3915514.

Files selected for processing (2)
  • agential/cog/prompts/agent/critic.py (2 hunks)
  • test.ipynb (1 hunks)
Files skipped from review due to trivial changes (1)
  • test.ipynb
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/prompts/agent/critic.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (2)
agential/cog/prompts/benchmark/ambignq.py (2)

Line range hint 8-41: Ensure Consistency in Capitalization and Punctuation

The questions and answers in the AMBIGNQ_FEWSHOT_EXAMPLES_COT constant should maintain consistent capitalization and punctuation. For instance, "washington" should be capitalized as "Washington", and each entry should end with proper punctuation for clarity and professionalism.

- Q: When is the date the last time washington won stanley cup?
+ Q: When is the date the last time Washington won Stanley Cup?

Line range hint 86-158: Improve Clarity and Completeness in Reactive Examples

The AMBIGNQ_FEWSHOT_EXAMPLES_REACT constant contains a detailed step-by-step search process. However, some entries are incomplete, such as missing observations or thoughts. It's crucial to complete these sections to provide a full context and enhance the educational value of the examples.

- Thought 2:
+ Thought 2: The evidence clearly supports the 2018 victory date.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3915514 and 8013d7c.

Files selected for processing (2)
  • agential/cog/prompts/agent/critic.py (2 hunks)
  • agential/cog/prompts/benchmark/ambignq.py (6 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/prompts/agent/critic.py
Additional comments not posted (1)
agential/cog/prompts/benchmark/ambignq.py (1)

Line range hint 48-81: Check for Consistency and Clarity in Direct Examples

The AMBIGNQ_FEWSHOT_EXAMPLES_DIRECT constant contains concise examples. Ensure that all examples are clear and direct, maintaining a uniform format across all entries. This will enhance readability and ease of understanding.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8013d7c and 2279887.

Files selected for processing (1)
  • agential/cog/prompts/benchmark/hotpotqa.py (1 hunks)
Files not reviewed due to errors (1)
  • agential/cog/prompts/benchmark/hotpotqa.py (no review received)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2279887 and d44c589.

Files selected for processing (1)
  • agential/cog/prompts/benchmark/ambignq.py (6 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/prompts/benchmark/ambignq.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d44c589 and ff91ece.

Files selected for processing (6)
  • agential/cog/prompts/agent/critic.py (6 hunks)
  • agential/cog/prompts/benchmark/gsm8k.py (1 hunks)
  • agential/cog/prompts/benchmark/mbpp.py (2 hunks)
  • agential/cog/prompts/benchmark/svamp.py (1 hunks)
  • agential/cog/prompts/benchmark/tabmwp.py (1 hunks)
  • agential/cog/prompts/benchmark/triviaqa.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/prompts/agent/critic.py
Additional comments not posted (10)
agential/cog/prompts/benchmark/triviaqa.py (3)

3-15: Well-structured detailed few-shot examples for TriviaQA.

The examples provided are comprehensive and align well with the intent to enhance the dataset with more contextual information. Great job on ensuring that each answer is detailed and informative.


17-29: Concise few-shot examples for TriviaQA.

The concise format of these examples is well-executed, providing straightforward answers to the questions. This format is useful for quick reference and aligns with the PR's goal of expanding the dataset.


31-55: Interactive and engaging few-shot examples for TriviaQA.

The REACT format effectively demonstrates a thought process and actions, which can be highly beneficial for training AI models to understand and generate human-like reasoning. This addition is innovative and aligns well with the PR's objectives.

agential/cog/prompts/benchmark/tabmwp.py (1)

3-102: Comprehensive and educational few-shot examples involving tables and Python code.

These examples are well-crafted to demonstrate how to interpret table data and write Python code to solve specific problems. This approach not only aligns with the PR's objectives but also adds significant educational value to the dataset.

agential/cog/prompts/benchmark/svamp.py (1)

3-44: Clear and correctly implemented few-shot examples for solving word problems.

The Python code snippets are correctly implemented and solve the word problems as stated. This addition enhances the dataset by providing practical examples of how to apply Python programming to solve real-world problems.

agential/cog/prompts/benchmark/mbpp.py (1)

Line range hint 1-91: Well-implemented Python programming tasks with accurate solutions.

The tasks are clearly defined, and the solutions are correctly implemented, including the corrected is_not_prime function. These additions and corrections significantly enhance the dataset's educational value for Python programmers.

agential/cog/prompts/benchmark/gsm8k.py (4)

3-18: Well-structured and correct calculation for wages and taxes.

The code is clear, variables are well-named, and the logic accurately reflects the problem statement. Good job on this example!


21-36: Correct and well-documented algebraic solution.

The step-by-step comments enhance readability and understanding. The logic and calculations are accurate.


39-50: Accurate calculation of uneaten pizza pieces.

This example clearly demonstrates the use of arithmetic operations to solve the problem. Variables are descriptive and the logic is sound.


53-71: Effective use of mathematical functions for cost calculation.

This example is well-implemented with a practical application of math.ceil and decision-making to find the cheapest option. The calculations and logic are correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ff91ece and 1b468c8.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (7 hunks)
Additional comments not posted (2)
agential/cog/prompts/agent/critic.py (2)

113-113: Ensure correct naming convention and usage of the constant.

The constant AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC is well-named, following Python uppercase convention for constants. It appears to be used correctly in the context of providing examples for critique. Ensure it's referenced correctly wherever used in the codebase.


113-113: Review the content for accuracy and clarity.

The examples provided under AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC are clear and seem logically consistent with the expected output. However, ensure that the factual information, especially historical or sports-related data, is accurate to avoid spreading misinformation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1b468c8 and 9ebddbc.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (7 hunks)
Files skipped from review as they are similar to previous changes (1)
  • agential/cog/prompts/agent/critic.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9ebddbc and 9b57be6.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (7 hunks)
Additional comments not posted (1)
agential/cog/prompts/agent/critic.py (1)

Line range hint 112-119: New constant and examples added for AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC.

The newly added constant and examples for critiquing the plausibility and truthfulness of answers are well-integrated. The examples are detailed and cover different scenarios, which should enhance the model's ability to critique responses effectively.

  • Correctness: The examples provided follow a consistent format and logically address the critique points.
  • Maintainability: The structured format makes it easy to understand and maintain. Future additions or modifications can be made with minimal effort.
  • Performance: Since this is primarily static content for instructional purposes, performance concerns are minimal.

Overall, this addition should enrich the critique capabilities of the model with specific focus areas like plausibility and truthfulness.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9b57be6 and 2a0b449.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (7 hunks)
Additional comments not posted (2)
agential/cog/prompts/agent/critic.py (2)

113-113: Ensure proper usage and documentation for the new constant AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC.

The new constant AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC has been added to hold examples specifically for critiquing the topic of men's figure skating as a summer Olympic sport. It's crucial to ensure that this constant is properly used throughout the application and is well-documented to inform other developers of its purpose and usage context.


113-113: Verify the accuracy and clarity of the new examples added.

The newly added examples under AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC provide detailed critique scenarios. It is essential to verify that these examples are accurate, clear, and provide meaningful insights into the critique process. Additionally, check that these examples align with the intended educational or functional purpose of the module.

Verification successful

The newly added examples under AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC are accurate and clear. They provide meaningful insights into the critique process by evaluating the plausibility and truthfulness of the proposed answers.

  • The example involving the question about men's figure skating as a summer Olympic sport is well-constructed and aligns with the module's educational purpose.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check for consistency and clarity in the examples provided for critiquing men's figure skating as a summer Olympic sport.

# Test: Search for additional references or documentation related to the examples.
rg --type py 'men\'s figure skating as a summer Olympic sport' --context 5

Length of output: 93



Script:

#!/bin/bash
# Corrected script to search for references or documentation related to the examples.
rg --type py 'men\'s figure skating as a summer Olympic sport' --context 5

# Search for the definition of AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC and display its contents.
rg 'AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC' --context 10

Length of output: 93



Script:

#!/bin/bash
# Corrected script to search for references or documentation related to the examples.
rg --type py "men's figure skating as a summer Olympic sport" --context 5

# Search for the definition of AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC and display its contents.
rg 'AMBIGNQ_FEWSHOT_EXAMPLES_CRITIC' --context 10

Length of output: 4086

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4d6708e and 7a60ddc.

Files selected for processing (2)
  • agential/cog/prompts/benchmark/mbpp.py (1 hunks)
  • agential/cog/prompts/benchmark/tabmwp.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • agential/cog/prompts/benchmark/mbpp.py
  • agential/cog/prompts/benchmark/tabmwp.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7a60ddc and 0b24502.

Files selected for processing (2)
  • agential/cog/prompts/benchmark/tabmwp.py (1 hunks)
  • agential/cog/prompts/benchmark/triviaqa.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • agential/cog/prompts/benchmark/tabmwp.py
  • agential/cog/prompts/benchmark/triviaqa.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 0b24502 and 4825e75.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (9 hunks)
Files not summarized due to errors (1)
  • agential/cog/prompts/agent/critic.py: Error: Message exceeds token limit
Additional comments not posted (9)
agential/cog/prompts/agent/critic.py (9)

277-278: Typographical Error: Inconsistent quotation marks.

Ensure consistent use of quotation marks for the question and answer strings.

- Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
- Proposed Answer: Let's think step by step. The eastern sector of the Colorado orogeny extends into an area with an elevation range of 1,800 to 7,000 ft. So the answer is: 1,800 to 7,000 ft.
+ Question: "What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?"
+ Proposed Answer: "Let's think step by step. The eastern sector of the Colorado orogeny extends into an area with an elevation range of 1,800 to 7,000 ft. So the answer is: 1,800 to 7,000 ft."

307-308: Typographical Error: Inconsistent quotation marks.

Ensure consistent use of quotation marks for the question and answer strings.

- Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?
- Proposed Answer: Let's think step by step. Allie Goertz wrote a song about Milhouse. Milhouse is a character from "The Simpsons" created by Matt Groening. Matt Groening named Milhouse after Richard Nixon. So the answer is: Richard Nixon.
+ Question: "Musician and satirist Allie Goertz wrote a song about the 'The Simpsons' character Milhouse, who Matt Groening named after who?"
+ Proposed Answer: "Let's think step by step. Allie Goertz wrote a song about Milhouse. Milhouse is a character from 'The Simpsons' created by Matt Groening. Matt Groening named Milhouse after Richard Nixon. So the answer is: Richard Nixon."

337-338: Typographical Error: Inconsistent quotation marks.

Ensure consistent use of quotation marks for the question and answer strings.

- Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?
- Proposed Answer: Let's think step by step. Adam Clayton Powell is a name associated with an American politician and civil rights leader. The Saimaa Gesture sounds more like a title related to Finnish culture. Therefore, The Saimaa Gesture is about Finnish rock groups. So the answer is: The Saimaa Gesture.
+ Question: "Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?"
+ Proposed Answer: "Let's think step by step. Adam Clayton Powell is a name associated with an American politician and civil rights leader. The Saimaa Gesture sounds more like a title related to Finnish culture. Therefore, The Saimaa Gesture is about Finnish rock groups. So the answer is: The Saimaa Gesture."

367-368: Typographical Error: Inconsistent quotation marks.

Ensure consistent use of quotation marks for the question and answer strings.

- Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?
- Proposed Answer: Let's think step by step. Adam Clayton Powell is a name associated with an American politician and civil rights leader. The Saimaa Gesture sounds more like a title related to Finnish culture. Therefore, The Saimaa Gesture is about Finnish rock groups. So the answer is: The Saimaa Gesture.
+ Question: "Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?"
+ Proposed Answer: "Let's think step by step. Adam Clayton Powell is a name associated with an American politician and civil rights leader. The Saimaa Gesture sounds more like a title related to Finnish culture. Therefore, The Saimaa Gesture is about Finnish rock groups. So the answer is: The Saimaa Gesture."

397-398: Typographical Error: Inconsistent quotation marks.

Ensure consistent use of quotation marks for the question and answer strings.

- Question: What profession does Nicholas Ray and Elia Kazan have in common?
- Proposed Answer: Let's think step by step. Nicholas Ray and Elia Kazan both worked as directors, screenwriters, and actors. So the answer is: director, screenwriter, actor.
+ Question: "What profession does Nicholas Ray and Elia Kazan have in common?"
+ Proposed Answer: "Let's think step by step. Nicholas Ray and Elia Kazan both worked as directors, screenwriters, and actors. So the answer is: director, screenwriter, actor."

425-426: Typographical Error: Inconsistent quotation marks.

Ensure consistent use of quotation marks for the question and answer strings.

- Question: Which magazine was started first, Arthur's Magazine or First for Women?
- Proposed Answer: Let's think step by step. Arthur's Magazine was first published in 1890. First for Women was first published in 1989. 1890 (Arthur's Magazine) < 1989 (First for Women), so First for Women was started first. So the answer is:First for Women.
+ Question: "Which magazine was started first, Arthur's Magazine or First for Women?"
+ Proposed Answer: "Let's think step by step. Arthur's Magazine was first published in 1890. First for Women was first published in 1989. 1890 (Arthur's Magazine) < 1989 (First for Women), so First for Women was started first. So the answer is: First for Women."

251-251: Typographical Error: Missing closing quotation mark.

The string should be enclosed in triple quotes for consistency.

- Claim: Nikolaj Coster-Waldau worked with the Fox Broadcasting Company.
+ Claim: Nikolaj Coster-Waldau worked with the Fox Broadcasting Company."""

Likely invalid or redundant comment.


889-889: Typographical Error: Missing closing quotation mark.

The string should be enclosed in triple quotes for consistency.

- Question: Marco and his dad went strawberry picking. Marco's dad's strawberries weighed 11 pounds. If together their strawberries weighed 30 pounds. How much did Marco's strawberries weigh?
+ Question: "Marco and his dad went strawberry picking. Marco's dad's strawberries weighed 11 pounds. If together their strawberries weighed 30 pounds. How much did Marco's strawberries weigh?"

Likely invalid or redundant comment.


950-950: Typographical Error: Missing closing quotation mark.

The string should be enclosed in triple quotes for consistency.

- Question: Each cow pasture has 10 sprinklers. How many sprinklers are in 6 cow pastures?
+ Question: "Each cow pasture has 10 sprinklers. How many sprinklers are in 6 cow pastures?"

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4825e75 and 5abbbc9.

Files selected for processing (1)
  • agential/cog/prompts/agent/critic.py (10 hunks)
Files not summarized due to errors (1)
  • agential/cog/prompts/agent/critic.py: Error: Message exceeds token limit
Additional comments not posted (1)
agential/cog/prompts/agent/critic.py (1)

277-277: Typographical Error: Inconsistent quotation marks.

The string should be enclosed in triple quotes for consistency with other examples.

- HOTPOTQA_FEWSHOT_EXAMPLES_CRITIC = """Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
+ HOTPOTQA_FEWSHOT_EXAMPLES_CRITIC = """Question: "What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?"

Likely invalid or redundant comment.

Copy link
Member

@alckasoc alckasoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LET'S FUCKING GO!!!!!!!!!!

@alckasoc alckasoc merged commit 601d318 into main Jul 2, 2024
4 checks passed
@alckasoc alckasoc deleted the tedasdf/adding_examples branch July 2, 2024 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants