Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PDF file uploads as context for LLM queries #3638

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

andrewwan0131
Copy link

Why are these changes needed?

These changes enable users to upload PDF files as context for LLM queries.

Changes made

  1. Added PDF file handling capabilities:

    • Implemented PDF file upload support in the web interface
    • Added PDF text extraction functionality
    • Integrated extracted PDF content as context for LLM queries
  2. Modified relevant files:

    • Updated gradio web server components to handle PDF uploads
    • Added PDF processing utilities
    • Enhanced chat protocol to include document context

Checks

  • I've tested the PDF upload and context integration with various document types by running Chatbot Arena locally

Copy link
Member

@infwinston infwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrewwan0131 left some comments!

Copy link
Member

@infwinston infwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks more comments

@CodingWithTim CodingWithTim self-requested a review December 26, 2024 23:16
@CodingWithTim CodingWithTim self-assigned this Dec 26, 2024
@CodingWithTim
Copy link
Collaborator

CodingWithTim commented Dec 30, 2024

@andrewwan0131 @PranavB-11 I resolved the old comments because it is no longer relevant. We can start commenting this new code as it is pretty different from before. The pdfchat is now operational, I will extensively test it and improve it next.

Next steps:

  1. Fix some existing UI issues which is bothering me at the moment.
  2. Integrate our language detection code into parse_pdf.
  3. Add pdf moderator.

Copy link
Member

@infwinston infwinston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @CodingWithTim ! left some quick comments

@CodingWithTim
Copy link
Collaborator

71 files changed?? 😭😭

@PranavB-11
Copy link
Collaborator

ohhh it was the formatting commit, it added a billion spaces to every file

This reverts commit 0955a76.
Copy link
Collaborator

@CodingWithTim CodingWithTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewwan0131 @PranavB-11 @yixin-huang1 Great work guys! I only fixed a few small bugs and cleaned up the logics. Everything now works!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to remove this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was accidentally created when I pushed the Black formatting commit so we reverted the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants