Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: update AzureOCRDocumentConverter to not use the dataframe field for tabular Documents #8885

Merged
merged 8 commits into from
Mar 3, 2025

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Feb 20, 2025

Related Issues

Proposed Changes:

Removes saving of table documents as dataframes. Instead saves them as CSV strings.

How did you test it?

Updated tests to reflect that we check for csv formatted tables now.

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@github-actions github-actions bot added the type:documentation Improvements on the docs label Feb 20, 2025
@coveralls
Copy link
Collaborator

coveralls commented Feb 20, 2025

Pull Request Test Coverage Report for Build 13625493255

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 20 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.01%) to 89.971%

Files with Coverage Reduction New Missed Lines %
components/converters/azure.py 20 89.2%
Totals Coverage Status
Change from base Build 13589137191: -0.01%
Covered Lines: 9563
Relevant Lines: 10629

💛 - Coveralls

@sjrl sjrl marked this pull request as ready for review February 24, 2025 08:07
@sjrl sjrl requested review from a team as code owners February 24, 2025 08:07
@sjrl sjrl requested review from dfokina and anakin87 and removed request for a team February 24, 2025 08:07
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

nit: we can change the title of this PR to something like "refactor!: make AzureOCRDocumentConverter not use the dataframe field for tabular Documents"

@sjrl sjrl changed the title chore: deprecate dataframe in AzureOCRDocumentConverter refactor!: update AzureOCRDocumentConverter to not use the dataframe field for tabular Documents Mar 3, 2025
@sjrl sjrl merged commit 52a0282 into main Mar 3, 2025
19 checks passed
@sjrl sjrl deleted the remove-df-doc-azure-ocr branch March 3, 2025 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor AzureOCRDocumentConverter to not produce Documents with the dataframe field
3 participants