feat: Enhance error handling in Azure document embedder #8941

mdrazak2001 · 2025-02-26T17:52:08Z

Related Issues

fixes AzureOpenAIDocumentEmbedder fails entire run when one document throws error #8869

Proposed Changes:

Add error handling in _embed_batch to continue processing remaining documents
Log failed embeddings with batch range information
Match error handling behavior with OpenAIDocumentEmbedder
Add unit tests for graceful error handling

How did you test it?

Added unit tests for graceful error handling in the AzureOpenAIDocumentEmbedder class

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2025-03-03T08:48:13Z

Pull Request Test Coverage Report for Build 13550304131

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
18 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.07%) to 90.047%

Files with Coverage Reduction	New Missed Lines	%
components/embedders/azure_document_embedder.py	18	69.74%

Totals
Change from base Build 13542952701:	0.07%
Covered Lines:	9581
Relevant Lines:	10640

💛 - Coveralls

anakin87

Thanks for the contribution!

I found a small opportunity for improving this PR.

anakin87 · 2025-03-03T10:57:41Z

haystack/components/embedders/azure_document_embedder.py

+            try:
+                if self.dimensions is not None:
+                    response = self._client.embeddings.create(
+                        model=self.azure_deployment, dimensions=self.dimensions, input=batch
+                    )
+                else:
+                    response = self._client.embeddings.create(model=self.azure_deployment, input=batch)
+
+                # Append embeddings to the list
+                all_embeddings.extend(el.embedding for el in response.data)
+
+                # Update the meta information only once if it's empty
+                if not meta["model"]:
+                    meta["model"] = response.model
+                    meta["usage"] = dict(response.usage)
+                else:
+                    # Update the usage tokens
+                    meta["usage"]["prompt_tokens"] += response.usage.prompt_tokens
+                    meta["usage"]["total_tokens"] += response.usage.total_tokens
+
+            except Exception as e:
+                # Log the error but continue processing
+                batch_range = f"{i} - {i + batch_size}"
+                logger.exception(f"Failed embedding of documents in range: {batch_range} caused by {e}")
+                continue


Could you please align this implementation with that of the OpenAIDocumentEmbedder?

I think that it is better for a few reasons:

groups args for the embedding creation API call

uses the more specific APIError instead of Exception

logs the IDs of the Documents fow which the embedding generation failed

Thanks for the review @anakin87.
For points 1 and 2, I can update the implementation to:

Group args for the embedding creation API call

Use the more specific APIError instead of Exception

For point 3 (logging document IDs), I notice this would require changing the signature of _prepare_texts_to_embed from:

def _prepare_texts_to_embed(self, documents: List[Document]) -> List[str]

to:

def _prepare_texts_to_embed(self, documents: List[Document]) -> Dict[str, str]

Would you be okay with this signature change to align it with OpenAIDocumentEmbedder's implementation? This would help improve error logging by identifying which specific documents failed during embedding.

I'm totally OK with changing the signature of _prepare_texts_to_embed.
It's an internal method (_something), so changing its signature and behavior is not considered a breaking change.

releasenotes/notes/add-azure-embedder-exception-handler-c10ea46fb536de3b.yaml

…6fb536de3b.yaml Co-authored-by: Stefano Fiorucci <[email protected]>

feat: Enhance error handling in Azure document embedder

9ef17ca

mdrazak2001 requested a review from a team as a code owner February 26, 2025 17:52

mdrazak2001 requested review from anakin87 and removed request for a team February 26, 2025 17:52

github-actions bot added topic:tests type:documentation Improvements on the docs labels Feb 26, 2025

add release notes

c425cc7

mdrazak2001 requested a review from a team as a code owner February 26, 2025 17:58

mdrazak2001 requested review from dfokina and removed request for a team February 26, 2025 17:58

anakin87 requested changes Mar 3, 2025

View reviewed changes

address review comments

3f5e9c2

mdrazak2001 requested a review from anakin87 March 3, 2025 18:42

Update releasenotes/notes/add-azure-embedder-exception-handler-c10ea4…

a5e7a01

…6fb536de3b.yaml Co-authored-by: Stefano Fiorucci <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enhance error handling in Azure document embedder #8941

feat: Enhance error handling in Azure document embedder #8941

mdrazak2001 commented Feb 26, 2025

coveralls commented Mar 3, 2025

anakin87 left a comment

anakin87 Mar 3, 2025

mdrazak2001 Mar 3, 2025 •

edited

Loading

anakin87 Mar 3, 2025

feat: Enhance error handling in Azure document embedder #8941

Are you sure you want to change the base?

feat: Enhance error handling in Azure document embedder #8941

Conversation

mdrazak2001 commented Feb 26, 2025

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented Mar 3, 2025

Pull Request Test Coverage Report for Build 13550304131

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

anakin87 left a comment

Choose a reason for hiding this comment

anakin87 Mar 3, 2025

Choose a reason for hiding this comment

mdrazak2001 Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

anakin87 Mar 3, 2025

Choose a reason for hiding this comment

mdrazak2001 Mar 3, 2025 •

edited

Loading