Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting a multi-part document results in duplicate nodes and edges #485

Closed
jin38324 opened this issue Dec 18, 2024 · 3 comments
Closed

Comments

@jin38324
Copy link
Contributor

I want to convert a long document into a Lightrage knowledge base.
If I convert the entire document directly, one error will lead to a total failure. For the purpose of risk control, the document is divided into multiple parts and insert is performed separately.
From the result data, it is found that there are a lot of duplications in nodes and edges.

After investigation, the reason is in the relevant code below.
LightRAG will merge nodes and edges with the same name when executing insert; but this is only valid for the same execution of _process_single_content. If you insert multiple documents like I did, the results will not be merged.

_merge_nodes_then_upsert(k, v, knowledge_graph_inst, global_config)

@xandery-geek
Copy link

xandery-geek commented Jan 8, 2025

@jin38324 Hi, have you solved this issue?
I want to process multiple documents individually and merge the duplicated nodes or edges into one knowledge graph.

@jin38324
Copy link
Contributor Author

@jin38324 Hi, have you solved this issue? I want to process multiple documents individually and merge the duplicated nodes or edges into one knowledge graph.

I dug deeper into the code and found that the problem was not from _merge_nodes_then_upsert.
The problem was caused by the data storage part from kg, which set too strict constraints when trying to get existing objects, resulting in the inability to find existing objects, so an insert was performed.

So, if you use the default storage, there should be no problem.

@jin38324
Copy link
Contributor Author

The problem was not from _merge_nodes_then_upsert.
The problem was caused by the data storage part from kg, which set too strict constraints when trying to get existing objects, resulting in the inability to find existing objects, so an insert was performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants