-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inserting a multi-part document results in duplicate nodes and edges #485
Comments
@jin38324 Hi, have you solved this issue? |
I dug deeper into the code and found that the problem was not from So, if you use the default storage, there should be no problem. |
The problem was not from _merge_nodes_then_upsert. |
I want to convert a long document into a Lightrage knowledge base.
If I convert the entire document directly, one error will lead to a total failure. For the purpose of risk control, the document is divided into multiple parts and insert is performed separately.
From the result data, it is found that there are a lot of duplications in nodes and edges.
After investigation, the reason is in the relevant code below.
LightRAG will merge nodes and edges with the same name when executing insert; but this is only valid for the same execution of
_process_single_content
. If you insert multiple documents like I did, the results will not be merged.LightRAG/lightrag/operate.py
Line 391 in 874f3b3
The text was updated successfully, but these errors were encountered: