You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for this great project!
I have searched the issues but I couldn't find any issue related to this.
Description
In my Flutter application, I have an objectbox entity defined as:
@Entity()
classDocumentSection {
@Id()
int id =0;
final document =ToOne<Document>();
String content;
@Property(type:PropertyType.int)
int pageNumber;
@HnswIndex(
dimensions:500,
distanceType:VectorDistanceType.cosine
)
@Property(type:PropertyType.floatVector)
List<double>? embedding;
@Property(type:PropertyType.int)
int originalId =0;
DocumentSection({
this.content ='',
this.embedding,
this.pageNumber =0,
});
}
It's for a semantic search use-case. The objectbox DB has 109000 entries for DocumentSection (therefore 109000 vectors).
While the performance of vector search is remarkably fast with that number of vectors (For example less than 1 second for nearestNeighborsF32 to return a result with 20 nearest embeddings), deleting entries is very slow:
Taking about 264 seconds (4.4 minutes) to delete 22,085 entries (out of 109000 total entries).
Could the reason for this be related to the management of the HNSW vector index during the removeMany operation?
This is the code I'm using to delete entries:
void_deleteDocument(Document document) {
try {
debugPrint('Starting deletion of document: ${document.filename} (ID: ${document.id})');
final startTime =DateTime.now();
widget.store.runInTransaction(TxMode.write, () {
debugPrint('Starting transaction...');
// Query sectionsdebugPrint('Querying sections...');
final queryStart =DateTime.now();
final query = widget.sectionBox
.query(DocumentSection_.document.equals(document.id))
.build();
final sectionCount = query.count();
final queryDuration =DateTime.now().difference(queryStart);
debugPrint('Found $sectionCount sections to delete (query took ${queryDuration.inMilliseconds}ms)');
// Get IDsdebugPrint('Getting section IDs...');
final getIdsStart =DateTime.now();
final ids = query.findIds();
final getIdsDuration =DateTime.now().difference(getIdsStart);
debugPrint('Got ${ids.length} section IDs (took ${getIdsDuration.inMilliseconds}ms)');
query.close();
// Delete sections using removeManydebugPrint('Starting batch section deletion...');
final deleteStart =DateTime.now();
final removedCount = widget.sectionBox.removeMany(ids);
final deleteDuration =DateTime.now().difference(deleteStart);
debugPrint('Sections deleted: $removedCount (took ${deleteDuration.inMilliseconds}ms)');
// Delete documentdebugPrint('Deleting document...');
final docDeleteStart =DateTime.now();
widget.documentBox.remove(document.id);
final docDeleteDuration =DateTime.now().difference(docDeleteStart);
debugPrint('Document deleted (took ${docDeleteDuration.inMilliseconds}ms)');
});
final totalDuration =DateTime.now().difference(startTime);
debugPrint('Total deletion process took ${totalDuration.inMilliseconds}ms');
ScaffoldMessenger.of(context).showSnackBar(
SnackBar(content:Text('Deleted ${document.filename}')),
);
// Refresh the data_loadData();
} catch (e) {
debugPrint('Error deleting document: $e');
ScaffoldMessenger.of(context).showSnackBar(
SnackBar(content:Text('Error deleting document: $e')),
);
}
}
The above code produces these logs:
flutter: Starting deletion of document: Test.pdf (ID: 23)
flutter: Starting transaction...
flutter: Querying sections...
flutter: Found 22085 sections to delete (query took 16ms)
flutter: Getting section IDs...
flutter: Got 22085 section IDs (took 2ms)
flutter: Starting batch section deletion...
flutter: Sections deleted: 22085 (took 264141ms)
flutter: Deleting document...
flutter: Document deleted (took 0ms)
flutter: Total deletion process took 264192ms
Specifically, this line appears to be the bottleneck:
final removedCount = widget.sectionBox.removeMany(ids);
Could the slowdown be related to the HNSW index maintenance during deletion, as all other operations (querying, getting IDs) are very fast?
Is there a known solution for this issue?
Environment:
ObjectBox version: 4.1.0
Flutter: 3.29.0
Platform tested on: Linux (Ubuntu 24.04.1 LTS)
The text was updated successfully, but these errors were encountered:
Could the slowdown be related to the HNSW index maintenance during deletion
Yes, I'm pretty sure the HNSW index update is the bottleneck. To my knowledge no efficient algorithm exists for HNSW bulk updates yet.
I've seen delete markers used instead of actually removing and reorganizing the index. But this not not perfect either and may just delay the problem to later stages.
What will (likely) happen in your use case after a bulk delete? Will new documents be added? It may help to understand the dynamics and usage patterns to think about solutions....
What will (likely) happen in your use case after a bulk delete? Will new documents be added? It may help to understand the dynamics and usage patterns to think about solutions....
Yes, after a bulk delete, it's quite possible that new documents will be added, but not necessarily.
The application allows users to add and delete documents at will, which leads to extensive changes in the database. Adding a single document can create several DocumentSection entries, as many as 22,000 if the document is big, as in this issue's case.
Also, as the database grows in size (number of total DocumentSection entries), deleting even smaller documents (with less than 5000 DocumentSection entries) becomes slow.
Interestingly, while the insertion of 22,000 DocumentSection entries completes in seconds, the deletion of the same 22000 DocumentSection entries can take as long as four minutes.
First of all, thank you for this great project!
I have searched the issues but I couldn't find any issue related to this.
Description
In my Flutter application, I have an objectbox entity defined as:
It's for a semantic search use-case. The objectbox DB has 109000 entries for DocumentSection (therefore 109000 vectors).
While the performance of vector search is remarkably fast with that number of vectors (For example less than 1 second for nearestNeighborsF32 to return a result with 20 nearest embeddings), deleting entries is very slow:
Taking about 264 seconds (4.4 minutes) to delete 22,085 entries (out of 109000 total entries).
Could the reason for this be related to the management of the HNSW vector index during the removeMany operation?
This is the code I'm using to delete entries:
The above code produces these logs:
Specifically, this line appears to be the bottleneck:
Could the slowdown be related to the HNSW index maintenance during deletion, as all other operations (querying, getting IDs) are very fast?
Is there a known solution for this issue?
Environment:
ObjectBox version: 4.1.0
Flutter: 3.29.0
Platform tested on: Linux (Ubuntu 24.04.1 LTS)
The text was updated successfully, but these errors were encountered: