Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow deletion performance of removeMany with HNSW vector index #710

Open
Ohrest88 opened this issue Feb 20, 2025 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@Ohrest88
Copy link

Ohrest88 commented Feb 20, 2025

First of all, thank you for this great project!
I have searched the issues but I couldn't find any issue related to this.

Description

In my Flutter application, I have an objectbox entity defined as:

@Entity()
class DocumentSection {
  @Id()
  int id = 0;

  final document = ToOne<Document>();
  String content;
  
  @Property(type: PropertyType.int)
  int pageNumber;
  
  @HnswIndex(
    dimensions: 500,
    distanceType: VectorDistanceType.cosine
  )
  @Property(type: PropertyType.floatVector)
  List<double>? embedding;

  @Property(type: PropertyType.int)
  int originalId = 0;

  DocumentSection({
    this.content = '',
    this.embedding,
    this.pageNumber = 0,
  });
}

It's for a semantic search use-case. The objectbox DB has 109000 entries for DocumentSection (therefore 109000 vectors).

While the performance of vector search is remarkably fast with that number of vectors (For example less than 1 second for nearestNeighborsF32 to return a result with 20 nearest embeddings), deleting entries is very slow:

Taking about 264 seconds (4.4 minutes) to delete 22,085 entries (out of 109000 total entries).
Could the reason for this be related to the management of the HNSW vector index during the removeMany operation?

This is the code I'm using to delete entries:

  void _deleteDocument(Document document) {
    try {
      debugPrint('Starting deletion of document: ${document.filename} (ID: ${document.id})');
      final startTime = DateTime.now();

      widget.store.runInTransaction(TxMode.write, () {
        debugPrint('Starting transaction...');
        
        // Query sections
        debugPrint('Querying sections...');
        final queryStart = DateTime.now();
        final query = widget.sectionBox
            .query(DocumentSection_.document.equals(document.id))
            .build();
            
        final sectionCount = query.count();
        final queryDuration = DateTime.now().difference(queryStart);
        debugPrint('Found $sectionCount sections to delete (query took ${queryDuration.inMilliseconds}ms)');

        // Get IDs
        debugPrint('Getting section IDs...');
        final getIdsStart = DateTime.now();
        final ids = query.findIds();
        final getIdsDuration = DateTime.now().difference(getIdsStart);
        debugPrint('Got ${ids.length} section IDs (took ${getIdsDuration.inMilliseconds}ms)');
        
        query.close();

        // Delete sections using removeMany
        debugPrint('Starting batch section deletion...');
        final deleteStart = DateTime.now();
        final removedCount = widget.sectionBox.removeMany(ids);
        final deleteDuration = DateTime.now().difference(deleteStart);
        debugPrint('Sections deleted: $removedCount (took ${deleteDuration.inMilliseconds}ms)');
        
        // Delete document
        debugPrint('Deleting document...');
        final docDeleteStart = DateTime.now();
        widget.documentBox.remove(document.id);
        final docDeleteDuration = DateTime.now().difference(docDeleteStart);
        debugPrint('Document deleted (took ${docDeleteDuration.inMilliseconds}ms)');
      });
      
      final totalDuration = DateTime.now().difference(startTime);
      debugPrint('Total deletion process took ${totalDuration.inMilliseconds}ms');

      ScaffoldMessenger.of(context).showSnackBar(
        SnackBar(content: Text('Deleted ${document.filename}')),
      );
      
      // Refresh the data
      _loadData();
    } catch (e) {
      debugPrint('Error deleting document: $e');
      ScaffoldMessenger.of(context).showSnackBar(
        SnackBar(content: Text('Error deleting document: $e')),
      );
    }
  }

The above code produces these logs:

flutter: Starting deletion of document: Test.pdf (ID: 23)
flutter: Starting transaction...
flutter: Querying sections...
flutter: Found 22085 sections to delete (query took 16ms)
flutter: Getting section IDs...
flutter: Got 22085 section IDs (took 2ms)
flutter: Starting batch section deletion...
flutter: Sections deleted: 22085 (took 264141ms)
flutter: Deleting document...
flutter: Document deleted (took 0ms)
flutter: Total deletion process took 264192ms

Specifically, this line appears to be the bottleneck:

final removedCount = widget.sectionBox.removeMany(ids);

Could the slowdown be related to the HNSW index maintenance during deletion, as all other operations (querying, getting IDs) are very fast?
Is there a known solution for this issue?

Environment:

ObjectBox version: 4.1.0
Flutter: 3.29.0
Platform tested on: Linux (Ubuntu 24.04.1 LTS)

@Ohrest88 Ohrest88 added the enhancement New feature or request label Feb 20, 2025
@greenrobot
Copy link
Member

Could the slowdown be related to the HNSW index maintenance during deletion

Yes, I'm pretty sure the HNSW index update is the bottleneck. To my knowledge no efficient algorithm exists for HNSW bulk updates yet.

I've seen delete markers used instead of actually removing and reorganizing the index. But this not not perfect either and may just delay the problem to later stages.

What will (likely) happen in your use case after a bulk delete? Will new documents be added? It may help to understand the dynamics and usage patterns to think about solutions....

@Ohrest88
Copy link
Author

What will (likely) happen in your use case after a bulk delete? Will new documents be added? It may help to understand the dynamics and usage patterns to think about solutions....

Yes, after a bulk delete, it's quite possible that new documents will be added, but not necessarily.
The application allows users to add and delete documents at will, which leads to extensive changes in the database. Adding a single document can create several DocumentSection entries, as many as 22,000 if the document is big, as in this issue's case.

Also, as the database grows in size (number of total DocumentSection entries), deleting even smaller documents (with less than 5000 DocumentSection entries) becomes slow.

Interestingly, while the insertion of 22,000 DocumentSection entries completes in seconds, the deletion of the same 22000 DocumentSection entries can take as long as four minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants