Skip to content

SearchScale/vectorsearch-benchmarks

Repository files navigation

Lucene-CuVS Benchmarks

Prerequisites

Before running

Build libcuvs libraries and CuVS Java API

(For now, please comment out cuvsRMMPoolMemoryResourceEnable in both CAGRA and Bruteforce build index methods in the C wrapper)

git clone [email protected]:rapidsai/cuvs.git \
&& cd cuvs \
&& git checkout branch-25.02 \
&& ./build.sh libcuvs java

Build Lucene-CuVS

git clone [email protected]:SearchScale/lucene.git \
&& cd lucene \
&& git checkout cuvs-integration-main \
&& ./gradlew compileJava mavenToLocal

Download the Wikipedia Dataset (5M vectors x 2048 dimensions), queries (100 x 2048 dimensions), and groundtruth (100 x 64 topk)

wget https://accounts.searchscale.com/datasets/wikipedia/ground_truth_100x64.csv \
&& wget https://accounts.searchscale.com/datasets/wikipedia/queries_100.csv.mapdb \
&& wget https://accounts.searchscale.com/datasets/wikipedia/wiki_dump_5Mx2048D.csv.gz.mapdb

Running Manually

Steps:

  • Add your benchmark job configuration in the jobs.json file
  • do ./benchmarks.sh jobs.json
  • If saveResultsOnDisk is set as true (in jobs.json) then you can find your benchmark results in the results folder. For each successful benchmark run, two files are created ${benchmark_id}__benchmark_results_${timestamp}.json and ${benchmark_id}__neighbors_${timestamp}.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published