A fast and efficient search engine built with C++ using Wikipedia Dump data. Optimized for quick and accurate information retrieval.
Wikipedia Dump (90 GB) - : http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
- For 90 GB of data Wiki XML Dump :
- Size of index ( primary+secondary ) : 9.12 GB
- Size of Metadata : 863 MB
- Time to index : 3hr 30min (average)
- Time to search : 0.34 sec (average on 100 searches)
- User search system for faster information retrieval
- Web-based interface
- Direct links to actual Wikipedia pages
- Stemming for improved search accuracy
To install and run this project, follow these steps:
-
Clone the repository:
git clone https://github.com/UtkarshAhuja2003/WikiSearch.git cd WikiSearch
-
Create a build directory:
mkdir build cd build
-
Generate the build files with CMake:
cmake ..
-
Build the project:
make
-
Run the application:
./WikiSearch
├──.github # Github actions workflow
├──build # Build files for the project
├──client # Web frontend
├──dependencies
├──res # Posting List, Metadata, WikiDump
└──src # Source code
├── index # Parse Wikipedia Data
├── search
└──utils # File Management, Stemming and Classifiers