Skip to content

MysterHawk/kdg-dai6-reccomandation-system

Repository files navigation

Recommendation system for english books [dataset goodreads]

Type used: Content-based RS

Team recommender systems 10

Antonio Gagliarducci and Lukas Nackmayr

Instructions

Use python 3.12

  1. First run data_preparation.ipynb to prepare the dataset (clean and process it) 2. (note it will take around 20 minutes due to the large dataset)
  2. Then run recommender_system.ipynb inside it you will find the one explained in class and some extra recommender system with graphs

Datasets:

  • GoodReads_100k_books.csv original dataset
  • goodreads_with_languages.csv original dataset with the addition of the languages used in each book (processed by utils/lang_detect.py)
  • cleaned_data.csv dataset after being cleaned and processed by data_preparations.ipynb

Dataset's source:

Kaggle.com

Utils folder

Inside here you will find some helpful scripts to prepare the dataset:

  • lang_dect.py will look at title and description and find out the language used in all rows, process the original dataset and save it in goodreads_with_languages.csv.
    • You will have also some stats about the languages found (note you don't have to run it separately, everything is called from the jupiter notebook)
Total unique language categories: 37

Language Detection Breakdown:
Total books: 100000
Books with content: 99713
Missing content: 1
Too short content: 272
Language detection failures: 14
Unexpected errors: 0
  • nan.py prints out why the lang_detect script has detected certain results for some books.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published