recommenders-team · gramhagen · Oct 1, 2019 · Sep 27, 2019 · Sep 27, 2019 · Sep 27, 2019
@@ -75,6 +75,7 @@ We provide a [benchmark notebook](benchmarks/movielens.ipynb) to illustrate how
 | [SVD](notebooks/02_model/surprise_svd_deep_dive.ipynb) | 0.012873	| 0.095930 |	0.091198 |	0.032783 | 0.938681 |	0.742690	| 0.291967 |	0.291971 |
 | [SAR](notebooks/00_quick_start/sar_movielens.ipynb) | 0.113028 |	0.388321 | 	0.333828 | 0.183179 | N/A |	N/A |	N/A |	N/A |
 | [NCF](notebooks/02_model/ncf_deep_dive.ipynb) | 0.107720	| 0.396118 |	0.347296 |	0.180775 | N/A |	N/A |	N/A |	N/A |
+| [BPR](notebooks/02_model/cornac_bpr_deep_dive.ipynb) | 0.105365	| 0.389948 |	0.349841 |	0.181807 | N/A |	N/A |	N/A |	N/A |
 | [FastAI](notebooks/00_quick_start/fastai_movielens.ipynb) | 0.025503 |	0.147866 |	0.130329 |	0.053824 | 0.943084 |	0.744337 |	0.285308 |	0.287671 |
 
 ## Contributing

@@ -13,6 +13,6 @@ The machine we used to perform the benchmarks is a Standard NC6s_v2 [Azure DSVM]
 * MovieLens 10M: 10 million ratings from 72000 users on 10000 movies.
 * MovieLens 20M: 20 million ratings from 138000 users on 27000 movies
 
-The MovieLens benchmark can be seen at [movielens.ipynb](movielens.ipynb). In this notebook, the MovieLens dataset is split into training / test sets using a stratified splitting method that takes 75% of each user's ratings as training data, and the remaining 25% ratings as test data. For ranking metrics we use `k=10` (top 10 recommended items). The algorithms used in this benchmark are [ALS](../notebooks/00_quick_start/als_movielens.ipynb), [SVD](../notebooks/02_model/surprise_svd_deep_dive.ipynb), [SAR](../notebooks/00_quick_start/sar_movielens.ipynb), [NCF](../notebooks/00_quick_start/ncf_movielens.ipynb) and [FastAI](../notebooks/00_quick_start/fastai_movielens.ipynb).
+The MovieLens benchmark can be seen at [movielens.ipynb](movielens.ipynb). In this notebook, the MovieLens dataset is split into training / test sets using a stratified splitting method that takes 75% of each user's ratings as training data, and the remaining 25% ratings as test data. For ranking metrics we use `k=10` (top 10 recommended items). The algorithms used in this benchmark are [ALS](../notebooks/00_quick_start/als_movielens.ipynb), [SVD](../notebooks/02_model/surprise_svd_deep_dive.ipynb), [SAR](../notebooks/00_quick_start/sar_movielens.ipynb), [NCF](../notebooks/00_quick_start/ncf_movielens.ipynb), [BPR](../notebooks/02_model/cornac_bpr_deep_dive.ipynb) and [FastAI](../notebooks/00_quick_start/fastai_movielens.ipynb).
 
 
@@ -8,6 +8,7 @@
 from pyspark.sql.types import StringType, FloatType, IntegerType, LongType
 from fastai.collab import collab_learner, CollabDataBunch
 import surprise
+import cornac
 
 from reco_utils.common.constants import (
     COL_DICT,
@@ -28,7 +29,9 @@
     compute_rating_predictions,
     compute_ranking_predictions,
 )
-from reco_utils.recommender.fastai.fastai_utils import cartesian_product, score
+from reco_utils.recommender.fastai.fastai_utils import (cartesian_product, score,
+                                                        hide_fastai_progress_bar)
+from reco_utils.recommender.cornac.cornac_utils import predict_ranking
 from reco_utils.evaluation.spark_evaluation import (
     SparkRatingEvaluation,
     SparkRankingEvaluation,
@@ -261,6 +264,33 @@ def recommend_k_ncf(model, test, train):
     return topk_scores, t
 
 
+def prepare_training_bpr(train):
+    return cornac.data.Dataset.from_uir(
+        train.drop(DEFAULT_TIMESTAMP_COL, axis=1).itertuples(index=False),
+        seed=SEED
+    )
+
+
+def train_bpr(params, data):
+    model = cornac.models.BPR(**params)
+    with Timer() as t:
+        model.fit(data)
+    return model, t
+
+
+def recommend_k_bpr(model, test, train):
+    with Timer() as t:
+        topk_scores = predict_ranking(
+            model,
+            train,
+            usercol=DEFAULT_USER_COL,
+            itemcol=DEFAULT_ITEM_COL,
+            predcol=DEFAULT_PREDICTION_COL,
+            remove_seen=True,
+        )
+    return topk_scores, t
+
+
 def train_sar(params, data):
     model = SARSingleNode(**params)
     model.set_index(data)