Description
Collaborative Filtering Recommender System for Restaurants
In this assignment, I develop a collaborative filtering-based recommender system for Yelp users. When users rated the restaurants where they have been to in Yelp app, their accounts have a record of their ratings. The recommender system can recommend new restaurants to a user by comparing this user’s similarity to other users and predicting his rating to a new restaurant.
The dataset come from my final project of CSE6242 and the original data were crawled from Yelp API https://www.yelp.com/developers/documentation/v3/get_started.
The dataset contains all of the user reviews/ratings for 97 restaurants in Atlanta. Here is an example of the ‘user_reviews.csv’:
Figure 1. Screenshot of 5 examples of database.
To fit the collaborative filtering algorithms, I only need 3 columns in the dataset, ‘restaurant_name’, ‘user_id’ and ‘rating’:
Figure 2. Screenshot of 10 examples of dataset for CF model fitting.
According to the assignment’s requirement, I generated 3 size of data from this dataset, named as ‘data1’, ‘data2’ and ‘data3’ respectively in the title of my scripts:
Figure 3. Sizes of 3 sub datasets (data1, data2, data3).
The running environment of my scripts is python/Jupyter notebook.
I used two popular CF algorithms to build the model, one is SVD algorithm in surprise
(https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.m atrix_factorization.SVD),
another algorithm is lightFM ‘Learning to Rank – WARP’ model (https://making.lyst.com/lightfm/docs/lightfm.html).
I install surprise and lightfm library using “pip install” and import modules in Jupyter notebook.
My codes references are:
https://surprise.readthedocs.io/en/stable/FAQ.html https://www.kaggle.com/malikasif123/amazon-reviews-recommendation-system https://www.kaggle.com/podsyp/anime-recommendations-with-surprise https://making.lyst.com/lightfm/docs/examples/dataset.html
The running time and accuracy of these two cases using different sizes of dataset are shown below:
Figure 4(a-1). Running time and accuracy (RMSE) values after 10 queries of data1, using surprise SVD.
Figure 4(a-2). Running time and accuracy (AUC scores) after 10 queries of data1, using LightFM model.
Figure 4(b-1). Running time and accuracy (RMSE) values after 10 queries of data2, using surprise SVD.
Figure 4(b-2). Running time and accuracy (AUC scores) after 10 queries of data2, using LightFM model.
Figure 4(c-1). Running time and accuracy (RMSE) values after 10 queries of data3, using surprise SVD.
Figure 4(c-2). Running time and accuracy (AUC scores) after 10 queries of data3, using LightFM model.
Testing accuracy analysis of models after fitting:
The surprise SVD algorithm describes accuracy by RMSE value while lightfm model uses AUC score. From the accuracy results of 10 queries for each dataset, I can conclude that the accuracy of testing increase while the size of data increase (Accuracy(data1, about 1000 samples) < Accuracy(data1, about 10000 samples) < Accuracy(data1, about 30000 samples)).
Finally, after model fitting and testing, the recommender system recommends restaurants to users by predict all the ratings for the pairs (user, item). The prediction results for 3 datasets by surprise algorithm were exported in ‘rate_df_1’, ‘rate_df_2’, ‘rate_df_3’, respectively. Prediction examples by surprise and LightFM models are shown below:
Figure 5. Screenshots of prediction examples provided by restaurant recommendation systems.
Reviews
There are no reviews yet.