CS6220 – Solved

$ 29.99

Category: CS6220

Description
Reviews (0)

Description

Collaborative Filtering Recommender System for Restaurants

In this assignment, I develop a collaborative filtering-based recommender system for Yelp users. When users rated the restaurants where they have been to in Yelp app, their accounts have a record of their ratings. The recommender system can recommend new restaurants to a user by comparing this user’s similarity to other users and predicting his rating to a new restaurant.
The dataset come from my final project of CSE6242 and the original data were crawled from Yelp API https://www.yelp.com/developers/documentation/v3/get_started.
The dataset contains all of the user reviews/ratings for 97 restaurants in Atlanta. Here is an example of the ‘user_reviews.csv’:

Figure 1. Screenshot of 5 examples of database.

To fit the collaborative filtering algorithms, I only need 3 columns in the dataset, ‘restaurant_name’, ‘user_id’ and ‘rating’:

Figure 2. Screenshot of 10 examples of dataset for CF model fitting.

According to the assignment’s requirement, I generated 3 size of data from this dataset, named as ‘data1’, ‘data2’ and ‘data3’ respectively in the title of my scripts:

Figure 3. Sizes of 3 sub datasets (data1, data2, data3).

The running environment of my scripts is python/Jupyter notebook.
I used two popular CF algorithms to build the model, one is SVD algorithm in surprise
(https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.m atrix_factorization.SVD),
another algorithm is lightFM ‘Learning to Rank – WARP’ model (https://making.lyst.com/lightfm/docs/lightfm.html).
I install surprise and lightfm library using “pip install” and import modules in Jupyter notebook.

My codes references are:
https://surprise.readthedocs.io/en/stable/FAQ.html https://www.kaggle.com/malikasif123/amazon-reviews-recommendation-system https://www.kaggle.com/podsyp/anime-recommendations-with-surprise https://making.lyst.com/lightfm/docs/examples/dataset.html

The running time and accuracy of these two cases using different sizes of dataset are shown below:

Figure 4(a-1). Running time and accuracy (RMSE) values after 10 queries of data1, using surprise SVD.

Figure 4(a-2). Running time and accuracy (AUC scores) after 10 queries of data1, using LightFM model.

Figure 4(b-1). Running time and accuracy (RMSE) values after 10 queries of data2, using surprise SVD.

Figure 4(b-2). Running time and accuracy (AUC scores) after 10 queries of data2, using LightFM model.

Figure 4(c-1). Running time and accuracy (RMSE) values after 10 queries of data3, using surprise SVD.

Figure 4(c-2). Running time and accuracy (AUC scores) after 10 queries of data3, using LightFM model.

Testing accuracy analysis of models after fitting:
The surprise SVD algorithm describes accuracy by RMSE value while lightfm model uses AUC score. From the accuracy results of 10 queries for each dataset, I can conclude that the accuracy of testing increase while the size of data increase (Accuracy(data1, about 1000 samples) < Accuracy(data1, about 10000 samples) < Accuracy(data1, about 30000 samples)).

Finally, after model fitting and testing, the recommender system recommends restaurants to users by predict all the ratings for the pairs (user, item). The prediction results for 3 datasets by surprise algorithm were exported in ‘rate_df_1’, ‘rate_df_2’, ‘rate_df_3’, respectively. Prediction examples by surprise and LightFM models are shown below:

Figure 5. Screenshots of prediction examples provided by restaurant recommendation systems.

Reviews

There are no reviews yet.

Be the first to review “CS6220 – Solved”

CS6220 – Solved

Description

Reviews

Related products

CS6220 – Solved

CS6220 – Solved

CS6220 – Assignment 3 Solved

CS6220 – CSE6220 Programming Assignment 2 Report Solved

CS6220 – This is an introductory programming assignment meant for you to get your bearing in programming in C/C++ and some basic MPI commands. You are encouraged to discuss this in piazza in order to work out how to solve the problem. Solved