Description
CENG 352
Database Management Systems
Project 1
1 Introduction
In this project you are supposed to write many SQL queries on a relational database which will be created by using a dataset. The dataset is taken from Yelp Open Dataset which is available for access to users for personal and non-commercial use. You can reach it from here. Also, you can find the documentation on the same link if you are interested in the dataset. Yelp Dataset contains too much information for this task, so we have done some cleaning to reduce the size of the dataset and converted json files to csv files. You can reach the data which will be used on this project from here. This project has three parts:
• Create the database using the given ’csv’ files.
• Write proper SQL queries for certain problems.
• Create triggers and views.
Note that all tasks should be completed using PostgreSQL.
2 Database Schema
Business(business id, business name, address, state, is open, stars)
Users(user id, user name, review count, yelping since, useful, funny, cool, fans, average stars)
Friend(user id1, user id2)
Yelp is a system in which you can find any kind of service such as restaurants, dentists or vet. You can see businesses in your area, their stars (measure for appreciation given by other users) and also you can vote on such businesses that you’ve visited. The main purpose of Yelp is create an environment for users to share information among themselves.
2.1 Foreign Key Constraints
Friend’s user id1 and user id2 references Users’ user id.
Review’s user id references Users’ user id.
Review’s business id references Business’s business id.
Tip’s business id references Business’s business id. Tip’s user id references Users’ user id.
2.2 Explanation of Attributes
Business • business id: Primary key of the business table that stores id’s of each business.
• business name: Name of that business.
• address: Address of the business.
• state: State code that the business is located at.
• is open: Boolean value, that indicates whether the business still working or closed completely.
• stars: Float value, average stars given to that business by users.
Users • user id: Primary key of the Users table that stores id’s of each user.
• user name: Name of the user.
• review count: Indicates how many reviews have been made by that user.
• useful: Count of this useful votes that sent by this user.
• funny: Count of this funny votes that sent by this user.
• cool: Count of this cool votes that sent by this user.
• fans: Number of fans of that user.
• average stars: Float value, average stars given by that user.
Friend • user id1: First user’s id.
• user id2: Friend of the first user.
Review • review id: Primary key of the Review table that stores id’s of each review.
• user id: User that gave the review.
• business id: Business that is reviewed by the user.
• stars: Given stars to the business by the user.
• useful: Count of useful tags given by other users to this review.
• funny: Count of funny tags given by other users to this review.
• cool: Count of cool tags given by other users to this review.
Tip • tip id: The id of the tip. It is NOT provided in the .csv file, it must be an auto incremented value.
• business id: Business that is given a tip.
• user id: User that gave the tip about the business.
• compliment count: Compliments that the tip is received.
• tip text: String, text of the tip.
2.3 General Information About Relations
• Reviews are supposed to be detailed reports created by a user for a business. However, since Review texts are too large in terms of bytes we’ve ignored them.
• Tips are shorter reviews like ’this restaurant’s pizza is really good.’. They are similar to the reviews but less detailed.
3 Tasks
3.1 Task 1 – Creating the Database – 15 pts
Using the given ’csv’ files and considering the database schema above create a database using PostgreSQL. For this task you should create a file named ’task1.sql’ that contains SQL statements that you’ve used to create the database.
3.2 Task 2 – Advanced SQL Queries – 70 pts
For this task you should be able to write SQL queries for given problems. Please order your queries (from query1 to query14) and create a file named ’task2.sql’. Each question is 5 pts.
1. Find the users whose review count is higher than its fans and reviewed at least a business which has more than 3.5 stars. List user id, user name, difference between review count and fans (Ordered by difference between review count and fans, user id DESC) (1504630 rows)
3. Find the top 20 users by their friend count (more friends are better ). List user names and count of friends (Ordered by friend count, user name DESC) (20 rows)
4. Find the users who have given lower stars to a business than its current stars. List distinct users’ user name (not distinc user name distinct user), average stars and yelping since (Ordered by average stars, yelping since DESC) (1139971 rows)
6. Find the users who have lower average stars than all of his/her friends’ average stars(Consider the average stars of Users table.). List user name, yelping since and average stars of such users.(Ordered by average stars, yelping since DESC) (105952 rows)
7. The average stars of businesses in a state gives us Average of the State. Find top 10 state by highest average stars. List state code and average stars (Ordered by average stars DESC) (10 rows).
9. List the names of the user’s who only reviewed businesses who have stars more than 3.5. (Ordered by user name ASC) (832155 rows)
10. Popular businesses are those with more than 1000 reviews. For each such business find the business name and average stars for each year. List only those with average stars greater than 3 (in ascending order of years, business name) (5601 rows).
11. Find the users who have got more useful votes than cool votes (Note that Users table contains votes sent by the users, now we need to find votes s/he got by reviews). List user name, useful, cool and difference of useful and cool. (Order by difference, user name DESC) (1003635 rows)
12. List pairs of friends and business names if both friends reviewed the same business with the same stars. List the business id, friends as user id 1 user id 2 and stars (Multiple pair reviews should also returned). You need to return all businesses for each (u1,u2) pair (There might be more than one review that two friends gave to a business. The stars were same before and even both changed their stars they meet in the same point again). However, if (u1,u2) is listed (u2,u1) should not be listed for the same business (Ordered by business id, stars DESC). (77157/79393 or 126629 rows) Example to Clarification:
user id1user id2business idstars user1user2business15 user1user2business14
This is a valid return.
13. Cross tabulations Write a single SQL query that computes the statistics present in a cross tabulation over stars and state on Business table. The aggregate reported in the crosstab should simply count businesses, but restrict your attention to the open businesses. You only need to list stars, states and count. (147 rows)
14. Window functions List the top 3 users (user id, review count and fans) and their rank (i.e. 1, 2, or 3) when ranked by review count for each grouping formed by number of fans. Restrict your attention to those with a number of fans between 50 and 60, inclusive. (33 rows)
3.2.1 Task 2 Specifications
Your submission file format (for task2) should be as follows:
/* Question 1 */ SELECT… /* Question 2 */ SELECT…
…
/* Question 14 */ SELECT…
You don’t have to write your queries on single lines. You are allowed (and encouraged) to write them in multiple lines for better readability. You should not write anything for the unsolved questions. You should have lines for only the solved questions.
3.3 Task 3 – Triggers and Views – 15 pts
Triggers. It is often useful to have the DBMS perform some actions automatically in response to operations on the database. Write necessary triggers to achieve the following functionalities:
2. Our system does not want Users who write reviews with 0 stars. If there is a review with 0 stars that review should not be inserted into the Review table. Further, delete all reviews and tips of that user.
Views. It is inefficient to calculate the number of reviews (review count) of a business for each related query. Create a view called ’BusinessCount’ that contains the columns business id, business name, and review count. The review count should be the total number of reviews made for that business.
4 Submission
Reviews
There are no reviews yet.