Description
Nested relational and semi-structured databases
(Draft)
Dirk Van Gucht
For this assignment, you will need the material covered in the lectures
• Lecture 13: Object-relational databases and queries
• Lecture 14: Nested relational and semi-structured databases
1 Formulating Query in Object-Relational SQL
For the problems in the section, you will need to use the polymorphically defined functions and predicates that are defined in the document
SetOperationsAndPredicates.sql Functions
set union(A,B) A ∪ B
set intersection(A,B) A ∩ B
set difference(A,B) A − B
add element(x,A) {x} ∪ A
remove element(x,A) A − {x}
make singleton(x) {x}
choose element(A) choose some element from A
bag union(A,B) the bag union of A and B
bag to set(A)
Predicates coerce the bag A to the corresponding set
is in(x,A) x ∈ A
is not in(x,A) x 6∈ A
is empty(A) A = ∅
is not emptyset(A) A 6= ∅
subset(A,B) A ⊆ B
superset(A,B) A ⊇ B
equal(A,B) A = B
overlap(A,B) A ∩ B 6= ∅
disjoint(A,B) A ∩ B = ∅
We now turn to the problems in this section. You will need use the data provided for the Person, Company, companyLocation, worksFor, jobSkill, personSkill, and Knows relations. But before turning to the problems, we will introduce various object-relational views defined over these relations:
• The view companyHasEmployees(cname,employees) which associates with each company, identified by a cname, the set of pids of persons who work for that company.
create or replace view companyHasEmployees as
select cname, array(select pid from worksfor w
where w.cname = c.cname order by 1) as employees
from company c order by 1;
• The view cityHasCompanies(city,companies) which associates with each city the set of cnames of companies that are located in that city.
create or replace view cityHasCompanies as
select city, array_agg(cname order by 1) as companies from companyLocation group by city order by 1;
• The view companyHasLocations(cname,locations) which associates with each company, identified by a cname, the set of cities in which that company is located.
create or replace view companyHasLocations as select cname, array(select city from companyLocation cc
where c.cname = cc.cname order by 1) as locations from company c order by 1;
• The view knowsPersons(pid,persons) which associates with each person, identified by a pid, the set of pids of persons he or she knows.
create or replace view knowsPersons as select p.pid, array(select k.pid2 from knows k where k.pid1 = p.pid order by pid2) as persons
from person p order by 1;
create or replace view isKnownByPersons as
select distinct p.pid, array(select k.pid1
from knows k where k.pid2 = p.pid) as persons
from person p order by 1;
• The view personHasSkills(pid,skills) which associates with each person, identified by a pid, his or her set of job skills.
create or replace view personHasSkills as
select distinct p.pid, array(select s.skill from personSkill s where s.pid = p.pid order by 1) as skills
from person p order by 1;
• The view skillOfPersons(skills,persons) which associates with each job skill the set of pids of persons who have that job skill.
create or replace view skillOfPersons as select js.skill, array(select ps.pid from personSkill ps where ps.skill = js.skill order by pid) as persons
from jobSkill js order by skill;
In the problems in this section, you are asked to formulate queries in objectrelational SQL. You should use the set operations and set predicates defined in the document SetOperationsAndPredicates.sql, the relations
Person
Company Skill worksFor
and the views
companyHasEmployees cityHasCompanies companyHasLocations knowsPersons isKnownByPersons personHasSkills skillOfPersons
However, you are not permitted to use the Knows, companyLocation, and personSkill relations in the object-relation SQL formulation of the queries. Observe that you actually don’t need these relations since they are encapsulated in these views.
Before listing the queries that you are asked to formulate, we present some examples of queries that are formulated in object-relational SQL using the assumptions stated in the previous paragraph. Your solutions need to be in the style of these examples. The goals is to maximize the utilization of the functions and predicates defined in document SetOperationsAndPredicates.sql.
Example 1 Consider the query “Find the pid of each person who knows a person who has a salary greater than 55000.”
select distinct pk.pid from knowsPersons pk, worksfor w where is in(w.pid, pk.persons) and w.salary > 55000 order by 1;
Note that the following formulation for this query is not allowed since it uses the relation Knows which is not permitted.
select distinct k.pid1 from knows k, worksfor w where k.pid2 = w.pid and w.salary > 55000;
Example 2 Consider the query “Find the pid and name of each person p who (1) has both the AI and Programming and (2) knows at least 5 persons, and report the number of persons who know p.”
select p.pid, p.pname, (select cardinality(kp.persons)
from isKnownByPersons kp where kp.pid = p.pid) as ct_knownByPersons
from person p where p.pid in (select ps.pid from personHasSkills ps
where subset(’”AI”, “Programming”’, ps.skills)) and
cardinality((select kp.persons from knowsPersons kp where kp.pid = p.pid)) >= 5;
Example 3 Consider the query “Find the pid and name of each person along with the set of his of her skills that are not among the skills of persons who work for ‘Netflix’”.
select p.pid, p.pname, set difference((select ps.skills from personHasSkills ps where ps.pid = p.pid), array(select unnest(ps.skills) from personHasSkills ps where is in(ps.pid, (select employees from companyHasEmployees where cname = ’Netflix’))))
from person p;
1. Formulate the following queries in object-relational SQL.
(a) Find the cname and headquarter of each company that employs at least two persons who each have both the AI and the Programming job skills.
(b) Find each skill that is not a job skill of any person who works for Yahoo or for Netflix.
(c) Find the set of companies that employ at least 3 persons who each know at least five persons. (So this query returns only one object, i.e., the set of companies specified in the query.)
(d) Find the pid and name of each person p along with the set of pids of persons who (1) know p and (2) who have the AI skill but not the Programming skill.
(e) Find each pair (s1,s2) of different skills s1 and s2 such that the number of employees who have skill s1 and who make strictly more than 55000 is strictly less than the number of employees who have skill s2 and who make at most 55000.
(f) (Practice Problem: not-graded).
Find each (c,p) pair where c is the cname of a company and p is the pid of a person who works for that company and who is known by all other persons who work for that company.
(g) (Practice Problem: not-graded).
Find the pid and name of each person who has all the skills of the combined set of job skills of the highest paid persons who work for Yahoo.
2. Find the following set of sets
{S | S ⊆ Skill ∧ |S| ≤ 3}.
I.e., this is the set consisting of each set of job skills whose size (cardinality) is at most 3.
3. (Practice Problem: not-graded).
Reconsider Problem 2. Let
S = {S | S ⊆ Skill ∧ |S| ≤ 3}.
Find the following set of sets
{X | X ⊆ S ∧ |X| ≤ 2}.
4. Let t be a number called a threshold. We say that a (unordered) pair of different person pids {p1,p2} co-occur with frequency at least t if there are at least t skills that are skills of both the person with pid p1 and the person with pid p2.
Write a function coOccur(t integer) that returns the (unordered) pairs {p1,p2} of person pid that co-occur with frequency at least t.
Test your coOccur function for t in the range [0,3].
5. Let A and B be sets such that A ∪ B 6= ∅. The Jaccard index J(A,B) is defined as the quantity
.
The Jaccard index is a frequently used measure to determine the similarity between two sets. Note that if A∩B = ∅ then J(A,B) = 0, and if A = B then J(A,B) = 1.
Let t be a number called a threshold. We assume that t is a float in the range [0,1].
Write a function JaccardSimilar(t float) that returns the set of unordered pairs {s1,s2} of different skills such that the set of persons who have skill s1 and the set of persons who have skill s2 have a Jaccard index of at least t.
Test your function JaccardSimilar for the following values for t: 0, 0.25, 0.5, 0.75, and 1.
2 Nested Relations and Semi-structured databases
Consider the lecture on Nested relational and semi-structured databases. In that lecture we considered the studentGrades nested relation and the jstudentGrades semi-structured database and we constructed these using a PostgreSQL query starting from the Enroll relation.
6. Write a PostgreSQL view courseGrades that creates the nested relation of type
(cno,gradeInfo{(grade,students{(sid)})})
This view should compute for each course, the grade information of the students enrolled in this course. In particular, for each course and for each grade, this relation stores in a set the sids students who obtained that grade in that course.
Test your view.
7. Starting from the courseGrades view in Problem 6 solve the following queries:
(a) Find each pair (c,S) where c is the cno of a course and S is the set of sids of students who received an ‘A’ or a ‘B’ in course c. The type of your answer relation should be (cno : text,Students : {(sid : text)}).
(b) Find each (s,C) pairs where s is the sid of a students and C is the set of cnos of courses in which the student received an ‘A’. The type of your answer relation should be (sid : text,Courses : {(cno : text)}).
(c) (Practice Problem: not-graded).
Find each cno c where c is a course in which all students received the same grade.
8. Write a PostgreSQL view jcourseGrades that creates a semi-structured database which stores jsonb objects whose structure conforms with the structure of tuples as described for the courseGrades in Problem 6.
Test your view.
9. Starting from the jcourseGrades view in Problem 8 solve the following queries. Note that the output of each of these queries is a nested relation.
(a) Find each pair (c,s) where c is the cno of a course and s is the sid of a student who did not received an ‘A’ in course c. The type of your answer relation should be (cno:text, sid:text).
(b) Find each pair ({c1,c2},S) where c1 and c2 are the course numbers of two different courses and S is the set of sids of students who received a ’B’ in both courses c1 and c2. The type of your answer relation should be (coursePair : {(cno : text)},Students : {(sid : text))}.
Reviews
There are no reviews yet.