CS 300 (Solution)

$ 29.99
Category:

Description

DATA STRUCTURES
Homework 1
PLEASE NOTE:
• SOLUTIONS HAVE TO BE YOUR OWN.
• NO COLLABORATION OR COOPERATION AMONG STUDENTS IS PERMITTED.
• 10% PENALTY WILL BE INCURRED FOR EACH DAY OF OVERTIME. SUBMISSIONS THAT ARE LATE MORE THAN 3 DAYS WILL NOT GET ANY CREDITS.
• SUBMISSIONS WILL BE MADE TO THE SUCOURSE SERVER. NO OTHER METHOD OF SUBMISSION WILL BE ACCEPTED.
In this homework, we are going to learn about how search engines such as Google do their searches really fast. These search engines search hundreds of millions web pages to see if they have the words that you have typed, and they can do this for thousands of users at a given time. In order to do these searches really fast, search engines such as Google do a lot of what we call preprocessing of the pages they search; that is, they transform the contents of a web page (which for the purposes of this homework, we will assume to consist of only strings) into a structure that can be searched very fast. Here is the basic idea:
• Let us assume we have millions of documents, D1,D2,…D19234678,…, each consisting of hundreds or thousands of words. We identify each document by its number, e.g., 1, 2, …19234678, ….
• Since almost all the time these documents contain material in a human language, many words (such as the, or geliyor) most likely appear in many of the documents in a given language. Some words such as the or bir appear in perhaps all documents in a given language, while words such as Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch or esneyemeyense appear in relatively few documents.
• For each unique word, we construct a linked list of document numbers representing the set of documents that word appears in. We ignore details such as how many times a word appears in that document or where in the document it appears in, etc.

Figure 1: Conceptual layout of the sample database
For instance if the word Sabanci appears in documents D2,D4,D6, we associate the list (2, 4, 6) with the string Sabanci. So, preprocessing involves finding ALL unique words AND creating a list of ALL documents numbers for the documents that contain that word somewhere.
Here is how we expect you to attack this problem:
• When you are given a query – a series of words – you search the database to locate the list nodes containing the words, and then retrieve their associated document lists and then intersect them.
Your task for this homework involves the following:
computer 21 Sabanci 4
Sabanci 6
Obviously any permutation of the lines above constitutes a valid database file for the data above. If some line is mistakenly repeated in the file you should ignore that line.
• You will then read query strings from the standard input. To simplify matters, each query will consist of a positive number and that many strings. For instance the query
If the number you read is 0, then the program should terminate without reading any further strings.
• The output for a query should consists of a line containing the query strings in the order given in the input, followed the number of documents found and then the matching document numbers in increasing order. There should be exactly 1 space between the strings and numbers in the output. For instance, for the query above the output will be
indicating that there is only one document – document 6 – that has all the three strings.
If however you have a query like
1 microelectronics your output will
microelectronics 0
Indicating that there are 0 documents matching.
• name the folder containing your source files as XXXX-NameLastname where XXXX is your student number. For instance if your student number is 5432 and your name is Ali Mehmetoˇglu, the folder will be named 5432-AliMehmetoglu. Make sure you do NOT use any Turkish characters in the folder name. You should remove any folders containing executables (Debug or Release), since they take up too much space.
• Compress your folder to a compressed file named
5432-AliMehmetoglu.zip. After you compress, please make sure it uncompresses properly.
• You then submit this compressed file in accordance with the deadlines above.
If the file you submitted is not labeled properly as above, if it does not uncompress, if the source code does not compile and if it does not work for the input we will use, as specified, we regretfully will have to give you 0 credit for the homework. So please make sure all these steps work properly.

Reviews

There are no reviews yet.

Be the first to review “CS 300 (Solution)”

Your email address will not be published. Required fields are marked *