Description
FileDownloader
Your program must be a console application (no graphical user interface (GUI) is required) and should be named as FileDownloader (i.e., the name of the class that includes the main method should be FileDownloader). Your program should run with the
[java/python] FileDownloader[.java/.py] <index_file>
[<lower_endpoint>-<upper_endpoint>]
command, where <index file> and <lowerendpoint>-<upperendpoint> are the command-line arguments. The details of the command-line arguments are as follows:
• <index file>: [Required] The URL of the index that includes a list of text file URLs.
• <lower endpoint>-<upper endpoint>: [Optional] If this argument is not given, a file in the index is downloaded if it is found in the index. Otherwise, the bytes between <lowerendpoint> and <upper endpoint> inclusively are to be downloaded.
When a user enters the command above, your program will send an HTTP GET request to the server in order to download the index file with URL <index file>. If the index file is not found, the response is a message other than 200 OK. In this case, your program will print an error message to the command-line and exits. If the index file is found, the response is a 200 OK message. When this is the case, your program will print the number of file URLs in the index file and send an HTTP HEAD request for each file URL in the index file.
Requested file is not found: If the requested file is not found in the server, the response is a message other than 200 OK. In this case, your program will print a message to the command-line indicating that the file is not found. Then, an HTTP HEAD request is sent for the next file.
Requested file is found: If the requested file is found in the server, the response is a 200 OK message which includes the size of the file in bytes in the header. When this is the case, there are three possibilities:
1. If the user does not give a range as a command-line argument, your program should send an HTTP GET message to obtain the content of the whole file.
2. If a range is given as a command-line argument and the size of the file is smaller than <lowerendpoint>, the file will not be downloaded and your program will print a message to the command-line indicating that the file is not requested. Then, an HTTP HEAD request is sent for the next file.
3. If a range is given as a command-line argument and the size of the file is not smaller than <lower endpoint>, the range is satisfiable. Then, your program should send an HTTP GET message with the range <lower endpoint>-<upper endpoint> and obtain a part of the file content from the HTTP 206 Partial Content response.
4. If your program successfully obtains the file or a part of the file, it saves the content under the directory in which your program runs. The name of the saved file should be the same as the downloaded file and a message indicating that the file is successfully downloaded is printed to the command-line. Then an HTTP HEAD request is sent for the next file.
In the report, provide a brief explanation as to how the GET requests and corresponding responses operate.
Assumptions and hints
• Please refer to W3Cs RFC 2616 for details of the HTTP messages in general and RFC 7233 for details of range requests.
• You will assume that <lowerendpoint> and <upper endpoint> are both non-negative integers and <lower endpoint> is not greater than <upper endpoint>. Note that there should be a hyphen ‘-’ character between the endpoints.
• You will assume that each line of the index file includes one file URL.
• You will assume that the name of each file in the index is unique.
• Your program will not save the index file to the local folder.
• Your program should print a message to the command-line to inform the user about the status of the files.
• The downloaded file should be saved under the directory containing the source file FileDownloader and the name of the file should be the same as the name of the downloaded file.
www.cs.bilkent.edu.tr/~cs421/fall21/project1/index1.txt www.cs.bilkent.edu.tr/~cs421/fall21/project1/index2.txt
• Please contact your assistant if you have any doubt about the assignment.
Example
Let www.foo.com/abc/index.txt be the URL of the file to be downloaded whose content is given as
www.cs.bilkent.edu.tr/file.txt www.cs.bilkent.edu.tr/folder2/temp.txt wordpress.org/plugins/about/readme.txt humanstxt.org/humans.txt www.cs.bilkent.edu.tr/cs421/deneme.txt
where the first file does not exist in the server and the sizes of the other files are 6000, 4567, 1587, and 9000 bytes, respectively.
Example run 1. Let your program start with the FileDownloader www.foo.com/abc/index.txt
command. Then all files except the first one in the index file are downloaded. After the
Command-line:
URL of the index file: www.foo.com/abc/index.txt
No range is given
Index file is downloaded
There are 5 files in the index
1. www.cs.bilkent.edu.tr/file.txt is not found
2. www.cs.bilkent.edu.tr/folder2/temp.txt (size = 6000) is downloaded
3. wordpress.org/plugins/about/readme.txt (size = 4567) is downloaded
4. humanstxt.org/humans.txt (size = 1587) is downloaded
5. www.cs.bilkent.edu.tr /cs421/deneme.txt (size = 9000) is downloaded
Example run 2. Let your program start with the FileDownloader www.foo.com/abc/index.txt 0-999
Command-line:
URL of the index file: www.foo.com/abc/index.txt
Lower endpoint = 0
Upper endpoint = 999
Index file is downloaded
There are 5 files in the index
1. www.cs.bilkent.edu.tr/file.txt is not found
2. www.cs.bilkent.edu.tr/folder2/temp.txt (range = 0-999) is downloaded
3. wordpress.org/plugins/about/readme.txt (range = 0-999) is downloaded
4. humanstxt.org/humans.txt (range = 0-999) is downloaded
5. www.cs.bilkent.edu.tr /cs421/deneme.txt (range = 0-999) is downloaded
Example run 3. Let your program start with the FileDownloader www.foo.com/abc/index.txt 1587-6999
Command-line:
URL of the index file: www.foo.com/abc/index.txt
Lower endpoint = 1587
Upper endpoint = 6999
Index file is downloaded
There are 5 files in the index
1. www.cs.bilkent.edu.tr/file.txt is not found
2. www.cs.bilkent.edu.tr/folder2/temp.txt (range = 1587-5999) is downloaded
3. wordpress.org/plugins/about/readme.txt (range = 1587-4566) is downloaded
4. humanstxt.org/humans.txt (size = 1587) is not downloaded
5. www.cs.bilkent.edu.tr/ cs421/deneme.txt (range = 1587-6999) is downloaded
Submission rules
You need to apply all of the following rules in your submission. You will lose points if you do not obey the submission rules below or your program does not run as described in the assignment above.
• The assignment must be submitted to Moodle. Any other methods (Email/Disk/ CD/DVD/Cloud Drive) of submission will not be accepted.
• Zip all of the downloaded files, your report in PDF format, and your source code for submission. The submission should only include a single ZIP file. Any other compression is not accepted.
• The name of the zip file must be AliVelioglu20141222 if your name and ID are Ali Velioglu and 20141222, respectively. If you are submitting an assignment done by two students, the file name should include the names and IDs of both group members like AliVelioglu20141222AyseFatmaoglu20255666 if group members are Ali Velioglu and Ayse Fatmaoglu with IDs 20141222 and 20255666, respectively. • For group submissions, ONLY ONE MEMBER must make the submission. The other member must NOT make a submission.
• All the files must be in the root of the zip file; directory structures are not allowed. Please note that this also disallows organizing your code into Java packages. The archive should not contain any file other than the source code(s) with .java or .py extension.
• The archive should not contain:
– Any class files or other executables,
– Any third party library archives (i.e. jar files),
– Any text files,
Reviews
There are no reviews yet.