Description
Lecture 6 CDN & DASH
HHQ. ZHANG
Dept. Computer Science and Engineering
Technology
Part A.
CDN & Web Cache
Content delivery network
CDN
• CDN is designed for cache content on a node closer to edge users to improve their experience.
• What are the scenarios of CDN?
– Big flow website, such as: online video, games, pictures, audio, social, e-commerce, download stations, etc.
• CDN is suitable for a certain level of static resource access (html, js, css, jpg, gif, etc).
How does CDN work?
• A CDN server is actually a reverse proxy cache server.
CDN Example
A static resource which is cached on a CDN node of aliyun
Web proxies
• Web proxies are intermediaries between web clients and web servers that fulfill transactions on clients’ behalf.
• A client sends a request to the proxy, which forwards the request to the server. When the proxy receives a response from the server, it forwards the response back to the client.
– Proxies act like servers to web clients
– Proxies act like clients to web serversrequest request
response response
Web browser Web proxy Web server
Web proxies
• There can be multiple proxy servers between a browser and the origin web server (which produces HTTP responses)
• Proxy servers are transparent to end-users.
Functions of Proxies
• Some possible functions:
– Content filtering – block access to inappropriate content
– Security firewall – block malicious software like virus
– Web caching – improve HTTP performance
request request
– „ two more examples follow
response response
Web browser Web proxy Web server
Forward proxy
• A forward proxy acts on behalf of a client (or other forward proxy) to access web servers in the Internet
– Reduce outgoing bandwidth usage and concurrent TCP connections
users
More about forward proxy
• A forward proxy often hides the IP of the clients
– TCP connection between the proxy and the server, but not between the client and server
• Some proxies add a request header X-Forwarded-For to reveal client’s IP address
– Experiment*: http://whatismyipaddress.com/
*Notice: If you are behind a NAT, the result will not be correspond to your IP address, which doesn’t mean you are behind a HTTP proxy.
Reverse proxy
• Reverse proxy is a proxy server that retrieves resources on behalf of a client from one or more servers
– Usually, a reverse proxy only connects to web servers of a web site
– Popular software: Nginx, lighttpd
• For client, the reverse proxy works as the web server of the company
– Client cannot connect to the ‘real’ web servers behind. The reverse proxy uses IP address of a web
site. Company Z network
Functions of reverse proxy
• Reduce workload of origin servers
– Caching
– Load balancing
– Serving static resources (e.g. images)
– HTTP Compression and encryption
• Protect against common web-based attacks
Reverse Proxy for load balancing
• A reverse proxy usually sits before a server farm
– Each server in the farm duplicates databases, programs and static resources (e.g. images)
– The reverse proxy dispatches request from the same user to one of the servers in the server farm.
– Each server maintains client sessions.
Reverse proxy Server farm
Caching is everywhere (Examples)
Tommy is accessing a web site through Chrome on his MacBook.
– Web server: It will cache rendering result (for dynamic pages).
– CDN server: It will cache static content.
– Web browser: It will cache web resources*.
– Operating System: It will likely cache file of browser cache in memory.
– Hybrid Drive: It will cache recent access blocks of HDD in SSD.
Web caching, cache hit / cache miss
• Some resources are retrieved frequently when users browse the web
– e.g. images, popular web pages, JavaScript libraries
• Web cache saves such resources (in memory or file system) and
use them to satisfy future requests from clients.
– Cache hit: the requested resource is available in the cache
• return the cached copy as a response Client Web cache Server
GET a.htm
– Cache miss: the requested resource is not available
• forward the request to the web server GET x.htm GET x.htm
• save the response in cache
• return the response to the client
Varieties of web caching
• Web request / response travel through several machines from a client to a server. Web caching is done in several places:
– Web browser(e.g: firefox about:cache ) – Web proxies:
• forward proxy (Cache server)
• reverse proxy
– Web server
Browser Forward proxy Reverse proxy Web server
Web browser cache
• Built-in caching of browsers. It saves cached copies in memory and disk on the client machine.
• Can cache private resources (response with Cache-
Control: private)
Experiment :
*Both chrome://cache and chrome://view-httpcache have been removed sience chrome 66.
*Firefox :about:cache
Web cache consistency
• If the cache server returns such an outdated copy, clients will have inconsistent view of the web site
1. Cache server A saves a copy of a resource
Origin at 8:00
server
2. The resource is modified at origin server at 9:00
3. Cache server B retrieves the resources
and saves a copy at 10:00
1)Ensuring consistency by validation
• Validation: a cache server inquires the origin server whether the cached copy is ‘the same’ as the resource in the origin server
– But validation for every request would be too expensive.
Browser Cache server Origin server
01001011
10101110
Client Cache server Origin server
2)Fresh cached copy GET c.htm GET c.htm
c
– The resource is not likely to change before it expires
• The cache server considers it safe to satisfy a client’s request with a fresh cached copy.
• If a cached copy has expired, it becomes stale.
• It is likely that the resource in the origin server has changed.
– The cache server cannot satisfy a client’s request with a stale cached copy.
Cached copy expires at 14:00
GET c.htm
c
At 13:30, the cached copy is still fresh. So the cache server can return the copy to the client.
GET c.htm
c
c
At 15:00, the cached copy has expired. The cache server cannot use the copy immediately.
Operation of Cache Server
Cache support in HTTP
• “The goal of caching in HTTP/1.1 is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases.” RFC 2616
• Cache-related headers in HTTP
– HTTP/1.1: Cache-Control, Etag, If-None-Matched, Vary
Controlling expiration / freshness
• An origin server should indicate whether its resources can be cached. (cacheability)
• If a resource can be cached, the origin server should indicate the expiration time of a response.
– In case none is provided, the cache server uses some heuristics to estimate.
Browser Cache servers Origin server
Cacheability of resources
• A resource is cacheable if a cache server can save a copy and later use it to satisfy client’s request
• An origin server defines the cacheability in the response header Cache-Control
Private cache (e.g. browser cache) Shared cache
(e.g. cache server)
Cache-Control: nostore
Cache-Control: private
Cache-Control: public
Expiration related headers
• An origin server uses these headers to set the freshness time of a resource.
– HTTP/1.1 headers are not understood by HTTP/1.0 cache.
Header (Response) Meaning
Expires: Time that this page will expire
Cache-Control: max-age=n The response will remain fresh for n seconds
Cache-Control: mustrevalidate The cache server must observe the expiration time set in other headers.
Cache-Control: no-cache The cached response cannot be used without validation
HTTP/1.1
Cache-Control: max-age
• A server can also indicate the maximum time (in sec) that a response remains fresh using max-age.
• Similar to Expires:, but max-age is relative to response time.
This response expires 1 hour later, i.e. at 18 Feb 2009 02:46:57 GMT.
HTTP/1.1 200 OK
Cache-Control: max-age=3600
Last-Modified: Tue, 17 Feb 2009 05:14:05 GMT …
Cache-Control: must-revalidate
HTTP/1.1 200 OK
Cache-Control: max-age=1800, must-revalidate …
• An origin server uses Cache-Control: must-revalidate to force a a cache server to validate before using a stale cached copy. If it cannot successfully validate the copy, the cache server should return the 504 (Gateway Timeout) error.
Validation before each reuse
• An origin server can force a cache server to validate a cached copy for each request using Cache-Control: no-cache. This disallows using the cached copy without validation.
– No-cache doesn’t prohibit the cache server to save the response.
HTTP/1.1 200 OK
Cache-Control: no-cache
Last-Modified: Tue, 17 Feb 2009 05:14:05 GMT …
Example
Cache-Control header When to validate?
Cache-Control: no-cache Must validate before reusing
No need to validate if reuse
Cache-Control: max-age=60 it within 1min of retrieval. SHOULD validate if expired.
No need to validate if reuse
Cache-Control: max-age=60, it within 1min of retrieval.
must-revalidate
MUST validate if expired.
Cache-Control: max-age=0,
What does this mean?
must-revalidate
Exercise
• Check the cacheability of web resources with https://github.com/
• What kinds of resources should have a small / large maxage?
Part. B
DASH
Dynamic Adaptive Streaming over HTTP
Part A.2 DASH
mpd & m4s
http://www-itec.uni-klu.ac.at/ftp/datasets/DASHDataset2014/
practice
1. open “chrome”
2. open the “developer tools” of chrome 3. visit the url: https://allen8101070.github.io/IT
MAN_DASHjs/index.html
4. Observe what happened on the ‘Network’ view of “developer tools”
Testing result
lab 6
• Please finish the lab according to this file
– submit the report of lab 6.
– submit your source code in zip file.
(6.3.zip)
• comments is MUST
lab 6.1 finding a CDN user
• Using curl to Get a resource from web which using CDN to upgrade the accessing speed and balance the traffic load
– How can you tell that this web is using CDN
– Using nslookup/dig to find the ip address the this web sit by your computer
– Ask a friend who is in another province ,ask him/her to practice the same thing(using nslookup/dig to find the ip of the same web site which using CDN, find the ip address of this web sit)
– Record the result in your report
lab 6.2 loading a Dash resource
• Using dash.js to load a dash resource
• Open “Network” view in ’developer tools’ of browse(such as chrome) to observe
– Is there any ‘mpd’ files, What’s its name, what is the description of ‘mpd’ in mime
– Is there any ‘m4s’ files, what’s its related rate, will the files’ ‘rate’ change along with the changing of network condition(especially the bandwidth)
• Reference:
– A html embedded a dash.js which maybe helpful for loading a
‘mpd’ file
• https://allen8101070.github.io/ITMAN_DASHjs/index.html
– A dataset of dash resources
• http://www-itec.uni-klu.ac.at/ftp/datasets/DASHDataset2014/
lab 6.3
• Using multi thread and TCP socket to rewrite the http server which is asked in lab assignment 3.3:
– Based on Assignment 3.3, implement following features:
– Range Header support
• With this feature implemented, user can pause and resume download file from the server.
– Session Cookie support:
• Remember last folder user visited, response with 302 Found if user access root directory.
Example:
Request: GET http://localhost:8080 Response: 302 Found, Location: http://localhost:8080/lastdir
Reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Location
Reviews
There are no reviews yet.