COMP249 Week 4

Steve Cassidy

Web Servers

  • A program to answer HTTP requests

    • Listen on a server port (80 by default)

    • Accept GET/HEAD/POST request

    • Map resource name (URL) to a local resource

    • Retrieve local resource and send it back to client

Embedded web servers

Source, see also Wikipedia

Mapping Resource Names

http://online.mq.edu.au/pub/COMP249/lectureschedule.html
  • Resource name: /pub/COMP249/lectureschedule.html

  • Mapped to a local file system:

    /home/httpd/html/pub/COMP249/lectureschedule.html
    C:\Web\httpd\html\pub\COMP249\lectureschedule.html
    
  • Base directory on server is the server root
  • Often have more than one server root per server

Mapping Resource Names

http://online.mq.edu.au/pub/COMP249/
  • Resource name: /pub/COMP249

  • Server must look for a default name in the given directory: index.html, index.htm, etc.

  • Settings are dependant on server configuration

Mapping Resource Names

http://www.ics.mq.edu.au/~cassidy/
  • Resource name: /~cassidy/

  • Refers to the personal directory of a user

  • Look in user's home directory for a give subdirectory: html (in ICS), public_html (also common).

  • Permissions:

    • Server runs as an untrusted user

    • Needs to be able to read and perhaps execute files in your html directory.

Generating Resources

http://www.smh.com.au/articles/2005/03/13/1110649055094.html

http://slashdot.org/article.pl?sid=05/03/13/1853233&
   tid=133&tid=186&tid=159
  • Server is free to find a resource any way it chooses

  • This includes finding it in a database or running a program to generate it.

  • In the SMH case the stories are likely to be stored in a database and served as needed, other content is added on the fly.

  • The Slashdot URL refers to a Perl script which will be run to generate the content. The remaining text is GET encoded form variables.

Complicated URLs

http://ad.doubleclick.net/click;h=v2|30d0|0|0|%2a|l
;7516609;0-0;0;8856706;3454-728|90;4719404|4737300|1;
;%3fhttp://www.sun.com/emrkt/sunfirev20z/

http://ad.au.doubleclick.net/click%3Bh=v5|33ae|3|0|%2a
|h%3B27111491%3B0-0%3B0%3B12619400%3B1-468|60%3B14797496
|14815392|1%3B%3B%7Esscs%3D%3fhttp://www.energy.com.au/onit

Note that these are folded onto multiple lines for display purposes. Note the use of escape codes like %3B to include characters in the URL that aren't allowed.

MIME Types

  • Problem: how does a client know what kind of data it's getting?
    1. Look at the file extension on the URL
    2. Look at the contents of the returned data
    3. Rely on the server to tell it.
  • Answer: Rely on the server:
    • Content-Type HTTP header
    • Eg. Content-Type: text/html

MIME Types

Some MIME types:

text/html, image/jpg, audio/mp3, application/xml, application/xhtml+xml, text/plain, application/cybercash, video/mp4, text/x-vcard, text/css, multipart/digest, chemical/x-genbank, video/quicktime, application/pdf

The HTTP Protocol

  • Requires: a connection between client and server
  • Stateless: no login process, each request is independant
  • Simple format: request header, blank line, possible payload
  • Symmetrical: allows data to be sent and recieved
  • Very easy to implement but scales very well

Example HTTP Request

GET /~cassidy/ HTTP/1.1
Host: www.ics.mq.edu.au
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
      Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,
      text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: UserTrack=63B08C38-1234-0000-0000-00000000000000; 
	    

Note lines folded for display.

What do each of these headers mean? Which are required? Many are defined in the HTTP standard but others can be defined via the HTTP extension framework.

Example HTTP Response

HTTP/1.x 200 OK
Date: Mon, 20 Mar 2006 05:33:32 GMT
Server: Apache/2.0
Accept-Ranges: bytes
Content-Length: 4111
Keep-Alive: timeout=15, max=499
Connection: Keep-Alive
Content-Type: text/html
Content-Language: en
	    

Example HTTP POST Request

POST /~steve/form.html HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
      Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,
      text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/~steve/form.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 106

name=Steve+Cassidy&interests=This+is+a+field+with%0D%0Aquite+a+bit+
    of+text%0D%0Athat+has+linebreaks.%0D%0A
	    

Note lines folded for display.

This is a POST request, note how the data is encoded in the request body.

Example HTTP GET Request

GET /~steve/form.html?name=Steve+Cassidy&interests=This+is+a+field+
     with%0D%0Aquite+a+bit+of+text%0D%0Athat+has+linebreaks.%0D%0A HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
     Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,
     text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/~steve/form.html
If-Modified-Since: Mon, 20 Mar 2006 06:22:29 GMT
If-None-Match: "4f42a9-fd-40f672edb1340"
	    

Note lines folded for display.

This is the same form submitted via a GET request, here the data is encoded in request URL. Note also the If-Modified-Since header in this request, sent because my browser has just asked for the same resource.

HTTP Redirect

GET /~steve/ HTTP/1.1
Host: www.shlrc.mq.edu.au

HTTP/1.x 301 Moved Permanently
Date: Mon, 20 Mar 2006 06:32:36 GMT
Server: Apache/2.0.46 (Red Hat)
Location: http://www.ics.mq.edu.au/~cassidy/
Content-Length: 242
Connection: close
Content-Type: text/html; charset=iso-8859-1
	    

Alternately

<meta http-equiv="refresh" 
      content="URL=http://my.new.site.com/">
	    

The HTTP redirect is a server response that can be used to indicate that a resource has moved to a new location. An alternate is to include the above meta tag in a page header to force a redirect from the current page.

HTTP Verbs

  • GET - get a resource, Idempotent
  • POST - send some data to a resource
  • HEAD - get headers for a resource
  • PUT - create a new resource
  • DELETE - delete a resource

Getting on the Web

  • Dedicated hosting (colo)
  • VM hosting
  • Shared hosting
  • Application hosting
  • New players:

Web Servers

Apache

Server Logs

  • Web servers receive information in request headers

  • This can be logged for later analysis

  • See the Platypus logs

  • Tools can analyse the logs to generate reports eg: analog, Google Analytics

Write Your Own Webserver

15:HOST = ''                 # Symbolic name meaning the local host
16:PORT = 50004              # Arbitrary non-privileged port
17:s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
18:s.bind((HOST, PORT))
19:s.listen(1)
20:
21:
22:conn, addr = s.accept()
23:data = conn.recv(4096)
24:words = data.split()
25:

...continued

26:if len(words) > 0 and words[0] == "GET":
27:    page = """<html>
28:<head><title>Hello</title></head>
29:<body><p>Your request was:</p>
30:<pre>""" + 
31:data + """
32:</body>
33:</html>
34:
35:"""
36:
37:    header = """HTTP/1.0  200 ok
38:Content-length: """ + str(len(page)) + """
39:Content-type: text/html
40:
41:"""
42:else:
43:    header = "HTTP/1.0  440 Page Not Found\n\n"
44:    page = ""
45:
46:print header+page
47:conn.send(header+page)
Download the full script

Even Better...use Python Modules

 8:import BaseHTTPServer
 9:import CGIHTTPServer
10:
11:server_address = ('', 8000)
12:handler = CGIHTTPServer.CGIHTTPRequestHandler
13:handler.cgi_directories = ['/cgi-bin']
14:httpd = BaseHTTPServer.HTTPServer(server_address, handler)
15:
16:print "Starting server. Connect to http://localhost:8000/"
17:
18:httpd.serve_forever()
19:
20: