A program to answer HTTP requests
Listen on a server port (80 by default)
Accept GET/HEAD/POST request
Map resource name (URL) to a local resource
Retrieve local resource and send it back to client
http://online.mq.edu.au/pub/COMP249/lectureschedule.html
Resource name: /pub/COMP249/lectureschedule.html
Mapped to a local file system:
/home/httpd/html/pub/COMP249/lectureschedule.html C:\Web\httpd\html\pub\COMP249\lectureschedule.html
http://online.mq.edu.au/pub/COMP249/
Resource name: /pub/COMP249
Server must look for a default name in the given directory: index.html, index.htm, etc.
Settings are dependant on server configuration
http://www.ics.mq.edu.au/~cassidy/
Resource name: /~cassidy/
Refers to the personal directory of a user
Look in user's home directory for a give subdirectory: html (in ICS), public_html (also common).
Permissions:
Server runs as an untrusted user
Needs to be able to read and perhaps execute files in your html directory.
http://www.smh.com.au/articles/2005/03/13/1110649055094.html http://slashdot.org/article.pl?sid=05/03/13/1853233& tid=133&tid=186&tid=159
Server is free to find a resource any way it chooses
This includes finding it in a database or running a program to generate it.
In the SMH case the stories are likely to be stored in a database and served as needed, other content is added on the fly.
The Slashdot URL refers to a Perl script which will be run to generate the content. The remaining text is GET encoded form variables.
http://ad.doubleclick.net/click;h=v2|30d0|0|0|%2a|l ;7516609;0-0;0;8856706;3454-728|90;4719404|4737300|1; ;%3fhttp://www.sun.com/emrkt/sunfirev20z/ http://ad.au.doubleclick.net/click%3Bh=v5|33ae|3|0|%2a |h%3B27111491%3B0-0%3B0%3B12619400%3B1-468|60%3B14797496 |14815392|1%3B%3B%7Esscs%3D%3fhttp://www.energy.com.au/onit
Note that these are folded onto multiple lines for display purposes. Note the use of escape codes like %3B to include characters in the URL that aren't allowed.
Some MIME types:
text/html, image/jpg, audio/mp3, application/xml, application/xhtml+xml, text/plain, application/cybercash, video/mp4, text/x-vcard, text/css, multipart/digest, chemical/x-genbank, video/quicktime, application/pdf
GET /~cassidy/ HTTP/1.1
Host: www.ics.mq.edu.au
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: UserTrack=63B08C38-1234-0000-0000-00000000000000;
Note lines folded for display.
What do each of these headers mean? Which are required? Many are defined in the HTTP standard but others can be defined via the HTTP extension framework.
HTTP/1.x 200 OK Date: Mon, 20 Mar 2006 05:33:32 GMT Server: Apache/2.0 Accept-Ranges: bytes Content-Length: 4111 Keep-Alive: timeout=15, max=499 Connection: Keep-Alive Content-Type: text/html Content-Language: en
POST /~steve/form.html HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,
text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/~steve/form.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 106
name=Steve+Cassidy&interests=This+is+a+field+with%0D%0Aquite+a+bit+
of+text%0D%0Athat+has+linebreaks.%0D%0A
Note lines folded for display.
This is a POST request, note how the data is encoded in the request body.
GET /~steve/form.html?name=Steve+Cassidy&interests=This+is+a+field+
with%0D%0Aquite+a+bit+of+text%0D%0Athat+has+linebreaks.%0D%0A HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12)
Gecko/20050922 Firefox/1.0.7 (Ubuntu package 1.0.7)
Accept: text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/~steve/form.html
If-Modified-Since: Mon, 20 Mar 2006 06:22:29 GMT
If-None-Match: "4f42a9-fd-40f672edb1340"
Note lines folded for display.
This is the same form submitted via a GET request, here the data is encoded in request URL. Note also the If-Modified-Since header in this request, sent because my browser has just asked for the same resource.
GET /~steve/ HTTP/1.1 Host: www.shlrc.mq.edu.au HTTP/1.x 301 Moved Permanently Date: Mon, 20 Mar 2006 06:32:36 GMT Server: Apache/2.0.46 (Red Hat) Location: http://www.ics.mq.edu.au/~cassidy/ Content-Length: 242 Connection: close Content-Type: text/html; charset=iso-8859-1
Alternately
<meta http-equiv="refresh"
content="URL=http://my.new.site.com/">
The HTTP redirect is a server response that can be used to indicate that a resource has moved to a new location. An alternate is to include the above meta tag in a page header to force a redirect from the current page.
Web servers receive information in request headers
This can be logged for later analysis
See the Platypus logs
Tools can analyse the logs to generate reports eg: analog, Google Analytics
15:HOST = '' # Symbolic name meaning the local host
16:PORT = 50004 # Arbitrary non-privileged port
17:s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
18:s.bind((HOST, PORT))
19:s.listen(1)
20:
21:
22:conn, addr = s.accept()
23:data = conn.recv(4096)
24:words = data.split()
25:
26:if len(words) > 0 and words[0] == "GET": 27: page = """<html> 28:<head><title>Hello</title></head> 29:<body><p>Your request was:</p> 30:<pre>""" + 31:data + """ 32:</body> 33:</html> 34: 35:""" 36: 37: header = """HTTP/1.0 200 ok 38:Content-length: """ + str(len(page)) + """ 39:Content-type: text/html 40: 41:""" 42:else: 43: header = "HTTP/1.0 440 Page Not Found\n\n" 44: page = "" 45: 46:print header+page 47:conn.send(header+page)Download the full script
8:import BaseHTTPServer
9:import CGIHTTPServer
10:
11:server_address = ('', 8000)
12:handler = CGIHTTPServer.CGIHTTPRequestHandler
13:handler.cgi_directories = ['/cgi-bin']
14:httpd = BaseHTTPServer.HTTPServer(server_address, handler)
15:
16:print "Starting server. Connect to http://localhost:8000/"
17:
18:httpd.serve_forever()
19:
20: