Friday, February 13, 2015

HTTP Fundamentals

Chapter 1 : HTTP Resources


- http://food.com/recipes/sandwich
  • URL scheme: (for e.g. "http://" in above address). Part before ://, it Defines how to access a particular resource. Other schemes are https,ftp and mailto(mailto://abhijeet.nagre@gmail.com). Everythinng after :// is specific to URL scheme.
  • Host: (for e.g. "host.com" in above address). DNS maps the host name to IP Address.
  • URL Path:(for e.g. "recipes/sandwich" in above address)

- http://food.com:80/recipes?type=breakfast
  • Port: (for e.g. "80" in above address). 80 is default for HTTP and 443 is default for HTTPS. This is the port at which Server is listening.
  • Query String: Preceded by "?" (for e.g. "type=breakfast" in above address). It is up to the host application to interpret this.  Mostly used to pass multiple name value pairs separated by &.
  • Fragment: Preceded by #. Not processed by server. Handled by client only. Identifies particular element in Html, which client should focus on.
- Having keywords in URL is a good Search engine optimization.

- URL Safe characters: URL can contain only safe characters.
Following are safe characters which cane appear is an URL.
            * Upper case and Lower case letters (a-z AND A-Z)
            * Numbers (0-9)
            * $_-.+*'(),

- URL Encoding:If URL contains unsafe character, it could be percentage (%) encoded (also know as URL Encoding). I.e. replace the character with "%{numeric code of the character in ASCII}". So ! could be sent as "%21" as 21 is it's ASCII code.

- Content Type of Resource: Server mentions type of the content to client. Thus client knows if the resource is image or video or text or something else. Content Types are specified as per MIME standard

- Content Negotiation: A resource represented by Single URL can have multiple representations. for e.g for multiple languages (for e.g. same recipe in French, English etc.) or in different formats like Html, MS Word or PDF etc. When client makes a request it can specify media types it can accept. Piece of code written in JavaScript can ask for JSON representation of a resource, whereas a piece of code written in C# can ask for representation in XML for the same resource with same URL.


Chapter 2 : HTTP Messages


- HTTP messages come in pairs (Request message and Response message). Information in the message is all in readable text.


- There are tools which give view of HTTP request coming and going from your computer. Fiddler is one such tool. Most browsers also provide such view.

- First line of the HTTP message (both Request and Response) is always explicit about its intent.

- HTTP Methods
 Method  Description
 GET Retrieve a Resource
 POST  Update Resource
 PUT  Store a Resource
 DELETE  Delete a Resource
 HEAD  Retrieve the header for a Resource        

- Even tough HTTP specification mentions above methods. GET and POST are mostly used methods others are rarely used.

- If you are writing a HTTP web service, you might want to use HTTP Put and Delete methods. Be careful, as there are few server side technologies and pieces of hardware who do not support these messages.

- POST method is used when browser needs to send some information to Server.

- There is part of HTTP specification which describes Safe and Unsafe methods.
- Safe methods let you read a resource from server. This method doesn't modify resources in Server. GET and HEAD fall in these category. Get operation should never have a side effect on Web Server.
- Unsafe methods are the ones which let you change resources on web server.
- If a Html page has Form and contents of the form are already Posted; then Browser will give warning if user Refreshes that page. PUT,POST and DELETE fall in this category.

- Post-Redirect-Get Pattern (aka PGT): When a Post request is processed by Server, the request is redirected to another Page (with HTTP Get command).

- FORM GET vs FORM POST
A Form is generally used with POST method. But it can be used with GET method as well. For e.g.
     
<form action="results.cshtml">
    <input name="q" placeholder="search" type="search" />
    <input type="submit" />
</form>

When above form is submitted a GET message is sent to Server at URL path results.cshtml. The value of input search will be appended to URL. URL will have query string with name q (input box's name). Whereas when contents of Form are sent to Server using POST message, values entered in the input boxes are sent as message and not as Query String.

- HTTP Status Code Categories


Method Description
100-199 Informational
200-299 Successful
300-399 Redirection
400-499 Client Error
500-599 Server Error

- Fiddler (fiddler2.com)



Chapter 3 : HTTP Connections


- Browser implement HTTP protocol. i.e. Browsers act as HTTP initiating agent and sends HTTP messages using (mostly) TCP.


- TCP: Does Flow control i.e. ensures that sender doesn't send messages too fast for the receiver to process them.

- WireShark : Could be used to do deeper analysis than what Fiddler does. Fiddler shows HTTP messages exchanged between browser and Web Server. WireShark even shows TCP handshakes,shows messages transferred between all the subsequent layers involved in the HTTP message transfer i.e. TCP, IP and Data Link layer.

- If a Web server doesn't allow persistent connections then it must include a header "Connection-Close" in Response. Shared Hosts would generally do this.



Chapter 4 : HTTP Architecture


- URL doesn't mention which HTTP method (GET,POST etc..) is to be used.


- All the information required to complete HTTP transaction is contained in HTTP messages.

- HTTP Proxy: 

Proxy server can
            * Act as Access Control device. e.g. Filter all traffic going to Facebook.com.
            * Strip out confidential data out of HTTP messages.
            * Create Audit Trail on Traffic          

- Forward Proxy: is closer to client than to Web Server.
            * Forward Proxy requires some configuration in client software or Browser.
            * Forward Proxy provides service to some limited set of users. e.g employees of company or users of ISP.

- Reverse Proxy: is closer to Server than to client
            * Completely transparent to client.
            * All the requests coming to the Web Server are coming though Reverse Proxy.
            * Proxy server can reduce load on the Web Server by providing services like Compression, HTTP message logging etc.

- Services provided by Web Proxy Server
            * Load balancing. Some Proxy Servers can look at how much CPU and Memory a server is using and distribute load based on that.
            * SSL acceleration: Encrypt and de-crypt HTTP messages
            * Security: Filter out dangerous HTTP Messages.
            * Caching Proxies: Cache HTTP Response Messages

- Fiddler works by installing itself as Proxy on the machine. Thus it can intercept all HTTP traffic.


- HTTP Headers for Caching 
            * Cache-Control:
            * Expires: Deprecated in HTTP 1.1 but used for backward compatibility
            * Pragma: Deprecated in HTTP 1.1 but used for backward compatibility

- Values for HTTP Cache-Control Header

            * Public: A response for everyone
            * Private: A response for specific user.
            * no-cache: Don't cache
            * no-store: You never saw this response (i.e. Delete message immediately)

- Caching
            * Public Cache: Shared among multiple users. Generally 
                  resides on Proxy Server
            * Private Cache: Web browsers cache HTTP messages marked
                  as Private Cache on users disk. 
                  Internet Explorer stores caches at "Windows\Temporary Internet Files" 
                  location.
                  Chrome's cache files could be found at "chrome://cache/"
            * HTTP GET message is safe message so could be considered for Caching
                  PUT,POST and DELETE are unsafe messages so are not considered for Caching.

- ASP.Net Cache Control Headers
            * Response.Cache.SetCacheability(..)
            * Response.Cache.SetExpires(..)

- Client sets "Last-Modified" header in HTTP messages to let Server know if the resource has changed since that. If resource has not changed, server sends HTTP response message with Status code 304. Which means client could use cached copy.



Chapter 5 : HTTP Security

- Some Load balancer's support Sticky Sessions. i.e. HTTP Requests belonging to a session are sent to same server.

- Stateless HTTP enables State management by using Cookies.

- Cookies: Server sends state information to browser using Set-Cookie header. Subsequent requests made by the browser contains this Cookie. 

- Session Cookie vs Persistent Cookie: Session cookie is discarded when Browser is closed whereas Persistent cookies are not discarded when Browser is closed. Persistent Cookie needs to have an Expires value.


- HTTP follows a Challenge Response format for authentication. When client asks for a secure resource, server returns a 401-Unauthorized response, Response also mentions which authentication protocol is used for Authentication. Client then asks credentials(username and password) to user and sends another request to server with credentials (Credentials are sent using WWW-Authentication header. All subsequent requests have WWW-Authentication header, which contains credentials.

- HTTP doesn't dictate how credentials are validated by Server.

- HTTP specification mentions two authentication protocols i.e. Basic and Digest.

- Basic Authentication: Sends username and password to server as Base 64 encoded string (via Authorization header). Thus this is very unsafe and rarely used.

- Digest Authentication: This is similar to Basic Authentication except client doesn't send plain text username and passwords to server. Client applies MD5 hash on username and password and sends result to server. Thus it is not possible for a sniffer to know username and password.

- Forms Authentication: Application has complete control over how authentication is managed.When client requests for a secure resource, Server redirects it to login page using HTTP 302 temporary Redirect. Login page lets user enter credentials, which are POSTed to server.  The response will also set a cookie indicating user is authenticated.

- In Forms Authentication Credentials are sent in plain text, So it is necessary to use HTTPS or Secure HTTP.

- Secure HTTP and HTTPS are same protocols also know as SSL or TLS(Transport Layer Security). Encrypts HTTP messages before they are sent. Uses https scheme in URL instead of regular http scheme. Default port for HTTP scheme is 80 and default port for HTTPS is 443.


- HTTPS adds a layer inbetween Application Layer and Transport Layer. HTTPS requires server to have a cryptographic certificate.This certificate is sent to client during setup of HTTPS connection. Certificate includes Server's host name. Certificates are provided by providers like Verisign. Certificates use public private keys. Administrators have to purchase certificates from certificate providers and install them on server.

- HTTPS encrypts HTTP messages except host name. Everything else (URL path,Cookies,headers, Body) is encrypted. Avoids Session hijacking as no eavesdropper can hijack Session cookie. Client can use certificate to validate(authenticate) host. HTTPS does not authenticate client.So some Authentication mechanism is required to authenticate client. HTTPS makes Client Authentication protocol secure as it encrypts username,password and authentication cookies in the Http messages. Clients cans authenticate by using Client side certificate, but this is rarely used.

- HTTPS Downsides: 
           * Performance: Large sites use specialized hardware called SSL accelerators.
           * Performance Connection setup is longer as additional hand shakes 
                  are required.
           * Can not be used as Public Cache however clients can cache them as 
                 private cache.

- OpenID: is a standard for decentralized authentication. Users do not have to create multiple passwords for various web sites. Also every web site doesn't have to manage authentication, it can delegate authentication management to identity provider. Identity provider stores and validates Identity.



Follow up 
Architectural Styles and the Design of Network-based Software Architectures
- Sticky Sessions

No comments:

Post a Comment