Skip to main content

How Security may Impact a Web App Performance: A trek into HTTP Caching

In this post I will go over some interesting HTTP topics: HTTP Caching Headers, SSL, Fiddler, and how each of them can have impact over your web application performance (yes, just running Fiddler on your machine can effect your application's caching behavior!). 
The main purposes of this post are both, helping out other developers who encountered issues covered in the post, and to stimulate awareness about security and HTTP caching relationship.
For .NET Stack developers, don't be disappointed when I will mention Spring MVC, it is just my current REST provider (long live IIS!), the essence of the post should be interesting regardless.
NOTE:
When mentioning 'browser' I am referring by default to Chrome version 51.0.2704.84.

Basic HTTP Caching Headers Rules

I won't fully cover basic HTTP Caching protocol subjects like Etag/Last-Modified, no-store, no-cache etc. But before staring let's have a very quick reminder about basics HTTP Caching concepts:
  • no-cache - must revalidate against the server before using the local cached version
  • max-age- should revalidate after validity period expires (the client can perform some heuristics in order to enforce some optimized behavior)
  • must-revalidate - must revalidate validity period

The descriptions above leads to the following logic conclusions:
  • no-cache = must-revalidate + max-age =0 
  • no-cache & max-age=0 doesn't make any sense

NOTE:
'no-cache' is (or may be) interpreted by some browsers as an instruction to not cache any resource in the local browser, practically interpreted identically to 'no-store'.
must-revalidate & max-age=0 can be a good alternative to enforce the requested behavior for all browsers.

Etag & Last-Modified and Cache-Control

Suppose you want to set your HTTP Caching Headers in a way that will cause the browser to re-validate your static resources (e.g. images,js,html files etc) against the server to make sure that you always get the latest resource version of the file.
The are several techniques to accomplish this task but the simplest way is by instructing the server to calculate the last modified date of the file or some other validation strategy string and send the results as Etag/Last-Modified HTTP headers in order for the browser to re-send these headers to the server on each further request for validation purposes.
In case the resource was not altered since last time the server served the requested resource, the server will respond with a 304 HTTP Code, instructing the browser to use the resource version from its local cache, otherwise a 200 Code with the fresh resource will be served to the client.  (At this stage I am ignoring performance issue of redundant roundtrips to the server this solution may cause).
When using this popular strategy we must be very cautious about additional headers, which at first sights may seem unrelated,  we send or not to the browser together with Etag/Last-Modified.
For example not sending any instruction to the client about how to handle the local cache, in other words, not sending Cache-Control header at all can cause some unexpected browser behavior.
Actually the browser in this case will behave differently based on the specific way the user triggers the page request. There are several options:
  1. Clicking F5 - In this case the browser will execute a round-trip to the server including the Etag/Last-Modified values received by the server, as expected.

  2. Clicking CTRL+F5 - In this case the browser will execute a clean round-trip to the server excluding the Etag/Last-Modified values received by the server, causing the server to server a brand new resource version, as expected.

  3. Clicking on links on the page or clicking on the address bar and ENTER - In this case the browser won't execute any HTTP request, instead the local browser cached content is used.

    This behavior, although not contradicting the W3 specification (which allows browser heuristics in this case) is strange as the normal way users browse within a web application is usually by clicking links and menus exposed by the application, this means that the resources re-validation is completely skipped, probably not what we expect:

  4. If none of Expires, Cache-Control: max-age, or Cache-Control: s- maxage (see section 14.9.3) appears in the response, and the response does not include other restrictions on caching, the cache MAY compute a freshness lifetime using a heuristic.
  5. Clicking the Back button  - I will cover the Back button later on. 

Another implication to pay attention to, is the combination of Etag/Last-Modified headers and the 'no-store' tag of the Cache-Control header.
'no-store' instruct the browser to never cache for any reason the resource , in this ambiguous scenario (Etag/Last-Modified has no mean without previously caching the content locally) the browser behaves precociously by respecting the 'no-store' tag over the Etag/Last-Modified, as result Etag/Last-Modified headers are ignored and a brand new request is executed by the browser.

Spring MVC - Secured and Non Secured Resources

When using Spring MVC as REST provider we need to make sure our Spring Security configuration doesn't influence the HTTP Caching headers sent by the server.
Resources flagged as NON-SECURE e.g.  <http pattern="/styles/**" security="none"/> (client can access those resources without performing any authentication process) via the security file configuration are served by default with Etag/Last-Modified headers but without any caching (Cache-Control) instruction, causing the behavior mentioned above which differs based on the exact way the user triggered the request.
On the other hand resources not flagged as NON-SECURE, meaning SECURED resources, are delivered by Spring by default with Etag/Last-Modified headers in conjunction with the 'no-store' tag. As explained above the browser will ignore Etag/Last-Modified headers and prevent any local caching.
Spring strategy to add 'no-store' tag on secured resources is understandable from a security point of view, trying to prevent sensitive data leaks, but then we would expect the Etag/Last-Modified headers to be emitted as there is no point to ask the server whether to take the local cached version of the resource if the browser doesn't have anything saved.
Any way Both options seems to miss our goal (remember, we want to trigger an  Etag/Last-Modified request and re-validate our resources against the server).
In ordeer to solve this task we need to programmatically set our HTTP Headers to meet your needs. I won't cover the technical way to do that via Spring Filters (the Internet is flooded of samples), but my final result was the following: Cache-Control 'no-cache' (a shorthand of  'max-age=0, must-revalidate') and some Etag/Last-Modified calculated values.
Now the browser is instructed to invalidate immediately the cache and enforced to re-validate immediately BEFORE using its locally cached version.
The important thing to remember when using Etag/Last-Modified is to emit 'no-store' which explicitly prevents any client side caching, and add 'Cache-Control' instructions to the browser in order to prevent some unexpected behaviors.

HTTP vs HTTPS

The solution above worked fine for me when using a non secured HTTP connection to my development server, when I tried to use it with a secured SSL connection I figured out that the browser behavior was not consistent. Some times the Etag/Last-Modified headers were sent to the server as expected but some times the browser emitted those headers causing a new request to the server and download a fresh version of the resource. This is not the behavior we expected.

Self-Signed Certificate

While trying to digging about some explanation about this issue I encountered this and this chromium blog which states that when the browser encounters a certificate exception the browser will never save any resource on its cache. That means that using Self-Signed Certificate will prevent saving any resource on the browser cache because self-signed certificates, by definition, cause at least an Untrusted exception (unless we explicitly move the certificate into the trusted section).
I must say that I like this way of thinking and the way chromium guys try to protect our resources when the server is not fully trusted, what I don't like is their inconsistency.
For example, the behavior explained above when Clicking on links on the page or clicking on the address bar and ENTER which leads to a local resource caching, was spotted even when using HTTPS self-signed Certificate. Let's see another example:

First fresh request:
GET /mysource-ss.js HTTP/1.1
Host: 1.1.1.1:8083
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36
Accept: */*
Referer: https://1.1.1.1 :8083/some-page.html
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8,he;q=0.6

First server response:
HTTP/1.1 200 OK
Cache-Control: max-age=315360000
Last-Modified: Sun, 06 Nov 2005 12:00:00 GMT
ETag: 2740050219
Expires: Fri, 12 Jun 2026 18:20:37 GMT
Content-Encoding: gzip
Content-Type: text/javascript;charset=UTF-8
Transfer-Encoding: chunked
Date: Sun, 12 Jun 2016 18:20:37 GMT

Second Request:
The resource should have been retrieved from the local browser's cache, but instead a brand new request is executed by the browser.

GET /mysource-ss.js HTTP/1.1
Host: 1.1.1.1:8083
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36
Accept: */*
Referer: https://1.1.1.1 :8083/some-page.html
Accept-Encoding: gzip, deflate, sdch, br
Accept-Language: en-US,en;q=0.8,he;q=0.6

But sometimes after several refreshing, the browser do serves the resource directly from its cache. You must believe about this one!
After deploying the certificate in to the Trusted store the browser consistently obied the max-age instruction by serving the resource from its local cache.
Apparently  we need to modify chromium guys statement: When the browser encounters a certificate exception the browser may not save any resource on its cache depending on some heuristics.
Another interesting browser behavior I noticed is that even thought, as explained, sometimes the resources are cached locally as opposed to the formal specification,  when using a non valid certificate the browser never caches large files, decreasing dramatically the page size response and as result the entire application performance. 
I am sure that Chrome developers had good reasons to handle non valid certificate like self-signed certificate communication as they actually do, but I was not able to figure out their heuristics, and at the end of the day I was not able to design my HTTP caching mechanism to consistently meet our sample's needs when using self-signed certificate.

Never go Back

The following W3 specification dictates the Back button behavior:
User agents often have history mechanisms, such as "Back" buttons and history lists, which can be used to redisplay an entity retrieved earlier in a session.
History mechanisms and caches are different. In particular history mechanisms SHOULD NOT try to show a semantically transparent view of the current state of a resource. Rather, a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved.
By default, an expiration time does not apply to history mechanisms. If the entity is still in storage, a history mechanism SHOULD display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.
In simple words when user clicks the Back button the browser should fetch its local cached version even if the resource's caching date is already expired, unless the resource couldn't be found in the storage.
As you probably guess the browser consistently respect this behavior when working with HTTP or with a valid HTTPS certificate (and for js files, see 'Not all resources were born equally' section below), otherwise, for example when using a SSL self-signed certificate the browser will again behave inconsistently retrieving some files from the local cache while others from the server.
As stated the specification emphasize that this is how the browser SHOULD behave, meaning that the browser may behave differently.
I tried to make an educated guess about this browser's behavior without any good result.

Not all resources were born equally

Using a valid plain HTTP or a valid SSL certificate and clicking the Back button should cause the browser getting resource from the local cache, as described above. But this is true when talking about JS files, for images of all kind (png, gif, jpg) the browser executes a HTTP round-trip to the server and if previously supplied by the server, Etga/Last-Modified will be added to the request. That means that although the browser keeps locally a cached version of the images it performs a round trip to the server, violating W3 specification: 
History mechanisms and caches are different. In particular history mechanisms SHOULD NOT try to show a semantically transparent view of the current state of a resource. Rather, a history mechanism is meant to show exactly what the user saw at the time when the resource was retrieved.

Fiddler on the Field Trail

As you probably know when using fiddler is set to sniff our HTTP communication, the browser is enforced to use fiddler as a proxy server.
In addition, when allowing fiddler to sniff SSL messages fiddler sends its Trusted certificate to the browser, which from its point of view is a legitimate Trusted certificate.
Following our last discussion above this impacts browser behavior by fooling you that the browser saves the resource locally and therefore Etag/Last-Modified are sent to the server as expected. But when real user will use the web application the browser will again change its behavior as described.

Summary - Let's get things together

  • We covered a HTTP Headers malconfiguration related to Spring for secured and non-secured files, in which both scenarios didn't made too much sense.When using Etag/Last-Modified we must emmit the 'no-store' field, it is a contradiction. Not adding Cache-Control at all while adding Etag/Last-Modified makes the browser behave differently depending on the way we refresh our resources, as described in th W3 specification.

  • Google developer's statement about the browser not saving, at any circumstances, resources on the local cache when using a non valid certificate like a self-signed certificates, doesn't match my test's results. Although we saw impacts on Chrome caching mechanism behavior when using non valid certificate, the observed results were not consistent. My personal recommendation is to not relay on local browser caching when using non valid certificate. 

  • Using Fiddler cause your browser to communicate with a proxy with a Signed Certificate, as result browser behavior changes to a regular and consistent behavior, similar to when using non secured HTTP communication. Obviously, explicitly adding your self-signed Certificate to the Trusted section of your Certificates store will reflect the same consistent results. I suggest using other tools for HTTP sniffing when you plan to deploy your web application with a self-signed certificate or to manually move Fiddler's certificate from the Trusted section. 

  • Take in consideration that these esoteric (but important) behaviors may change between different browsers and versions, but the main point to take from this post is to be aware about the relationship between security and caching were the former can impact the latest.

Comments

The Best

Closures in C# vs JavaScript -
Same But Different

Closure in a Nutshell Closures are a Software phenomenon which exist in several languages, in which methods declared inside other methods (nested methods), capture variables declared inside the outer methods. This behavior makes captured variables available even after the outer method's scope has vanished.

The following pseudo-code demonstrates the simplest sample:
Main() //* Program starts from here { Closures(); } AgeCalculator() { int myAge = 30; return() => { //* Returns the correct answer although AgeCalculator method Scope should have ordinarily disappear return myAge++; }; } Closures() { Func ageCalculator = AgeCalculator(); //* At this point AgeCalculator scopeid cleared, but the captured values keeps to live Log(ageCalculator()); //* Result: 30 Log(ageCalculator()); //* Result: 31 } JavaScript and C# are two languages that support…

Formatting Data with IFormatProvider & ICustomFormatter

This post provides an internal overview of IFormatProvider & ICustomFormatter interfaces, and they way they are handled by .NET.

IFormatProvider is a .NET Framework Interface that should be used, by implementing its single public object GetFormat(Type) method, when there is a need to implement custom formatting of data like String and DateTime.

The public object GetFormat(Type) method simply returns an object that in turns is available to supply all available information to continue the formatting process. The Type passed in by the Framework is meant to give the implementor a way to decide which type to return back. Its like a Factory Method Design Pattern where the "formatType" is the type expected to be returned.
class MyProvider : IFormatProvider { public object GetFormat(Type formatType) { object result = null; //* Factory Method if (formatType == typeof( ICustomFormatter)) //* Some object, will be disc…

Design API for Multiple Different Clients

Today I want to talk about common design challenges related to architecture of robust APIs, designed to be consumed by multiple clients with different needs.

Our use case is the following: We need to build a N-Tier Web REST/SOAP API that is supposed to read/write data from a DB, perform some processing on that data and expose those methods to our API consumers.

In addition we have multiple different API clients each with different needs, meaning we can't just expose a rigid set of functions with a defined group of DTOs (Data Transfer Objects).
DTO vs POCO Before start diving I want to explain shortly the difference between these two controversial concepts.
DTO Objects that are designed to transfer data between edges (i.e. between processes, functions, server & clients etc'). Typically DTOs will contain only simple properties with no behavior.
POCO Objects that are designed to reflect the internal business data model. For example if you have an eCommerce platform you will…