Time to Euthanize “Pragma: no-cache”

Modern Cache Directives

As part of each website security vulnerability assessment performed, the security researcher will check that proper caching directives are implemented. In situations where the most extreme “never cache this data” is required, the gold standard HTTP headers recommended by infosec professionals everywhere is:

Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT

Cache-Control: no-cache, no-store, max-age=0, must-revalidate

Pragma: no-cache

Expires: Thu, 01 Jan 1970 00:00:00 GMT

This advice is problematic and here in this blog post, I will present an argument of why we need to modify it. Infosec professionals must stop recommending the use of the Pragma header due to it being obsolete and dangerous.

Pragmas in General

Definition: pragma – a directive outside of the specification that informs a device to do something special if the device recognizes the directive. TLDR; a pragma is a non-standard extension to whatever.

Before the web, pragmas were most commonly used in programming as a means to inform the compiler to do something ¹. In C, one might add #pragma once to tells the compiler to only #include a certain file once during compilation. But, as pragrams are non standardized, if the compiler did not understand “once”, then the code would probably not compile (rabbit hole into include guards). Another is #pragma optimization_level {value} used by the GCC and Intel compilers. For Intel, the valid values are 0,1,2,3; but the value for GCC is 0,1,2,3,reset. If the Intel compiler sees the “reset” for this pragma, what does it do; ignore the pragma completely, raise a warning, abort with an error? In addition, what do the 0-3 integer values mean? Intel manufactures compilers that are highly optimized for their own CPUs whereas GCC’s compilers are optimized to be used across many different architectures, from SPARC to ARM. As such, level 3 on the Intel will produce much more optimized code than level 3 on GCC². To make matter even more confusing, GCC level 3 might produce more optimized code on a common CPU versus an obscure or legacy CPU.

As you can see, the same pragma might produce very different results.

So, why do we use this undefined non-standardized thing for HTTP caching? Well, at one time, there was no alternative.

Web Caching History

HTTP/0.9 was first documented (officially) in 1991 with the Pragma header as part of the specification in this historical document. Only one pragma was defined at the time, “no-cache”. Here is the official definition:

Pragma directives should be understood by servers to which they are relevant, e.g. a proxy server; currently only one pragma is defined:

no-cache – When present the proxy should not return a document from the cache even though it has not expired, but it should always request the document from the actual server.

Pragmas should be passed through by proxies even though they might have significance to the proxy itself. This is necessary in cases when the request has to go through many proxies, and the pragma should affect all of them.

Wait a sec; did you see any reference to “browser” or “client” or “user-agent” in that definition? There was discussion of servers and proxy servers, but nothing about the end user. Does this mean that this non-standard “no-cache” extension was implemented in a non-standard manner by browsers for caching? Is Pragma: no-cache officially defined as a header for a proxy server to consume and not the browser? Good question.

As standards committees take time to weigh discussions and commentary before they publish their work, the final HTTP/1.0 specification was published in May 1996. However, discussion on features for HTTP/1.1 was already in progress and the first draft of HTTP/1.1 was published 11 November 1995, before the official release of HTTP/1.0. Even before the official release of HTTP/1.0, a large number of browsers already supported many HTTP/1.1 features, including, you guessed it, Cache-Control.

So, the State of the Art™ browser in 1996 was already understanding that sometimes it must no-store when told. The Arena browser completely implemented the HTTP/1.1 and Cache-Controls in 1996. Lynx users were clamoring for a means to ignore “Cache-Control” in order to force-enable caching to make modem communications faster in 1999.

Not every option for Cache-Control was supported in every browser, but there was enough compliance that your data was safely controlled during your run of Netscape Navigator Gold and even the awesome piece of machinery known as Internet Explorer 2.0.

The final HTTP/1.1 specification, including Cache-Controls, was published in June 1999.

HTTP/1.0 Today

But, we still publish security advice telling people to add Pragma: no-cache so that older browsers will have a fall back mechanism. Really? Someone is going to access your React-enabled, jQuery-driven, responsive-designed, Bootstrap-themed, CSS-optimized website using IE 1.0 and you need to protect their data. Really?

Another fun fact: HTTP/1.0 does not support the Host header. In other words, when you connect to a server running multiple websites, how does the server know which website you want (remember, DNS resolves a name to an IP address). As we have run out of IP address and IPv6 isn’t prime time, nearly every web server on the Internet hosts multiple sites. So, if you have a end user coming to your site with a truly HTTP/1.0-compliant browser, there is no way they can even connect to your site. Go ahead, try it:

[hacker@evil-kitten ~]$ telnet veggiespam.com 80
Trying 69.163.152.247...
Connected to veggiespam.com.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Fri, 28 Jul 2017 18:36:56 GMT
Server: Apache
Last-Modified: Sat, 01 Nov 2014 04:18:40 GMT
ETag: "304-506c4687e0800"
Accept-Ranges: bytes
Content-Length: 772
Connection: close
Content-Type: text/html

<!doctype html>
<html>
<head>
    <title>Site not found &middot; DreamHost</title>
    <meta http-equiv="cache-control" content="no-cache" />
    <meta name="description" content="The owner of this domain has not yet uploaded their website." />
    <link rel="stylesheet" href="https://securendn.a.ssl.fastly.net/newpanel/css/singlepage.css" />
</head>
<body>
    <div class="page page-missing">
        <h1>Site Not Found</h1>

        <p>Well, this is awkward. The site you're looking for is not here.</p>
        <p><small>Is this your site? <a href="http://wiki.dreamhost.com/Site_not_found">Get more info</a> or <a href="https://panel.dreamhost.com/index.cgi?tree=support.msg">contact support</a>.</small></p>

        <a href="http://www.dreamhost.com/" class="logo">DreamHost</a>
    </div>
</body>
</html>
Connection closed by foreign host.

[hacker@evil-kitten ~]$ telnet veggiespam.com 80

Trying 69.163.152.247...

Connected to veggiespam.com.

Escape character is '^]'.

GET / HTTP/1.0

HTTP/1.1 200 OK

Date: Fri, 28 Jul 2017 18:36:56 GMT

Server: Apache

Last-Modified: Sat, 01 Nov 2014 04:18:40 GMT

ETag: "304-506c4687e0800"

Accept-Ranges: bytes

Content-Length: 772

Connection: close

Content-Type: text/html

<!doctype html>

<html>

<head>

<title>Site not found · DreamHost</title>

</head>

<body>

<h1>Site Not Found</h1>

<p>Well, this is awkward. The site you're looking for is not here.</p>

<p><small>Is this your site? <a href="http://wiki.dreamhost.com/Site_not_found">Get more info</a> or <a href="https://panel.dreamhost.com/index.cgi?tree=support.msg">contact support</a>.</small></p>

<a href="http://www.dreamhost.com/" class="logo">DreamHost</a>

</div>

</body>

</html>

Connection closed by foreign host.

Heck, try with HTTP/1.1 and no Host header:

[hacker@evil-kitten ~]$ telnet veggiespam.com 80
Trying 69.163.152.247...
Connected to veggiespam.com.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.1 400 Bad Request
Date: Fri, 28 Jul 2017 18:39:56 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>
Connection closed by foreign host.

[hacker@evil-kitten ~]$ telnet veggiespam.com 80

Trying 69.163.152.247...

Connected to veggiespam.com.

Escape character is '^]'.

GET / HTTP/1.1

HTTP/1.1 400 Bad Request

Date: Fri, 28 Jul 2017 18:39:56 GMT

Server: Apache

Content-Length: 226

Connection: close

Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

<title>400 Bad Request</title>

</head><body>

<h1>Bad Request</h1>

<p>Your browser sent a request that this server could not understand.<br />

</p>

</body></html>

Connection closed by foreign host.

So, no user running a true HTTP/1.0-compliant browser will be able to reach your content. We don’t need to support HTTP/1.0.

But wget still uses HTTP/1.0, right? Well, not quite, it supports many extensions of HTTP/1.1 including the Host header (think about it: Host header must be present or wget will fail to work at all). But, the purpose of wget, curl, and their ilk is to download files to the disk; so the cache point is fairly moot (plus they don’t cache anyway). Also, wget --no-cache option has nothing to do with client-side caching. It sends a “Pragma: no-cache” to the server so that it forces your server gives out new data. You programmed your server to support this non-standard extension, right?

Bots and Crawlers

Many bots and crawlers reputedly report to be HTTP/1.0. But, they encounter the same issues as above, they need to send the Host header, so they are not pure-HTTP/1.0. Also, consider the purpose of the crawler – to cache and index your data on their servers. They will ignore your Pragma. They will even ignore your Cache-Control.

And think about Google. Do you really believe they stopped developing their crawler in 1996 after bolting on the Host header support to HTTP/1.0? No, it sends HTTP/1.0 in the header just in case it connects to a really old server or a quirky non-standard server.

Proxies

Web Proxies are used for mainly three purposes:

Prevent direct access to outside machines from the inside.
Track web usage.
Decrease network traffic by caching data.

In the first case, the proxy simply wants to be a middle man with the data in order to protect the network. It doesn’t care about the caching directives, it just passes them along as-is. In the second use case, caching is irrelevant, the system is only interested in the destination and if some sites should be rejected. In the final situation, the proxy will do caching in the manner by which the proxy server operator dictates, including with the proxy being configured to ignore the directives returned by the web server so that the proxy or browser performs additional caching. In this last situation, no matter what Pragma or Cache-Control your server sends, the proxy might decide to override and modify your directives when the proxy forwards the response to the browser.

The other argument foisted upon us to use Pragma is that not all proxy servers support HTTP/1.1. True, but as with the bots & crawlers, they support many HTTP/1.1 features, but still send the HTTP/1.0 version indicator in the headers. Looking at the very popular Squid Cache, we see that version 1.2.alpha4, released in September or October 1997, added the initial support for Cache-Control; scroll to the bottom here to see it ³. As you scroll up in the changelog, various other Cache-Control options get added over time, like max-age, no-store, etc. If you search for “Pragma” in this changelog file, there are only bug-fixes for “Pragma: no-cache” where this header was causing confusion in Squid. The header makes things worse.

Squid’s current HTTP/1.1 implementation is about 90% complete as of version 3.2 released July 2010 according to their status page. It supports most of the Caching features, certainly all more than just “Pragma”.

Reading further into the changelog, Squid 2.2 from April 1999 indicates that Cache-Control: max-age takes precedence over Expires – meaning you don’t even need the Expires header either! Maybe a topic for another blog post.

Encrypted Transport

Since the data is confidential enough that it must never be cached, you’re probably requiring that it be sent over SSL or even TLS? Well, that isn’t possible in this theoretical IE 1.0 that you need to support – it doesn’t support https at all⁴! Those users still on IE 2.0 can only use SSL v2, something not compiled into your webserver for at least a decade. For connections coming from IE 3.0 to IE 6.0, they can connect with SSL v3, which is completely disabled (and perhaps not even supported at all) by any modern webserver. You could tell your users how to edit a bunch of registry settings to enable TLS v1.0 on IE 6.0 for Windows 2000, but you’d need to convince Verisign to sign your certificate with SHA-1 (hint: they won’t) as these IE 3.0 to 6.0 doesn’t support SHA-256 signatures. Plus, you’d need to enable RC4 to avoid POODLE and BEAST but still be susceptible to FREAK and Logjam attacks. And enabling RC4 would put every other user of your website at risk, but whatever.

The latest NIST Special Publication 800-52rev1 on TLS says that “…servers shall not support TLS 1.0, SSL 2.0, or SSL 3.0” in §3.1 and indicates that migration away from TLS v1.0 must begin by 01 January 2015 (page vi).⁵

Websites that are required to be PCI compliant (e.g., anything that takes a credit card) must not use SSL v3.0 at all and must not use TLS v1.0 after 30 June 2018. If you need HIPAA compliance, you need to follow NIST 800-52 which has been superseded by revision 1 above – thus TLS v1.0 is not allowed.

Basically, if your site does any type of commerce, medical, or government services, it is forbidden from using TLS v1.0. To connect to your site, users need a minimum of Chrome 21, Firefox 27, Safari 7, or IE 11 (via end-user-configuration hacks, the minimum becomes Firefox 23 or IE 7 atop Windows 7 or 2008Svr).

And somehow, you’re worried about backwards compatibility with Pragma.

You’re Still Doing It Wrong

So, after this history and current state of software, I’ve shown that you do not need to specify Pragma: no-cache at all. But as security practitioners, we keep recommending it. So, the developers follow our advice and use the Pragma, including in situations where they want private caching, such as with:

Pragma: private

1	Pragma: private

See, an older browser doesn’t support “Cache-Control: private”, so it will obviously support using “Pragma: private” instead. And my favorite observation from a recent (July 2017) penetration test:

Pragma: no-store, max-age=3600

1	Pragma: no-store, max-age=3600

Which these legacy browsers of course will magically support. The pragma definition from 1992 referenced earlier says to use semi-colons to separate optional fields, but this really old browser will know what you mean.

When we security professionals keep recommending the servers send Pragma “just in case for these older systems”, the developers are beginning to assume the word “pragma” is synonymous for “cache-control”. We are making the problem worse. What happens when the developers make the next leap and think, “Why not save a few lines of code and only use Pragma since it is supported by both old and new browsers.” That will lead us to:

Pragma: no-cache, no-store, max-age=0, must-revalidate

1	Pragma: no-cache, no-store, max-age=0, must-revalidate

It’s a Waste

With our security reports demanding that the developers add this header, we are creating unnecessary work. It isn’t as simple as adding a single line of code. Many times, different components have different header generation expectations. For example, the main website, the REST services, the UI framework services, the download file function, the export data as file function, the directly download PDF from disk component, etc will all required different modifications to very different codebases as observed in a recent website assessment. Some of these components did caching correctly, except for the Pragma header; so the application team did the extra work all because the official infosec boilerplate advice is to add this stupid header.

The developers are wasting hours per web application implementing something of zero value; it is time to stop.

But It’s Only an After Dinner Mint.

So, after all of these arguments above, you are still thinking that it’s only a few bytes for “Pragma: no-cache”, why not do it “just in case”?

[hacker@evil-kitten ~]$ echo -e -n "Pragma: no-cache\r\n" | wc 
       1       2      18

1 2	[hacker@evil-kitten ~]$ echo -e -n "Pragma: no-cache\r\n" \| wc 1 2 18

So, 18 bytes for nearly every single web request performed. This includes many calls with dynamic JavaScript, to load portions of a page. It often means rotating images and 1 pixel tracking images. It might even include JavaScript and CSS files if they change often.

So, what does a company with unlimited money do? Google can afford it and doesn’t care about the extra bytes. Well, no. Look at the Google home page and how each JavaScript variable is a single letter long. See how there are no extraneous line feeds or even spaces in the HTML or JavaScript code. Google’s engineers have optimized the shit out of the page for less bandwidth and faster loading. They care about each and every byte.

Most Google pages do not use Pragma as Google wants their page to be cached. But, you can find Pragma in Google’s Doubleclick and on their /signin & /ServiceLogin page. So even Google uses Pragma.

For a thought exercise, let’s assume they always send “Pragma: no-cache”. In a fresh browser, go to www.google.com and slowly type “cache-control” into the search box. At this point, you have sent 24 different round-trip web requests (I’ve seen different numbers of requests, but 24 was most common).

With that in mind, recall that Google processes at least 2 trillion searches per year. Let’s make a logical jump that each search is like my search above, it takes 24 requests which means 24 instances of “Pragma: no-cache”. At volume, it costs $0.01 to transfer 1 GB of data (Google probably pays less, but not that much). So,

2,000,000,000,000 requests/year × 24 connections/request × 18 Bytes/connection × (1000×1000×1000 B/GB)^-1 × 0.01 USD/GB

⇒ 2,000,000,000,000 × 24 × 18 × (1000×1000×1000)^-1 × 0.01

⇒ $8,640 / year.

Okay, maybe data transmission doesn’t cost a lot. And I also won’t bother computing a disk-space cost argument with consideration taken for the extra code and logging with full HTTP header. (If my math is wrong, someone please correct me.)

But a developer spending 30 minutes finding all locations where Pragmas need to be added, running the full compile/unit-test/code-check-in cycle, followed by the QA team, the push to production, the clean up of the open change request, the close out of the security team’s finding in the GRC system, and then getting procurement hire that outside infosec firm to retest all the findings might end up being a thousand dollars worth of time for all personnel involved.

Multiply that by every website in the company. Now we’re talking real money, even for Google.

End of Times

I will end this pain; I will no longer advise clients to add the Pragma: no-cache header. I will delete this from whatever boilerplate I may be required to use. Peer reviewers or other infosec firms may override me, but I will save the developer from these wasted efforts.

It’s dead. It’s time to bury Pragma: no-cache.

If you have feedback, send it to me as a DM on Twitter or email me directly

See history of Pragmas in programming languages on Wikipedia.
Yes, I am generalizing here.
This is the first instance I found, but it could have appeared in Squid earlier.
The Rosetta of TLS support in Browsers
The NIST document does allow TLS v1.0 a few niche exceptions for some public facing sites, but that is uncommon and discouraged.

Thanks for Alexei Kojenov @kojenov for helping to proof read this.