Re: Servlet caching strategies?
On Wed, 22 Sep 2010, markspace wrote:
On 9/22/2010 6:11 AM, Tom Anderson wrote:
On Mon, 20 Sep 2010, Arne Vajh?j wrote:
There are no need to pregenerate static content.
Agreed. To repeat what Arne said with slightly different emphasis, i
think pages should be cached - i think it's vastly preferable to
cacheing data further back, because it shortcuts page generation - and i
think the place to do it is in a reverse proxy sitting in front of your
app servers.
So I agree that the author I linked to, Jason Hunter, seems a little goofy in
this regard. I've never heard of pregenerating websites being a best
practice, which is why I asked here about it. The idea of using separate
(and pre-written, and pre-debugged) cache software to do the caching seems
much, much better.
However, Mr Hunter raises some good points. One I think is that it's
impossible for a cache to function if it can't determine the age of a
page (or other resource). And he admonishes the coder to implement
getLastModified() on the servlet, which will automatically add date
information to the page (or other generated content). This seems
fundamental in enabling a cache to work, because the spec for caching
seems to imply that without some sort of cache control in the HTTP
header, data cannot be cached.
So I'm I correct in assuming that the cache software needs some sort of
control? Either Last Modified, ETag, or some sort of explicit cache
control.
Yes, absolutely. IIRC, the minimum you need is a last-modified header and
a lack of a no-store cache-control header. With that, clients will still
have to make GET requests for the page, but they can be conditional; if
your server supports conditional GETs (i have no idea if theres's any
support for this in web frameworks; it's straightforward with a plain
servlet), you can avoid rendering pages. I believe a reverse proxy can use
that to serve cached pages to users making non-conditional GETs (eg who
have never seen the page before): it passes on a conditional GET to the
app server, using the last-modified on its cached copy of the page, and if
it gets a 304 Not Modified response, it serves the cached page.
You then have a plethora of options for doing things better or
differently. Off the top of my head:
* If you can't or don't want to issue last-modified headers, you can use
etags instead.
* You can send an expires or cache-control maxage header, which lets
browser and proxy caches reuse cached pages without revalidating them with
the app server.
* You can send all sorts of other things in cache-control headers to tune
cache behaviour; i don't think any of these are colossally important,
though.
* You can use the 'far-future expires' method, in which you serve all
content with expiry dates in the far future, so it can be cached forever,
and make sure that if you change it, you change the URL it's reached
through, so that the cached old version won't be used. For example, when
you change your logo, logo.v1.png becomes logo.v2.png.
* If you have a highly configurable reverse proxy like Varnish, you can do
some trickery to offload work to it while retaining control. The thing is
to send a freshness-enabling header, like cache-control s-maxage, with a
value which guarantees freshness far into the future, which will let the
proxy cache the response without needing to revalidate it, but strip this
off before sending it out into the web. You thus ensure that the response
is cached, but only locally. That doesn't save you any bandwidth, but it
does save you server load. You keep control over the cacheing, because you
can always purge the proxy's cache if things change.
And what cache control do you use when planning out a site?
How do you mean?
http://perl.apache.org/docs/tutorials/apps/scale_etoys/etoys.html
Thanks, I'll look into this when I get a chance.
I'll add some a couple more. Basic info that it sounds like you already
know:
http://www.mnot.net/cache_docs/
Yet more interesting Danish cache science (although
not directly useful in this discussion):
http://iwaw.europarchive.org/04/Clausen.pdf
tom
--
An unreliable programming language generating unreliable programs
constitutes a far greater risk to our environment and to our society than
unsafe cars, toxic pesticides, or accidents at nuclear power stations. --
C. A. R. Hoare