Ian Bicking: the old part of his blog

CMS and static publishing

(Looking for XHTML Range (Semantic My Ass)?)

I've been working on a content management system for use with our clients. The system actually preceded me, but we undertook a complete rewrite for this next version. So we came in with certain architectural decisions already decided, and static publishing was one of them (i.e., the CMS publishes to files which Apache serves independently from the CMS).

While this always seemed like a pretty reasonable technique, I'm not sure if it would have been the first thing to occur to me. A more dynamic system (like Plone) seems a lot easier and more general. At least, easier if you do the Right Thing -- the naive approach is easy with a static CMS, but the naive approach fell far short of what we needed, which is why a rewrite was required.

But now that we are getting close to going live with the CMS, I'm very happy with static publishing. A big part is that it takes the pressure off of going live. I can be sure before going live that the public website is correct. The actual CMS may explode in flames, but the site will be fine. Going live with a web application is always a stressful process, and anything that reduces the stress of that is a great benefit. As time goes on, static publishing is also a big stress reduction for the system administrator, since a simple Apache configuration is a lot more reliable under different loads and configurations than any dynamic site will be.

Another benefit is that it's really easy to migrate existing sites into the CMS -- the pages only need to be as structured as you want them to be, and you can import the entire site verbatim if you really want, applying structure and template later. Anything that allows incremental processes is a big +1 in my book. Maybe that makes it +2.

This has also freed us up to do lots of tweaking of the system and UI. Unit tests have been very successful at giving us a solid foundation, but the user interface is very difficult to test, especially as it gets constantly tweaked. Inevitably there are bugs. If the CMS interface had to be as stable as the public site (which has to be extremely stable), then I'd be very nervous about the changes we are making late in the game. But I'm not worried, because the application remains Stable Enough, and it lets us make significant improvements instead of going into a long, deep pre-release freeze.

Created 12 Dec '03
Modified 25 Jan '05

Comments:

You can get much the same benefits of static page generation by sticking a good caching proxy in front of the CMS.

Set a simple spider that visits every page and you'll have a cached copy of the whole site that leaves the dynamic layer free.

It's a technique that's worked well for me in the past.
# Adrian Howard

Doesn't work that well when you have large parts of the traffic in session based pages with personalized data. I usually tend to do a twofold system: part of it dynamic and part of it statically rendered. High-load parts get served with statically rendered pages that are rerendered on data triggers or time triggers. It's a bit more work to build but a very nice solution if your data is mostly read and much less changed. Gives fully dynamic user experience with fully static server experience.
# Georg Bauer

Adrian: once you create a dynamic site that is actually cache-friendly (or spider-friendly), it's not just the same benefits, it's pretty much the same thing as a static site, isn't it? Well, I guess a truly static site is event-based (you actively rerender pages), where a cached site isn't (the proxy figures out that pages have updated over time). The differences become subtle. In the spidering version, you've really just replaced a push (file write) with a triggered pull (from the spider). Either way, how you actually move the files around -- cache, pull, push, file write, FTP upload, SCP upload, and so on -- should hopefully be somewhat abstracted from the CMS, since requirements vary considerably. Upon further thought, I think the distinguishing aspect is render-on-demand, vs. render-on-event.

Georg: yes, static is limited. The issue we're dealing with a lot is the workflow of publishing a site, and pieces that don't fit into the workflow usually wouldn't be put in the CMS. Though the more I work with the system, the more I see places where workflow applies. You could add or edit an item in your store, preview, have the copy reviewed, etc., then publish it to your site. But obviously a store item is much richer than your typical HTML file -- or at least has different set of metadata. And it has to interact with entirely non-static items, like a shopping cart, or inventory control. Some things don't make sense as event-based, or the events occur with a granularity that doesn't fit your model. When writing the CMS, I expect data to be updated on the order of a few items a day, or maybe the occassional big batch. An online forum wouldn't fit into this model at all. Anyway, it's my first foray into workflow, so I'm still thinking about the possibilities, and what it means for all the different parties involved.

Also, one of the benefits of publishing into a system like Apache is that it's a good basis for heterogeneous sites, so I'm inclined to simply call upon external applications in these cases.
# Ian Bicking

Yep, Apache is a great base for mixed applications. That's what I usually do: Apache in the front, parts built on mod_perl or cgi, parts built on Zope (via mod_proxy and mod_rewrite behind the Apache) and parts with event based rendering that result in static HTML stuff.

Ok, with high traffic systems I usually have a bit more than just one Zope behind the Apache. Actually it's more like two Apache machines with load balancer in front, 4 Zope machines in the back, connected via a Zeo server, a PostgreSQL machine (with a spare one for failover) in the back. Several servers around all that to connect it and hold it together (for example two machines running load balancers to match up between two Apaches and 4 Zopes). But it's a system with 2500 hits per minute in it's prime time, 1000 hits alone to the Zope application parts. Each Zope page with 3-4 SQL statements. :-)

Mixing static and dynamic parts is essential with this system.

Oh, and even a forum can be built with static rendered output: a forum is much more read than written! So just render out index pages when someone writes a comment. Build dynamic parts with Javascript and Dynamic HTML (for example to build outline views of the forum). That way you get a very fast forum for readers - the rendering of new index pages can happen in the background. Or if you use Apache: the cleanup phase of a request is the right place for rendering stuff. In that phase the client is already satisfied, so you don't block anything (beside the process slot - but with high performance systems you already will have patched your system to unusual high values for running processes).

The main problem with really high performance systems that make use of event based rendering is how to build a really fast replicating filesystem. If you create a comment, the comment stuff should go to the background to some rendering machine, be rendered there into static pages that are pushed back to the front machines. That needs to be performant, if you do that wrong, your system will be blocked by shuffling files around :-)
# Georg Bauer

Ian: Yup, you're right it is pretty much the same thing - which was my point ;-) However, I'm surprised how many people fail to see it that way.

I was talking to a guy I met on a previous project a couple of weeks ago who was having scaling problems with their Java based CMS. Proxying the (almost completely static) content hadn't occurred to him. Most odd since he was a technically literate bloke.

Somehow the attitude that "we could change this at any time" overrides the reality of things only changing once every three months. One Apache mod_proxy later and suddenly they don't have to purchase another Sun E350. Odd, but I've seen the same pattern several times.
# Adrian Howard