Archive for the ‘Internet’ Category

The Trouble Getting A 69

As someone who uses the bus a lot I’ve been using the Real Time Information system since it came out. I now use it pretty much every day to plan my trip.

Usually I find the data to be pretty accurate, with one constant exception – the 69 passing my house in the mornings which is constantly late.

A few weeks ago I decided to gather some data and find out just how late it was. There is still no API for accessing the real time information, so I wrote a python script to get the data from the website, and set up a cron job on my raspberry pi to run the script between 8:00 and 8:45.

That was three weeks ago, so I figured there was enough data gathered that it was worth putting it in a spreadsheet for further analysis. Here’s what I’ve learned:

  • At 8am the bus is due between 8:14 and 8:17
  • On most days it actually arrived between 8:30 and 8:35
  • The first three days seem to have had extra delays, arriving between 8:40 and 8:45
  • On one of the 15 days the bus didn’t run

The full data is available here

The problem with modern content management systems

Something is rotten in the state of Denmark. Seriously.

Basically, I believe that modern content management systems (Drupal, Joomla, and to some degree WordPress) have got it all horribly wrong. The fact that processing power is fast and cheap is the only factor that has allowed them to succeed in spite of this.

All of the above systems generate pages dynamically, everytime the page needs to be displayed. This works for a blog like this that no one reads, but, what about systems that do have readers? Thousands of readers? Thousands of Readers per hour? The only solution is to start to generate caches, put things behind reverse proxies, and then try to manage these systems. This is not efficient.

Rather than generate each block each time it is viewed, why not generate it each time the contents change? This is difficult to do with the above mentioned systems, but, would it be any more difficult to implement a system that did this if you were to start from scratch – I think not. I’m not proposing we abandon php, WYSIWYG editors or databases, just that we reverse the pattern, generate the html only when it changes, rather than everytime it’s viewed, or at an arbitary cache refresh interval.

On this basis, I’m going to define features that a content management system designed from the ground up for large volume systems should have:

  • HTML generated when it needs to change, as described above. This should just be written to flat files. There’s no need for a cache, operating systems are designed to cache these. There’s no need to compromise on the reuse of elements or the look and feel, generated elements and template elements can easily be combined with SSIs.
  • CSS/Js combination, compression that’s actually well desgined. Drupal tries to do this, but it fails. Firstly, it has no js compression because this feature didn’t work, but, more fundamentally, the system is designed so that the hashed file name stays the same when the file changes? What the hell use is this? How can I attach an expires header to this when the name doesn’t change with the file version?
  • Image Compression. PHP scripts and databases eat RAM and CPU, images eat bandwidth. Like memory, bandwidth is expensive. Any CMS desgined for large scale usage should handle this by default, with options that support image compression and other optimisations.
  • CDN support. Even if you don’t actually have a CDN budget, you may want to host your static content on a cookie free domain. This isn’t a new idea, it’s a well discussed way to improve performance and reduce bandwidth usage. Does drupal support this without accepting a compromise solution or hacking the core with ducttape solutions? No. There’s no technical reason not to do it, the code isn’t difficult, the thinking is just wong.
  • Content goes in the database. Settings go in the settings file. There’s two reasons for this, firstly security, if there’s a bug found in $CMS and the settings file can only be read by the executing user then the damage is somewhat more limited. The more important issue though is release management, actually being able to use your version control system. Once you combine the content and the configuration in a database ball of mud, if becomes almost impossible to make site changes in development and push them automatically into the live environment, because the changes depend on settings stored in a database. It’s just easier to do it manually than try to get a trustworthy system do do ‘stuff’ to your production database.

Other suggestions on a postcard please 🙂

My First WordPress Theme

I finally decided to write my own theme from scratch for ornat’s new blog.


As you can see it’s fairly purple and black. I think it’s nice and simple, and the fading on the edges look great. I’ll get a tar up of it up soon.