Raking Jekyll

I’ve never really touched rake before, but since switching to Jekyll I’m finding that it’s becoming an essential part of my workflow. In the limited area of blogging, at least.

rake is a version of make in which you define all your targets in Ruby. Because practically anything would be an improvement over Makefile syntax, this is pretty easy to work with. I’m not a huge fan of shell scripting at the best of times, so mixing it in with something else is… not desirable. I still find Ruby less intuitive than Python, but that’s my prejudices talking.

To elaborate… what does posting a new entry look like for me?

  1. rake server to start up an automatically-rebuilding local webserver copy of my blog
  2. rake post[raking-jekyll] to make a new post with the YAML front matter boilerplate
  3. Actually edit the newly created post in an editor
  4. rake deploy to rsync the local copy to my hosting over ssh

Any part of my routine which looks like it might be scriptable has been replaced with a rake target. For example, the post target:

  1. Copies a template file
  2. Names it according to the current date and provided title
  3. Adds an expanded version of the current date into its YAML front matter so sorting will work correctly if I post multiple times a day

Since I rarely know the current date without having to look it up, that certainly saves me some effort.

Here’s my Rakefile, if you want to use anything from it. It’s probably not properly idiomatic Ruby, but it does at least work.

XSS is fun!

Pretending innocence, I ask why all these high profile websites have their homepages covered in spinning images?

Okay, obviously enough, I’m messing with them. But how can I do that?

The answer is cross site scripting (“XSS”).

XSS is surprisingly common, and nigh-universally is caused by poorly escaped user inputs. Even user inputs which, as in this case, they obviously don’t think of as user inputs. It happens when content is injected into a page, which results in the loading of arbitrary JavaScript onto that page.

As such, I own your interaction with those sites. If I was malicious I could be harvesting your cookies from them, redirecting you to phishing sites, recording everything you type, or just snooping on everything you view. As an example of why someone might want to do this… in the case of these particular sites, stealing your cookies (document.cookie) would let me post comments as you. I could thus spam those sites using legitimate accounts that I don’t have to go through the hassle of creating myself.

I’m not doing this, because that wouldn’t be nice. All I’m doing is reversing links and spinning images, because I think that’s cute.

In this case, all these sites have screwed up by including a little bit of HTML from an ad network (EyeWonder) on their site. This HTML accepts an arbitrary URL as a parameter, and loads it in a <script> tag. This is quite a common way for ad networks to ruin your day, often in the name of “frame busting”.

If you’re wondering who might be vulnerable to this exact hole from this exact ad network, Google can help you with that. Hint: it’s a lot of sites. I just grabbed the first three big names to demonstrate with.

Here’s the offending HTML:

This would actually be pretty easy to fix, note. A little bit of checking of the input, to restrict it to scripts hosted only on known-trusted domains would be enough to make exploiting it almost impossible. (I say “almost” because someone sufficiently resourceful might find one of these “trusted” domains isn’t as secure as they hoped and slip a script onto it. But it at least raises the bar.)

If you’re curious what I’m doing to make these pages spin, check out this gist which includes the spinner script. Essentially it’s just making an iframe which shows the root of the domain, and then manipulates the contents of that iframe, which it’s allowed to do because the script is running on the same domain.

In short: never trust user input. Also, don’t trust your ad networks to know/care about security.

This post brought to you by my coworker Paul Banks pointing out the existence of this fun little hole on CNN. I then added the spinning myself, because it looks nice and spectacular.

Jekyll

I’ve just redone my website using Jekyll. It is now completely static. No PHP, no database, nothing like that.

Why did I do this?

  • It’s quite soothing knowing that all my content is version controlled.
  • I am now nigh-immune to traffic spikes. I was using caching with WordPress before, so it had never been an issue even when I was on the HN frontpage, but there’s some peace of mind in it.
  • WordPress had a history of security bugs which wasn’t comforting. Since nothing on this new site is executable I feel pretty secure now.
  • My site is now ridiculously flexible. Jekyll forces almost no structure on you, leaving you free to change things around as you please.

I’m happy with the end result, but the process of getting there was not without pain.

The initial difficulty came from Jekyll’s documentation being somewhat lacking. I found myself somewhat confused about minor details like “how does a layout work?”. After I’d cribbed that together by examining other sites posted with Jekyll, I discovered that the template data docs were inaccurate / misleading, implying the presence of a post variable which failed to exist. It turned out to be something that’s merged into page if you’re viewing a post.

I don’t completely blame Jekyll for this being opaque. Jekyll uses Liquid for its templating language, which claims to be aimed at designers… and I feel it would benefit from some sort of debugging mode that dumps the current scope for examination.

I resorted to reading Jekyll’s source, which cleared up a number of things. However, I view it as a bad sign that I felt I had to do this. Not that a command-line driven static website generator is ever likely to be a mainstream product, but still, it’s the principle of the thing.

Pagination worked, but was completely lacking in configuration. Since part of my goal was to have my URLs remain the same as they were in WordPress, I had to change this. I did so with a horrible monkey-patching hack of a plugin. Specifically, I made a copy of the pagination module from Jekyll’s core into my _plugins directory and selectively edited it to change the pagination urls.

In the process I noticed a bug in the core code, and submitted a pull request to fix it. So horrible monkey patching might at least pay off this time.

Also utterly broken was the related posts feature. No matter what, it always seems to think the most recent posts are the most related to anything. It’s possible that running with --lsi would have helped with this, via complex semantic analysis, but that takes forever and I’ve seen others complain that it doesn’t really help. So there’s more monkey patching going on via Lawrence Woodman’s related posts plugin, which I took and edited so it worked based on tags instead of categories.

One thing I haven’t fixed, which I’d like to, is making the automatic regeneration of your site during development / writing a lot smarter. Right now it notices a file has changed and so it regenerates every single bit of content on your site. This does mean that the live generated site always has recent/related posts up to date everywhere… but it’d be nice to have some sort of --quick option that ignored that stuff in favor of a faster development cycle.

Because of the utter staticness, I naturally cannot have my own comment system in use any more. So I’ve switched to Disqus, which adds commenting to the site via JavaScript. It feels sort of weird to be outsourcing a component of my user experience like this… but they seem to be trustable. Widely used, and their monetization plan is fairly transparent.

If you’re interested you can see the repo for my website on github. It contains, in its default / post templates, markup that’s compatible with any WordPress theme that’s based on Toolbox, which might be of use to some.

Like I said, I’m happy with how it turned out. I wouldn’t recommend this at all for a non-technical person, but if you want to dig in and get your hands dirty then Jekyll is quite workable.

To replace PHP you need

(Expanding slightly on my response to this HN thread.)

First: to be on all shared hosting everywhere. I.e. you need to be really easy to install, and preferably not involve long-running processes that shared hosts might choke on.

Second: to be beginner friendly. No requirement of understanding MVC, or running commands in a shell (hi RoR!). Pure instant gratification. Someone’s first step into using PHP is likely going to be “I want the current date in the footer of my page”, or “I want a random image on my homepage”, or something like that. Anything like that you can handle by taking your existing page and dropping a tiny snippet in where you want the change to happen. is a potent thing to someone who has never programmed before.

Note: For point 2 many of the things serious programmers hate about PHP are actually advantages. All the functions in one big namespace? That’s great! A newbie doesn’t have to try to understand .

It’s easy to replace PHP for serious developers. We like advanced features, and care about a sane default library. We’re willing to use complex tools to get a payoff.

It’s hard to replace PHP for non-programmers who just want to tweak their static page in notepad so it has one cool new feature, or install a blogging package on their cheapo shared hosting.

To sum up: if you don’t address both of these points then you haven’t killed PHP. You’re competing with Python or Ruby or whatever. PHP will carry right on ignoring you, because you’re not addressing its fundamental use case.