Raking Jekyll

I’ve never really touched rake before, but since switching to Jekyll I’m finding that it’s becoming an essential part of my workflow. In the limited area of blogging, at least.

rake is a version of make in which you define all your targets in Ruby. Because practically anything would be an improvement over Makefile syntax, this is pretty easy to work with. I’m not a huge fan of shell scripting at the best of times, so mixing it in with something else is… not desirable. I still find Ruby less intuitive than Python, but that’s my prejudices talking.

To elaborate… what does posting a new entry look like for me?

  1. rake server to start up an automatically-rebuilding local webserver copy of my blog
  2. rake post[raking-jekyll] to make a new post with the YAML front matter boilerplate
  3. Actually edit the newly created post in an editor
  4. rake deploy to rsync the local copy to my hosting over ssh

Any part of my routine which looks like it might be scriptable has been replaced with a rake target. For example, the post target:

  1. Copies a template file
  2. Names it according to the current date and provided title
  3. Adds an expanded version of the current date into its YAML front matter so sorting will work correctly if I post multiple times a day

Since I rarely know the current date without having to look it up, that certainly saves me some effort.

Here’s my Rakefile, if you want to use anything from it. It’s probably not properly idiomatic Ruby, but it does at least work.

XSS is fun!

Pretending innocence, I ask why all these high profile websites have their homepages covered in spinning images?

Okay, obviously enough, I’m messing with them. But how can I do that?

The answer is cross site scripting (“XSS”).

XSS is surprisingly common, and nigh-universally is caused by poorly escaped user inputs. Even user inputs which, as in this case, they obviously don’t think of as user inputs. It happens when content is injected into a page, which results in the loading of arbitrary JavaScript onto that page.

As such, I own your interaction with those sites. If I was malicious I could be harvesting your cookies from them, redirecting you to phishing sites, recording everything you type, or just snooping on everything you view. As an example of why someone might want to do this… in the case of these particular sites, stealing your cookies (document.cookie) would let me post comments as you. I could thus spam those sites using legitimate accounts that I don’t have to go through the hassle of creating myself.

I’m not doing this, because that wouldn’t be nice. All I’m doing is reversing links and spinning images, because I think that’s cute.

In this case, all these sites have screwed up by including a little bit of HTML from an ad network (EyeWonder) on their site. This HTML accepts an arbitrary URL as a parameter, and loads it in a <script> tag. This is quite a common way for ad networks to ruin your day, often in the name of “frame busting”.

If you’re wondering who might be vulnerable to this exact hole from this exact ad network, Google can help you with that. Hint: it’s a lot of sites. I just grabbed the first three big names to demonstrate with.

Here’s the offending HTML:

This would actually be pretty easy to fix, note. A little bit of checking of the input, to restrict it to scripts hosted only on known-trusted domains would be enough to make exploiting it almost impossible. (I say “almost” because someone sufficiently resourceful might find one of these “trusted” domains isn’t as secure as they hoped and slip a script onto it. But it at least raises the bar.)

If you’re curious what I’m doing to make these pages spin, check out this gist which includes the spinner script. Essentially it’s just making an iframe which shows the root of the domain, and then manipulates the contents of that iframe, which it’s allowed to do because the script is running on the same domain.

In short: never trust user input. Also, don’t trust your ad networks to know/care about security.

This post brought to you by my coworker Paul Banks pointing out the existence of this fun little hole on CNN. I then added the spinning myself, because it looks nice and spectacular.

Jekyll

I’ve just redone my website using Jekyll. It is now completely static. No PHP, no database, nothing like that.

Why did I do this?

  • It’s quite soothing knowing that all my content is version controlled.
  • I am now nigh-immune to traffic spikes. I was using caching with WordPress before, so it had never been an issue even when I was on the HN frontpage, but there’s some peace of mind in it.
  • WordPress had a history of security bugs which wasn’t comforting. Since nothing on this new site is executable I feel pretty secure now.
  • My site is now ridiculously flexible. Jekyll forces almost no structure on you, leaving you free to change things around as you please.

I’m happy with the end result, but the process of getting there was not without pain.

The initial difficulty came from Jekyll’s documentation being somewhat lacking. I found myself somewhat confused about minor details like “how does a layout work?”. After I’d cribbed that together by examining other sites posted with Jekyll, I discovered that the template data docs were inaccurate / misleading, implying the presence of a post variable which failed to exist. It turned out to be something that’s merged into page if you’re viewing a post.

I don’t completely blame Jekyll for this being opaque. Jekyll uses Liquid for its templating language, which claims to be aimed at designers… and I feel it would benefit from some sort of debugging mode that dumps the current scope for examination.

I resorted to reading Jekyll’s source, which cleared up a number of things. However, I view it as a bad sign that I felt I had to do this. Not that a command-line driven static website generator is ever likely to be a mainstream product, but still, it’s the principle of the thing.

Pagination worked, but was completely lacking in configuration. Since part of my goal was to have my URLs remain the same as they were in WordPress, I had to change this. I did so with a horrible monkey-patching hack of a plugin. Specifically, I made a copy of the pagination module from Jekyll’s core into my _plugins directory and selectively edited it to change the pagination urls.

In the process I noticed a bug in the core code, and submitted a pull request to fix it. So horrible monkey patching might at least pay off this time.

Also utterly broken was the related posts feature. No matter what, it always seems to think the most recent posts are the most related to anything. It’s possible that running with --lsi would have helped with this, via complex semantic analysis, but that takes forever and I’ve seen others complain that it doesn’t really help. So there’s more monkey patching going on via Lawrence Woodman’s related posts plugin, which I took and edited so it worked based on tags instead of categories.

One thing I haven’t fixed, which I’d like to, is making the automatic regeneration of your site during development / writing a lot smarter. Right now it notices a file has changed and so it regenerates every single bit of content on your site. This does mean that the live generated site always has recent/related posts up to date everywhere… but it’d be nice to have some sort of --quick option that ignored that stuff in favor of a faster development cycle.

Because of the utter staticness, I naturally cannot have my own comment system in use any more. So I’ve switched to Disqus, which adds commenting to the site via JavaScript. It feels sort of weird to be outsourcing a component of my user experience like this… but they seem to be trustable. Widely used, and their monetization plan is fairly transparent.

If you’re interested you can see the repo for my website on github. It contains, in its default / post templates, markup that’s compatible with any WordPress theme that’s based on Toolbox, which might be of use to some.

Like I said, I’m happy with how it turned out. I wouldn’t recommend this at all for a non-technical person, but if you want to dig in and get your hands dirty then Jekyll is quite workable.

To replace PHP you need

(Expanding slightly on my response to this HN thread.)

First: to be on all shared hosting everywhere. I.e. you need to be really easy to install, and preferably not involve long-running processes that shared hosts might choke on.

Second: to be beginner friendly. No requirement of understanding MVC, or running commands in a shell (hi RoR!). Pure instant gratification. Someone’s first step into using PHP is likely going to be “I want the current date in the footer of my page”, or “I want a random image on my homepage”, or something like that. Anything like that you can handle by taking your existing page and dropping a tiny snippet in where you want the change to happen. is a potent thing to someone who has never programmed before.

Note: For point 2 many of the things serious programmers hate about PHP are actually advantages. All the functions in one big namespace? That’s great! A newbie doesn’t have to try to understand .

It’s easy to replace PHP for serious developers. We like advanced features, and care about a sane default library. We’re willing to use complex tools to get a payoff.

It’s hard to replace PHP for non-programmers who just want to tweak their static page in notepad so it has one cool new feature, or install a blogging package on their cheapo shared hosting.

To sum up: if you don’t address both of these points then you haven’t killed PHP. You’re competing with Python or Ruby or whatever. PHP will carry right on ignoring you, because you’re not addressing its fundamental use case.

Why not just use an IDE if you want IDE features?

After I posted about my Sublime Text 2 git plugin I got one response which I thought was worth responding to.

That looks helpful, but I often wonder why not just use an IDE if you want IDE features.

Obviously I have a bias here, but I’ll try to be fair to IDEs…

An IDE is an editor that does a lot of things, many of them well. If there’s something you want to do it’ll almost certainly let you do it, but if you’re not happy with some basic element of how it works then you’re stuck having to find a new IDE. (Yes, I know, many IDEs have plugins available, but I’ve never had that much luck with them.)

IDEs also tend to be built with a workflow in mind. If you conform to that workflow then they’ll be good to you, but you want to deviate from it you may have to fight with your tools.

A lightweight-but-extensible editor (e.g. Sublime, TextMate, vi, and so on) tends to focus on having a really good editing experience. So you start with good editing, and then you pick and choose the “IDE features” that you want to mix in. If part of the editor doesn’t work how you want you might have to find a new plugin for it, but since it’s not a massive and complicated system it’s likely to be easier to find that plugin.

Neither is necessarily better, but they do tend to appeal to different types of developer. Web developers, needing to work with a number of different file types, and not generally having complicated build system requirements, gravitate towards the lightweight editors.

UPDATE: To be clear, I’m not saying either is better. It’s a matter of personal choice and situation. As someone who mostly does web development in dynamic languages, I like using a fairly lightweight editing environment. If I wrote in Java I’m sure I’d be singing the praises of IntelliJ/Eclipse/whatever, because I understand that Java is almost impossible to write well without an IDE.

Sublime Text 2 git plugin

I wrote a git plugin for Sublime Text 2.

I’d decided to try Sublime out for work to see how it compared to TextMate… and thus some degree of git integration was required. Given that it’s been out since January, I was surprised that there wasn’t already a solid git plugin.

I did find this one, admittedly, but I decided that I didn’t like how it fit in with Sublime. It’s built around menus and keybinds, whereas I felt that setting everything up as commands in the palette and hooking as much stuff as I could into the fuzzy search was the way to go.

Working on the plugin was a good exercise in getting me used to Sublime. I’m fairly sold on it as a result. It’s philosophically somewhat similar to TextMate, but with some of TextMate’s rough edges smoothed out.

(Short rant: if the recently announced TextMate2 alpha doesn’t get rid of the single-character undo buffer… I don’t know what I’ll do. It’s certainly the biggest single complaint I have about TextMate nowadays.)

A shadow is upon us

Other people contributing to my projects is often the cause of my improving them. This is because people tend to contribute something that works for the case they care about, without necessarily testing how it combines with the rest of the product. There’s nothing wrong with this. They did some work and wanted to give it back; that’s how open source should work.

A case in point here is how shadows just got added to maphilight. A pull request was submitted for a commit that added shadow support for rectangles in canvas only. I accepted the request because, hey, that’s a nice improvement, and it seemed to work. But I wasn’t really happy with rectangles-only.

So, I started fiddling with it. I had, for whatever reason, never touched shadows in canvas before. In fact, it’s been quite a while since I did the research into canvas that was involved in writing maphilight in the first place.

Looking at the commit I see that the shadows have been implemented with a combination of clipping regions and redrawing the shape with some shadow options on the path.

// [Regular shape drawing happens here]

// Clip regions
context.beginPath();
context.rect(0, 0, canvas.width, canvas.height);
context.rect(200, 200, 100, 100);
context.closePath();
context.clip();

// Shadow drawing
context.beginPath();
context.rect(200, 200, 100, 100);
context.closePath();
context.shadowColor = "rgba(0,0,0,1)";
context.fill();
context.stroke();

Now, I’m confused by the clipping being done, since I’ve never seen it work quite like that before. It’s drawing a rectangle around the whole canvas, then another around the rectangle we’re shadowing, and telling it to clip. So I do a bit of testing, and I find that this isn’t doing what I think the submitter meant it to.

I think they wanted it to set up a clip region only outside the shape, so that the shadows they drew wouldn’t appear inside it. However, in practice it seems that it’s just adding the two rectangles together and leaving us with a clip region the size of the whole canvas. It’s possible that this did work in another browser, but not in Chrome where I was testing…

Since subtractive clipping obviously wasn’t the answer, I looked into globalCompositeOperation to clean up after the fill. It turned out that destination-out was the operation I needed to empty my shape. Also, because the shadow-drawing had been added after the regular shape, I had to move it to be before that, otherwise the shadow was being drawn on top of the stroke and cleaning it up would wipe out the fill.

// So we now have...

// Shadow drawing
context.beginPath();
context.rect(200, 200, 100, 100);
context.closePath();
context.shadowColor = "rgba(0,0,0,1)";
context.fill();
context.stroke();

context.save();
context.globalCompositeOperation = "destination-out";
context.beginPath();
context.rect(200, 200, 100, 100);
context.closePath();
context.fillStyle = "rgba(0,0,0,1);";
context.fill();
context.restore();

// [Regular shape drawing happens here]

Okay! Now we have outline shadows.

But, another issue with this method: it’d fill and stroke on the shape, regardless of the settings you were using. If you had no fill / stroke it’d use the default (flat black) settings, which are ugly. Also, it harmed your opacity settings — the stroke and fill were being done twice. So when I added a shadow to a strokeless mostly-transparent rectangle I noticed that it gained a thin black outline, and was darker than it should have been.

I messed around a little bit with trying to erase the fill or stroke, but eventually decided that this was more hassle than it was worth. What wound up being the simplest option was drawing the shape massively off the edge of the canvas, and using shadow offsets to cast the shadow into the right spot.

// Just the shadow part this time
context.beginPath();
context.rect(200 + 9001, 200 + 9001, 100, 100);
context.closePath();
context.shadowOffsetX = -9001;
context.shadowOffsetY = -9001;
context.shadowColor = "rgba(0,0,0,1)";
context.fill();
context.stroke();

Now we had a shadow that didn’t involve drawing anything stroking or filling onto the canvas near our existing shape. At this point I had completely rewritten the code I’d merged in. About the only thing remaining was the option names Raven24 had chosen.

I went ahead and added some options for whether the shadow was cast inside or outside the shape, since I could see reasons for both, and added some overrides for whether it’d be casting from the fill or the stroke, since the varying possible configurations made it difficult to reliably guess which would look better.

I still didn’t add it to non-canvas. Largely because now that IE has finally given in and implemented canvas I view that as being a dead branch. Needs to keep working, and any major changes have to be ported over… but minor display differences are somewhat acceptable. Also, I don’t have access to IE right now, since I’m away from home. If I ever have reason to look into shadows in VML I’m sure I’ll add it in then, for the heck of it.

You can see all this in action on the demo page if you’re interested.

And that’s how community involvement improves things. 😀

Hiring developers

It turns out to be quite difficult to hire good developers. I’m involved in the hiring process at deviantART, and it has opened my eyes to just how unqualified the majority of applicants to these jobs are.

To give an example of this, I estimate that we hire perhaps 0.2% of the people who apply for a job with us. I haven’t gone back and tallied this up, so all figures in this example are ballpark at best… but I’ve tried to err on the generous side.

First, we collect resumes. We advertise on all sorts of job sites, we post things to Hacker News, and generally we try to get the word out. Filtering the resumes eliminates about 95% of them, depending on where we’ve been advertising. This is simultaneously the step where unqualified applicants are most expected, and the step that’s most likely to be rejecting perfectly qualified people because of some random quirk of their resume.

Then we ask promising applicants to do a simple exercise. This one, in fact. It really is fairly simple… not FizzBuzz, but still trivial. This gets reviewed by the entire hiring team. Passing it is somewhat subjective; there’s a list of things we look for, along with an ineffable “code style” component. This weeds out about 90% of submissions.

Next comes a phone interview with those who passed the test. Again, with the entire hiring team on the line. Since you’ve made it this far we’re pretty positive about you, and strongly suspect that you can code competently, but we’ll still try to dig into your past experiences and quiz you on the reasons for things you did on our exercise. Ideally you’ll have left a small bug in the exercise that we can get you to debug while we’re on the call. It’s not uncommon for us to be underwhelmed here, and we tend to turn down maybe 60% of candidates on this step.

Finally comes a three month trial period. Which we take seriously. I’d worked places with a trial period before, but it was always just a matter of not showing up to work drunk and you’d make it. We evaluate performance, work style, and general team fit… and we seem to lose perhaps 20% of hires here.

Adding all that up, P(we hire you) would seem to be 0.0016. Or 0.16% if you prefer.

The step that seems the most telling is the exercise, because it goes out to people who are generally already making a living as a developer. And, like I said, it’s not difficult. So we get to see how many professional developers apparently can’t perform their job function. For that matter, a hefty chunk of the people who fail that step just plain didn’t complete the listed requirements.

Despite all that, feel free to apply! 😀

UPDATE 2011-10-01: Some people have mentioned that PHP is keeping them from applying. Can’t blame them there; it’s not cool and modern, and it has some ugly warts. But we’re an old website (11 years now), and back in the day it’s what was picked. Rewriting into something cool and modern is technically possible, but would also be a genuinely ridiculous amount of work.

UPDATE 2011-10-02: It was pointed out to me that I may not have conveyed my point correctly. I mean to say “hiring good people is hard”, not “we are too awesome for you”. The numbers are intended to indicate our low success rate more than our exclusivity.

Random recommendation: Virgin Mobile

After resisting smartphones for many years, I got one a few months ago, just before I moved. It turns out that they’re pretty nifty. Who knew, right?

I got an LG Optimus V on Virgin Mobile. One of the big things that had been stopping me from getting one before was that I couldn’t quite bring myself to pay as much as you have to for a phone plan with data. Virgin Mobile gets around this objection by offering phone + unlimited data for $25/month… which is frankly insanely cheap compared to everyone else that I looked at.

It runs Android, and is pretty easy to root and flash to a newer version, which I have done. I held off on it for ages, but then my wife did it and I was impressed enough by CyanogenMOD to want to do it myself.

It’s not a fancy phone; it has a smallish screen, and not a terribly powerful processor. But sufficient for my needs.

So, consider this recommended. If you’re cheap like me. 🙂