Rails, Twitter and the 800lb Gorilla in the Room…

When you first start using Rails, you quickly find there’s an 800lb gorilla in the room that no-one really wants to acknowledge. Yes, the ’s’ word: Scaling. Oh, they’re quick to tell you that it’s an uninteresting problem (at least for a programmer), if you do Rails right, and you’ve shared-nothing, then you just throw more hardware at the problem. That if you have a real bottleneck in your code, just rewrite the performance critical parts of it in C. Oh, and YARV will be here soon (for certain values of soon), so Ruby will be faster!

And suddenly, that gorilla shrinks; or at least it seems to.

Until someone goes and pushes the boundary way beyond those who’ve come before. Suddenly, the gorilla has grabbed your girl, climbed to the top of the Empire State and is angrily swatting light aircraft left, right and centre.

Scaling suddenly gets interesting again.

Nay-sayers and trolls cloud the issue
Those who for whatever reason want to see Rails fail, have grabbed ahold of the issue and are casting it as a reason to abandon Rails: “Behold, the savage beast that cannot be tamed!” Often, they’ve got their own agenda, whether it’s a desire to see their pet platform succeed, or to see DHH kicked to the dirt, these folks don’t really care about the scaling issue at all. They latch onto these issues not to help improve the platform, in the worst case they want to destroy it.

So forget them.

We’re here because we want to solve this without changing platforms.

It doesn’t matter to most of us anyway
The truth is, 99% of Rails projects will *never* hit a performance wall that cannot be solved using existing wisdom. So the issue is a non-issue and certainly not a reason to ditch Rails.

If perchance, you’re part of the other 1%, you should be thrilled to have this problem, it means your software has been successful! Fact is, you don’t accidentally create a site/service which generates 11,000+ requests/sec. You explicitly set out to do so. And presumably, you chose Rails for a reason. Why shouldn’t we extend the concept of “no premature optimisation” to language and platform selection? If Twitter had chosen an “enterprise” class platform and tried to architect for 11k reqs/sec, chances are they would still be building…

Obvious had a reason to choose Rails; they aren’t naive, they’ve been around the block with Rails before.

Generally, it’s much better to throw something out there NOW, and let the real problems come out in the wash, rather than sit in a dark room, worrying about what might happen and giving yourself a case of analysis paralysis.

Escalation Sucks
This issue has been blown up by a combination of strong opinions, incomplete information and folks from other camps pouring fuel on the fire. It’s time this “conflict” was de-escalated and the combative nature of the discussion toned down.

The situation itself can be summed up pretty easily

  • We’re in uncharted waters in terms of this sort of throughput, although others have suggestions and maybe even solutions.
  • The problem didn’t just suddenly spring up on the Twitter guys, clearly they’ve been working on it for a while - implying they’re now crying to core because Rails is broken for them is disingenuous. (In fact, the whole “they’ve forsaken that opportunity for an arms-crossed alternative.” thing smells like something of a strawman argument - I didn’t get that from the interview, so unless that’s come out of some backchannel, it seems a peculiar accusation to make.)
  • Solutions that work for Twitter may not work for other Rails apps. Once you exhaust all of the conventional scaling wisdom, you start to move into application-specific optimisations. Are we there yet? I guess we’ll wait and see.

When the storm is over
This is a learning opportunity for everyone, but those who stand to learn the most are the core team and the Twitter guys. I hope each can continue to be open to the other’s position. We need to remember at the end of the day, Twitter are under no obligation to release patches, and the core team are under no obligation to accept them - I hope that doesn’t happen though, as the community would be the worse for it.

2 Responses to “Rails, Twitter and the 800lb Gorilla in the Room…”

  1. RubyPanther Says:

    The metaphor you’re using is supposed to be “elephant in the room,” an “800lb gorilla” is a totally different meaning.

  2. Bruce Clemson Says:

    I think you’re confusing some points. Being in that 1% doesn’t mean you’re “successful” it means you’re popular or maybe you just have heavy load in your particular problem domain (I work on a portal that enterprises use, it’s not uncommon for us to get a few thousand SSL logins over a fairly short period of time (10 to 15 minutes) and it’s not always an easy problem to have with a single box software product that usually runs on off the shelf hardware or some old box they had lying around, if we don’t solve it, they don’t pay.) If you’re in that 1% you’re also probably in a sitation where your success is dependant upon solving that problem also, if you don’t you will most likely not be “successful.”

    As for the other 99% that’s true, scale is usually never an issue but if there is any single truth about software and its production, it’s that applications almost always outlive the expected life at the start and it’s never a good time to rewrite something. The way it works for those 99% (and probably a lot of the applications in the 1%) is that they work fine, they do a job, the author moves on to other problems and maybe entirely different applications and then once people start relying upon the program and really start using it, the scale problems show up. Almost universally, fixing the scale problems will be a larger effort than the initial construction and that’s time not adding new features and if you’ve got customers it might be time that you simply do not have to really think through all of the problems, they want a solution and they want if fast.

    At the end of it, it’s just risk. Will you have that problem? Most likely not, but having it doesn’t always equate to being a “good thing.”

    If twitter starts going down once every 15 minutes and you routinely start seeing error pages, how long do you think they have to fix that before something else takes their community away? Seriously?

Leave a Reply