I’ve seen several instances where people have demonstrated the ease with which encrypted cookies can replace sessions within PHP. Michael Nitschinger wrote a piece recently demonstrating the switch with Lithium, while CodeIgniter does this by default (optionally encrypting). The problem is that while replacing sessions with cookies works, it introduces a few risks not present with native session support, and these risks tend to be under documented.
Encryption is often viewed as a panacea for security problems, you sprinkle a little encryption dust around, and your problems dissolve. Unfortunately, while properly implemented encryption solves the problem of other people reading your encrypted data quite well, it doesn’t do much else. It doesn’t (on its own) tell you if other people have duplicated your data, manipulated the data in question, or handed you an old copy of properly encrypted data.
Consider an attacker who manages to sit in between Amazon and one of their warehouses. Amazon (in this theoretical example) encrypts all order instructions well, then transmits them to the warehouse. The warehouse decrypts the instruction, fills the order, and ships. If the attacker wanted free stuff they could place an order during a low traffic period, and wait for Amazon to send an encrypted message and make a copy. A few minutes later they could re-send that encrypted message to the warehouse, never having even tried to read it. With luck, they’ll soon receive two of whatever they just ordered! This is an instance of the replay attack, and that vulnerability is the one this post will examine in detail.
Exploiting cookie based sessions
Executing a replay attack against a cookie based session is easy, legitimately obtain some desired state on the system in question, and make a copy of your cookies. Should something go wrong, simply restore your cookies to the previous state. To demonstrate this, I’ve created a sample gambling application, you can get the source from github, or Play Now! (thanks to Orchestra.io, it's a free account so give it a moment)
In the game you start with $1000, then have the ability to make arbitrary wagers. Make a few bets to get a feel for the system, then make a copy of your cookie values. Bet again. If you lose money, simply restore your earlier cookies to get that money back. That’s it. This attack works because the system is willing to accept any successfully decrypted message as a valid, and current, session.
This attack doesn’t exist under traditional session storage. The user’s cookie doesn’t change from one request to the next as their money goes up and down. If a system is regenerating the session ID when the user changes privilege levels, the old session is deleted and no longer available when invoked properly: session_regenerate_id(TRUE);
A system using cookie based session storage, while using a CAPTCHA to defeat bots is doomed to fail, the attacker could solve the CAPTCHA once, then re-use that same cookie a million times. Encryption makes it difficult for an attacker to read the message, additional systems must be bolted on to defend against this sort of attack. In the stateless world of HTTP and the web, this is exceedingly difficult without some server-side datastore.
While this was clearly a trivial example, the problem is not. If you’re looking at using cookie based sessions it’s one of the things you need to be aware of.
If you’re interested in learning more about cryptography, including a great rundown of this problem and many others, I’d highly recommend Practical Cryptography
Take a look at this picture I took in the mall, in particular the Reitmans, Stokes, and Carleton Cards stores (left to right accordingly):
The stores look very different, mostly due to the colour of light they've used. In the Carleton Cards, there's a yellow light, adding a bit of warmth to a store built around relationships and expressing feelings. Stokes (a kitchen store) uses a very clean, white light, appropriate for clean kitchens, and pushing that meme. Reitmans (a clothing store) uses a rose coloured light which improves the look of skin for most people (especially us pasty Caucasians in the middle of winter).
Just another way stores are marketing to you that's easy to miss.
Here’s the first email I found with the word “if” in it:
We wanted to let you know that your order has shipped. If you ordered multiple items, you may receive separate shipments with no additional shipping charges.
My problem is that the email contains an “if”. There’s no reason for it. The computer that sent the email either does have access, or should have access, to my order. It knows that I only ordered a single product, so there’s no point in wasting my time with any other information.
My bank does something even more confusing:
A fee of $1.50 will be charged in the currency of the account for each cheque viewed. The fee will be debited from your account by the next business day. You may view a cheque as many times as you wish during your current EasyWeb session.
For personal chequing accounts the View Cheque service is free for customers who havePaperless or Online Only record keeping.
I have no idea what kind of account type I have. I push “Checking” to get money out of the ATM, but that doesn’t seem to be what it’s asking for. I opened this account when I was 16 years old, at a branch and bank that no longer exists. Do I have “Paperless” or “Online Only” record keeping? I’m not sure. Not knowing may cost me $1.50, but the computer running the bank most certainly knows what options I have on my account. That’s how it will know to charge me if I’ve got the wrong kind of account. Yet, I’m faced with the question.
Seeing these issues reminds me of an article I read back in 2004: Ten most persistent design bugs in particular “let you save me some time.”
In both these cases the developer decided to let me save him or her some time. They didn’t complicate their page or email template with an if structure, and instead presented one to me. Every Single person who interacts with that page, or gets that email is now forced to determine which case applies to them, all an effort to save some work on the part of one developer/designer some time ago.
It’s a stupid waste, and I wish it would stop.
I wish it was easy for me to persistently and privately tag an individual on the Internet with a keyword, then hopefully a short justification. When reading an interesting blog post or tweet you could tag them as interesting or understands cryptography or the like. Then the next time you ran into a post, comment, tweet, conference schedule, whatever from that individual that tag would be visible.
It could in theory live almost entirely in the browser, but I think it would work best with some sort of centralized backend to facilitate cross device consistency.
It doesn’t need to be perfect, nor does it need to pierce any sort of attempt at anonymity. Think about gravatar, a great service that plugs your picture into blog comments through your email address. It’s neither perfect nor completely pervasive but it’s still helpful.
To be clear this isn’t something I want to add onto Facebook or Twitter. I’m not friends with these people, I don’t need to hear when they’re drunk, dating, or farming pixels. Nor do I want to read every inane comment or checkin they push to Twitter. I’d simply like a little icon to appear the next time I encounter them on any medium letting me know they’re brilliant, or a racist bigot so I can react appropriately…
Apart from the (hopefully) apparent utility, I also think this helps bring us back something we’ve lost. One hundred years ago if someone said something interesting to you they were standing in front of you (or at least within earshot) and you’d likely have an easy time remembering that individual to credit them later, or simply pay extra attention that individual attempted to share a remark. By the same token if someone said something idiotic and offensive you’d have an easy time remembering them, and simply walk to the other side of the bar when they mount their soap box. We’re exposed to opinions from nearly faceless individuals at a terrifying rate these days, and along the way lost the ability to appropriately credit the things we’ve seen, this wish would make serious progress to fix that.
First, the love story:
I ran into hooks rather simultaneously with two very different frameworks: Code Igniter and Lithium. In both cases I was using a rather nifty hook to handle ensuring that users were properly authenticated and authorized before accessing a page. I think we can all agree having to add some code to every single method is foolhardy: if (!isset($_SESSION['user_level']) OR ($_SESSION['user_level'] != 'admin')) { header('Location: ' . APP_ROOT_URI); exit; }
Hooks provide a great way to solve the issue. Within Lithium you can leverage the filters mechanism to handle authentication quite easily, in fact it's an example case in their tutorials. The beautiful thing about it in my mind is the simplicity of things at the controller level: I simply list the publicly accessible methods in a property (public $publicActions = array('login');), and everything else is assumed to only be accessible to logged in users. Fantastic! If I mess up when adding a new method, it defaults to closed, which is exactly the result I'm looking for.
Over with Code Igniter things are really quite similar. The post_controller_constructor hook can be used to invoke a specific class and controller. That class can return true, redirect a user, whatever as appropriate. Since it's invoked after the controller is instantiated, similar configuration options can be made available.
The honeymoon
I ripped redundant, error prone, easy to forget, and fundamentally stupid checks out of all of the controllers where I'd added them. These new systems were much easier to maintain, required a lot less code, and I didn't need to add 10 lines of unrelated bloat into each controller. Life was grand. Days where I wrote negative lines of code felt glorious.
The big fight
One day, while messing around, I accidentally turned off the hook configuration within Code Igniter (actually I clobbered a file, and restored the wrong one). Then, things came crashing down in a horrible cacophony of... actually they didn't. Everything kept working: that was the problem. The entirety of my security system was turned off because one file was wrong, and things kept working. Sure, specific calls that referenced the current user's username broke, but there was more than enough left vulnerable for me to get a big chill.
Counselling
Revisiting the hooks system, I was shocked by the tremendous lack of depth in my defense: one mis-configured file and security was off. Even worse, nothing broke. There was no evidence that the security was disabled unless you went probing, which is horrible. The only thing worse than a safe that won't lock, is one that looks like it's locked but pulls open to a slight touch.Conciliation
The easy thing to reach for with both Lithium, and Code Igniter is __construct(). A single, unified location to ensure that the authorization tests have been executed. Unfortunately, in both cases __construct() is called long before the authorization hooks are run. More specialized solutions are required.With Lithium
The __invoke() method is invoked after the authorization filter, so it's a great candidate for double checking. public function __invoke($request, $dispatchParams, array $options = array()) { if (!defined('AUTH_CHECKED')) { throw new DispatchException('Authorization filter not run.'); } return parent::__invoke($request, $dispatchParams, $options); }
With Code Igniter
The _remap() function is called (when available) to allow you to remap incoming requests to a different method. Since it's invoked universally, the check can go there. public function _remap($method) { if(defined('AUTH_CHECKED')) { $this->$method(); }else { exit('ACL Configuration Error'); } }
Both of these cases provided me with the depth I was looking for. I'm no longer entirely dependent on one configuration option or file for my security to function. Should it fail, I've got a secondary check in place; this example of defence in depth allows me to be comfortable with the hooks security system once more.
Final thoughts
Through researching this, and exploring several code bases to which I have access, I've noticed two distinct strategies for managing method level access. In Lithium, each class indicates which methods should be accessible to which security groups using a public property; by contrast, in the other strategy, public methods were explicitly laid out inside the single authorization function. The former has the advantage of allowing you to quickly and easily manage security while you add functions to a given controller; the latter solves the issue of your authorization rules being scattered across each and every controller by centralizing them in one easy to locate file.
Updates!
Some suggestions have cropped up:
- Unit Testing While I'll agree that great, properly implemented, unit testing would have caught this it still doesn't leave me feeling comfortable. First, I've seen a lot of unit testing code, and much of it wouldn't have caught this. Either because the code specifically tested the authentication, and authorization code on its own, not combined with the actual controllers. Or because it used its own configuration file during testing, missing the fact that something had been removed from production. Second, I'm not sure that simply exposing the issue with unit testing would make me comfortable, I definitely would have caught it faster. Possibly even as soon as I'd committed the code. But I'd still like something running along side the code constantly to make sure the checks are in place.
- Auth Specific Class As suggested by Chris Morrell (in this series of tweets) I could be using a dedicated authorization class. While this does solve the configuration problem by calling it explicitly, I'm losing granularity or making a call with every action. I'll lose granularity if I decide to put one initial call in my constructor (or other early globally called method) since it affects the entire class. Ending up making a call with every action was exactly what I was trying to avoid in the first place. That said, the use of getIdentity() vs requireIdentity() is quite slick, I like how simultaneously explicit and transparent a normal action becomes.
"This doesn't sound hard"
All I want is a DVD ripper that can do what a CD ripper did over a decade ago. I'd like to insert a DVD into my computer, have it look up the meta data on some sort of modern CDDB equivalent, then rip the disk with appropriate meta data for iTunes. It will need to understand the difference between Television and Movies, as well as how to rip the right audio track, and what chapters are associated with what episode. This really doesn't seem that hard. Yet, I don't have one. Everything I've seen requires me to figure out those associations, or handles meta data poorly.
RipIt: Is a functional ripping tool, it however doesn't seem to understand TV disks. I put in Battlestar Galactica Season 1, Disk 2. It ripped it into a single file, that turned out to be something like the middle of episode 3 to the end of episode 4.
Handbrake: Another functional ripping tool, it requires me to figure out which chapters are associated with which episodes on a disc. I don't want to worry about this, put a disc in, wait, take disk out, repeat.
I'm on a mac, though if something truly perfect came out for Linux or Windows I'd rip from a VM.
I'm updating my theme at present, this may take a moment.
I'm still not happy, but it's at least legible now, I'm going to try blogging again as a constant kick in my own ass to fix it further
This is a rant, I admit this up front.
Since I moved to Quebec five years ago, I've spun up two business: a sole proprietorship and Corporation. I've always felt like I had to prove something to the province, as opposed to them helping me succeed. Most recently with the corporation I've run into a slew of problems:
- Quebec corporations are issued an NEQ (Quebec Enterprise Number (numéro d'entreprise du Québec)) when started, this number is used when dealing with Government agencies, it's also a requirement for dealing with several heavily regulated industries like Banking. Despite registering WonderProxy Inc. a month ago I've yet to receive one due to some "upgrades" they're putting in place. Not being able to open a bank account is a serious detriment to any business operation. Lacking a bank account prevented WonderProxy from verifying PayPal status, allowing partners to invest in the business, and GoDaddy from issuing an EV Certificate.
- When obtaining tax registrations for WonderProxy I picked up a booklet on taxes to ensure that WonderProxy could charge taxes correctly within Canada (note to american readers: even online we charge taxes in Canada when the purchaser is also in the country). This booklet contained incorrect information for Canada's three most populous provinces (British Columbia, Quebec, and Ontario). If I had charged the taxes as indicated the business would have been liable for the difference, and possibly faced fines.
- The search engine that allows interested parties to research business has been failing silently intermittently for at least a month. When searching for "WonderProxy" you'll occasionally receive the correct results, and occasionally a "No Results" message. When the search engine breaks they simply return zero results, rather than an error. Leaving some groups to determine that WonderProxy Inc. doesn't exist.
- The tax office allows businesses to log in and pay taxes, correct forms, etc. This website was down today from 2:30 -> 3:30, at least that's what it said initially. At 3:45 that message was updated to indicate it would be down until 4:30, then around 4:50 it updated to indicate that the site was simply down. I'm working to resolve some other tax issues, so being able to check on the state of things might have been handy.
I'm beginning to regret opening a business in this province. I've also spun up two businesses in Ontario and found the entire process much easier. I'd like to point out that none of the issues I've presented here have anything to do with language (though the French error messages on English pages does get a bit grating in time).
Wow, this is my fourth registered business. Also: my last in Quebec.
Luckily, owning a business with VPNs made solving the problem for myself quite easy.
Then some pesky friends came along and insisted I offer this as a service. So now we do: WonderProxy - VPN. I'm targeting the service at people, like me, who use their laptop on the go and want to protect their communications. When you "VPN Up!" all of the communications between your computer and our server are encrypted, there's no need for individual applications to support anything, the process is effectively invisible to them. Your communications are decrypted at our server, and sent out onto the Internet at large. This does mean that, in theory, we could be eavesdropping on your communications, but we're not. I'd like to think that I'm a bit more trustworthy than the hipster with skinny jeans sitting in the corner smirking when you open your laptop, plus our very expensive lawyers wrote a privacy policy.
My partner Will put a lot of hours into configuring the software just right, it works with the native VPN clients in WindowsXP, Windows 7 and Windows Vista, as well as the built in client on MacOS X. He's a linux user, but configuring there happens to be a bit of a pain (we can share instructions, but these are developer/sysadmin level instructions, not for the casual end user). It also works when you've got a public IP, or when you're behind a NAT. If you're in a hotel that sells "VPN Internet" as an up-sell you probably won't need it (I never have).
I've had perfect success using the VPN in Hotels and friends WiFi, great success using it in independent coffee shops (or ones that band together on some lightly branded WiFi), and mixed success using it in big brand shops.
So, protect your communications and sign up: WonderProxy - VPN
p.s. We're currently offering a single VPN endpoint on that plan, and it's remarkably well connected in Fremont California. If you're in Europe and don't particularly feel like waiting for your packets to cross the pond twice, drop me a line after you sign up and I can transfer your access to our London server. I've got a limited number of slots I can open there.
To be honest, as much as I’m looking forward to speaking, I’m looking forward to Andrei’s talk on what happened with Unicode and PHP 6 more than I’ve looked forward to a talk in a long time. Despite being saddened by the decision, and affected by the lack of great unicode support, I’ve never fully understood what happened. I’m looking forward to changing that.
Most of why I was looking forward to going before the speaking line up had been announced is simple: PHP needs this conference. User group conferences don’t seem as prevalent as they used to, and the ones that have survived seem to have either shrunk and become hyper-local, or grown to mimic the larger commercial conferences. There’s just too much value in the connections you make, and the conversations you have in a reasonably sized community conference to let them fade away. Thanks to Ben, Lisa, and Nick for making sure this doesn’t happen.
(I might be remiss if I didn't mention the cost, only $300USD, even in snow-pesos (CAD$) that's not a lot of money)
My route to Wellington lead me through Los Angeles with a possibly tortuous 11 hour layover. My good friend gwoo was kind enough to rescue me from the purgatory that is the airport departures lounge, I would have been thankful for a decent lunch (any day that omits an airport Chili’s is a good day) but he had grander plans in mind. Lunch on Venice beach, watching the world walk, skate and cycle by. Following lunch we sailed up the coast, I’ve enjoyed boating before, and I’ve always enjoyed walking through nature. Leaning on the mast, looking out onto the ocean with only the sound of the water rushing by was glorious. I hadn't been feeling particularly stressed as of late or anything, we just had a great release at work, but the release of just standing there on the boat was awe inspiring. Just sailing around we saw the seals on the green bouy, and some dolphins just playing around. It was a great few hours, the physical work of raising the sails, tacking, etc. just added to the experience.
Sails suitably stowed we headed back to venice beach to catch the sunset at the skate park. The feeling of community at the park was great: people skating, people taking pictures, lots of people just watching. There was a real sense of community that I hadn't perhaps expected. Skating has never really been my thing, but the park seemed fantastic, and a fantastic idea. There was a possibly deliberate lack of lights there, so there's a natural curfew for the park. An interesting idea.
Finally, after yet another arduous flight, at roughly 13 hours I think my longest ever, I made it to New Zealand. Luckily I managed to sleep a bit on the plane so I was only slightly delirious when I cleared Biosecurity. One last flight from Auckland to New Zealand, and another entertaining Air New Zealand security video (the videos are funny, better than Westjet's improvised comedy I think). At long last, I've made it to my hotel, and a fantastic view. I've only been here a few hours, but I'm really starting to like this city. It feels like a real city, but it's small at the same time. Cuba street is fantastic, lots of interesting stores. The street is busy and vibrant without being crowded (though the weather isn't great) a stark contrast from popular streets in New York or Montreal. I like it here, and I can't wait for Webstock.
You’re indubitably familiar with various php scripts that accept an image, and bounding box as parameters, it then resizes the image to fit within the bounding box, returning the output. phpThumb seems to be a popular option.
These scripts tend to be incredibly helpful for sites where the designers change layouts with any frequency, and a large pool of existing images would need to be resized for every modification. I’ve implemented various thumbnailers in the past for just this reason. Generally some measure of caching is implemented so that the first request against a given image, with a given size is stored for future use.
The vulnerability most of these scripts present is the ability of an attacker to manipulate the width and height parameters to force the server to generate an incredibly large number of images. Servers cache based on the requested output size, so it’s easy to step through 1 pixel increments to occupy the servers time, and fill up the cache. A single 1024x768 image can be requested 786,432 times and require resizing each time (1x1, 1x2, 1x3, 1x4, etc.). Your webserver has a finite number of workers available to service incoming requests. When all the workers are occupied new requests are either ignored, or forced to wait for a worker to become ready. A sufficient number of concurrent requests can be an effective Denial of Service attack against the server
Most of the popular scripts are also willing to return an image larger than the original, generally by surrounding it in a black or white bounding box, rather than actually upscaling the image. While this compresses remarkably well, if your source image is 120KB it takes only 8,738 cached requests to fill a gigabyte of disk space (assuming perfect compression on the bounding box). A full hard disk on a single volume machine is disastrous. While not quite as bad on a multi-volume configured system, it still prevents any further images from being cached, scripts also tend to handle failed saves poorly.
Attackers now have two Denial of Service attacks to exploit simultaneously: occupying all your workers with resizing tasks, and filling your hard disk. Sad Times.The solution is to prevent users from requesting arbitrary versions of your image, by either hard-coding valid sizes into your thumbnailing script (e.g. an array of valid sizes used as a white list), or implement a hashing scheme to prevent attackers from making changes. Several of the popular thumbnailers offer one or both of those options, however I haven’t seen either implemented on any of the production sites I found while researching this post.
If you decide to list valid resize options, and would rather your design team not hate you: Allow arbitrary sizes in development, but watermark the image if it’s not on the whitelist of sizes. A constant reminder that it’s not yet production ready, but wont slow them down while they’re experimenting.
(commentary on the state of Quebec Health care at the end)
I had a sore throat starting last week towards the end of my trip to Seattle, no big deal. It persisted and got worse each day, and by Tuesday I was convinced it was strep, there was a lot of swelling back there. Wednesday soup was about as much as I could eat, and even that was hard, I decided to see a doctor on Thursday. I prepared a brief document explaining my problems as speaking was difficult (I even had google translate take a whack at french) I went to the walk-in-clinic first thing Thursday morning to discover that they didn’t actually accept walk-ins until 2:00 that afternoon. Allison rescued me and took me to another (open) clinic where everyone spoke English. My fever was high enough that they fast-tracked me, the doctor who saw me took one good look, concluded it wasn’t strep, and decided I was hospital-bound.
The swelling in my throat had started symmetrically, but the right side had swollen enough that the uvula was seriously off center, and had actually pushed my jaw off to the side. On a 0-10 scale (0 = No pain, 10 = Most extreme) I was at about a 4 for much of the week, a 5 by Thursday morning, and a solid 7 by the time I got to the hospital.
I went through triage, handed over my admission form from the walk-in-clinic doctor, and waited. The secondary examination nurse gave me the option between a suppository tylenol, or oral. I opted for the latter, but it was a tough battle by then, even with a very small pill. She then took some blood, and I promptly fainted. Wheeled out of the room I saw the E.R. doc who was incredulous I wasn’t yet on antibiotics (silly clinic doctor!), she then proceeded to not give me antibiotics. I was also visited by the Short Stay unit (an E.R. step down unit) as I would be admitted and would spend the night. There was universal appreciation of just how my my throat was swollen (“wow! that’s huge”). Then I got morphine and a saline drip, and finally some antibiotics, they put me in a gurney for this which was probably prudent: everything is awesome when you’re on morphine. The ENT doctor arrived to lance and drain my throat (as gross as it sounds). Despite both topical sprayed antiseptic, and injected stuff, this was a full 10 on the pain scale, plus I couldn’t breathe. Not my finest hour. Allison did get to watch it happen though, apparently it was pretty interesting; I was busy trying to scream. After spending a few minutes to recover, and some more IV pain killers, I was wheeled up to the short stay unit. They hooked me up with IV antibiotics, provided me with a vacuum tube to drain any saliva (still couldn’t swallow) and I promptly fell asleep.
My now seriously abused mouth and throat had two distinct states: either it was open and incredibly dry, or it was closed and full. Since my sinuses were congested my mouth open when I slept. So I’d wake up every hour with a painfully dry mouth and some new swelling in my palette. I’d chew some ice chips (getting to use my jaw for something was glorious), and swish some water. Closer to morning I was mostly able to swallow water (though with great difficulty).
I spent the day in the short stay unit on a clear liquids diet, not very interesting (or filling). My ability to swallow and then to speak improved as the day went on. Towards the evening a new ENT doctor came by to examine my throat. He did a quick test with a needle and found more pus, so he drained it once more with Allison conveniently holding the flash light. The swelling reduced considerably and immediately. He wrote me a prescription for serious antibiotics and pain killers and sent me home. Having now had a full night of sleep (the first in many nights), I’m feeling pretty good.
If you're interested in what my unshaven, not-recently brushed teeth, but recently lanced throat looks like (gross): mouth pic. You can sort of tell that the swelling has pushed the uvula off to the side in that picture, during the peak of it, it would have been much further off.
A comment on the state of health care in Quebec.
It’s pretty good (or at least it was for me).The time from when I decided to actively seek medical attention to a trained specialist applying an intervention to resolve the issue was less than 12 hours. That time includes having gone to a closed clinic, waiting for a ride, driving outside of the city, then back to the downtown ER, etc. A more accurate number would probably be nine hours. Sitting in a waiting room feeling like crap is a pretty miserable place to be, but really, it’s fine. In both the walk-in-clinic, and the ER I saw new patients arrive, and be seen before me: this makes sense. They were in worse shape medically than I was. I also saw myself seen ahead of some people and again I think this makes sense: I had a decent fever and my airway was at risk (at which point things tend to get serious rather quickly). In the ER I was seen in a room by a doctor, then moved back to the waiting room until I was needed again. This was kind of annoying, since the waiting room is miserable and full of sick people: in an ideal world they’d have enough rooms to deal with all the people they’re seeing at the moment, but that would be a LOT more rooms, and it wasn’t the end of the world. Having us all in a single room also made it easier to monitor us: this way someone who arrived alone couldn’t fall unconscious un-noticed. So it may even be sensible this way. Would it have been nice for things to have gone “faster”? Sure. Am I happy with the level and timeliness of care? Absolutely.
The most annoying part for me, was watching the relatives of the ill either walking around glaring at everyone who had received a bed (while their loved-one hadn’t) or asking everyone they could find how much longer it would be. I would honestly estimate that forward-facing medical staff spend between 5%-10% of their time politely telling people that they have no idea, or can “look into it to make sure nothing has been missed”. The five well relatives to a single ill patient ratio some groups were presenting didn’t make the crowded level of the ER waiting room any better either, especially since it was apparently “a very busy day”.
The tool procured some great press last week with SwissMiss tweeting about us, and Life Hacker picking it up. Accordingly, things broke. This was rather unfortunate as I'd sort of planned for traffic spikes, and this sort of thing shouldn't have happened. Two key things went wrong:
The pecl_http extension seg faulted on certain requests. supervisord promptly restarted the worker, which then picked up a similar job from the queue, and seg faulted again. This happened enough times that supervisord gave up on the worker and left it shut down. Gearman detected that the job was never completed, and re-queued it, ready to crash another worker when it came up. The issue was ultimately caused by incomplete HTTP response headers lacking the reason phrase, a number of systems seem to omit that message, crashing the worker.
Ilia was able to patch pecl_http, and we've updated to the more recent release to obtain that fix.
gearman re-submitted crashed jobs forever. This ensured that all workers eventually died with even a very small number of requests that caused the workers to crash. This is silly for a few reasons. First, it allowed this to happen. Second, these requests are timely, there's no point in trying something thirty times as by then the web client has given up on the request itself and the data will never be used.
We've re-configured the gearman system to only retry a job once. This should allow random issues to be retried, but prevent pervasive problems from crashing the system. If you're running gearman I'd strongly suggest supplying some sort of maximum retried value using --job-retries=N
We've come through the outage and fixed both elements of the problem (though fixing either of them would have prevented an identical issue from causing a problem). We're also looking at better ways of monitoring this to be informed of problems sooner.
Our apologies for the outage.
There's a popular turn of phrase "pave the cow paths", which was introduced to me by my friend Chris Shiflett in one of his talks. The essence (as I understand it) of paving the cow paths is that it's easier to positively encourage users to act they way they already want to, than to have them change their behaviour.
Twitter has some great examples of paving the cow paths. Look at @replies, not a feature they built into the service, just something that developed through use. Later developers included features within the product to better support this. Hash tags followed the same route, possibly co-tags will come later.
In each of these cases the developers had the opportunity to observe how their users behaved (creating their own paths), then worked to encourage and support their behaviour (paving them).
I think there's a lot to learn here for people working on a new project or product. You can't hope to guess all the ways users will want to interact with your products. Those unexpected use cases may turn out (in the long run) to be a major part of your application. You can however release your core feature with some flexibility and mind, then watch.
I've written before about the hard choice of getting something right, or getting something up, and I argued then for just getting it up. I think this is another great reason to follow that route. In between releasing the product, and determining where the paths lie you'll have some time to round some corners and fix some bugs. Don't worry.
Finding the Paths.
Twitter had it easy, watch the stream see what users are doing. Your project is different, your paths will be too. Here's some ideas:
- Log routes not just hits.
- Common paths
- Wasted steps (do users always go home -> friends -> news? Then include News as a top level link, or drop them there in the first place)
- Give them somewhere to chat
- Leave it open
- Divide and Conquer
Your access log will contain a record of all the pages your users visit. Turn this raw data into information by tracking individual users as they navigate your site. Your webserver should allow you to modify the logging format to include enough unique information to track users (session id?). While IPs are not unique, they may be sufficiently unique for this purpose.
Things to look out forIf they'll use it, user forums can be a great resource. Some of your more passionate users will start arguing for features with each other, saving you time and presenting ideas.
Many sites still try to lock you down when you register. A service devoid of APIs, restrictive comment fields and closed data. If your audience is at all techy, they'll try to pave the paths themselves while they're using the service. Wether it's greasemonkey scripts, bookmarklets, or full api implementations. These tools are invaluable maps to where the paths are being laid. If you've provided users with a place to chat (forums) you've provided a natural place for the developers of these tools to congregate and share. Help them! Then make them redundant by improving your app. Keeping them in the loop throughout the feature development process (hey! we love what you've done here, so much in fact we'd like our app to do it for everyone. Here's a preview what do you think) is a great way to solicit early feedback, and avoid developer backlash.
In Malcom Gladwell's article The Ketchup Conundrum (also appearing in What the Dog Saw) we learned that there is no one perfect spaghetti sauce. Different people want different things, chunky vs smooth, thick vs thin, etc. Your service may be no different. Again, consider twitter. There's users who tweet each inane portion of their lives to a few close friends. Others who tweet carefully and selectively. Since different people will want to use it differently, stop trying to find one trend within the whole. Instead look for segments in your user base, and discover what their needs are. Where needs prove to be mutually exclusive, provide the ability to customize the experience.
