Andrew Quarton developed a nifty little visualization built using the Where’s it UP API called GeoPing. Go take a look then come back.

Our technology stack for the API includes supervisor to run workers, and gearman to manage our job queue. We’re normally running 25 workers to manage the queue. Work tends to come in chunks, and that number of workers has been able to keep the queue minimal or at zero.

Since it’s such an nifty tool, it made the front page of Hacker News today, which led to a few problems on our end. The number of jobs launched for each person hitting the GeoPing tool was rather high, enough to fill all the current workers. When many people started hitting the GeoPing tool in rapid succession the queue built and built. At one point Gearman reported 13,000 jobs in the queue.

Noticing this I quickly changed the number of desired workers in supervisor from 25 to 100, than used /etc/init.d/supervisord restart to apply the changes. That didn’t seem to affect the queue, so I tried 250 workers, used restart to apply the changes once more, and watched. Then I noticed something the restart option wasn’t launching the extra workers I wanted. Running /etc/init.d/supervisord stop, then start did. Then the queue finally started to recover. I kept an eye on the queue with a quick and dirty shell command from stack overflow.

(echo status ; sleep 0.1) | netcat 127.0.0.1 4730 -w 1

From our side, I think a few things went wrong:

  • We didn’t have tooling in place to warn us when the queue broached reasonable limits
  • We hadn’t documented the proper way to increase workers (stop/start not restart)
  • Our graphing system seems to have a hard coded max value, hiding valuable data

Having either of those first two items in place would have allowed us to respond to the issue much more quickly.

We're working on them :)


Comments »

No Trackbacks
No comments

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
 

Hi, I’m Paul Reinheimer, a developer working on the web.

I co-founded WonderProxy which provides access to over 200 proxies around the world to enable testing of geoip sensitive applications. We've since expanded to offer more granular tooling through Where's it Up

My hobbies are cycling, photography, travel, and engaging Allison Moore in intelligent discourse. I frequently write about PHP and other related technologies.

Search