We've made significant updates to the infrastructure supporting Where's it up this year. Many of these changes were necessitated by some quick growth from a few thousand tests per day to several million. Frankly without the first two, I'm not sure we could have remained stable much past a million tests per day.

MongoDB Changes

I blogged about our switch to MongoDB for Where’s it Up a while back, and we’ve been pretty happy with it. When designing our schema, I embraced the “schema free world” and stored all of the results inside a single document. This is very effective when the whole document is created at once, but it can be problematic when the document will require frequent updates. Our document process was something like this:

  • User asks Where’s it up something like: Test google.com using HTTP, Trace, DNS, from Toronto, Cairo, London, Berlin.
  • A bare bones Mongo document is created identifying the user, the URI, tests, and cities.
  • One gearman job is submitted for each City-Test pair against the given URI. In this instance, that’s 12 separate jobs.
  • Gearman workers pick up the work, perform the work, and update the document created earlier using $set

This is very efficient for reads, but that doesn’t match our normal usage: users submit work, and poll for responses until the work is complete. Once all the data is available, they stop. We’ve optimized for reads in a write heavy system.

The situation for writes is far from optimal. When Mongo creates a document, under exact fit allocation, it considers the size of the data being provided, and applies a padding factor. The padding factor maxes out at 1.99x. Because our original document is very small, it’s essentially guaranteed to grow by more than 2x, probably with the first Traceroute result. As our workers finish, each attempting to add additional data into the document, it will need to be moved, and moved again. MongoDB stores records contiguously on disk, so every time it needs to grow the document it must read it off disk, and drop it somewhere elsewhere. Clearly a waste of I/O operations.

It’s likely that Power of 2 sized allocations would better suit our storage needs, but they wouldn’t erase the need for moving entirely, just reduce the number of times it’s done.

Solution:

Normalize, slightly. Our new structure has us storing the document framework in the results collection, and each individual work result from the gearman worker is stored in results_details in its own document. This is much worse for us on read: we need to pull in the parent document, then the child documents. On write, we’re saved the horrible moves we were facing previously. Again, the usage we’re actually seeing is: Submit some work, poll until done, read complete dataset, never read again. So this is much better overall.

We had some slight work to manage to make our front end handle both versions, but this change has been very smooth. We continue to enjoy how MongoDB handles replica sets, and automatic failover.

Worker Changes

We’ve recently acquired a new client which has drastically increased the number of tasks we need to complete throughout the day: checking a series of URLs using several tests, from every location we have. This forced us to re-examine how we’re managing our gearman workers to improve performance. Our old system:

  • Supervisor runs a series of workers, no more than ~204. (Supervisor’s use of python’s select() limits it to 1024 file descriptors, which is allows for ~204 workers)
  • Each worker, written in PHP, connects to gearman stating it’s capable of completing all of our work types
  • When work arrives, the worker runs a shell command that executes the task on a remote server using a persistent ssh tunnel. Waits for the results, then shoves them into MongoDB.

This gave us a few problems:

  • Fewer workers than we’d like, no ability to expand
  • High memory overhead for each worker
  • The PHP process spends ~99.9% of its time waiting, either for a new job to come in, or for the shell command it executed to complete.
  • High load with 1 PHP process per executing job (that actually does work)

We examined a series of options to replace this, writing a job manager as a threaded Java application was seriously considered. It was eventually shot down due to the complexities of maintaining another set of packages, and due to the reduced number of employees who could help maintain it. Brian L Moon’s Gearman Manager was another option, but it left us running a lot of PHP we weren’t using. We could strip down PHP to make it smaller, but it wouldn’t solve all our problems.

Minimizing the size of PHP is pretty easy. Your average PHP install probably includes many extensions you’re not using on every request, doubly so if you’re looking at what’s required by your gearman workers. Look at ./configure --disable-all as a start.

Solution:

Our solution involves some lesser-used functions in PHP: proc_open, stream_select, and occasionally posix_kill.

We set Gearman to work in a non-blocking fashion. This allows us to poll and check for available work without blocking until it becomes available.

Our main loop is pretty basic:

  1. Check to see if new work is available, and if so start it up.
  2. Poll currently executing work to see if there’s data available, if so record it.
  3. Check for defunct processes that have been executing for too long, kill those.
  4. Kill the worker if it's been running too long.

Check for available work
Available work is fired off using proc_open() (our work tends to look something like this: sudo -u lilypad /etc/wheresitup/shell/whereisitup-trace.sh seattle wonderproxy.com). We save the read pipe for later access, and set it to non-blocking. We only allow each non-blocking worker to execute 40 jobs concurrently, though we plan to increase the limit after more testing.

Poll currently executing work for new data
Check for available data using stream_select() using a very low poll time. Some of our tests, like dig or host, tend to return the data in one large chunk; others, like traceroute, return data slowly over time. Longer running processes (traceroute seems to average 30 seconds) will have their data accumulated over time and appended together. If the read pipe is done (checked with feof()) close them off, and close the process (proc_close)

Check for defunct processes
We track the last successful read from each process, if it was more than 30 seconds ago we close all the pipes then recursively kill the process tree.

Kill the worker
Finally, the system keeps track of how much work it’s done. Once it hits a preset limit, it stops executing new jobs. Once its current work is done, it dies. Supervisor automatically starts a new worker.

Site Changes

Our original site wasn’t actually hugely problematic. We had used a large framework, which often left me feeling frustrated. I’ve lost days fighting with it, on everything from better managing 404s to handling other authentication methods. It was the former that lead to me rage-quitting the framework and installing Bullet. That said, we’re not afraid to pull in a library to solve specific problems, we’ve been happily using Zend_Mail for a long time.

Solution:

We’ve re-written our site to use the Bullet framework. Apart from appreciating micro frameworks in generally, they’re particularly apt for delivering APIs. We’ve had great success writing tests under Bullet, and being able to navigate code in minutes rather than hours is an everlasting bonus. This leaves the code base heavily weighted to business logic, rather than a monolithic framework.


Yesterday either the city or Toronto Hydro went through and replaced the streetlights in my neighbourhood. This seemed good, keep the lights working. Then night fell, and while walking to bed I wondered why there was a car with xenon headlights on my front lawn pointing its headlights at my bathroom. That was the only logical reason I could think of for my bathroom to be so bright. It wasn't a car. It was the new streetlights. My entire street now looks like a sports field.



Here is Palmerston, a nice street with pretty street lights (much prettier than ours):




Here is Markham, with the sports field effect:




I took those two pictures on the same camera, with the same settings. I exported them both using the same white balance value. Full resolution options are posted at smugmug. I have emailed Toronto 311, who has forwarded me on to Toronto Hydro. In the meantime I'm going to need upgraded blinds, and to start wearing my sunglasses at night.


We recently expanded the number of disks for the raid on the main server handling Where’s it Up requests. Rebuilding that array took roughly 28 hours, followed by background indexing which took another 16 hours.

During the rebuild, the raid controller was doing its best to monopolize all the I/O operations. This left the various systems hosted on that server in a very constrained I/O state, iowait crested over 50% for many of them, while load breached 260 on a four core vm. Fun times.

To help reduce the strain we shut down all un-needed virtual machines, and demoted the local Mongo instance to secondary. Our goal here was to reduce the write load on the constrained machine. This broke the experience for our users on wheresitup.com.

We’ve got the PHP driver configured with a read preference of MongoClient::RP_NEAREST. This normally isn’t a problem, we’re okay with some slightly stale results, they’ll be updated in a moment. Problems can occur if the nearest member of the replica set doesn’t have a record at all when the user asks. This doesn’t occur during normal operations as there’s a delay between the user making the request, and being redirected to the results page that would require them.

Last night, with the local Mongo instance so backed up with IO operations, it was taking seconds not ms for the record to show up.

We shut that member of the replica set off completely, and everything was awesome. Well, apart from the 260 load.


Back in the fall of 2010 I was kicked out of my apartment for a few hours by my fantastic cleaning lady, and I wandered the streets of Montréal on Cloud 9. Not because of the cleaning mind you, but because I’d kissed Allison for the first time the night before, while watching a movie on my couch (Zombieland). Great night. My wanderings took me to the flagship Hudson’s Bay store in Montreal, and inevitably to the electronics department. I decided to check out the 3D televisions.

The first problem I ran into with the 3D TVs was the glasses, they didn’t work. I tried on the first pair (which had an ugly, heavy, security cable attaching them to a podium), no dice. I tried on the second pair, no luck there either. Looking at the glasses in more detail, I found a power button! Pushed, tried them on, nothing. Repeated the process on the other pair, still nothing. There I was, big electronics department, with a range of 3D TVs in front of me, and the glasses didn’t work.

If I was going to describe the target market for 3d televisions in 2010, I might have included a picture of myself. Male, 30, into technology, owns a variety of different entertainment products and consoles, decent disposable income. As far as I could tell I represented the exact person they were hoping would be the early adopters on these.

While wandering off I finally encountered a sales person, I mentioned that the glasses for the TVs didn’t work. He told me they were fine, and motioned for me to follow him. It turned out that you had to push and hold the power button down for a few seconds in order to turn them on. As I put them on the sales person walked away, and I got to enjoy a demo video in 3D.

Well, sort of. First the glasses had to sync with the television, then I was all set, great demo video on directly in front of me. Of course, I wasn’t in a room with a single television, there was several along the wall to the right and left of the central television, and since my glasses had sync’d with the one directly in front of me (not the others), the other televisions had essentially been turned into strobe lights. Incessantly blinking at me. When I turned my head towards the blinking lights the glasses re-sync’d with a different television, a disorienting procedure that allowed me to view it properly, but turned the one directly in front of me into a strobe light.

So, after requiring aid to put on a pair of glasses that were practically chained down, I was being forced to view very expensive televisions adjacent to a series of strobing pictures with an absentee salesman.

Despite all of the issues, this was really cool! This was 3D, for my living room! No red-blue glasses either, this was the real thing! Maybe I could get one, then invite Allison over again for a 3D MOVIE! Clearly owning impressive technology was the way to any woman’s heart. While those thoughts were racing through my mind I caught my own reflection in a mirror, the glasses were not pretty. If you picture a pair of 3d glasses in your head right now, you’re probably imagining the modern polarized set you get at the movie theatre. Designed to fit anyone, over prescription glasses, rather ugly so people don’t steal them. Those glasses were years away back in 2010, and these were active shutter glasses. Rather than just two panes of polarized plastic, they each lens was a small LCD panel capable of going from transparent to opaque and back many times a second. They looked a little like this, but in black:

As I put the glasses back on the presentation pedestal and rubbed my sore nose I realized: There was absolutely no way I could try to kiss a girl for the first time wearing a pair of those. I left the TVs behind, I think I picked up some apples I could slice and flambé to serve over ice cream instead.

The kissability test: When considering a product, could you imagine kissing someone you care about for the first time while using it.


Stuart McLean is a fantastic story teller. I’ve enjoyed his books immensely, but most of all, I’ve enjoyed listening to him tell me his stories on the radio, either directly through CBC Radio 2, or through the podcast made of the show. This past Christmas, I was gifted tickets to see and hear him live here in Toronto, a wonderful time.

When I listen to him on the radio, as his calming voice meanders through the lives in his stories, I often picture him. In my mind, he’s sitting in a library on overstuffed leather chair, with a tweed coat laid over the arm, a side table with a lamp and glass of water beside him… perhaps a large breed dog dozing at his feet. As each page came to an end he leisurely grasps the corner, sliding his fingers under the page, gently turning it, just as an avid reader moving through a treasured novel. This is the man I picture when I hear the measured voice regaling me with tales from the mundane to the fantastic.

This could not be further from the truth.

The man I saw at the Sony Centre for the Performing Arts has nothing in common with the man I pictured but the voice. As his story began, as he lapsed into those measured tones, his feet never stopped moving. He danced around the microphone like a boxer, stepping closer to make a point, jumping to the side in excitement, waving his arms in exclaim, always ready to strike us with another adverb. When he reached the bottom of each page, he’d frantically reach forward and throw it back, as eager as a child on Christmas morning. It’s easy to fall under the spell of a great storyteller, to stop seeing and only listen, but his fantastically animated demeanour shook that away, and spiced the story in ways I couldn’t have imagined over years of radio listening.

Listen to his podcast for a while, then go see him live, he’s an utter delight.


I’ve had a few valentines days over the years. I’ve spent far too much money, I’ve planned in exacting detail, I’ve left things until the last minute, and I’ve spent a fair few alone. This year, I wanted something special.

Just after Christmas Allison was kind enough to give me a hand knit sweater she’d been working on for over a year. It’s fantastic. Around the same time she’d commented that she was jealous of my neck warmer. An idea stuck! I’d knit her a neck warmer!

Small problem: I’ve never knit a thing in my life.

I’ve never let not knowing how to do something stop me before, and this didn’t seem like the time to start. I headed up to Ewe Knit, where Caroline was able to administer a private lesson. First I had to use a swift to turn my skien of yarn into a ball. They appear to sell yarn in a useless format to necessitate the purchase of swifts, a good gig if you can get it.


Once I had my nice ball of yarn I “casted on” a process I promptly forgot how to accomplish. The process mostly consisted of looping things around one of my knitting needles the prescribed number of times. That number was 17 according to the pattern. Once I’d casted on the regular knitting started, each row involved an arcane process where I attached a new loop to the loop on the previous row. The first few rows were quite terrifying, but eventually I slipped into a rhythm, and was quite happy with my progress by the time I’d made it to the picture shown below.


Just a few rows later I made a terrifying discovery: I’d invented a new form of knitting. Rather than knitting a boring rectangle, I was knitting a trapezoid, and there was a hole in it. My 17 stitch pattern was now more like 27. There was nothing for it but to pull it out and basically start over.


Several hours, and many episodes of The Office later, I’d slipped into a great rhythm, and developed a mild compulsion to count after every row to ensure I had 15 stitches. The neck warmer was looking great! Just another season of The Office, and some serious “help” from the cat, and I’d be finishing up.


I headed back to Ewe Knit for instructions on casting off, where I tied off the loops I’d been hooking into with each row. Then I sewed the ends together, and wove in my lose ends. A neck warmer was born!


This felt like a success before I’d even wrapped it. I’d spent a lot of time working on something she’d value, I’d learned more about one of her hobbies, and gained new appreciation for the sweater she’d knit for me. She just needed to unwrap it.



She loved it.


WonderProxy will be announcing availability of a new server in Uganda any day now. We’re very excited. When we first launched WonderProxy the concept of having a server anywhere in Africa seemed far-fetched. Uganda is shaping up to be our fifth.

Our provider asked us to pay them by wire transfer, so I dutifully walked to the bank, stood in line, then paid $40CAD in fees & a horrible exchange (to USD) rate to send them money. Not a great day, so I grabbed a burrito on the way home. A few days later we were informed that some intermediary bank had skimmed $20USD off our wire transfer, so our payment was $20USD short. Swell.

In order to send them $20USD, I’d need to go back to the bank, stand in line, hope I got a teller who knew how to do wire transfers (the first guy didn’t), buy $20USD for the provider, $20USD for the intermediate bank, and pay $40CAD for the privilege. $80 to send them $20. Super.

Luckily XE came in to save the day again. Using their convenient online interface I was able to transfer $40USD for only $63CAD, including my wire fee. I paid a much better exchange rate, lower wire fees, and didn’t have to put pants on. The only downside was a lack of burrito. Bummer.

If you’re dealing with multiple currencies and multiple countries, and these days it’s incredibly likely that you are, I’d highly recommend XE.


At WonderProxy we’ve been helping people test their GeoIP sensitive applications since launch. It’s why we launched. Perhaps ironically it’s never been a technology we’ve used on our own website. With our upcoming re-design that’s changing.

Using GeoIP:

  1. How to use it
  2. Deciding how you’ll use GeoIP information will affect how much you end up spending for a GeoIP database, how often you’ll renew, and what safeguards you’ll need to put in place. Country level granularity is relatively easy to come by, city level within the US and Canada however tends to be much more expensive.

    Our integration goal is to support a nice slogan, for us that’s “You’re in .., Your customers aren’t. WonderProxy: Test Globally”. We opted for Region level as opposed to City, or Country level. We felt like “Ontario” or “Texas” was more impressive than “Canada” and “United States”, but were also wary of the lower accuracy level with city rate (Telling someone they’re in Brooklyn, when they’re really in Manhatten wouldn’t inspire confidence).

  3. Acquire a database
  4. There’s several options available. This step was easy, we bought ours from MaxMind. We feel relatively familiar with the GeoIP data provider marketplace, and MaxMind has seemed both quite accurate and responsive to updates throughout WonderProxy’s existence. IP2Location is another provider with downloads available.

    MaxMind also provides API access to its data. We’ve been leveraging this for a long time in our monitoring systems (we check all our servers to ensure they’re geo-locating correctly), but they’re all batched processes. Waiting for a remote API to return during page load, in particular for a landing page is folly. IP2Location also offers an API, as does InfoSniper. APIs work really well in batched process, or anything somehow detached from page loads.

  5. Rebuild Apache & PHP
  6. Our initial build only required the Apache module, this way additional superglobals were provided in PHP. I can grab <?=$_SERVER['GEOIP_REGION_NAME']; ?> get someone’s location, it’s really easy. We later installed the PHP module (using the same database) to support arbitrary IP lookup within our administration systems. We also encoded the MaxMind ISO 3166 data into our application to convert country codes to names.

    If you’re taking the API approach life should be easy, there’s plenty of code examples for every major API provider. If you’re using an API you also have the ability to choose different levels of granularity on the fly, full data some of the time, minimal data most of the time to save on credits.

  7. Handle edge cases
  8. Not every IP will have a result, it’s important to catch these and handle correctly. We’ve simply decided to test the variable and replace with “Your Office” when the lookup fails.

    On the API front It’s worth spending a few minutes to make a request fail on purpose and ensure your code handles it well. I’ve had a few important daily reports fail because the API we were using was unavailable, frankly it’s embarrassing.

We've been really happy with how easy the integration has been. I've already added several new integration points throughout our administrative system (providing lookups on banned users, the IP associated with transactions, etc.). For us the integration is really supporting the slogan and looking nice, but there's plenty of practical uses like estimating shipping charges, localizing prices, and adjusting content.


I’ve been having a one-man ticket derby this weekend, the goal of a Ticket Derby (as I’m defining it) is to close as many tickets as possible. I’ve closed 15 so far, aiming for quantity of tickets, not quality of tickets. I’ve resolved sorting issues, one line fixes to change the From address on some email notifications; changed the name servers on old domains, etc. Lots of easy stuff. But it’s closed and out of the way!

I decided to do this because I was finding our ticket system a bit overwhelming: pages of tickets all awaiting my attention. I had over 60 assigned tickets a week ago. Now I’m down to 39. As a small shop, and without a project manager (dedicated or otherwise), I was doing my best to prioritize tickets based on criteria like: customer impact, revenue generation, time saved, etc. Tickets that fared well in those categories tended to be large affairs, requiring a decent amount of effort. This left me with an intimidating, seemingly endless wall-of-work. Adding Date Opened to the view just made it depressing. The derby seemed like a great way to clear out the work and make the wall less intimidating.

I’m finding my open & assigned ticket screen manageable now. If your team has been working on big issues for a while, why not give them a few days to plow through some easy stuff? I awarded prizes for my derby, giving out chocolate in a few categories: oldest ticket closed, most tickets closed, most humorous commit message.

Now, if you’ll excuse me, I’ve got a bellyache.

For a long time at WonderProxy we neglected internal systems, instead directing our efforts to things used by our customers. We’ve built new products, launched redesigns, then a few more products, all the while maintaining user accounts by directly interacting with the database (including a few update queries lacking a WHERE clause).

This was a huge mistake

As I worked on the redesign for WonderProxy (Original vs Redesign) I added a few basic admin features almost by accident, and all our lives got remarkably easier. I added a few more, and things got easier still. Tasks that used to be a chore (like setting up a free trial) almost became fun. Researching account history is just a few easy clicks, with nice graphs using nvd3, and pretty data tables. Editing accounts in place, with code that understands 30GB = 32212254720 bytes.

Saying things “creating trial accounts is fun” may sound like gross exaggeration, but it’s not. I’m pretty happy with the code i’ve got there, which may add to it. The form supports pasting in an address like Paul Reinheimer <paul@preinheimer,com>, parsing it out to its component parts, and generating an available username. Then in for expiry, I leverage php’s strtotime() function so I can enter in something simple like “+2 weeks” or “next thursday” and have that parsed and work properly.

The speed at which we’re both willing and able to resolve requests has greatly increased. Trial accounts (which convert with great regularity) are easy to do in a minute, rather than 10, so we’re more likely to do them when they roll in, rather than waiting until we’re on the server for another reason. When it comes to customizing accounts, getting an accurate history, and being able to quickly modify accounts has helped everyone. I’m lacking a basis for comparison, but our revenue has also been climbing nicely since the change, having a dashboard to find clients exceeding their plan limits has certainly helped.

If you’re looking at the next big thing to improve for your team, I’d strongly suggest taking a harder look at your internal tools.


Since we launched WonderProxy we’ve had lots of bills to pay, often to ourselves or contractors. WonderProxy is a Canadian company, with a Canadian bank account, when we need to pay someone in the US we’ve had a few options: cheque, wire transfer, PayPal.

  • Cheque
  • These are easy, we open a filing cabinet, pull out a cheque, sign it and put it in the mail. Then the contractor waits for a week or so for the cheque to arrive, deposits in the bank, receives a horrible exchange rate, and waits two or more weeks for it to clear.

  • Wire Transfer
  • This requires me to go to the branch in person, wait in a line, pay a lot in fees, then the money shows up in a day or three, possibly with additional fees being deducted along the way.
  • PayPal
  • These are easy, open up our PayPal account, make a transfer, log out out again. The recipient ends up paying like 3.5% in fees, receives a mediocre at best exchange rate, then waits longer for the money to appear in their account.

So, everything sucks, bad rates, fees, and possibly involving me going to a bank in person.

Eventually we got frustrated with the money we were effectively losing with crappy exchange rates, and took a look further. We came across XE's currency trading services. It took a fair amount of effort to sign up (various forms to be scanned and sent in), but it’s been fantastic. I log on to execute a trade, enter that I would like to buy USD with $1000CAD, and get a spot rate on the transfer. I choose to execute the trade, with the USD funds being deposited in the recipients US account. I pay XE through my bank’s online bill payment services, then about a week later the US funds arrive in the recipient’s account.

I get to do it all from my desk, we get a fantastic exchange rate, and the transfer service makes its money on the spread, so there’s no extra fees.

If you’re paying across borders, I would heartily recommend investigating XE to see if they can meet your needs.


When I finished high school I spent about 8 months working as a cable-internet installer before heading off to college. It paid well, and I was basically a glorified network card delivery boy. I’d drive to your house, install a network card in your computer (this was like 1999-2000), reboot it, and you’d be on the internet. Someone else was in charge of actually drilling holes and running the cable to your computer. Fun times. Well, unless you had NT4, then I needed to goof around with IRQ addresses for like half an hour and miss my lunch :-).

I turned down tips when they came up, heck, I also turned down drinks. Though the occasional mother would rephrase her question as “water or juice” and force the pimply dehydrated teenager to drink something. I’d take the water.

Our role was pretty clearly defined, we install the NIC, we give them a 5 minute “tutorial” on how to use the Internet, we leave. We don’t do anything else on their computer, and under no circumstances do we ever use the CD that came with the install package, it bricks computers.

One day I did the install for an older gentleman from the middle east. After I connected him to the web he tried to load up a webpage, some news site from his home country. It wouldn’t render. He was missing the Microsoft font packs. Now our role was drilled into us pretty hard, if we did anything else and it went wrong there was liability on our company, and a serious amount of flak was about to come in our general direction. It could also create skewed expectations “Hey! The guy who did Sally’s internet installed a free virus scanner, and upgraded her Windows, why won’t you do that?”. Lots of problems.

But I installed those font packs.

Now, the gentleman didn’t speak a lot of English, so I had no idea how long he’d been in Canada, or how much news he’d been getting from home. But the emotion on his face when that page loaded… I understood what I’d really done. He handed me a crisp $20 bill, I tried to hand it back but I’d already lost him into that screen, there might have been a tear on his face I don’t remember, but it wouldn’t have been out of place.

That was a great day, and the $20 had nothing to do with it.

We recently ran into an issue where several of our ajax requests were taking an inordinate amount of time to execute. Digging a little deeper into the queries being executed, I was expecting return times in the order of 200ms, not the several seconds I was seeing. Installing XHGui only furthered my confusion: session_start() was the culprit with incredibly high run times.

I started exploring how we were storing sessions, the default file store. My first thought was that perhaps there was so many files in the session directory that directory access was slowed (easily fixed with some options on session.save_path). But we only had a few active sessions and fewer than 100 files. No dice.

After some head pounding, I realized the issue: PHP locks the session file for the duration of script execution. The first ajax request took the expected amount of time, but locked the user’s session file for the duration of the request. The second ajax request was blocked on session_start() until the first call had finished. The third ajax request blocked on the first two, and was left with the incredibly high run length that drew me in in the first place.

You can try it out yourself like this: Open a basic HTML file including jQuery, and add the following: $(document).ready(function() { $.ajax({ type: “GET”, url: “/destination.php?q=1”, async : true }).done(function( msg ) { }); $.ajax({ type: “GET”, url: “/destination.php?q=2”, }).done(function( msg ) { }); $.ajax({ type: “GET”, url: “/destination.php?q=3”, }).done(function( msg ) { }); }); Then create destination.php is quite simple: <?php session_start(); sleep(1);

In your browser’s developer tools you should see three requests fired off almost simultaneously. The first will take about a second to execute, the second will take two seconds, and the third three.

I couldn’t simply replace our session datastore (as attractive as that is) with something like APC or memcache, as we’d lose all our user data if the server restarted. Implementing a system that uses memory but falls back on a persistent store would take more time than I had. I also couldn’t rip out session usage here: it was needed for authentication, and in some cases helped craft the response.

Luckily PHP has a solution: session_write_close();. This writes any changes you’ve made to the session file, then closes it and releases the lock. In my case, I wasn’t ever writing new data to the file, so I could start the session, then call session_write_close() immediately. This released the lock, and allowed the other ajax requests to actually execute asynchronously. You can try it out yourself by modifying destination.php to call session_write_close() immediately after session_start().

If you’re writing your own code, closing out sessions early should be pretty easy. If you’re dealing with a framework, you may want to hack in some code to close early on ajax calls.


Authentication has been an interesting problem at WonderProxy: we currently have 101 active public servers, and hundreds of active users who each have access to a particular subset of those servers. Every user has the ability to add new servers to their account at will, and expects newly-added servers to work quickly. On our end, when a user’s account expires, their credentials need to be removed promptly.

Centralized Auth

When we started, we created a centralized authentication scheme: each proxy instance called an authentication URL when a user attempted to connect, and successful responses were cached for a time. This was easy to write, and allowed us to maintain a single canonical copy of our authentication: the database.

It did however give us two big problems:

  • Massive single point of failure
  • High latency for distant locations

The single point of failure was a looming problem that thankfully only raised its head twice. Our central server sat in a rather nice data centre, with a top-end provider, but it was still a huge risk anytime work was being done on the server or its software. As the network grew, this clearly needed to change.

It was actually the latency issue that prompted us to move to a new solution. Users of our Tokyo proxy reported problems where requests were taking too long to execute, or simply timing out. We eventually isolated the cause as being timeouts on authentication, exacerbated by some packet loss on the ocean crossing.

Distributed Auth

Our solution involved creating two new columns in our servers table: users_actual, and users_desired. These integers represent the actual version of the authentication file on that server, and the desired version. When a user adds a server to their account, we update that server’s row, setting users_desired = users_actual + 1. When a contract expires, we update the servers that contract had access to in a similar manner.

In addition, we have a cron job running every minute, identifying servers where users_desired > users_actual. The cron job finds users with access to the server in question, pushes a new authentication file to those servers, and updates their users_actual to match users_desired when the operation returns. This is managed within a MySQL transaction to avoid race conditions.

On the administration side, we have a button on each contract management page that allows us to update the users_desired for all servers accessible to that contract’s users. This extra push is rarely used, but helpful in some weird cases.

By managing things with auth versions (rather than simply pushing credentials to all the servers as needed), we’re able to handle servers being down during the push. When you’re managing a network with 70 different suppliers, they can’t all be winners, so this happens with some frequency. If we simply pushed on demand we’d need a secondary mechanism to handle re-pushing later to the recovering server. By using auth versions, we have one mechanism that handles authentication. By setting users_desired = users_actual + 1, we also avoid updating a server repeatedly after it comes online because multiple updates were queued.

Conclusion

This distributed mechanism has worked quite well since rollout, and it’s becoming easier to manage with more granular options now available in our administration tools. I haven’t felt even remotely tempted to change this, which I feel is often a great sign of stability for a system. During our recent migration of properties to a single co-located server here in Toronto, having distributed auth was a great relief; even if things went poorly, our paying customers would still be able to access the network.


We recently migrated Where’s it Up to our fancy new hardware, it took a bit more effort than planned (some pains surrounding our use of MongoDB) but I’m incredibly happy with how things have ended up. As mentioned earlier we’ve purchased our own hardware, and have racked it with Peer 1 here in Toronto. We’ve installed a hypervisor, and are running different VMs for critical services: MySQL, Mongo, Web Production, Web Development, etc.

Our websites sit under /var/www, so Where’s it Up resides at /var/www/wheresitup.com/. Under that directory we have /noweb/apache/ which contains both wheresitup.com and dev.wheresitup.com, configuration files for apache. The entire /var/www/wheresitup.com directory tree resides nicely in our version control system. We hand off key configuration options to our websites through the use of Apache’s SetEnv, things like SetEnv mysql_host dev.mysql, these apache configuration options represent the only difference between the two code bases.

I’ve written or maintained code that implied the state (Dev/Production/Stage) based on the Host, directory, or other factors in the past. I much prefer grabbing an explicit constant. It feels cleaner, I don’t have to read up on which variables could have been manipulated by an attacker, and I can ask the exact question I want answered: Is this dev, rather than “is the url the one that means this is dev”.

This allows us to match our Development and Production virtual machines very closely, the only difference between the two is which apache configuration file is sym-linked under /etc/apache2/conf/sites-enabled. Clearly WebDev links to the dev.wheresitup.com file, and WebProd links to wheresitup.com. We actually cloned one machine to produce the other.

Keeping the configuration files so close also makes a lot of sense to me. If I’m adding a new constant on Dev, the immediate presence of Prod reminds me that I’ll need to add it there as well. Storing the entire site: PHP code, supporting apache configuration, etc, all in once place makes it easy to avoid forgetting anything (which is easy when it's a different file on a different server). The only exception is SSL certificates. We currently host a number of our projects with GitHub, and trust them as we might, we’re not willing to hand those to anyone else.


Hi, I’m Paul Reinheimer, a developer working on the web.

I wrote a book titled Professional Web APIs with PHP back in 2006, and am currently working in Biomedical Informatics for a major public health company.

I’m working on a project to help developers called WonderProxy which has proxies all over the world. Working on GeoIP development? Now you can finally test properly! We've also released Global Ping Statistics for expected ping times between cities, as well as a Load Testing Tool to measure your site's ability to handle load. Our most recent site checking tool is Where's it Up? which checks your sites availability globally, returning HTTP, DNS, and Traceroute details

My hobbies are cycling, photography, travel, and engaging Allison Moore in intelligent discourse. I frequently write about PHP and other related technologies.

I co-founded:

WonderNetwork Logo

Search