We recently expanded the number of disks for the raid on the main server handling Where’s it Up requests. Rebuilding that array took roughly 28 hours, followed by background indexing which took another 16 hours.

During the rebuild, the raid controller was doing its best to monopolize all the I/O operations. This left the various systems hosted on that server in a very constrained I/O state, iowait crested over 50% for many of them, while load breached 260 on a four core vm. Fun times.

To help reduce the strain we shut down all un-needed virtual machines, and demoted the local Mongo instance to secondary. Our goal here was to reduce the write load on the constrained machine. This broke the experience for our users on wheresitup.com.

We’ve got the PHP driver configured with a read preference of MongoClient::RP_NEAREST. This normally isn’t a problem, we’re okay with some slightly stale results, they’ll be updated in a moment. Problems can occur if the nearest member of the replica set doesn’t have a record at all when the user asks. This doesn’t occur during normal operations as there’s a delay between the user making the request, and being redirected to the results page that would require them.

Last night, with the local Mongo instance so backed up with IO operations, it was taking seconds not ms for the record to show up.

We shut that member of the replica set off completely, and everything was awesome. Well, apart from the 260 load.

Back in the fall of 2010 I was kicked out of my apartment for a few hours by my fantastic cleaning lady, and I wandered the streets of Montréal on Cloud 9. Not because of the cleaning mind you, but because I’d kissed Allison for the first time the night before, while watching a movie on my couch (Zombieland). Great night. My wanderings took me to the flagship Hudson’s Bay store in Montreal, and inevitably to the electronics department. I decided to check out the 3D televisions.

The first problem I ran into with the 3D TVs was the glasses, they didn’t work. I tried on the first pair (which had an ugly, heavy, security cable attaching them to a podium), no dice. I tried on the second pair, no luck there either. Looking at the glasses in more detail, I found a power button! Pushed, tried them on, nothing. Repeated the process on the other pair, still nothing. There I was, big electronics department, with a range of 3D TVs in front of me, and the glasses didn’t work.

If I was going to describe the target market for 3d televisions in 2010, I might have included a picture of myself. Male, 30, into technology, owns a variety of different entertainment products and consoles, decent disposable income. As far as I could tell I represented the exact person they were hoping would be the early adopters on these.

While wandering off I finally encountered a sales person, I mentioned that the glasses for the TVs didn’t work. He told me they were fine, and motioned for me to follow him. It turned out that you had to push and hold the power button down for a few seconds in order to turn them on. As I put them on the sales person walked away, and I got to enjoy a demo video in 3D.

Well, sort of. First the glasses had to sync with the television, then I was all set, great demo video on directly in front of me. Of course, I wasn’t in a room with a single television, there was several along the wall to the right and left of the central television, and since my glasses had sync’d with the one directly in front of me (not the others), the other televisions had essentially been turned into strobe lights. Incessantly blinking at me. When I turned my head towards the blinking lights the glasses re-sync’d with a different television, a disorienting procedure that allowed me to view it properly, but turned the one directly in front of me into a strobe light.

So, after requiring aid to put on a pair of glasses that were practically chained down, I was being forced to view very expensive televisions adjacent to a series of strobing pictures with an absentee salesman.

Despite all of the issues, this was really cool! This was 3D, for my living room! No red-blue glasses either, this was the real thing! Maybe I could get one, then invite Allison over again for a 3D MOVIE! Clearly owning impressive technology was the way to any woman’s heart. While those thoughts were racing through my mind I caught my own reflection in a mirror, the glasses were not pretty. If you picture a pair of 3d glasses in your head right now, you’re probably imagining the modern polarized set you get at the movie theatre. Designed to fit anyone, over prescription glasses, rather ugly so people don’t steal them. Those glasses were years away back in 2010, and these were active shutter glasses. Rather than just two panes of polarized plastic, they each lens was a small LCD panel capable of going from transparent to opaque and back many times a second. They looked a little like this, but in black:

As I put the glasses back on the presentation pedestal and rubbed my sore nose I realized: There was absolutely no way I could try to kiss a girl for the first time wearing a pair of those. I left the TVs behind, I think I picked up some apples I could slice and flambé to serve over ice cream instead.

The kissability test: When considering a product, could you imagine kissing someone you care about for the first time while using it.

Stuart McLean is a fantastic story teller. I’ve enjoyed his books immensely, but most of all, I’ve enjoyed listening to him tell me his stories on the radio, either directly through CBC Radio 2, or through the podcast made of the show. This past Christmas, I was gifted tickets to see and hear him live here in Toronto, a wonderful time.

When I listen to him on the radio, as his calming voice meanders through the lives in his stories, I often picture him. In my mind, he’s sitting in a library on overstuffed leather chair, with a tweed coat laid over the arm, a side table with a lamp and glass of water beside him… perhaps a large breed dog dozing at his feet. As each page came to an end he leisurely grasps the corner, sliding his fingers under the page, gently turning it, just as an avid reader moving through a treasured novel. This is the man I picture when I hear the measured voice regaling me with tales from the mundane to the fantastic.

This could not be further from the truth.

The man I saw at the Sony Centre for the Performing Arts has nothing in common with the man I pictured but the voice. As his story began, as he lapsed into those measured tones, his feet never stopped moving. He danced around the microphone like a boxer, stepping closer to make a point, jumping to the side in excitement, waving his arms in exclaim, always ready to strike us with another adverb. When he reached the bottom of each page, he’d frantically reach forward and throw it back, as eager as a child on Christmas morning. It’s easy to fall under the spell of a great storyteller, to stop seeing and only listen, but his fantastically animated demeanour shook that away, and spiced the story in ways I couldn’t have imagined over years of radio listening.

Listen to his podcast for a while, then go see him live, he’s an utter delight.

I’ve had a few valentines days over the years. I’ve spent far too much money, I’ve planned in exacting detail, I’ve left things until the last minute, and I’ve spent a fair few alone. This year, I wanted something special.

Just after Christmas Allison was kind enough to give me a hand knit sweater she’d been working on for over a year. It’s fantastic. Around the same time she’d commented that she was jealous of my neck warmer. An idea stuck! I’d knit her a neck warmer!

Small problem: I’ve never knit a thing in my life.

I’ve never let not knowing how to do something stop me before, and this didn’t seem like the time to start. I headed up to Ewe Knit, where Caroline was able to administer a private lesson. First I had to use a swift to turn my skien of yarn into a ball. They appear to sell yarn in a useless format to necessitate the purchase of swifts, a good gig if you can get it.

Once I had my nice ball of yarn I “casted on” a process I promptly forgot how to accomplish. The process mostly consisted of looping things around one of my knitting needles the prescribed number of times. That number was 17 according to the pattern. Once I’d casted on the regular knitting started, each row involved an arcane process where I attached a new loop to the loop on the previous row. The first few rows were quite terrifying, but eventually I slipped into a rhythm, and was quite happy with my progress by the time I’d made it to the picture shown below.

Just a few rows later I made a terrifying discovery: I’d invented a new form of knitting. Rather than knitting a boring rectangle, I was knitting a trapezoid, and there was a hole in it. My 17 stitch pattern was now more like 27. There was nothing for it but to pull it out and basically start over.

Several hours, and many episodes of The Office later, I’d slipped into a great rhythm, and developed a mild compulsion to count after every row to ensure I had 15 stitches. The neck warmer was looking great! Just another season of The Office, and some serious “help” from the cat, and I’d be finishing up.

I headed back to Ewe Knit for instructions on casting off, where I tied off the loops I’d been hooking into with each row. Then I sewed the ends together, and wove in my lose ends. A neck warmer was born!

This felt like a success before I’d even wrapped it. I’d spent a lot of time working on something she’d value, I’d learned more about one of her hobbies, and gained new appreciation for the sweater she’d knit for me. She just needed to unwrap it.

She loved it.

WonderProxy will be announcing availability of a new server in Uganda any day now. We’re very excited. When we first launched WonderProxy the concept of having a server anywhere in Africa seemed far-fetched. Uganda is shaping up to be our fifth.

Our provider asked us to pay them by wire transfer, so I dutifully walked to the bank, stood in line, then paid $40CAD in fees & a horrible exchange (to USD) rate to send them money. Not a great day, so I grabbed a burrito on the way home. A few days later we were informed that some intermediary bank had skimmed $20USD off our wire transfer, so our payment was $20USD short. Swell.

In order to send them $20USD, I’d need to go back to the bank, stand in line, hope I got a teller who knew how to do wire transfers (the first guy didn’t), buy $20USD for the provider, $20USD for the intermediate bank, and pay $40CAD for the privilege. $80 to send them $20. Super.

Luckily XE came in to save the day again. Using their convenient online interface I was able to transfer $40USD for only $63CAD, including my wire fee. I paid a much better exchange rate, lower wire fees, and didn’t have to put pants on. The only downside was a lack of burrito. Bummer.

If you’re dealing with multiple currencies and multiple countries, and these days it’s incredibly likely that you are, I’d highly recommend XE.

At WonderProxy we’ve been helping people test their GeoIP sensitive applications since launch. It’s why we launched. Perhaps ironically it’s never been a technology we’ve used on our own website. With our upcoming re-design that’s changing.

Using GeoIP:

  1. How to use it
  2. Deciding how you’ll use GeoIP information will affect how much you end up spending for a GeoIP database, how often you’ll renew, and what safeguards you’ll need to put in place. Country level granularity is relatively easy to come by, city level within the US and Canada however tends to be much more expensive.

    Our integration goal is to support a nice slogan, for us that’s “You’re in .., Your customers aren’t. WonderProxy: Test Globally”. We opted for Region level as opposed to City, or Country level. We felt like “Ontario” or “Texas” was more impressive than “Canada” and “United States”, but were also wary of the lower accuracy level with city rate (Telling someone they’re in Brooklyn, when they’re really in Manhatten wouldn’t inspire confidence).

  3. Acquire a database
  4. There’s several options available. This step was easy, we bought ours from MaxMind. We feel relatively familiar with the GeoIP data provider marketplace, and MaxMind has seemed both quite accurate and responsive to updates throughout WonderProxy’s existence. IP2Location is another provider with downloads available.

    MaxMind also provides API access to its data. We’ve been leveraging this for a long time in our monitoring systems (we check all our servers to ensure they’re geo-locating correctly), but they’re all batched processes. Waiting for a remote API to return during page load, in particular for a landing page is folly. IP2Location also offers an API, as does InfoSniper. APIs work really well in batched process, or anything somehow detached from page loads.

  5. Rebuild Apache & PHP
  6. Our initial build only required the Apache module, this way additional superglobals were provided in PHP. I can grab <?=$_SERVER['GEOIP_REGION_NAME']; ?> get someone’s location, it’s really easy. We later installed the PHP module (using the same database) to support arbitrary IP lookup within our administration systems. We also encoded the MaxMind ISO 3166 data into our application to convert country codes to names.

    If you’re taking the API approach life should be easy, there’s plenty of code examples for every major API provider. If you’re using an API you also have the ability to choose different levels of granularity on the fly, full data some of the time, minimal data most of the time to save on credits.

  7. Handle edge cases
  8. Not every IP will have a result, it’s important to catch these and handle correctly. We’ve simply decided to test the variable and replace with “Your Office” when the lookup fails.

    On the API front It’s worth spending a few minutes to make a request fail on purpose and ensure your code handles it well. I’ve had a few important daily reports fail because the API we were using was unavailable, frankly it’s embarrassing.

We've been really happy with how easy the integration has been. I've already added several new integration points throughout our administrative system (providing lookups on banned users, the IP associated with transactions, etc.). For us the integration is really supporting the slogan and looking nice, but there's plenty of practical uses like estimating shipping charges, localizing prices, and adjusting content.

I’ve been having a one-man ticket derby this weekend, the goal of a Ticket Derby (as I’m defining it) is to close as many tickets as possible. I’ve closed 15 so far, aiming for quantity of tickets, not quality of tickets. I’ve resolved sorting issues, one line fixes to change the From address on some email notifications; changed the name servers on old domains, etc. Lots of easy stuff. But it’s closed and out of the way!

I decided to do this because I was finding our ticket system a bit overwhelming: pages of tickets all awaiting my attention. I had over 60 assigned tickets a week ago. Now I’m down to 39. As a small shop, and without a project manager (dedicated or otherwise), I was doing my best to prioritize tickets based on criteria like: customer impact, revenue generation, time saved, etc. Tickets that fared well in those categories tended to be large affairs, requiring a decent amount of effort. This left me with an intimidating, seemingly endless wall-of-work. Adding Date Opened to the view just made it depressing. The derby seemed like a great way to clear out the work and make the wall less intimidating.

I’m finding my open & assigned ticket screen manageable now. If your team has been working on big issues for a while, why not give them a few days to plow through some easy stuff? I awarded prizes for my derby, giving out chocolate in a few categories: oldest ticket closed, most tickets closed, most humorous commit message.

Now, if you’ll excuse me, I’ve got a bellyache.

For a long time at WonderProxy we neglected internal systems, instead directing our efforts to things used by our customers. We’ve built new products, launched redesigns, then a few more products, all the while maintaining user accounts by directly interacting with the database (including a few update queries lacking a WHERE clause).

This was a huge mistake

As I worked on the redesign for WonderProxy (Original vs Redesign) I added a few basic admin features almost by accident, and all our lives got remarkably easier. I added a few more, and things got easier still. Tasks that used to be a chore (like setting up a free trial) almost became fun. Researching account history is just a few easy clicks, with nice graphs using nvd3, and pretty data tables. Editing accounts in place, with code that understands 30GB = 32212254720 bytes.

Saying things “creating trial accounts is fun” may sound like gross exaggeration, but it’s not. I’m pretty happy with the code i’ve got there, which may add to it. The form supports pasting in an address like Paul Reinheimer <paul@preinheimer,com>, parsing it out to its component parts, and generating an available username. Then in for expiry, I leverage php’s strtotime() function so I can enter in something simple like “+2 weeks” or “next thursday” and have that parsed and work properly.

The speed at which we’re both willing and able to resolve requests has greatly increased. Trial accounts (which convert with great regularity) are easy to do in a minute, rather than 10, so we’re more likely to do them when they roll in, rather than waiting until we’re on the server for another reason. When it comes to customizing accounts, getting an accurate history, and being able to quickly modify accounts has helped everyone. I’m lacking a basis for comparison, but our revenue has also been climbing nicely since the change, having a dashboard to find clients exceeding their plan limits has certainly helped.

If you’re looking at the next big thing to improve for your team, I’d strongly suggest taking a harder look at your internal tools.

Since we launched WonderProxy we’ve had lots of bills to pay, often to ourselves or contractors. WonderProxy is a Canadian company, with a Canadian bank account, when we need to pay someone in the US we’ve had a few options: cheque, wire transfer, PayPal.

  • Cheque
  • These are easy, we open a filing cabinet, pull out a cheque, sign it and put it in the mail. Then the contractor waits for a week or so for the cheque to arrive, deposits in the bank, receives a horrible exchange rate, and waits two or more weeks for it to clear.

  • Wire Transfer
  • This requires me to go to the branch in person, wait in a line, pay a lot in fees, then the money shows up in a day or three, possibly with additional fees being deducted along the way.
  • PayPal
  • These are easy, open up our PayPal account, make a transfer, log out out again. The recipient ends up paying like 3.5% in fees, receives a mediocre at best exchange rate, then waits longer for the money to appear in their account.

So, everything sucks, bad rates, fees, and possibly involving me going to a bank in person.

Eventually we got frustrated with the money we were effectively losing with crappy exchange rates, and took a look further. We came across XE's currency trading services. It took a fair amount of effort to sign up (various forms to be scanned and sent in), but it’s been fantastic. I log on to execute a trade, enter that I would like to buy USD with $1000CAD, and get a spot rate on the transfer. I choose to execute the trade, with the USD funds being deposited in the recipients US account. I pay XE through my bank’s online bill payment services, then about a week later the US funds arrive in the recipient’s account.

I get to do it all from my desk, we get a fantastic exchange rate, and the transfer service makes its money on the spread, so there’s no extra fees.

If you’re paying across borders, I would heartily recommend investigating XE to see if they can meet your needs.

When I finished high school I spent about 8 months working as a cable-internet installer before heading off to college. It paid well, and I was basically a glorified network card delivery boy. I’d drive to your house, install a network card in your computer (this was like 1999-2000), reboot it, and you’d be on the internet. Someone else was in charge of actually drilling holes and running the cable to your computer. Fun times. Well, unless you had NT4, then I needed to goof around with IRQ addresses for like half an hour and miss my lunch :-).

I turned down tips when they came up, heck, I also turned down drinks. Though the occasional mother would rephrase her question as “water or juice” and force the pimply dehydrated teenager to drink something. I’d take the water.

Our role was pretty clearly defined, we install the NIC, we give them a 5 minute “tutorial” on how to use the Internet, we leave. We don’t do anything else on their computer, and under no circumstances do we ever use the CD that came with the install package, it bricks computers.

One day I did the install for an older gentleman from the middle east. After I connected him to the web he tried to load up a webpage, some news site from his home country. It wouldn’t render. He was missing the Microsoft font packs. Now our role was drilled into us pretty hard, if we did anything else and it went wrong there was liability on our company, and a serious amount of flak was about to come in our general direction. It could also create skewed expectations “Hey! The guy who did Sally’s internet installed a free virus scanner, and upgraded her Windows, why won’t you do that?”. Lots of problems.

But I installed those font packs.

Now, the gentleman didn’t speak a lot of English, so I had no idea how long he’d been in Canada, or how much news he’d been getting from home. But the emotion on his face when that page loaded… I understood what I’d really done. He handed me a crisp $20 bill, I tried to hand it back but I’d already lost him into that screen, there might have been a tear on his face I don’t remember, but it wouldn’t have been out of place.

That was a great day, and the $20 had nothing to do with it.

We recently ran into an issue where several of our ajax requests were taking an inordinate amount of time to execute. Digging a little deeper into the queries being executed, I was expecting return times in the order of 200ms, not the several seconds I was seeing. Installing XHGui only furthered my confusion: session_start() was the culprit with incredibly high run times.

I started exploring how we were storing sessions, the default file store. My first thought was that perhaps there was so many files in the session directory that directory access was slowed (easily fixed with some options on session.save_path). But we only had a few active sessions and fewer than 100 files. No dice.

After some head pounding, I realized the issue: PHP locks the session file for the duration of script execution. The first ajax request took the expected amount of time, but locked the user’s session file for the duration of the request. The second ajax request was blocked on session_start() until the first call had finished. The third ajax request blocked on the first two, and was left with the incredibly high run length that drew me in in the first place.

You can try it out yourself like this: Open a basic HTML file including jQuery, and add the following: $(document).ready(function() { $.ajax({ type: “GET”, url: “/destination.php?q=1”, async : true }).done(function( msg ) { }); $.ajax({ type: “GET”, url: “/destination.php?q=2”, }).done(function( msg ) { }); $.ajax({ type: “GET”, url: “/destination.php?q=3”, }).done(function( msg ) { }); }); Then create destination.php is quite simple: <?php session_start(); sleep(1);

In your browser’s developer tools you should see three requests fired off almost simultaneously. The first will take about a second to execute, the second will take two seconds, and the third three.

I couldn’t simply replace our session datastore (as attractive as that is) with something like APC or memcache, as we’d lose all our user data if the server restarted. Implementing a system that uses memory but falls back on a persistent store would take more time than I had. I also couldn’t rip out session usage here: it was needed for authentication, and in some cases helped craft the response.

Luckily PHP has a solution: session_write_close();. This writes any changes you’ve made to the session file, then closes it and releases the lock. In my case, I wasn’t ever writing new data to the file, so I could start the session, then call session_write_close() immediately. This released the lock, and allowed the other ajax requests to actually execute asynchronously. You can try it out yourself by modifying destination.php to call session_write_close() immediately after session_start().

If you’re writing your own code, closing out sessions early should be pretty easy. If you’re dealing with a framework, you may want to hack in some code to close early on ajax calls.

Authentication has been an interesting problem at WonderProxy: we currently have 101 active public servers, and hundreds of active users who each have access to a particular subset of those servers. Every user has the ability to add new servers to their account at will, and expects newly-added servers to work quickly. On our end, when a user’s account expires, their credentials need to be removed promptly.

Centralized Auth

When we started, we created a centralized authentication scheme: each proxy instance called an authentication URL when a user attempted to connect, and successful responses were cached for a time. This was easy to write, and allowed us to maintain a single canonical copy of our authentication: the database.

It did however give us two big problems:

  • Massive single point of failure
  • High latency for distant locations

The single point of failure was a looming problem that thankfully only raised its head twice. Our central server sat in a rather nice data centre, with a top-end provider, but it was still a huge risk anytime work was being done on the server or its software. As the network grew, this clearly needed to change.

It was actually the latency issue that prompted us to move to a new solution. Users of our Tokyo proxy reported problems where requests were taking too long to execute, or simply timing out. We eventually isolated the cause as being timeouts on authentication, exacerbated by some packet loss on the ocean crossing.

Distributed Auth

Our solution involved creating two new columns in our servers table: users_actual, and users_desired. These integers represent the actual version of the authentication file on that server, and the desired version. When a user adds a server to their account, we update that server’s row, setting users_desired = users_actual + 1. When a contract expires, we update the servers that contract had access to in a similar manner.

In addition, we have a cron job running every minute, identifying servers where users_desired > users_actual. The cron job finds users with access to the server in question, pushes a new authentication file to those servers, and updates their users_actual to match users_desired when the operation returns. This is managed within a MySQL transaction to avoid race conditions.

On the administration side, we have a button on each contract management page that allows us to update the users_desired for all servers accessible to that contract’s users. This extra push is rarely used, but helpful in some weird cases.

By managing things with auth versions (rather than simply pushing credentials to all the servers as needed), we’re able to handle servers being down during the push. When you’re managing a network with 70 different suppliers, they can’t all be winners, so this happens with some frequency. If we simply pushed on demand we’d need a secondary mechanism to handle re-pushing later to the recovering server. By using auth versions, we have one mechanism that handles authentication. By setting users_desired = users_actual + 1, we also avoid updating a server repeatedly after it comes online because multiple updates were queued.


This distributed mechanism has worked quite well since rollout, and it’s becoming easier to manage with more granular options now available in our administration tools. I haven’t felt even remotely tempted to change this, which I feel is often a great sign of stability for a system. During our recent migration of properties to a single co-located server here in Toronto, having distributed auth was a great relief; even if things went poorly, our paying customers would still be able to access the network.

We recently migrated Where’s it Up to our fancy new hardware, it took a bit more effort than planned (some pains surrounding our use of MongoDB) but I’m incredibly happy with how things have ended up. As mentioned earlier we’ve purchased our own hardware, and have racked it with Peer 1 here in Toronto. We’ve installed a hypervisor, and are running different VMs for critical services: MySQL, Mongo, Web Production, Web Development, etc.

Our websites sit under /var/www, so Where’s it Up resides at /var/www/wheresitup.com/. Under that directory we have /noweb/apache/ which contains both wheresitup.com and dev.wheresitup.com, configuration files for apache. The entire /var/www/wheresitup.com directory tree resides nicely in our version control system. We hand off key configuration options to our websites through the use of Apache’s SetEnv, things like SetEnv mysql_host dev.mysql, these apache configuration options represent the only difference between the two code bases.

I’ve written or maintained code that implied the state (Dev/Production/Stage) based on the Host, directory, or other factors in the past. I much prefer grabbing an explicit constant. It feels cleaner, I don’t have to read up on which variables could have been manipulated by an attacker, and I can ask the exact question I want answered: Is this dev, rather than “is the url the one that means this is dev”.

This allows us to match our Development and Production virtual machines very closely, the only difference between the two is which apache configuration file is sym-linked under /etc/apache2/conf/sites-enabled. Clearly WebDev links to the dev.wheresitup.com file, and WebProd links to wheresitup.com. We actually cloned one machine to produce the other.

Keeping the configuration files so close also makes a lot of sense to me. If I’m adding a new constant on Dev, the immediate presence of Prod reminds me that I’ll need to add it there as well. Storing the entire site: PHP code, supporting apache configuration, etc, all in once place makes it easy to avoid forgetting anything (which is easy when it's a different file on a different server). The only exception is SSL certificates. We currently host a number of our projects with GitHub, and trust them as we might, we’re not willing to hand those to anyone else.

Buying physical hardware was a new step for WonderProxy, it’s hard to say that we rushed into it, this being our third year, but it sort of feels that way. We’re operating around 100 servers around the world right now, but all of them are either virtual servers, or dedicated machines we’re renting from providers. Having a UPS guy drop off a rather large box one day was a big change.

Everything has worked out well, but there was a few steps that could have gone more smoothly, this post is half note to myself on what to do better next time, and half for you.

Buying Hardware

  • As far as I can tell the Dell & HP websites have been largely designed to be horrible, in hopes of routing you to a sales person. I tried to fight this, but it was pointless. Phone someone at one of those companies and save yourself several hours.
  • Watch for extras on the quote, your sales person will likely work your specifications, then insert the most expensive options around it, things like 24/7 hardware replacement with a 4hr SLA, fancy cable management systems, etc.
  • Your data centre will have power requirements. Your phone rep may be able to help you there, Dell’s UPS website is also capable of turning your server specifications into amperage.
  • Remote management cards are helpful, but you’ll either need to set up pass-through on the NIC (if your card supports it) or have multiple drops to reach it.
  • Check each component for compatibility with your operating system if it doesn’t ship installed. We’re using Debian, and had a mild panic attack before we found drivers for our Raid controller in Debian testing.


  • Your hosting provider will sell you bandwidth as a 95th percentile. That means they’ll sample how much bandwidth you’re using on regular increments (say every 15 minutes), sort those results biggest to smallest, delete the top 5% then charge you the next one. Unless you’re buying a lot of bandwidth you’ll probably end up paying more here than you would on a dedicated box by the GB.
  • Hosting space comes in either U increments (1U, 2U, 4U, etc) or rack portions (full rack, half rack, quarter rack, eighth rack (octal). If you’re buying directly from a provider you’re likely going to need to over-buy if you’re only racking one server.
  • Providers also care about power usage they will likely tell you something like 8 AMPS. You’ll need to spec your server out appropriately.
  • The number of network cables and power ports inside your unit will also matter, there’s no point in having a redundant power supply if you’re only going to be able to plug one in.
  • You will need to plan your move in date, your provider may need a lot of paper work signed and then a few more days before this happens. Talk to your sales rep about dates, and SLA for setting up new space. It may be as long as a week between getting your paper work in order and being able to move in.
  • Find out how your server will be mounted, there appears to be both round and square holes. As we learned when we were four, you need to match the right peg to the right hole. If you’re renting a very small fraction of space (like an octal) you may not have any mounting brackets at all, instead just letting things rest on the sheet metal between clients.
  • Your sales guy may not manage things once you’ve signed up, you may be handed off to the network team. Try to keep track of who knows what, if you’re racking your server and have a problem sales guy can’t help (and probably doesn’t answer the phone after hours).

Visiting the Data Centre

  • You’ll need ID, your network guy should be able to describe the requirements
  • Depending on how many servers you’re bringing in, you may be able to use the front door, or the loading dock.
  • Ours had a nice man-trap on the way in, first door needed to close before the second would open
  • It will be loud in the server room, ear plugs would be prudent
  • A flash light might help see things, there’s decent lighting but you’ll likely have stuff on top and below you
  • There should be a monitor, keyboard, and mouse on a trolley somewhere for configuring things
  • There may not be wifi, or even 3g inside
  • Pre-configuring your IP details stuff would be prudent, there will not be DHCP

That in hand, hopefully your server buying and racking experience will go smoothly.

Andrew Quarton developed a nifty little visualization built using the Where’s it UP API called GeoPing. Go take a look then come back.

Our technology stack for the API includes supervisor to run workers, and gearman to manage our job queue. We’re normally running 25 workers to manage the queue. Work tends to come in chunks, and that number of workers has been able to keep the queue minimal or at zero.

Since it’s such an nifty tool, it made the front page of Hacker News today, which led to a few problems on our end. The number of jobs launched for each person hitting the GeoPing tool was rather high, enough to fill all the current workers. When many people started hitting the GeoPing tool in rapid succession the queue built and built. At one point Gearman reported 13,000 jobs in the queue.

Noticing this I quickly changed the number of desired workers in supervisor from 25 to 100, than used /etc/init.d/supervisord restart to apply the changes. That didn’t seem to affect the queue, so I tried 250 workers, used restart to apply the changes once more, and watched. Then I noticed something the restart option wasn’t launching the extra workers I wanted. Running /etc/init.d/supervisord stop, then start did. Then the queue finally started to recover. I kept an eye on the queue with a quick and dirty shell command from stack overflow.

(echo status ; sleep 0.1) | netcat 4730 -w 1

From our side, I think a few things went wrong:

  • We didn’t have tooling in place to warn us when the queue broached reasonable limits
  • We hadn’t documented the proper way to increase workers (stop/start not restart)
  • Our graphing system seems to have a hard coded max value, hiding valuable data

Having either of those first two items in place would have allowed us to respond to the issue much more quickly.

We're working on them :)

Hi, I’m Paul Reinheimer, a developer working on the web.

I wrote a book titled Professional Web APIs with PHP back in 2006, and am currently working in Biomedical Informatics for a major public health company.

I’m working on a project to help developers called WonderProxy which has proxies all over the world. Working on GeoIP development? Now you can finally test properly! We've also released Global Ping Statistics for expected ping times between cities, as well as a Load Testing Tool to measure your site's ability to handle load. Our most recent site checking tool is Where's it Up? which checks your sites availability globally, returning HTTP, DNS, and Traceroute details

My hobbies are cycling, photography, travel, and engaging Allison Moore in intelligent discourse. I frequently write about PHP and other related technologies.

I co-founded:

WonderNetwork Logo