<?paul

Improving Where's it Up by slowing it down

Saturday, September 22. 2012

One of the common complaints with Where’s it Up has been that the results are only available for a few minutes. A link you get now will only work for a little while before being useless. While this fits in with its design goal of telling you if something is up right now, it tends to be less useful in real life. As an example, I run a work URL through the system, see some anomalous results, and send the link to a co-worker. That co-worker happens to be in the middle of rocking out to a killer drum solo on Pandora, so he doesn’t get to it for a minute. By the time he does, the link is broken and he calls me bad words.
As an added bonus, our site doesn’t handle non-existent records well.

The reason this happens is that Where’s it Up stores results in Memcache. Memcache is super fast (which is incredibly helpful when you get a huge traffic spike), but is also a temporary data store. While I could have extended the timeout on data (and hoped it lasted that long under load), I still wouldn’t have been in a position to say you could email that link to someone, and it would work tomorrow, or next week when you do a post-mortem on the issue.

To solve the issue I decided to start storing the results in MongoDB.

Despite this being my first production use of a NoSQL solution, I felt like NoSQL was a good choice. The data we’re generating isn’t without form, but the form varies wildly. There could be 1-50 steps in each trace route, and each step could have taken different routes on the various packets. HTTP requests can have 0-5 redirects. The result of the dig command can yield wildly varying information. While it certainly would have been possible for me to normalize all of this across several tables, the query to obtain results would have been hairy, and probably pretty slow. I could have simply treated a relational database as a key-value store, and shoved everything in a TEXT field, but I’d never be able to go back and query based on anything other than the key. So, for example, if I later wanted to go back and find out how many trace routes we’ve done through some network I’d be left with a full text search. Generating average connection times for HTTP requests would have been even hairier. Going with NoSQL lets me shove the data into a document in the format my application is already using and query that data later without losing its structure.

I’ve gone with MongoDB over other options simply because I have a network of friends and colleagues who have enough experience with it to answer my questions.

It took a few iterations to convert the system over from Memcache to MongoDB: at first I was just falling back on MongoDB when memcache failed. That added a lot of complexity to the codebase, and didn’t feel like it was adding much. Next I got the system working using the same storage as the original memcache implementation (one master document outlining the details of the request, and a series of child documents providing results to a given portion, e.g. a traceroute from milan to example.com). Then finally (at the behest of my MongoDB expert friends) to a single cohesive document that contains all the information from the request. It looks like this: Sample JSON

I’ve got a few more clean up items on my list for this project, but I’m really happy with this progress. I also feel like this conversion will make it much easier for us to offer API access to this data (another frequently requested feature).

Feeling more Productive

Friday, September 21. 2012

I tweeted a few nights ago how non-productive my evening felt. It was mildly dis-heartening really. I’d gone to Starbucks, bought my tea & cookie, and sat down for a few hours to try and get some real work done. Unfortunately, since i’d been handling “business” things, rather than solving the computer problems I enjoyed, I didn’t feel like i’d really gotten anything accomplished.

Then I stopped to write down what I’d done:

Determined usage information for a client and sent them an invoice.
Looked up our costs for various cities, and generated a quote for a prospective client.
Researched our usage and purchased bandwidth for a given location, and noted that we should scale back our purchase.
Reached out to a client to let them know I’d heard from their procurement department, and that their purchase was moving along.
Got the ball rolling on moving more domains off GoDaddy
Researched alternate EV certificate providers
Double checked our bandwidth costs for a provider as we’d recently changed providers.
Replied to a new contact I made at GTA PHP.

Once I wrote that list down, I felt much better about how i’d spent my evening.

So that’s my tip, if you’re feeling unproductive because you spent a day, afternoon, evening, whatever working outside your comfort zone, write down what you did. For bonus points write the list in advance so you get the pleasure of striking items off. Be as granular as you need to be in order to make the list at least 5 items long.

My Infinite Loop

Tuesday, September 18. 2012

I had a small spot of panic earlier this week when a customer reported that their account page on WonderProxy was timing out. The panic intensified when I was readily able to replicate the issue: the script was timing out, and completely failing to display any account information.

I’ve worked on many poorly performing pages in the past, often to great success: that’s not the source of the panic. My real concern was that there was no reason for the page to take that long to generate. We switched over to generating the account information automatically every hour (after our billing updates) to ensure that those pages would build fast. That was unfortunately a large chunk of code I didn’t want to try to debug quickly.

Fortunately the problem was much more benign: a small infinite loop!
We’ve got a small block of code that determines when a user’s billing period starts and ends. It accepts a start date (that could be some number of months back), calculates an end date, and keeps adding a month until *now* occurs between the start and end dates.

$start = strtotime($startDate); $end = strtotime("+1 month", $start); $now = time(); while (!($start < $now && $end > $now)) { $start = strtotime("+1 month", $start); $end = strtotime("+1 month", $start); }

When the account in question was configured (tragically, by me) the start date was set in the future. So my loop obligingly kept looking forever.

A quick database tweak later and the user’s account was fixed, and I had a few sanity checks to add.

An odd bug

Monday, September 10. 2012

I just fixed an odd bug on the WonderProxy admin pages. Occasionally there would be an issue where the administrative dashboard would refuse to load. It would display an all too common message that the database was out of connections. This was quite odd as there’s generally almost no users on our site, so there’s no reason for MySQL to run out of connections. I had much larger issues to work on, and the issue was almost always temporary, so I didn’t really worry about it.

Last night while improving our admin dashboard I tweaked a few things, and was wondering about performance. Curious why the page was loading slowly I ran show processlist; from the MySQL console, and was astonished to see well over a hundred connections.

A few minutes of digging revealed that while the administrative dashboard was creating instances of the Contract object to display their traffic details, each Contract instance was also generating its own connection to the database! As the number of active contracts we’ve managed has increased over time, so has the number of connections to the database. The issue was intermittent, as the number of connections we’re allowing to MySQL still exceeds the number of contracts WonderProxy has. However hourly cron jobs handling billing, maintenance, etc. can create enough connections to put it over the edge.

I’ve corrected my code to properly share the database connection, the page loads faster, and doesn’t crash if something else is happening on the server. I don’t often make such fundamental mistakes (or so I’d like to think) so I thought I would share.

Improving Where's it Up by slowing it down

Feeling more Productive

My Infinite Loop

An odd bug

Hi, I’m Paul Reinheimer, a developer working on the web.

Top Posts

Search