We recently ran into an issue where several of our ajax requests were taking an inordinate amount of time to execute. Digging a little deeper into the queries being executed, I was expecting return times in the order of 200ms, not the several seconds I was seeing. Installing XHGui only furthered my confusion: session_start() was the culprit with incredibly high run times.

I started exploring how we were storing sessions, the default file store. My first thought was that perhaps there was so many files in the session directory that directory access was slowed (easily fixed with some options on session.save_path). But we only had a few active sessions and fewer than 100 files. No dice.

After some head pounding, I realized the issue: PHP locks the session file for the duration of script execution. The first ajax request took the expected amount of time, but locked the user’s session file for the duration of the request. The second ajax request was blocked on session_start() until the first call had finished. The third ajax request blocked on the first two, and was left with the incredibly high run length that drew me in in the first place.

You can try it out yourself like this: Open a basic HTML file including jQuery, and add the following: $(document).ready(function() { $.ajax({ type: “GET”, url: “/destination.php?q=1”, async : true }).done(function( msg ) { }); $.ajax({ type: “GET”, url: “/destination.php?q=2”, }).done(function( msg ) { }); $.ajax({ type: “GET”, url: “/destination.php?q=3”, }).done(function( msg ) { }); }); Then create destination.php is quite simple: <?php session_start(); sleep(1);

In your browser’s developer tools you should see three requests fired off almost simultaneously. The first will take about a second to execute, the second will take two seconds, and the third three.

I couldn’t simply replace our session datastore (as attractive as that is) with something like APC or memcache, as we’d lose all our user data if the server restarted. Implementing a system that uses memory but falls back on a persistent store would take more time than I had. I also couldn’t rip out session usage here: it was needed for authentication, and in some cases helped craft the response.

Luckily PHP has a solution: session_write_close();. This writes any changes you’ve made to the session file, then closes it and releases the lock. In my case, I wasn’t ever writing new data to the file, so I could start the session, then call session_write_close() immediately. This released the lock, and allowed the other ajax requests to actually execute asynchronously. You can try it out yourself by modifying destination.php to call session_write_close() immediately after session_start().

If you’re writing your own code, closing out sessions early should be pretty easy. If you’re dealing with a framework, you may want to hack in some code to close early on ajax calls.


Authentication has been an interesting problem at WonderProxy: we currently have 101 active public servers, and hundreds of active users who each have access to a particular subset of those servers. Every user has the ability to add new servers to their account at will, and expects newly-added servers to work quickly. On our end, when a user’s account expires, their credentials need to be removed promptly.

Centralized Auth

When we started, we created a centralized authentication scheme: each proxy instance called an authentication URL when a user attempted to connect, and successful responses were cached for a time. This was easy to write, and allowed us to maintain a single canonical copy of our authentication: the database.

It did however give us two big problems:

  • Massive single point of failure
  • High latency for distant locations

The single point of failure was a looming problem that thankfully only raised its head twice. Our central server sat in a rather nice data centre, with a top-end provider, but it was still a huge risk anytime work was being done on the server or its software. As the network grew, this clearly needed to change.

It was actually the latency issue that prompted us to move to a new solution. Users of our Tokyo proxy reported problems where requests were taking too long to execute, or simply timing out. We eventually isolated the cause as being timeouts on authentication, exacerbated by some packet loss on the ocean crossing.

Distributed Auth

Our solution involved creating two new columns in our servers table: users_actual, and users_desired. These integers represent the actual version of the authentication file on that server, and the desired version. When a user adds a server to their account, we update that server’s row, setting users_desired = users_actual + 1. When a contract expires, we update the servers that contract had access to in a similar manner.

In addition, we have a cron job running every minute, identifying servers where users_desired > users_actual. The cron job finds users with access to the server in question, pushes a new authentication file to those servers, and updates their users_actual to match users_desired when the operation returns. This is managed within a MySQL transaction to avoid race conditions.

On the administration side, we have a button on each contract management page that allows us to update the users_desired for all servers accessible to that contract’s users. This extra push is rarely used, but helpful in some weird cases.

By managing things with auth versions (rather than simply pushing credentials to all the servers as needed), we’re able to handle servers being down during the push. When you’re managing a network with 70 different suppliers, they can’t all be winners, so this happens with some frequency. If we simply pushed on demand we’d need a secondary mechanism to handle re-pushing later to the recovering server. By using auth versions, we have one mechanism that handles authentication. By setting users_desired = users_actual + 1, we also avoid updating a server repeatedly after it comes online because multiple updates were queued.

Conclusion

This distributed mechanism has worked quite well since rollout, and it’s becoming easier to manage with more granular options now available in our administration tools. I haven’t felt even remotely tempted to change this, which I feel is often a great sign of stability for a system. During our recent migration of properties to a single co-located server here in Toronto, having distributed auth was a great relief; even if things went poorly, our paying customers would still be able to access the network.


Hi, I’m Paul Reinheimer, a developer working on the web.

I co-founded WonderProxy which provides access to over 200 proxies around the world to enable testing of geoip sensitive applications. We've since expanded to offer more granular tooling through Where's it Up

My hobbies are cycling, photography, travel, and engaging Allison Moore in intelligent discourse. I frequently write about PHP and other related technologies.

Search