We recently expanded the number of disks for the raid on the main server handling Where’s it Up requests. Rebuilding that array took roughly 28 hours, followed by background indexing which took another 16 hours.
During the rebuild, the raid controller was doing its best to monopolize all the I/O operations. This left the various systems hosted on that server in a very constrained I/O state, iowait crested over 50% for many of them, while load breached 260 on a four core vm. Fun times.
To help reduce the strain we shut down all un-needed virtual machines, and demoted the local Mongo instance to secondary. Our goal here was to reduce the write load on the constrained machine. This broke the experience for our users on wheresitup.com.
We’ve got the PHP driver configured with a read preference of MongoClient::RP_NEAREST. This normally isn’t a problem, we’re okay with some slightly stale results, they’ll be updated in a moment. Problems can occur if the nearest member of the replica set doesn’t have a record at all when the user asks. This doesn’t occur during normal operations as there’s a delay between the user making the request, and being redirected to the results page that would require them.
Last night, with the local Mongo instance so backed up with IO operations, it was taking seconds not ms for the record to show up.
We shut that member of the replica set off completely, and everything was awesome. Well, apart from the 260 load.