<?paul

Yanking Hard Drives

Sunday, April 7. 2013

I invested a portion of my day heading down to our data centre, and yanking a drive out of our newly racked server, and watching to see what would happen. The answer was: not much. The system kept merrily computing along, but it also failed to warn us that it had lost a drive. Some configuration tweaks later, I repeated the process, and my mailbox was quickly filled. Success.

I did this for two reasons: I wanted assurance that our raid controller had been properly configured, and that things would continue operating normally if we lost a drive. I also wanted to ensure that should we lose a drive we'd be notified, redundancy is worth little if actions aren't taken to correct problems that cause it to be lost. Today confirmed the first, and revealed problems in the second. Huge Success.

If you're going to perform the same task, a few pieces of advice:

Do it before there's critical production software running on it, at least the first time
Your raid controller will need to rebuild the drive after it's been pulled. This will take a while, and kills your redundancy (or a portion thereof) while it's happening. If you need to repeat the test pull the same drive each time to avoid real problems.
Work to ensure that problem notifications leave the affected system as quickly as possible. If the only copy of an alert sits on the box that's having issues, you may lose it as well.

Comments »

No Trackbacks

No comments

Name
Email
Homepage
In reply to
Comment	Enclosing asterisks marks text as bold (word), underscore are made via _word_. Standard emoticons like :-) and ;-) are converted to images.

Yanking Hard Drives

Comments »

Hi, I’m Paul Reinheimer, a developer working on the web.

Top Posts

Search