Graphics
Intel® graphics drivers and software, compatibility, troubleshooting, performance, and optimization
20639 Discussions

ICH10R - RAID 5 - Problem replacing failed drive

idata
Employee
3,262 Views

Thank you very much for taking the time to read about my problem!

 

I have six identical SATA drives connected to the six Intel ICH10 controller ports on my Asus P5Q Premium motherboard (I did post this question to the Asus support forums as well, but I think that the Intel experts here are more likely to be familiar with the workings of the Intel Matrix Storage system).

The drive connected to the first port is a standalone, non-RAID drive that contains the OS (Vista x64).

The remaining five drives are members of a RAID 5 array. One of these drives recently failed. The Intel Matrix Storage application reported the RAID array as degraded, and reported SMART errors on the failed drive.

I tried resetting the errors, setting the failed drive back to Normal, and re-building the RAID 5 array onto the drive twice now, but the rebuild failed at around 67% completed on both attempts, which leads me to believe that the drive is genuinely bad.

 

I purchased seven identical drives for this computer so that I'd have a "cold spare" that I could swap in if one of the RAID 5 drives ever failed.

So, I tried doing just that -- replacing the failed RAID drive with the brand-new spare -- but both the Intel Matrix Storage Manager ROM (the Ctrl+I boot option) and the Windows application report the array as FAILED. There is no option to right-click on the replacement drive and start a re-build of the array onto it.

In fact, when I disconnect the failed drive entirely (and leave that port disconnected), the array is listed as FAILED, when I believe it should just be listed as Degraded.

Page 29 of http://download.intel.com/support/chipsets/imsm/sb/reference_content_intelmatrixstorageconsole.pdf this Intel PDF has instructions on how to replace a failed RAID 5 member drive and rebuild the array onto it. I've tried those steps, and (unfortunately) they do not work for me.

Thank you again for helping me out with this! I'd be happy to provide any additional information you may need. Please just ask!

-Eric

0 Kudos
6 Replies
idata
Employee
1,571 Views

Here is some additional information, including screenshots:

  • If I re-connect the failed drive, the array switches back from Failed to Degraded, and attempts to re-build the array onto the failed drive, http://www.olaim.com/P5Q-RAIDError/Failed_Drive_Connected-Rebuilding.png just like you see in the first screenshot of this post. Of course, the rebuild will fail eventually. So at least the data's not lost (phew!), but I still need to get the array to re-build onto the replacement drive to restore redundancy.

Once again, thank you so much to the kind, knowledgeable folks who take the time to help out folks like myself who've run into a wall!

0 Kudos
idata
Employee
1,571 Views
idata
Employee
1,571 Views

Hi, Peter! Thanks for your reply.

I am running version 8.9 of the Matrix Storage Manager, so I'm downloading that "Rapid Storage Technology" that you recommended now.

It sounds like "Rapid Storage Technology 9.6" is the new name and upgraded version of the old "Matrix Storage Manager". Am I correct?

Thanks again! Really appreciate the help,

-Eric

0 Kudos
idata
Employee
1,571 Views

Update:

I upgraded from the old Matrix Storage Manager to the new Rapid Storage Technology software/driver. The new version's UI sure looks nice, but unfortunately it hasn't solved my problem.

When I swap out the failed drive for a brand-new, blank, identical model replacement drive, the Rapid Storage Technology program reports the RAID array as Failed, not Degraded. If the array was just Degraded, the documentation mentions that the software would give me the option to re-build the array onto the new drive.

From what I understand, if a single drive in a RAID 5 array fails or is removed, that just removes the redundancy from the array and should not make the entire RAID volume unavailable and flagged as Failed. Yet that's exactly what's happening to me.

I'd be very thankful for any further advice!

0 Kudos
idata
Employee
1,571 Views

Are you able to click Verify under advanced for your RAID in the Manager?

 

I can only think that the boot info is some how corrupt on part of a disk that the drive you need to replace is not.

 

The only other thing you can do except Backup is to run SpinRite on the drives out of RAID and hope that fixes it.

http://www.grc.com/sr/spinrite.htm

0 Kudos
idata
Employee
1,571 Views

Thankfully, this issue is resolved for me. Here's a summary of my situation and the resolution, in case it might benefit someone else:

I had a RAID 5 array where one drive failed. I tried marking the drive as Normal and initiating a rebuild onto it, but the rebuild kept failing at the same percent completion, which I took as pretty decent proof that the drive was genuinely bad.

So I swapped out the bad drive for a brand-new one. However, when I did that, the RAID array was listed as Failed (no chance of recovery), rather than Degraded (one drive bad, no data loss, but no redundancy until rebuild is performed). That was why I started this thread.

First of all, I upgraded from the older and (according to many reports) buggy Matrix Storage Manager application/driver to the new Rapid Storage Technology app/driver. This step alone didn't resolve the problem, but I feel that it was a good move regardless.

After several frustrating days of experimentation, it turns out that this problem occurred because the system kept trying to re-build the array onto the bad drive every time I rebooted the system with it connected. I would shut down the system before the rebuild completed because I knew it wouldn't complete as it was the only bad drive in the array, so I may as well replace it and perform the re-build onto a new, good drive. However, it turns out that, even though the other drives in the array were intact, you can't replace a drive that the system was working on rebuilding the array onto.

I had to re-connect the bad drive and allow the system to take almost three days (it's a large array) to get to the point where the rebuild failed. Once the rebuild failed, I could swap out the bad drive and finally see the array listed as Degraded, not Failed. Which is good, because when the array is listed as Degraded, you can instruct it to rebuild onto the replacement drive -- which I did, and (after a couple days) everything's now working perfectly. No data loss, redundancy restored, all's right with the world =).

Moral of the story: If a drive in a RAID 5 array on an Intel controller fails, know that either rebooting or flagging the drive as Normal is likely to result in an attempt to re-build the array onto that failed drive. If that happens, you need to wait (possibly a long time) for that rebuild to fail. When it fails, SHUT DOWN, disconnect the failed drive, and connect the new one before rebooting, or you'll have to wait for another rebuild attempt onto the bad drive to fail.

Much thanks to PeterUK for taking the time to help! I appreciate it!

0 Kudos
Reply