Wednesday, March 25, 2009

Ubuntu - software Raid5: recovering from a failing drive

- Check integrity of array:
hdparm -i -v /dev/md0

- Unmount the raid:
root@gamma:~# umount /mnt/r5/
umount: /mnt/r5: device is busy
umount: /mnt/r5: device is busy

- Find out the process that's keeping the drive busy:
root@gamma:~# fuser -m /dev/md0
/dev/md0: 5670c

- Look up the process:
root@gamma:~# ps auxw | grep 5670
root 5670 0.4 0.0 8792 3200 ? S 19:25 0:34 /usr/sbin/smbd -D

- Ah, it's Samba, stop it:
root@gamma:~# /etc/init.d/samba stop
* Stopping Samba daemons...

- Try again to unmount:
root@gamma:~# umount /mnt/r5/
root@gamma:~#


- Take a detailed look at the raid array:
mdadm --query --detail /dev/md0


...

Number Major Minor RaidDevice State
0 0 0 - removed
1 8 16 1 active sync /dev/sdb
2 8 32 2 active sync /dev/sdc
3 0 0 - removed

4 8 0 - faulty /dev/sda
5 8 48 - spare /dev/sdd


Found a faulty drive: /dev/sda


- tell the array the drive that is faulty:
root@gamma:~# mdadm -f /dev/md0 /dev/sdd
mdadm: set /dev/sdd faulty in /dev/md0

- hot remove the faulty drive
root@gamma:~# mdadm --remove /dev/md0 /dev/sdd
mdadm: hot removed /dev/sdd

- walk over to server and pysically remove drive.
If you don't know which one is the right drive,
remove one at a time, and then run
mdadm --query --detail /dev/md0
and see which drive is no longer there

- insert a new (and good) hard drive. Add it:
root@gamma:~# mdadm -add /dev/md0 /dev/sdd
mdadm: hot added /dev/sdd

- watch the recovery
watch cat /proc/mdstat

Every 2.0s: cat /proc/mdstat Tue Mar 24 22:45:58 2009

Personalities : [raid5]
md0 : active raid5 sdd[4] sda[0] sdc[2] sdb[1]
1465159488 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[>....................] recovery = 0.0% (20224/488386496) finish=19245.9min speed=421K/sec

unused devices:

It seems that this will take 19,000 minutes, which is 13 days. Ugh.

No comments:

Post a Comment