RAID 5 corruption 2 disk failure

Thecus N3200 – 3 Disk RAID 5 recovery

In Blog, Server/RAID Array Recovery Case Study, Successful recoveryby Administrator

RAID-5 Recovery after a two disk failure

Recently CDR received in a 3 disk Thecus RAID array. This was from a home user which was using a NAS to share his photos, documents, videos and other data around the home network.

RAID 5 corruption 2 disk failure

Not using the correct combination of disks in the RAID can lead to corruption in the filesystem and files

We thought it would be worthwhile to report this case as it was an example of the most typical instance of RAID 5 failure which is received.

The NAS had two HDDs which were marked in the Thecus web-interface as having failed with bad sectors / degraded media. The owner of the device assumed that both HDDs had failed at the same time and that this was an unfortunate coincidence.

Multiple disk failure

Typically when inspecting a RAID array with two failed HDDs, we usually find that one of these disks has failed some, days, weeks or even months before the second HDD. This means that the RAID-5 has been operating minus one disk over this period, and that the owner has not noticed this. Often home users have the NAS device in a location which is not easily observable, and have not set up the NAS email notification to send a report when a disk fails. The owner of the device then only becomes aware when there is a problem when a second disk fails and the data becomes inaccessible.

Identifying the active disks

It is necessary to take a full clone of each hard disk drive in the RAID set using hardware disk imaging equipment like DeepSpar Disk Imager, or PC3000. Once this is complete it is possible to examine the disk images and to compare the contents of each disk.

With three hard disk drives it might be possible to achieve a recovery using a trial and error method. However, in RAID arrays where there are a larger number of disks, it is necessary to analyse the RAW data and establish the correct RAID parameters and which HDD is out of date. Use of an XOR analysis across all of the disks in the RAID array can establish whether there is inconsistency in the RAID, however, this cannot establish which disk (or how many) have inconsistent data in the RAID. Above is an image showing corruption in a jpeg image when all three of the disks from the N3200 RAID-5 were used in the rebuild. There is corruption to the image and corruption to the file system.

By identifying the disk which failed some weeks before the second, it was possible to make a full recovery of the data.