RAID5 data recovery sectors winhex

Case study: Typical RAID-5 data recovery after multiple disk failure

In Blog, Server Data Recovery, Server/RAID Array Recovery Case Study, Successful recovery by Administrator

Methods to recover data from a RAID-5 Array

RAID – Redundant Array of Inexpensive Disks is used to provide increased storage reliability and less downtime. The two most common implementation of RAID which CDR – Manchester Data Recovery Services – receives in for recovery is RAID 1 (mirror) and RAID 5 (distributed parity). Put most simply in the case of a RAID1 or RAID5 there is spare capacity within the storage array to allow for a single disk failure. If you are unfamiliar with the way in which data is distributed across a RAID1 or RAID5 array then please visit our dedicated RAID and Server Data Recovery website.

RAID 5 data loss

Typically we find that the main cause for data loss in a RAID 5 array is due to multiple disk failure. In our experience, it is less frequent for disks (HDD or SSD) to fail simultaneously. Typically there has been a single disk failure some, days, weeks, or months prior to the second disk failure. Unfortunately, during this period the system administrator has either not noticed, or even neglected to replace the first failed disk. Then a second disk failure occurs resulting in failure of the RAID. Frequently the system administrator is unaware in which order the disks have failed.

Establishing the order of failure

Where ever possible it is necessary to create a full disk image of each disk in the RAID set. Not only does this ensure that the data is stored safely and minimise the chance of an additional disk failure, but it allows for the efficient comparison of data across the disk set. This work is conducted first as per our standard single-disk recovery methods. Once disk images have been created it is necessary to compare the data structures across these. If we have access to all the disks in the RAID set then it is possible to run an XOR evaluation to test whether data is consistent across the RAID array. This will give some indication on whether one disk has failed some time before a second.

Understanding RAID 5 and XOR

RAID 5 uses a bitwise “exclusive OR” (XOR) function to compute the parity values from the array data. The XOR function satisfies two important conditions:

  1. If X xor Y = Z, then X = Z xor Y, and also Y = Z xor X.
  2. If X and Y occupy the same number of bits, Z also occupies that number of bits.

X xor Y = Z

0 0 0
1 0 1
0 1 1
1 1 0

Using these properties of XOR function allows one to calculate one of the missing values given all the other.

Hexadecimal Editor Inspection

XOR tests will show if there is good consistency across the disk set. It will not tell us which disk is inconsistent. This must be done by the technician working on the RAID array. Below is an example of a 4 disk RAID 5 opened in WinHex. In this case, it is very clear that one disk had failed some time ago and had not had data written to it. There were large areas of the disk where there was “00” data, whereas on the 3 other disks there was data which was consistent with each other.

RAID5 data recovery sectors winhex

By examining the raw data in a Hexadecimal editor it becomes immediately apparent that HDD3 is not fully integrated into the RAID array.

More disks – an Increased complication

In the example above we have shown a clear case of failure as a result of one not being integrated within the RAID set. In this job, it was straight forward to establish and there were only 4 disks in the RAID array. CDR regularly receives RAID arrays which have more than 10 disks. As the number of disks in the array increases so do the number of variables. The work becomes increasingly challenging as a result of this. Moreover, the order of disk failure can be much more challenging to establish if the failure of the first disk has happened only hours before the second. Nevertheless, if data is being written to the array during that period then it is necessary to establish this order to ensure a fully successful recovery.

Additional complications are also to be had when working with disk sets which are formatted with a non-standard number of bytes-per-sector (usually 520 or 528). Working on enterprise HDDs with a 520+ bps formatting shall be discussed in a separate post.