Understanding Different RAID Levels

Advantages and disadvantages of various RAID levels

Today’s consumers have become used to having access to every service online instantly and expect it to function without interruption no matter what.

As a business owner, you have many features to consider when choosing the right system and infrastructure for your critical online applications. One of the features you have to consider when choosing the right server for your business is whether to enable RAID on your system, but more importantly, what type of RAID to choose to fit your technical needs. Below we will go through all the pros and cons of each RAID level and give suggestions on which type to choose for your set up.

RAID, short for redundant array of independent (originally inexpensive) disks is a disk subsystem that stores your data across multiple disks to either increase the performance or provide fault tolerance to your system (some levels provide both).

There are two ways of implementing the system. Software raid and hardware raid.

Hardware raid is directly managed by a dedicated hardware controller to which the disks are connected. The raid calculations are managed by an on-board processor which offloads the strain on the host processor CPU. However, the performance of today’s CPUs has increased so much, that this advantage has become more or less obsolete. HW controllers do provide an extra failsafe element with its BBU (Battery Backup Unit) that protects your data in case of an unexpected power loss to the server.

Software RAID is part of the OS and is the easiest and most cost effective implementation. It does not require the use of an additional (often costly) piece of hardware and the proprietary firmware.

Here is a list of the most used RAID levels:

RAID 0 (Disk striping)

Minimum number of disks: 2

Advantages

  • RAID 0 offers great performance, both in read and write operations. There is no overhead caused by parity controls.
  • All storage capacity is used, there is no overhead.
  • The technology is easy to implement.

Disadvantages

  • RAID 0 is not fault-tolerant. If one drive fails, all data in the RAID 0 array are lost. It should not be used for mission-critical systems.

Ideal Use: RAID 0 is ideal for non-critical storage of data that have to be read/written at a high speed, such as on an image retouching or video editing station.

If you want to use RAID 0 purely to combine the storage capacity of twee drives in a single volume, consider mounting one drive in the folder path of the other drive. This is supported in Linux, OS X as well as Windows and has the advantage that a single drive failure has no impact on the data of the second disk or SSD drive.

 

RAID 0 splits data across any number of disks allowing higher data throughput. An individual file is read from multiple disks giving it access to the speed and capacity of all of them. This RAID level is often referred to as striping and has the benefit of increased performance. However, it does not facilitate any kind of redundancy and fault tolerance as it does not duplicate data or store any parity information (more on parity later). Both disks appear as a single partition, so when one of them fails, it breaks the array and results in data loss. RAID 0 is usually implemented for caching live streams and other files where speed is important and reliability/data loss is secondary.

RAID 1 (Disk Mirroring)

Minimum number of disks: 2

Advantages

  • RAID 1 offers excellent read speed and a write-speed that is comparable to that of a single drive.
  • In case a drive fails, data do not have to be rebuild, they just have to be copied to the replacement drive.
  • RAID 1 is a very simple technology.

Disadvantages

  • The main disadvantage is that the effective storage capacity is only half of the total drive capacity because all data get written twice.
  • Software RAID 1 solutions do not always allow a hot swap of a failed drive. That means the failed drive can only be replaced after powering down the computer it is attached to. For servers that are used simultaneously by many people, this may not be acceptable. Such systems typically use hardware controllers that do support hot swapping.

Ideal use: RAID-1 is ideal for mission-critical storage, for instance for accounting systems. It is also suitable for small servers in which only two data drives will be used.

 

RAID 1 writes and reads identical data to pairs of drives. This process is often called data mirroring and it’s primary function is to provide redundancy. If any of the disks in the array fails, the system can still access data from the remaining disk(s). Once you replace the faulty disk with a new one, the data is copied to it from the functioning disk(s) to rebuild the array. RAID 1 is the easiest way to create failover storage.

RAID 1E

Minimum number of disks: 3

RAID1E is a mirror made over odd number of disks. With RAID1E you still get 50% overhead because each data block is stored on two mirror copies. Unlike RAID1, RAID1E uses the striping technique which gives you an increase in read speed even for degraded configurations.

With RAID1, you are supposed to use only 2 drives or maximum three (3-way mirror) because to have more than 3 copies of the same data is really costly in terms of disk space. RAID1E allows you to stick to the mirror configuration while having more than two disks in the set.

Use RAID1E when you need to get a reliable storage, surviving a single disk failure, made over odd number of disks.

RAID 2

RAID 2 consists of bit-level striping with dedicated Hamming-code parity. All disk spindle rotation is synchronized and data is striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. This level is of historical significance only; although it was used on some early machines as of 2014 it is not used by any commercially available system.

RAID 3

RAID 3 consists of byte-level striping with dedicated parity. All disk spindle rotation is synchronized and data is striped such that each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive. Although implementations exist, RAID 3 is not commonly used in practice.

RAID 4

RAID 4 consists of block-level striping with dedicated parity. This level was previously used by NetApp, but has now been largely replaced by a proprietary implementation of RAID 4 with two parity disks, called RAID-DP. The main advantage of RAID 4 over RAID 2 and 3 is I/O parallelism: in RAID 2 and 3, a single read I/O operation requires reading the whole group of data drives, while in RAID 4 one I/O read operation does not have to spread across all data drives. As a result, more I/O operations can be executed in parallel, improving the performance of small transfers.

RAID 5 (Striping with parity)

Minimum number of disks: 3

Advantages

  • Read data transactions are very fast while write data transactions are somewhat slower (due to the parity that has to be calculated).
  • If a drive fails, you still have access to all data, even while the failed drive is being replaced and the storage controller rebuilds the data on the new drive.

Disadvantages

  • Drive failures have an effect on throughput, although this is still acceptable.
  • This is complex technology. If one of the disks in an array using 4TB disks fails and is replaced, restoring the data (the rebuild time) may take a day or longer, depending on the load on the array and the speed of the controller. If another disk goes bad during that time, data are lost forever.

Ideal use: RAID 5 is a good all-round system that combines efficient storage with excellent security and decent performance. It is ideal for file and application servers that have a limited number of data drives

 

RAID 5 stripes data blocks across multiple disks like RAID 0, however, it also stores parity information (Small amount of data that can accurately describe larger amounts of data) which is used to recover the data in case of disk failure. This level offers both speed (data is accessed from multiple disks) and redundancy as parity data is stored across all of the disks. If any of the disks in the array fails, data is recreated from the remaining distributed data and parity blocks. It uses approximately one-third of the available disk capacity for storing parity information.

RAID 6 (Striping with double parity)

Minimum number of disks: 4

Advantages

  • Like with RAID 5, read data transactions are very fast.
  • If two drives fail, you still have access to all data, even while the failed drives are being replaced. So RAID 6 is more secure than RAID 5.

Disadvantages

  • Write data transactions are slower than RAID 5 due to the additional parity data that have to be calculated. In one report I read the write performance was 20% lower.
  • Drive failures have an effect on throughput, although this is still acceptable.
  • This is complex technology. Rebuilding an array in which one drive failed can take a long time.

Ideal use: RAID 6 is a good all-round system that combines efficient storage with excellent security and decent performance. It is preferable over RAID 5 in file and application servers that use many large drives for data storage.

 

RAID 6 is like RAID 5, but the parity data are written to two drives. That means it requires at least 4 drives and can withstand 2 drives dying simultaneously. The chances that two drives break down at exactly the same moment are of course very small. However, if a drive in a RAID 5 systems dies and is replaced by a new drive, it takes hours or even more than a day to rebuild the swapped drive. If another drive dies during that time, you still lose all of your data. With RAID 6, the RAID array will even survive that second failure.

RAID 7

RAID 7 is a trademarked RAID level owned by the now defunct Storage Computer Corp. It is a non-standard RAID level that requires proprietary hardware. RAID 7 is based on RAID 3 and RAID 4 but adds caching. The commercial implementation incorporates proprietary disk array hardware with a real-time embedded operating system to control disk drive access and data flow to host interfaces.

RAID 10 (Striping + Mirroring)

Minimum number of disks: 4

Advantages

  • If something goes wrong with one of the disks in a RAID 10 configuration, the rebuild time is very fast since all that is needed is copying all the data from the surviving mirror to a new drive. This can take as little as 30 minutes for drives of  1 TB.

Disadvantages

  • Half of the storage capacity goes to mirroring, so compared to large RAID 5  or RAID 6 arrays, this is an expensive way to have redundancy.

Ideal use: Highly utilized database servers/ servers performing a lot of write operations.

RAID 10 combines the mirroring of RAID 1 with the striping of RAID 0. Or in other words, it combines the redundancy of RAID 1 with the increased performance of RAID 0. It is best suitable for environments where both high performance and security is required.

RAID50 (RAID5 arrays combined in a RAID0)

Minimum number of disks: 6

RAID50 consists of several RAID5 arrays combined into a RAID0. As discussed above, to create a RAID0 array, you need at least two disks, while for a RAID5 you need to provide three disks minimum. Given the numbers, we may conclude that for a RAID50 we need at least six disks.

With a RAID50, you can get the performance increase in terms of read speed by (N-1)*K times, where N is the number of disks in each RAID5 group and K is the number of RAID5 groups forming a RAID0.

As far as fault tolerance goes, RAID 50, like a regular RAID5, is guaranteed to survive a single disk failure. However, sometimes more disks may fail without data loss - this is possible if the failed disks are from different RAID5 groups.

In practice, the configurations like RAID50 are typically used with a large number of disks, and with several disks reserved as hot spare.

RAID60 (RAID6 arrays combined into a RAID0)

Minimum number of disks: 8

RAID 60 is a nested RAID method in which the constituent drives are organized into several identical RAID 6 logical drive sets (parity groups). The smallest possible RAID 60 configuration has eight drives organized into two parity groups of four drives each.

For any given number of hard drives, data loss is least likely to occur when the drives are arranged into the configuration that has the largest possible number of parity groups. For example, five parity groups of four drives are more secure than four parity groups of five drives. However, less data can be stored on the array with the larger number of parity groups.

The number of physical drives must be exactly divisible by the number of parity groups. Therefore, the number of parity groups that you can specify is restricted by the number of physical drives. The maximum number of parity groups possible for a particular number of physical drives is the total number of drives divided by the minimum number of drives necessary for that RAID level (three for RAID 50, 4 for RAID 60).

A minimum of 8 drives is required.

All data is lost if a third drive in a parity group fails before one of the other failed drives in the parity group has finished rebuilding. A greater percentage of array capacity is used to store redundant or parity data than with non-nested RAID methods.

This method has the following benefits:

  • Higher performance than for RAID 6, especially during writes.
  • Better fault tolerance than RAID 0, 5, 50, or 6.
  • Up to 2n physical drives can fail (where n is the number of parity groups) without loss of data, as long as no more than two failed drives are in the same parity group.

RAID 0+1 (Nested Hybrid RAID)

creates two stripes and mirrors them. If a single drive failure occurs then one of the stripes has failed, at this point it is running effectively as RAID 0 with no redundancy. Significantly higher risk is introduced during a rebuild than RAID 1+0 as all the data from all the drives in the remaining stripe has to be read rather than just from one drive, increasing the chance of an unrecoverable read error (URE) and significantly extending the rebuild window.

RAID 1+0 (Nested Hybrid RAID)

RAID 1+0: creates a striped set from a series of mirrored drives. The array can sustain multiple drive losses so long as no mirror loses all its drives.

JBOD RAID N+N (Nested Hybrid RAID)

JBOD RAID N+N: With JBOD (just a bunch of disks), it is possible to concatenate disks, but also volumes such as RAID sets. With larger drive capacities, write and rebuilding time may increase dramatically (especially, as described above, with RAID 5 and RAID 6). By splitting larger RAID sets into smaller subsets and concatenating them with JBOD, write and rebuilding time may be reduced. If a hardware RAID controller is not capable of nesting JBOD with RAID, then JBOD can be achieved with software RAID in combination with RAID set volumes offered by the hardware RAID controller. There is another advantage in the form of disaster recovery, if a small RAID subset fails, then the data on the other RAID subsets is not lost, reducing restore time.

RAID Levels Comparison Chart

RAID is no substitute for back-up!

All RAID levels except RAID 0 offer protection from a single drive failure. A RAID 6 system even survives 2 disks dying simultaneously. For a complete security, you do still need to back-up the data from a RAID system.

That back-up will come in handy if all drives fail simultaneously because of a power spike. It is a safeguard when the storage system gets stolen. Back-ups can be kept off-site at a different location. This can come in handy if a natural disaster or fire destroys your workplace.

The most important reason to back-up multiple generations of data is user error. If someone accidentally deletes some important data and this goes unnoticed for several hours, days or weeks, a good set of back-ups ensure you can still retrieve those files.