RAID in 2018
Still Not Quite Obsolete
I’ve talked to a lot of professionals in the IT industry and some surprisingly don’t even know what RAID is! Others think it is unnecessary, while some think RAID is a replacement for backups still (something admins and hardware techs have been harping about for decades now). First, I’ll give a quick introduction into what RAID is, what it isn’t and its applications in the real world.
RAID stands for Redundant Array of Independent Disks. I think the term is a little bit unnecessary in todays’ world but let’s break it down.
First of all we are talking about an array of connected, separate hard disk drives. These could be 2.5″, 3.5″, SAS, SATA or SSD as far as our implementation and OS they are all essentially the same to the computer that they are connected to.
There are 5 levels or versions of RAID as follows:
- RAID 0 AKA striping (two drives required). This takes two identical hard drives and combines their performance and capacities to make what appears to be a single drive. Performance with 0 is excellent but the disadvantage is that a failure of any single disk will result in dataloss and the array going offline. There is no recovery except for backups. I never recommend RAID 0.
- RAID 1 AKA mirroring (two drives required). It is called mirroring because both drives contain an identical copy of the data. Performance is enhanced on reads because data can be read twice as fast but simultaneously reading from the 2 separate hard drives at once. There is a performance penalty in terms of writing since the data must be written to both drives at once (however this is usually not an issue for most servers since the majority are read intensive on average).
- RAID 5 (3 + drives required). RAID 5 has in the distant past been one of the most common RAIDs as it provides enhanced performance and some redundancy but it is very prone to faults, failures and slow rebuild times. It uses a parity drive that is essentially spread between the others but this parity often results in performance degradation unless a hardware RAID card is used. It can withstand a single drive failure but NOT 2 drives. Performance of reads is good but the parity calculations slow down performance.
- RAID 6 (4+ drives required). Similar to RAID 6 but two drives are used for parity so it could survive 2 drives failing and is more fault tolerant. It takes even longer to rebuild on RAID 6 than RAID 5. Performance of reads is good but the parity calculations slow down performance.
- RAID 10 AKA 1+0 (requires 4 or more drives). It is a combination of the sum of two RAID 1 arrays, striped together as a RAID 0. It delivers excellent performance and is fault tolerant (a drive of each RAID 1 could die without any ill effect aside from some performance reduction). Rebuild times are similar to RAID 1 and are much faster than RAID 5 or 6.
Rather than over complicating this issue I will try to give a practical take in 2018 of what RAID means. Some have said RAID is obsolete but usually they are referring to the nearly impossible resync or rebuild times on large multi-terabyte RAID 5/6 arrays. I would agree there as I’ve never liked RAID 5 or 6 and whether you like it or not it is very impractical to use.
So what is the best way to go?
RAID 1 If you only have 2 drives then I think RAID 1 is an excellent trade off. It is quick and easy to resync/rebuild, a single drive can die and you will still not have any data loss, yet when both are active you have a performance boost in
RAID 10 If you have 4 drives you gain extra performance in a RAID 10 configuration with fault tolerance that a single drive on each RAID 1 could die without dataloss.
The main disadvantage is that with RAID 1 and RAID 10 you are essentially losing 50% of your storage space but since storage/drives are relatively cheap I think it’s been a worthy tradeoff.
There are some people who spout that “drives are more reliable today” and “you don’t need RAID anymore” but I hardly find this true. I’d actually argue that SSD drives may be more unreliable or unpredictable than mechanical hard drives. One thing we can all agree on is that the most likely component to fail in a server is a hard disk and that’s not likely to change any time soon as much as we like to believe flash based storage is more reliable. I’d also ask anyone who thinks running on a single drive (even with backups) that isn’t the performance benefit and redundancy worth running RAID? I’m sure most datacenter techs and server admins would agree that it is much better to hotswap/replace a disk than it is to deal with downtime and restoring from backups right?
Now for the warnings. RAID “protection” is NOT a replacement for backups even if nothing ever dies. The reason for this is to understand the misleading term of RAID “protection” that some in the industry use. It is true in sense that you are protected from dataloss if a single drive fails (or possibly 2 in some RAID levels). However this doesn’t take into account natural disasters, theft, accidental or willful deletion or destruction of data.
I’d say as it stands in 2018 and beyond that everyone should be using at least RAID1 or RAID10 if possible in nearly every use case. There are a few possible exceptions to this rule but they are rare and even then you should aim for as much redundancy as possible.
In conclusion, if you can use RAID 1, preferably RAID 10. If you can’t use RAID, learn and use it anyway.
Cheers!
A.Yasir