All About RAID

RAID systems are categorised by RAID levels. Levels 0 to 5 are known as the Berkeley RAID levels from their origin. Other levels have been invented by various people. Here is a summary of RAID levels in use.

RAID 0 - stripes data across two (or more) disks. This gives better read and write bandwidth for bulk transfers, and under heavy load (multiple threads) it will provide better random access and small file access.
RAID 1 - mirrors data across two disks. This gives better read performance (doubles the speed under heavy load) and means that a failure of a single disk will not result in data loss.
This can be implemented with more than two disks for either better read performance under really heavy load, or for great paranoia. I have never seen or heard of such a configuration being used, but I'm sure it's out there somewhere.
Some implementations of RAID 1 such as in IBM's AIX may read from both disks for every access (this won't allow any performance increase). I wonder what they would do if both disks were different...
RAID 2 and RAID 3 - I have not heard of these being implemented. They seem to only be used in computer science text books.
RAID 4 - this involves having two or more data disks over which the data is striped in a similar fashion to RAID 0. Then there is a single parity disk. Usually each block on the parity disks contains an XOR of the data on the same block on each of the data disks, the parity disk could use other parity algorithms - a checksum could be used just as well (but I will refer to XOR throughout this document).
If a disk dies then it can be reconstructed from the XOR of the other disks.
Reading data is the same as reading from RAID 0 when all disks are functional. If one of the data disks is broken then the RAID system will read from all the other data disks and the parity disk and return the XOR of this data (which will be the same as the data that had been written to the lost disk).
Writing data requires changing the parity disk too. To do this we have to read the original data from the block that is to be written, and the parity data for that block. The new parity block will be the XOR of the old parity block, the old data block, and the new data block. This means that write performance is poor (worse than a single non-RAID disk).
RAID 5 - the same as RAID 4 but the parity is spread across all the disks. This makes it significantly faster than RAID 4, and there is no down-side. So RAID 4 is almost never used. If a RAID 5 array is used for heavy writes then I expect the performance to be less than 2/N times the performance of a single disk (where N is the number of data disks). The write performance of a 3 disk array (2 data disks) in my tests is less than the performance of a single disk, I haven't had an opportunity to test other array sizes.
This is the highest RAID level defined by the Berkeley RAID.
RAID 0+1 / RAID 1+0 / RAID 10 - this means running RAID 1 over some RAID 0 stripes. This gives mirroring for reliability and read bandwidth, and striping for write bandwidth, capacity, and read bandwidth under heavy load.
Mylex refers to this as RAID 6