|
Advertisement |
Understanding RAID
Posted On April 2, 2012 by Geeta Priya filed under Enterprise
RAID is a technology that provides increased storage functions and reliability through redundancy. Author gives more information on RAID.
RAID (Redundant Array of Independent Disks) is a disk subsystem used to increase performance, provide fault tolerance . . . or both. RAID uses two or more ordinary hard disks and a RAID disk controller. In the past, RAID has also been implemented via software only.
In the late 1980s, RAID stood for ‘redundant array of inexpen-sive disks’, being compared to large, expensive disks at the time. As hard disks became cheaper, the RAID Advisory Board changed ‘inexpensive’ to ‘independent’. RAID greatly influences your storage architecture and amount of storage you need. This article gives you the low-down on RAID basics and helps you determine which RAID is right for you.
Data Striping
Fundamental to RAID is "striping", a method of concatenating multiple drives into one logical storage unit. Striping involves partitioning each drive's storage space into stripes which may be as small as one sector (512 bytes) or as large as several megabytes. These stripes are then interleaved round-robin, so that the combined space is composed alternately of stripes from each drive. In effect, the storage space of the drives is shuffled like a deck of cards. The type of application environment, I/O or data intensive, determines whether large or small stripes should be used.
Most multi-user operating systems today, like NT, Unix and Netware, support overlapped disk I/O operations across multiple drives. However, in order to maximize throughput for the disk subsystem, the I/O load must be balanced across all the drives so that each drive can be kept busy as much as possible. In a multiple drive system without striping, the disk I/O load is never perfectly balanced. Some drives will contain data files which are frequently accessed and some drives will only rarely be accessed. In I/O intensive environments, performance is optimized by striping the drives in the array with stripes large enough so that each record potentially falls entirely within one stripe. This ensures that the data and I/O will be evenly distributed across the array, allowing each drive to work on a different I/O operation, and thus maximize the number of simultaneous I/O operations which can be performed by the array.
In data intensive environments and single-user systems which access large records, small stripes (typically one 512-byte sector in length) can be used so that each record will span across all the drives in the array, each drive storing part of the data from the record. This causes long record accesses to be performed faster, since the data transfer occurs in parallel on multiple drives. Unfortunately, small stripes rule out multiple overlapped I/O operations, since each I/O will typically involve all drives. However, operating systems like DOS which do not allow overlapped disk I/O, will not be negatively impacted. Applications such as on-demand video/audio, medical imaging and data acquisition, which utilize long record accesses, will achieve optimum performance with small stripe arrays.
A potential drawback to using small stripes is that synchronized spindle drives are required in order to keep performance from being degraded when short records are accessed. Without synchronized spindles, each drive in the array will be at different random rotational positions. Since an I/O cannot be completed until every drive has accessed its part of the record, the drive which takes the longest will determine when the I/O completes. The more drives in the array, the more the average access time for the array approaches the worst case single-drive access time. Synchronized spindles assure that every drive in the array reaches its data at the same time. The access time of the array will thus be equal to the average access time of a single drive rather than approaching the worst case access time.
Knowing RAID Basics
RAID subsystems come in all sizes from desktop units to floor-standing models. Stand-alone units might include large amounts of cache as well as redundant power supplies. At first used only with servers, desktop PCs are increasingly being retrofitted by adding a RAID controller and extra inte- grated development environment (IDE) or SCSI disks. Newer motherboards often have RAID controllers.
RAID makes performance better by disk striping, which inter-leaves bytes or groups of bytes across multiple drives, so more than one disk is reading and writing at the same time.
Climbing RAID Levels
Different levels of RAID require dramatically different hardware and software solutions. Here are the main levels of RAID so you can figure out what you do and don’t need:
RAID 0 – Speed (widely used): RAID level 0 is disk striping only, which interleaves data across multiple disks for performance. Used a lot for gaming, RAID 0 has zero safeguards against failure.
RAID 1 – Fault Tolerance (widely used): RAID 1 uses disk mirroring, which provides 100% duplication of data. It’s the most reliable, but doubles storage cost. RAID 1 is widely used in business apps.
RAID 2 – Speed: Instead of single bytes or groups of bytes (blocks), bits are interleaved (striped) across many disks. The Connection Machine (the series
of 1980s supercomputers used at MIT) used this technique, but this is rarely used because 39 disks are required!
RAID 3 – Speed and Fault Tolerance: Data is striped across three or more drives. RAID 3 is used to achieve the highest data transfer, because all drives oper-
ate in parallel. Using byte level striping, parity bits are stored on separate, dedicated drives.
RAID 4 – Speed and Fault Tolerance: Similar to RAID 3, butRAID 4 uses block level striping. Not often used.
RAID 5 – Speed and Fault Tolerance (widely used): Data are striped across three or more drives for performance, and parity bits are used for fault tolerance. The parity bits from two drives are stored on a third drive and are interspersed with user data. RAID 5 is widely used in servers.
RAID 6 – Speed and Fault Tolerance: RAID 6 has the highest reliability because it can recover from a failure of two disks, but isn’t widely used. It’s similar to RAID 5, but performs two different parity computations or the same computation on overlapping subsets of the data.
RAID 10, RAID 100 – Speed and Fault Tolerance: RAID 10 is RAID 1 + 0. The drives are striped for performance (RAID 0), and all striped drives are duplicated (RAID 1) for fault tolerance. RAID 100 is RAID 10 + 0. It adds a layer of striping on top of two or more RAID 10 configurations for even more speed.
Fault tolerance is achieved by mirroring or parity. Mirroring is 100% duplication of the data on two drives (RAID 1). Parity is used to work out the data in two drives and store the results on a third (RAID 3 or 5). After a faulty drive is replaced, the RAID controller automatically rebuilds the lost data from the other two. RAID systems might have a spare drive (called a hot spare) ready and waiting to be the replacement for a broken one.
Software versus hardware
Software-based solutions utilise custom firmware running on general purpose processors for their RAID algorithms and exclusive OR (XOR) parity calculations (see Table-1).
Table -1 Truth Table
Input Output
#1 #2 OR XOR
0 0 0 0
0 1 1 1
1 0 1 1
1 1 1 0
RAID parity calculations are a very compute-intensive process and software-based solutions suffer a larger performance penalty compared to hardware-assist solutions – especially during peak workloads and when the system is in degraded mode rebuilding a failed drive. For RAID 6 implementations, the performance impact is even more severe due to a second parity calculation and more complex parity algorithms.
Hardware-assist solutions utilise chip-level XOR engines to generate the compute-intensive RAID parity calculations. These solutions are able to handle large numbers of parity calculations easily with less of a performance penalty. Hardware-assist implementations are widely viewed as the more efficient and thus superior solution.
Dual parity versus P+Q
Dual parity (DP) implementations combine a conventional ‘horizontal’ parity stripe with a second ‘vertical’ parity stripe. However, the intricate algorithms that create the vertical parity stripes, and complex error and exception handling requirements, result in a RAID 6 implementation that is
noticeably slower than P+Q RAID 6. (The P Parity is exactly like it would be for RAID 5. Q is calculated based on Error Correcting Code. The Q is then striped across all the disks within the RAID Group.)
P+Q implementations are similar to RAID 5 and simply have two parity values within each data stripe. Typically implemented with hardware-assist to enable it to achieve higher performance than dual parity implementations; including slightly higher IOPS performance, 10–30% more sequential write performance and faster drive rebuilds.
Raid 6 has minimal performance impact compared to RAID 5; typically less than 3%.
If you want the quickest rebuild times of all, use RAID 6 with the P+Q implementation. RAID 6 and RAID DP have issues and the RAID 6 P+Q implementation has less than 3% performance impact against RAID 5.
Deciding Which RAID You Need
You need to think about a number of factors when choosing the appropriate RAID level for a specific application. Cost is one of them (obviously!), but right from the start, the level of data protection and performance are the two main drivers. The application’s read/write ratio will actually dictate which RAID level is best.
RAID 5 is better suited for mostly read-oriented applications. Because it has to write parity information for each block of data it writes to the RAID set, it’s not well suited for write intensive apps like some transactional databases. Remember that RAID 5 alone doesn’t provide multiple-drive failure protection.
The MS Exchange Information Store is mostly read-oriented, so it’s typically better off on a RAID 5. This is the most cost-efficient way to provide data protection while maintaining read performance. But MS Exchange also uses logs that can get quite busy. Because these logs are mostly write-oriented, they perform better on a RAID 0+1 array. You have to evaluate the read/write ratio for your database app to determine the best type of RAID.
Of course, if your apps aren’t very busy, the difference in performance between RAID 0+1 and RAID 5 becomes way less noticeable and cost can become the driving factor. Consider a combination of both RAID technologies if your disk array allows it.
Always make decisions on disk technology and RAID levels in the order of performance, availability and capacity, never on capacity first. Start with nirvana and work backwards.
For more information on RAID you can send the email to gplr93@gmail.com




