Terabyte IDE RAID-5 Disk Arrays Page: 4 of 8
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
4 2003 Conference for Computing in High Energy and Nuclear Physics, La Jolla, California, March 24-28, 2003
We opted on using ext3 for two reasons: 1) At the time
there were stability problems with ReiserFS and NFS
(this has since been resolved with kernel 2.4.7) and
2) it was an extension of the standard ext2fs (it was
originally developed for the 2.2 kernel) and, if synced
properly could be mounted as ext2. Ext3 is the only
one that will allow direct upgrading from ext2, this is
why it is now the default for RedHat since 7.2.
NFS is a very flexible system that allows one to
manage files on several computers inside a network as
if they were on the local hard disk. So, there's no need
to know what actual file system they are stored under
nor where the files are physically located in order to
access them. Therefore, we use NFS to connect these
disks arrays to computers that cannot run Linux 2.4.
We have successfully used NFS to mount disk arrays
on the following types of computers: a DECstation
5000/150 running Ultrix 4.3A, a Sun UltraSparc 10
running Solaris 7, a Macintosh G3 running MacOS X,
and various Linux boxes with both the 2.2 and 2.4
As an example, in Spring 2002 we built a pair of one
Terabyte Linux RAID-5 arrays, as described in section
3.1, to store CMS Monte Carlo data at CERN. They
were mounted using NFS, via gigabit ethernet. They
remotely served the random background data to the
CMS Monte Carlo Computers, as if it was local. While
this is not as efficient as serving the data directly, it
is clearly a viable technique . We also are cur-
rently using two, NFS mounted, RAID-5 boxes, one
at SLAC and one at the University of Mississippi, to
run analysis software with the BABAR KANGA and
CMS CMSIM/ORCA code.
We have performed a few simple speed tests. The
first was "hdparm -tT /dev/xxx". This test simply
reads a 64 MB chunk of data and measures the speed.
On a single drive we saw read/write speeds of about
30 MB/s. The whole array saw an increase to 95
MB/s. When we tried writing a text file using a simple
FORTRAN program (we wrote "All work and no play
make Jack a dull boy" 108 times), the speed was about
95 MB/s While mounted via NFS over 100 Mb/s eth-
ernet the speed was 2.12 MB/s, limited by both the
ethernet speed and the NFS communication overhead.
In the past , we have been able to get much higher
fractions of the rated ethernet bandwidth by using the
lower level TCP/IP socket protocol  in place of the
higher level NFS protocol. TCP/IP sockets are more
cumbersome to program, but are much faster.
We also tested what actually happens when a disk
fails by turning the power off to one disk in our RAID-
5 array. One could continue to read and write files,
but in a "degraded" mode, that is without the parity
safety net. When a blank disk was added to replace
the failed disk, again one could continue to read and
write files in a mode where the disk access speed is
reduced while the system rebuilt the missing disk as a
background job. This speed reduction in disk access
was due to the fact that the parity regeneration is a
major disk access in its own right. For more details,
see reference .
The performance of Linux IDE software drivers is
improving. The latest standards  include support
for command overlap, READ/WRITE direct mem-
ory access QUEUED commands, scatter/gather data
transfers without intervention of the CPU, and eleva-
tor seeks. Command overlap is a protocol that allows
devices that require extended command time to per-
form a bus release so that commands may be executed
by the other device on the bus. Command queuing
allows the host to issue concurrent commands to the
same device. Elevator seeks minimize disk head move-
ment by optimizing the order of I/O commands. The
Hitachi/IBM 180GXP disk  supports elevator seeks
under the new ATA6 standard .
We did encounter a few problems. We had to mod-
ify "MAKEDEV" to allow for more than eight IDE
devices, that is to allow for disks beyond "/dev/hdg".
For version 2.x one would have to actually modify the
script; however, for version 3.x we just had to modify
the file "/etc/makedev.d/ide". This should no longer
be a problem with newer releases of Linux.
Another problem was the 2 GB file size limit. Older
operating system and compiler libraries used a 32
bit "long-integer" for addressing files; therefore, they
could not normally address files larger than 2 GB
(231). There are patches to the Linux 2.4 kernel and
glibc but there are still some problems with NFS and
not all applications use these patches.
We have found that the current underlying file sys-
tems (ext2, ext3, reiserfs) do not have a 2 GB file
size limit. The limit for ext2/ext3 is in the petabytes.
The 2.4 kernel series supports large files (64-bit off-
sets). Current versions of GNU libc support large
files. However, by default the 32-bit offset interface
is used. To use 64-bit offsets, C/C++ code must be
recompiled with the following as the first line:
#define _FILEOFFSETBITS 64
or the code must use the *64 functions (i.e. open be-
comes open64, etc.) if they exist. This functionality
is not included in GNU FORTRAN (g77); however, it
should be possible to write a simple wrapper C pro-
gram to replace the OPEN statement (perhaps called
open64). We have succeeded in writing files larger
than 2 GB using a simple C program with "#define
_ FILE_ OFFSET_ BITS 64" as the first line. This
works over NFS version 3 but not version 2.
While RAID-5 is recoverable for a hardware fail-
ure, there is no protection against accidental deletion
of files. To address this problem we suggest a sim-
ple script to replace the "rm" command. Rather than
deleting files it would move them to a "/raid/Trash"
or better yet a "/raid/.Trash" directory on the RAID-
5 disk array (similar to the "Trash can" in the Macin-
tosh OS). The system administrator could later purge
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
al., David A. Sanders et. Terabyte IDE RAID-5 Disk Arrays, article, September 30, 2003; Batavia, Illinois. (digital.library.unt.edu/ark:/67531/metadc737292/m1/4/: accessed December 13, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.