Grid Data Access on Widely Distributed Worker Nodes Using Scalla and SRM Page: 4 of 11
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
disruption of the entire system. Last, but not least it should have Grid support meaning an
ability to connect to other instances located in different parts of the world. More precisely,
the capability to share and interchange data with other storage solutions. One of the solutions
partially complying with all these mentioned requirements, well known in HENP computing is
Scalla (Structured Cluster Architecture for Low Latency Access) aka Xrootd . The purpose
of this paper is not dedicated to explanation of Scalla package architecture , but rather the
focus was given on features which this tool lacks and also to present a solution providing them.
4. Enabling MSS access in large deployment context
At STAR, the usual analysis is performed with the help of software called SUMS (STAR Unified
Meta Scheduler). Each user has to describe its intend with special language as for example: input
to the task, location to store output of a task, program which will be used for processing etc.
Input to the task can be specified as a list of files or using meta-data query (energy, collision
etc.) to the STAR File catalog which resolves this query into particular physical data-sets.
According to the user's specification, SUMS orders the data-sets, splits them into particular
jobs and submits them into preferred batch system queue. As one can imagine, there must be a
limitation of job's run-time to control resource sharing in such multi-user shared environment.
The usual practice to limit user's job run-time is having a hard limit in clock time. In simply
way, this means that job is killed when it is running too long time. This finding comes to the
initial problem when a user accesses a file from the tape system within a job. Delays are highly
expected and when the access and performance is not enough efficient, the jobs is sooner or later
From our observation and usage at STAR, the average time to restore one file from the tape
system was about - 24 minutes. By simple counting, when a user requests 1000 files, we get
the time period of 400 hours. This number is almost impossible to adjust as a hard limit.
What about jobs requesting more then 1000 files ? The one of the next facts confirming very
slow performance for dynamic disk population is the plot showing performance of the system
DataCarousel  used for efficient management of the unique system which needs to be shared
among many users requesting files from/to tape system. In respect to STAR/RHIC experiment,
the tape system is represented by High Performance Storage System (HPSS).
The plot 6(a) shows performance in MB per second in relation to time range. Obviously, the
performance is in average 10Mb/s upon 9 tape drives. The theoretical limit for one tape drive
depends on drive technology being used (in STAR 9940b-~ 30 MB/s, LTO-3-70-80 MB/sec).
Most of the STAR files (around 98%) are located on 9940b tapes where 9 tape drives corresponds
to the theoretical data throughput of 270 MB/sec. All these observations call for studying key
parameters how to optimize and tune performance of the tape system.
5. Key parameters of tape system performance
There were performed several studies on how the performance of the tape system can be increased
and its key parameters influencing the efficiency , . From those studies, there are several
inquiries which directly affect the performance of one tape drive, but also parameters implicitly
influencing the overall aggregate performance of the system:
(i) measured maximum theoretical performance value per one drive (9940b 25-30 MB/s, LTO-3
(ii) the coupling to disk storage cache and the performance of it
(iii) the access pattern defined by user application and its disposal on efficient managing of
multiple tape drives
(iv) the length of uninterrupted streaming of a file from/to tape and per file overhead on tape
seeking, both of which are affected by the file size
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
Jakl, Pavel; /Prague, Inst. Phys.; Lauret, Jerome; /Brookhaven; Hanushevsky, Andrew; /SLAC et al. Grid Data Access on Widely Distributed Worker Nodes Using Scalla and SRM, article, November 10, 2011; United States. (digital.library.unt.edu/ark:/67531/metadc846972/m1/4/: accessed November 21, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.