The D0 online monitoring and automatic DAQ recovery Page: 4 of 8
This article is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to Digital Library by the UNT Libraries Government Documents Department.
The following text was automatically extracted from the image on this page using optical character recognition software:
Computing in High Energy and Nuclear Physics, La Jolla Ca, March 24-28, 2003
the other end of a low-end DSL line. The daqAI auto
recovery program, described in Section 3, is one such
Fermilab is a National Lab, and, as such, all computer
systems critical for the operation of the accelerator and the
taking of data must be protected by a firewall. The MS is no
exception, and thus there is no way to directly contact the
MS from outside the firewall. Early on it was recognized
that this made the system less useful for remote debugging if
displays could not connect. We have received permission to
open a single port to a specific machine across the firewall.
This second machine receives MS requests and relays them
to the MS, and relays the answers back. The relay contains
no intelligence, but does do careful buffer length checks,
illegal character checks, etc. The relay system is a Windows
XP system. All clients must be inside the firewall.
2.4. Monitor Displays and Clients
This section contains a brief description of a number of
the monitor displays and clients we have running in
2.4.1. Monitor Clients
The L3DAQ's readout crates contain a Single Board
Computer (SBC) that runs the VME readout. The system
supplies monitor information on the readout state of each
crate, CPU usage statistics, and data transmission failures.
The statistics furnished by the SBC to the monitor system
requires traversing fairly complex data structures in the
program. We have had to use a fast mutex to protect
modification by the main SBC program while the monitor
data is being collected. Performance of the SBC is not
noticeably affected by the locking because the caching
feature in the MS reduces the monitor requests to about two
The Level 3 farm nodes are another component for which
CPU is a valuable resource. Currently information on
incomplete events and CPU usage are generated. There are
plans to convert trigger pass statistics and physics
performance from another monitor system to the one
described in this paper.
The DO trigger framework (TFW), a non-L3DAQ system,
also generates extensive information. This includes all the
scalars for the Level 1 and Level 2 triggers and configuration
There are also a number of monitor repeaters. For
example, we have one system that monitors a web page
generated by the accelerator division and scrapes the CDF
and DO luminosity, anti-proton stack size, and even the
2.4.2. Monitor Displays
The principle shift monitor displays for the L3DAQ are
written in Java. The designs are based upon the principles
outlined in Tuffte's books on the display of graphical
information . The main L3DAQ display, uMon, contains
a relatively large amount of densely packed information
arranged for interpretation by both experts and non-experts.
In general we find that though non-expert shifts require
about a week to familiarize themselves with the display, they
can diagnose a large range of L3DAQ and other subsystem
problems with just a glance. Figure 4 shows a portion of the
uMon display. A similar display for the L3 CPU farm also
exists. The displays were carefully prototyped with simple
paint programs and handed around to a small group of
experts and non-experts before programming began
(PowerPoint, xfig). The displays' designs and usability
benefited from this process. This set of displays run on both
Linux and Windows.
Inc%:0 " Inc%6:0
0x19 . 0x1f
Inc% :0 ' Inc%6: 0
Figure 4 : A small portion of the uMon shifter-
monitor display. Each large box represents a single readout
crate. The % shows the incomplete event rate for the crate
and below it is the status of the L3DAQ route and event
queues (on the left, in the white area). The yellow area
shows the status of every connection the SBC maintains
(there are three farm nodes down). The white area on the
right is a rate plot; one small downtime is visible as an
inverse white spike.
The L3DAQ also has an expert display based on the
freeware version of Qt . This display has a fairly simple
main window from which further dialog boxes can be
opened. The drill down approach has worked will for getting
progressively more detailed information. The display alters
its monitor data requests to suit the information it needs to
show. Thus it can request detailed, expensive-to-generate
information for one or two particular monitor system clients.
This display also runs on both Linux and Windows.
We also have written a small Windows systray monitor.
This puts a small 32x32 pixel icon in the Windows taskbar
that displays the system's health continuously. It has only a
rate meter and two green/red circles that indicate general
system health. Moving the mouse pointer over the icon will
display a small popup with further information. This small
display was inspired by Quite Computing principles and has
proved useful a useful way for experts to watch L3DAQ
while doing other work.
The systray monitor is often run on a portable, which isn't
always connected to the internet. It is more convenient to use
an http based interface for this monitor tool. There is a web
site that acts as a front-end for the monitor server. The web
site, called l3mq, also allows developers debugging the
system to issue monitor queries without having to write
code. It is also possible to store a query and reissue it by
accessing a single URL. Finally, the web site collects
statistics from the MS about which items have been
requested and maintains a database. The Web Site user can
then add documentation. In the future this will automatically
Here’s what’s next.
This article can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Article.
al., A. Haas et. The D0 online monitoring and automatic DAQ recovery, article, April 6, 2004; Batavia, Illinois. (digital.library.unt.edu/ark:/67531/metadc779188/m1/4/: accessed December 13, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.