Description: Today's system monitoring tools are capable of detectingsystem failures such as host failures, OS errors, and network partitionsin near-real time. Unfortunately, the same cannot yet be said of theend-to-end distributed softwarestack. Any given action, for example,reliably transferring a directory of files, can involve a wide range ofcomplex and interrelated actions across multiple pieces of software:checking user certificates and permissions, getting details for allfiles, performing third-party transfers, understanding re-try policydecisions, etc. We present an infrastructure for troubleshooting complexmiddleware, a general purpose technique for configurable logsummarization, and an anomaly detection technique that works in near-realtime on running Grid middleware. We present results gathered using thisinfrastructure from instrumented Grid middleware and applications runningon the Emulab testbed. From these results, we analyze the effectivenessof several algorithms at accurately detecting a variety of performanceanomalies.
Date: August 1, 2007
Creator: Gunter, Dan; Tierney, Brian L.; Brown, Aaron; Swany, Martin; Bresnahan, John & Schopf, Jennifer M.
Item Type: Refine your search to only Article
Partner: UNT Libraries Government Documents Department