The 2005 IEEE International Conference
on Cluster Computing
Burlington Marriott, Burlington, MA, USA
Towards Highly Available,
Scalable, and Secure HPC Clusters
with HA-OSCAR
The tutorial is intended for scientists and engineers interested on learning the state of the art in building highly available clusters for high performance and enterprise computing using Linux and Open Source tools and software.
Duration
A half day,
Presenters
Dr.
Chokchai (Box) Leangsuksun Ibrahim Haddad
Center for Entrepreneurship and Information Technology
Open Source Development Labs
Phone: 1.318.257.3291
Phone:
1.503.906.1914
Fax:
1.318.257.4922 Fax:
1 503 626-2436
Email: box@latech.edu Email:
ibrahim@osdl.org
Dr. Stephen L. Scott
Computer Science and Mathematics Division
Oak Ridge National
Laboratory
One Bethel Valley Road
P.O. Box , MS-6016
Phone: 1.865.574.3144
Fax: 1.865.576.5491
Email: scottsl@ornl.gov
Abstract
March 2004 was a major milestone for the HA-OSCAR
Working Group. It marked the announcement of the first public release of the
HA-OSCAR software package. HA-OSCAR is an Open Source project that aims to
provide a combined power of high availability and performance computing.
HA-OSCAR enhances a Beowulf cluster system for mission critical grade
applications with various high availability mechanisms such as component
redundancy to eliminate this single point of failure, self-healing mechanism,
failure detection and recovery mechanisms, in addition to supporting automatic
failover and fail-back.
The first release (version 1.0) supports new high
availability capabilities for Linux Beowulf clusters based on the OSCAR 3.0
release from the Open Cluster Group. In this release of HA-OSCAR, we provide an
installation wizard graphical user interface and a web-based administration
tool, which allows intuitive creation and configuration of a multi-head Beowulf
cluster. In addition, we have included a default set of monitoring services to
ensure that critical services, hardware components, and important cluster
resources are always available at the control node. HA-OSCAR also supports new
tailored services that can be configured and added via a WebMin-based
HA-OSCAR administration tool.
This tutorial will address in detail all the design
and implementation issues related to building HA Linux Beowulf clusters and
using Linux and Open Source Software as the base technology. In addition, the
focus of the tutorial is HA-OSCAR. We will present the architecture of
HA-OSCAR, review of new features of the latest release, discuss how we
implemented the HA and security features, and discuss our experiments covering
modeling, and testing performance and availability on real systems.
The HA-OSCAR project’s primary goal is to improve the
existing Beowulf architecture and cluster management systems while providing high-availability
and scalability capabilities for Linux clusters. HA-OSCAR introduces several
enhancements and new features to OSCAR, mainly in the areas of availability,
scalability, and security. The new features in the initial release are head
node redundancy, self-recovery for hardware, service, and application outages.
HA-OSCAR has been tested to work with several OSCAR distributions. HA-OSCAR
should work with OSCAR 2.3, 2.3.1, 3.0 based on Red Hat 9.0 and OSCAR 4.0 based
on Fedora core 2. The first version (1.0) was released on
Tutorial Detailed Description
Introduction (20%)
·
Introduction
HA clustering
·
Various
levels of HA
·
Linux:
the commodity component of the cluster stack
·
Software
and hardware system architecture
·
Challenges
in Designing and Prototyping HA/HPC Clusters
OSCAR (20%)
·
Introduction
·
Cluster Computing
Overview
·
OSCAR - "The
Beginning" - Overview / Strategy
·
OSCAR Components
(Functional areas)
o
Core, Admin/Config, HPC Services
o
Core Components: SIS, C3, Switcher, ODA, OPD
·
"The
o
R
·
OSCAR Wizard (v3.0)
HA-OSCAR (50%)
·
HA-OSCAR
overview
·
HA-OSCAR
architecture and components
·
HA-OSCAR
comparison with Beowulf architecture
·
HA
features
·
Multi-head
builder and Self-configuration
·
Monitoring
o
Service
monitoring
o
Hardware
monitoring
o
Resource
monitoring
·
Self-healing
and recovery mechanism
·
Test
environment
·
Installation
Steps
·
Experiments
·
Availability
moldering, analysis and uptime improvement study between Beowulf and HA-OSCAR
·
Test
results
·
Applications
and feasibility studies
·
Grid-enable
HA cluster
·
HA-OSCAR
and Distributed Security Infrastructure integration
Demonstration (with 4 laptops running latest research release of
HA-OSCAR)
Conclusion (10%)
·
HA-OSCAR
Roadmap
·
Advanced
research
·
Questions
and answers
Presenters Bio
Dr. Chokchai Leangsuksun is an Associate Professor Computer Science,
Ibrahim Haddad is a member of OSDL
Engineering Department acting as Strategic Program Manager for the Carrier
Grade Linux Initiative. Prior to joining OSDL, Ibrahim was a Senior Researcher
at the "Research and Innovation" Unit, Ericsson Research Corporate
Unit, in Montreal, Canada, where he was involved with the server system
architecture for 3G wireless IP networks and promoting the use of Linux in
telecommunications.
Ibrahim is Contributing
Editor to the Linux Journal and LinuxWorld magazine.
In addition, he contributes regularly to the O’Reilly Network, Sys Admin
Magazine, and Linux User & Developer magazine. He has delivered a number of
presentations and tutorials at local universities, IEEE and ACM conferences,
Open Source forums, and international conferences.
Ibrahim contributed to two
of Richard Petersen books, "Red Hat Linux Pocket Administrator" and
"Red Hat: The Complete Reference (DVD Edition)", both published by
McGraw-Hill/Osborne. He received his Bachelor and Master degrees in Computer
Science from the
The following is the list of tutorials previously
presented by Ibrahim Haddad:
• “Design and Implementation of HA Linux Clusters”,
IEEE Cluster Conference 2001
• “Design and Implementation of Benchmarking
Environments”, ACM Sigmetrics 2002
• “Supporting IPv6 on Linux Servers”,
• “Supporting IPv6 on Linux Clusters”, IEEE Cluster
Conference 2002
• “Networking Protocols for UMTS and 3G Services”, ACM
Multimedia 2002
• “IPv6: The New Internet Protocol - All You Wanted to
Know”, Real World Linux 2003
• “IPv6: The New IP Protocol”, Internetworking 2003
• “Carrier Grade Linux Platforms: Characteristics and
Development Efforts”, EuroPar 2003
• “Carrier Grade Linux”, Real World Linux 2004
• “Wireless Carrier Grade Platforms: Characteristics
and Ongoing Development Efforts”, International Conference on E-Business and
Telecommunication Networks 2004
• “HA-OSCAR: Building Highly Available Linux
Clusters”, IEEE Cluster 2004.
Dr. Stephen L. Scott is a senior research scientist in the Network and Cluster Computing Group
of the Computer Science and Mathematics Division of Oak Ridge National
Laboratory (ORNL) –