Talks

Developing Autonomic Distributed Scientific Applications: A Case Study From History Matching Using Ensemble Kalman-Filters

Shantenu Jha and Yaakoub El Khamra (CCT / LSU)

SLIDES: talk1

The development of simple effective distributed applications that can utilize multiple distributed resources remains challenging. Therefore, not surprisingly, it is difficult to implement advanced application characteristics - such as autonomic behavior for distributed applications. Notwithstanding, there exist a large class of applications which could benefit immensely with support for autonomic properties and behavior. For example, many applications have irregular and highly variable resource requirements, which are very difficult to predict in advance. As a consequence of irregular execution characteristics, dynamic resource requirements are difficult to predict a priori thus rendering static resource mapping techniques such as workflows ineffective; in general the resource utilization problem can be addressed more efficiently using autonomic approaches. This paper discusses the design and development of a prototype framework that can support many of the requirements of Autonomic applications that desire to use Computational Grids. We provide here an initial description of the features and the architecture of the Lazarus framework developed using SAGA, integrate it with an Ensemble Kalman Filter application, and demonstrate the advantages - performance and lower development cost, of the framework. As proof of concept we deploy Lazarus on several different machines on the TeraGrid, and show the effective utilization of several heterogeneous resources and distinct performance enhancements that autonomics provides. Careful analysis provides insight into the primary reason underlying the performance improvements, namely a late-binding and an optimal choice of the configuration of resources selected.

Grid Management Support by means of Collaborative Learning Agents

Wico Mulder and Ceriel Jacobs (VU University Amsterdam)

SLIDES: talk2

The complex and dynamic settings of grid environments lead to challenges on their operational maintenance. The growth of these environments in terms of size and usage requires supporting systems to be of a more sophisticated level. Contemporary tools lack the ability to relate and infer events. Communication across organizational domains and interoperability between existing monitoring tools is subject to improvement. In this paper we present an information system, based on collaborative agents, that supports system administrators in monitoring the grid. While observing log files, the agents learn patterns about job-traffic in their own local domain of the grid and share information to provide global or multi-domain overviews. The agents represent their knowledge in the form of deterministic finite automata (DFA). We discuss our collaborative learning mechanism and show the results of our experiments with data of two grid-sites. Our system generated job-traffic overviews that gave new insights in the performance of the grid environment

Service Mediation and Negotiation Bootstrapping as First Achievements Towards Self-adaptable Grid and Cloud Services   

Ivona Brandic, Dejan Music and Schahram Dustdar (Vienna University of Technology).

SLIDES: talk3

Nowadays, novel computing paradigms as for example Grid or Cloud Computing are gaining more and more on importance. In case of Cloud Computing users pay for the usage of the computing power provided as a service. Beforehand they can negotiate specific functional and non-functional requirements relevant for the application execution. However, providing computing power as a service bears different research challenges. On the one hand dynamic, versatile, and adaptable services are required, which can cope with system failures and environmental changes. On the other hand, human interaction with the system should be minimized. In this paper we present the first results in establishing adaptable, versatile, and dynamic services considering negotiation bootstrapping and service mediation achieved in context of the Foundations of Self-Governing ICT Infrastructures (FoSII) project. We discuss novel meta-negotiation and SLA-mapping solutions for Grid/Cloud services bridging the gap between current QoS models and Grid/Cloud middleware and representing important prerequisites for the establishment of autonomic Grid/Cloud services. We present document models for the specification of meta-negotiations and SLA-mappings. Thereafter, we discuss the sample architecture for the management of meta-negotiations and SLA-mappings.

Responsive Elastic Computing

Julien Perez, Cécile Germain-Renaud, Balazs Kegl and Charles Loomis (CNRS and Université Paris-Sud)

SLIDES: talk4

Two production models are candidates for e-science computing: grids enable hardware and software sharing; clouds propose dynamic resource provisioning (elastic computing). Organized sharing is a fundamental requirement for large scienti c collaborations; responsiveness, (the ability to allocate resources without notice) is a fundamental requirement for seamless integration of the large scale computing resources into everyday use. This paper focuses on a model-free resource provisioning strategy supporting both scenarios. The provisioning problem is modeled as a continuous action-state space, multi-objective reinforcement learning problem, under realistic hypotheses; the high level goals of users, administrators, and shareholders are captured through simple utility functions. We propose an implementation of this reinforcement learning framework, including an approximation of the value function through an Echo State Network, and we validate it on a real dataset.

Investigating Autonomic Behaviours in Grid-Based Computational Science Applications

Shantenu Jha (CCT / LSU), Manish Parashar (Rutgers University) and Omer Rana (Cardiff University).

SLIDES: talk5

Emerging Grid infrastructures present unprecedented opportunities for computational science and engineering, with the potential for fundamental insights into complex phenomenon. However, it also presents unprecedented challenges in terms of its scale, heterogeneity, dynamism and overall complexity that must be addressed before this potential can be realized. Autonomic computing concepts have been e_ectively used to address similar challenges in enterprise systems and applications; in this paper, we explore the role of autonomic computing to Grid-based computational science applications. Speci cally, we use three representative computational applications to motivate Autonomic Computational Science (ACS). Using these applications, we develop a conceptual framework for ACS, consisting of mech- anisms, strategies and objectives, and demonstrate how these concepts can be used to express autonomic behaviors. Finally we explore a research agenda towards realizing ACS behaviors and developing self-* patterns for large scale autonomic computational science application.

Managing democratic grids: architecture and lessons learnt

Leandro Navarro (Universitat Politècnica de Catalunya).

SLIDES: talk6

This paper presents the results and lessons learnt in the EU project Grid4All (2006-2009) on the construction of a democratic grid infrastructure, where self-* properties are an essential element.These grids have challenging and novel requirements: usability, content sharing and collaboration, security, scalability and availability via decentralization, brokerage of resources, and autonomic management of the dynamics of organizations and applications. These are considered essential factors for next generation Internet computing and generic enablers for cloud computing.

We present the architecture of the Grid4All infrastructure, the self-* mechanisms incorporated to enable these grids built on conventional computers and networks contributed by the participants, and the lessons learnt during the construction and evaluation.

The Grid Observatory

Charles Loomis (Loomis (CNRS and Université Paris-Sud)

SLIDES: talk7

Scientific communities worldwide have set up massive grids that manage several tens of thousands of CPUs and several  PetaBytes of storage space. The control, and maintenance of these complex systems remain a serious operational challenge. Application developers need synthetic characterizations of the grid activity and the grid applications for predicting and optimizing application performance. Grid models are required for dimensioning, capacity planning, and middleware design.  The goal of the Grid Observatory pro ject (GO) is to contribute to an experimental theory of large grid systems by integrating the collection of data on the behavior of the EGEE grid and users with the development of models and of an ontology for the domain knowledge.  Autonomic computing is highly relevant to grid systems, especially at a time where production grids have to move to  sustainable infrastructures, with implications on the volume and structure of the manpower dedicated to the day-to-day  operations. The specific Autonomic Computing goals of the GO are Self-Optimization, Self-Healing, and to some extent Self-Configuration.

Automated deployments of complex software systems in cloud environments

Marc-Elian Begin (Sixsq.)

SLIDES:talk8

Complex software systems, like the gLite grid middleware, are inherently hard to deploy, configure, and manage.  However, cloud platforms via their APIs for dynamic management of remote resources promise to ease deployment of those complex systems.  Amazon Web Services (AWS), an early entrant in the “Infrastructure as a Service” (IaaS) field, provides a complete, mature cloud service via a particularly simple API.  To determine if cloud technologies can indeed simplify the deployment of complex software systems, the StratusLab collaboration started an investigation on the feasibility of deploying a grid resource center (based on gLite) into AWS.  To capture the deployment information and to automate the deployment itself, SlipStream(TM), a product of SixSq Sàrl, is used.  In this talk, we share our latest findings from this ongoing experiment and describe possibilities for convergence between grid and cloud technologies and highlights opportunities in deployment automation to simplify the handling of complex software systems.

Characterization of a Computational Grid as a Complex System   

Lovro Ilijasic and Lorenza Saitta (Università del Piemonte Orientale)

SLIDES: talk9

On the grid log data of more than 28 million jobs collected during 20 months, we use statistical and data mining methods to examine the relations primarily between users, computing elements and jobs in the network. The results of the large-scale analysis are used for building probabilistic models of the system behaviour. Bayesian Networks are constructed on historical data and are proven to be able to accurately predict abortion and lengths of newly arriving jobs.

-->