24 April 2008

Reducing Wasted Resources to Help Achieve Green Data Centers

Last week we presented a research  paper at the 4th Workshop on High-Performance, Power-Aware Computing (HPPAC 2008) held in Miami, FL, USA, in conjunction with the 22nd Annual International Parallel & Distributed Processing Symposium (IPDPS 2008).

In this paper we introduce a new approach to the consolidation strategy of a data center that allows an important reduction in the amount of active nodes required to process a heterogeneous workload without degrading the offered service level. This article reflects and demonstrates that consolidation of dynamic workloads does not end with virtualization. If energy-efficiency is pursued, the workloads can be consolidated even more using two techniques, memory compression and request discrimination, which were separately studied and validated in previous work and are now to be combined in a joint effort. We evaluate the approach using a representative workload scenario composed of numerical applications and a real workload obtained from a top national travel website. Our results indicate that an important improvement can be achieved using 20% less servers to do the same work. We believe that this serves as an illustrative example of a new way of management: tailoring the resources to meet high level energy efficiency goals.

torres_hp-pac_2008

Green Computing

Comments Off

Permalink

[MASTER COURSE CANS] What Cloud Computing Means : everyone in class seems to have a different definition!

Dear students, as we discussed in class, we should consider “cloud computing” in the course (EEDC) because we are definitely at an inflection point today whereby distributed computing execution environments turn towards cloud computing environments. However during the first class it was difficult for all of us to find a consensus on the meaning of the word. Probably the only consensus was that it is clearly a buzzword . You know that for me a basic definition is “tot està a l’altre costat” (catalan , “everything is on the network”) (the cloud simply refers to the move from local applications to services on the Internet or web). However this definition is too broad and simple since from it the cloud ends up looking like a collection of Web-based applications, a revival of the thin-client and mainframes model. Currently, the web browser plays a very important role in the provision of online services, however Kevin noticed that some companies (Adobe AIR as example) have a new products that could change this somewhat. We will see! .

Let me synthesize some of the good ideas that appeared in class and I will try to relate them with the basic topics of the course in order to advance the definition and understanding behind the word:

  • Students coming form a Grid background, consider that Cloud Computing = Grid computing if the workload is sent to an IT infrastructure that consists of dispatching masters and working slave nodes. This is transparent to the client, who only sees that that the workload has been dispatched to the cloud/grid and results are returned to it.
  • However some of you consider Cloud computing = Software-as-Service, where all the applications are somewhere in the web (this is the case of Google apps that we discussed in class).
  • Finally, the majority of you agreed with the consideration that Cloud computing = Platform-as-Service, where an external provider maintains the IT infrastructure and the client buys time or resources on this infrastructure (this is the case of Amazon EC2 that we studied in class). This point is more concordant with the view which considers the cloud to be like utility computing, a grid that charges metered rates for processing/accessing time/resources.

Finally some of you referred to the role of different actors that play a different role in this field, and this is orthogonal to what we where discussing in the previous paragraph. I found an interesting post at http://elasticvapor.com/ (which I am reproducing literally here) that in my opinion clarifies the other way to see the cloud:

  • Enablers - These are companies that enable the underlying infrastructures or the basic building blocks. These companies are typically focused on data center automation and or server virtualization (VMware/EMC,Citrix,BladeLogic, RedHat, Intel, Sun, IBM, Enomalism, etc) These enables can range from the chip level such as Intel VT, to the hypervisor such as xen or vmware to the orchestration such as 3tera or our Enomalism elastic computing platform.
  • Providers - (Amazon web services, Rackspace, Google, Microsoft). The ones with the budgets and know how to build out global computing environments costing millions or even billions of dollars. Cloud providers typically offer their infrastructure or platform. Frequently these “As a Service” offerings are billed & consumed on a utility basis.
  • Consumers - On the other side of the spectrum I see the “consumers”, companies that build or improve their web applications on top of existing clouds of computing capacity without the need to invest in data centers or any physical infrastructure. Often these two groups can be one in the same such as Amazon (SQS,SDB,etc), Google (Apps) and Salesforce (Force). But they can also be new startups that provide tools & services that sit on top of the cloud (Cloud management). Cloud consumers can be a fairly broad group including just about any application that is provided via a web based service like a webmail, blog, social network, etc. Cloud computing from the consumer point of view is becoming the only way you build, host and deploy a scalable web applications these days.

Next session we will discuss the next post: From Web 0.0 to Web 3.0: Cloud computing has been around since the beginning of the Internet?

CANS Master Course
Cloud Computing

Comments Off

Permalink

Utility-based Placement of DynamicWeb Applications with Fairness Goals

A new paper in cooperation with a research team from IBM Research at Watson Lab in NY have been accepted in one of the important conferences in the area: IEEE/IFIP Network Operations and Management Symposium (NOMS 2008). The paper is related with the energy efficient computing, and study the problem of dynamic resource allocation to clustered Web applications.

Utility-based Placement of DynamicWeb Applications with Fairness Goals
David Carrera, Malgorzata Steinder, Ian Whalley, Jordi Torres, and Eduard Ayguadé

Abstract. We study the problem of dynamic resource allocation to clustered Web applications. We extend application server middleware with the ability to automatically decide the size of application clusters and their placement on physical machines. Unlike existing solutions, which focus on maximizing resource utilization and may unfairly treat some applications, the approach introduced in this paper considers the satisfaction of each application with a particular resource allocation and attempts to at least equally satisfy all applications. We model satisfaction using utility functions, mapping CPU resource allocation to the performance of an application relative to its objective. The demonstrated online placement technique aims at equalizing the utility value across all applications while also satisfying operational constraints, preventing the over-allocation of memory, and minimizing the number of placement changes. We have implemented our technique in a leading commercial middleware product. Using this real-life testbed and a simulation we demonstrate the benefit of the utility-driven technique as compared to other state-of-the-art techniques.

Autonomic Computing
Cloud Computing

Comments Off

Permalink

Master CANS: Execution Environments for Distributed Computing (EEDC)

34330 Execution Environments for Distributed Computing (EEDC)

Spring 2008

Master in Computer Architecture, Networks and Systems (CANS)

Computer Architectura Department (AC)

Technical University of Catalonia (UPC)

ECTS credits: 6

OFFICIAL COURSE DESCRIPTION :

“Guia docente” (Spanish) : http://docencia.ac.upc.edu/master//es/course.24.html

OBJETIVOS:

El objetivo del curso es ofrecer una visión de los desarrollos y la investigación en los entornos de ejecución para sistemas paralelos y distribuidos. En el curso se hará especial énfasis en los entornos requeridos en las nuevas aplicaciones ebusiness y grid. Se estudiará la organización interna y la gestión que recursos requerida para garantizar funcionalidades y calidad de servicio.

CONTENIDOS: Distributed object technologies and protocols
Internal organization of middleware
Grid middleware
Middleware for eBusiness applications
Middleware Performance and Analysis
New Middleware Challenges: Autonomic Computing

METODOLOGÍA: Esta asignatura es avanzada y los conceptos básicos se suponen ya conocidos en asignaturas anteriores y está enfocada principalmente a introducir al estudiante en los conceptos más avanzados.

Con clases magistrales se introduciran los temas que se trabajarán sobre artículos recientes publicados en congresos o revistas especializadas y propuestas de proyectos en desarrollo. Los alumnos deberán realizar la lectura y discusión de estos artículos.
Los alumos realizaran un trabajo personal y presentaran un tema de investigación relacionado con la asignatura. El estudiante tendrá que desarrollar su capacidad crítica para evaluar la adecuación de las soluciones presentadas o, en su caso, proponer posibles ideas.

De las 150 horas de trabajo, la distribución será:

· 60 horas (4 horas semana, 15 semanas) de clase presencial. Estas 60 horas se distribuirán en clase magistrales, trabajos, casos de estudio y debate de artículos de investigación y presentación de trabajos

· 90 horas de trabajo no presencial: lectura de artículos de investigación, realización de trabajos, preparación de las presentaciones y estudio.

EVALUACIÓN:

La evaluación se basará en

(a) la participación en las discusiones,

(b) entrega de los trabajos propuestos y

(c) en un trabajo final individual o por grupos

RECURSOS:

Artículos presentados en congresos y revistas. Campus virtual o blog
Documentos de los organismos de estandarización y manuales.

INSTRUCTOR

Jordi Torres

Phone: +34 93 401 7223

Email : torres@ac.upc.edu

Office : Campus Nord, Modul C6. Room 217.

Lecture and meeting times: Monday and Friday, 3:00pm to 5:00pm.

Classroom: A6203

Office hours: By appointment.

COURSE CONTENT:

Nowadays, business and scientific organisations have a large amount of critical workflow processes that depend upon a set of heterogeneous applications. This set of applications can range from transactional applications, with databases, to non interactive applications such as those that perform scientific CPU-intensive computing, document indexing or intensive I/O. To efficiently execute all these kinds of applications, a new execution environment (or middleware) is needed to manage the available resources and simplify the development and integration of the different types of applications and services. These middleware components are crucial to new systems and architectures because of their direct impact on the quality of the service offered by the application.

This course provides a glimpse at the wide scope of this area and introduces past and current research in the area focusing on conceptual and practical aspects.

The course also has the objective of introducing the student to research. Throughout the course we will consider how to write a scientific paper and the process for getting it published.

The course will start with a brief revision of the basic concepts required to take this course. Section entitled “0. Introduction and Overview of Middleware”, emphasizes the motivation behind the research in this area. The section entitled, “1. Distributed object technologies and protocols”, will review some of the basic concepts about the core technologies: Internet protocols, web servers, technologies that make up Web Services, … . In “2. Internal organization of middleware”, we introduce the layered view of the middleware and review the main components and current models. Some of the research challenges in this topic are related to the complexity involved in tuning the components of a middleware. In this section we will pay special attention to a new research focus which was started recently with the rebirth of virtualization technology. The next section, “3. Grid middleware”, makes references to CPU intensive applications. Their typical execution platforms are hosted on supercomputing facilities that are composed of high performance machines (supercomputers). The research in middleware technologies for these kinds of environments covers areas such as job scheduling, resource management and the dynamic management of application parallelism. Transactional application management is also a challenging task and will be considered in section “4. Middleware for eBusiness applications”. The management of this execution environment is a difficult task that covers areas such as distributed resource management, quality of service issues, resource utilization, application isolation and security. The middleware layer is crucial because it can have an immediate impact on the quality of service the customer receives. In section “5. Middleware Performance and Analysis” we will have a look at the performance of middleware components. The middleware’s requirements have risen quickly as well as its complexity. To be able to manage these complex middleware systems it’s necessary to make use of self-managing techniques, also known as autonomic computing. In section “6. New Middleware challenges: Autonomic Computing “, we will introduce the recent advances in this new research area.

During the course we will discuss the amazing new wave of Internet applications and the new requirements for middleware as a result of these types of applications.

Besides the general view given with this brief revision of the basic concepts, we will study in more detail some important issues in the format of a case study.

Based on the previous concepts there will be paper readings assigned for some of the classes. This is an important part of the course and it is required that each student read these before coming to class. We will have two main reading parts in the course, one for general papers and one for more in depth research in the area. Each paper is discussed in class. Discussions will be led by one or more students and may include a brief presentation of the paper.

All students taking the course are required to complete a research project (or “state-of-the-field” review) and submit a paper describing the results of this project. The project is intended to provide the student with an opportunity to gain experience with research in a topic related to the content of the course. Moreover we will introduce the student to the research information resources.

Finally we will learn how a research conference is organized. The students will be involved in the organization committees of the Second Workshop on Execution Environments for Distributed Computing (EEDC 2008). All the research projects will be presented (and published) at this conference after a standard review process.

TENTATIVE COURSE ORGANIZATION :

feb

march

april

may

june

A. Course overview

B. Basic concepts

C. Basic papers review/pres.

D. Case study

E. Writing an article

F. Review/pres research papers

G. EEDC’08 workshop

GRADING POLICY:

· Research Readings/Presentations: 35%

· Basic papers: 15%

· Research papers: 20%

· Research Project: 35%

· Paper content and writing 25%

· Presentation of the paper 10%

· Participation: 30%

· Conference organization participation 10%

· Class Participation in paper discussions 20%

TENTATIVE SCHEDULE:

To be announced next class

Basic papers: Reading/Presentation/Class discussion

We will study a set of papers to obtain a first background in this topic. We will start with an article that reviews the research advances in middleware for distributed systems [1]. Furthermore, an introduction to web servers [2] is required to establish a base line of knowledge between all the students. In [3] we outline what’s different between grid computing and P2P, CORBA and cluster computing. In paper [4] we will analyze the different performance parameters that can influence in the performance of the Grid/eBusiness middleware. We glance at current trends with article [5], where the author examines a current commercial platform as an integrated grid platform for SOA and infrastructure virtualization environments.

It is required that each student read these 5 papers. You will also have to submit (by midnight the deadline day) a report/presentation about the readings with the main contributions of the papers. Additionally each paper will be presented at the class (with ppt or equivalent) by one group and discussed by the students (the choice will be random).

Key dates: To be announced

How to read a paper?

S.Keshav. ACM SIGCOMM Computer Communication Review 83 Volume 37, Number 3, July 2007

http://www.sigcomm.org/ccr/drupal/files/p83-keshavA.pdf

Research Paper: Reading/Presentation/Class discussion:

Key dates: To Be announced

List of papers: To Be announced

EEDC’08 Workshop

Second International Workshop on Execution Environments for Distributed Computing (EEDC 2008), june 2008.

Soon more information.

Previous edition: EEDC’07

Call For Papers:

http://autonomic.ac.upc.edu/JordiTorres/wp-content/uploads/2007/05/cfp_eedc_new.pdf

Proceedings: First Workshop on Execution Environments for Distributed Computing. 68 pages. Lulu Ed. 2007. ISBN: 978-1-84799-878-1 http://books.google.com/books?id=jes6CecbM1EC


Case study:

Soon more information.

Writing an article:

Soon more information.

____________________

Comments Off

Permalink

System Performance Evaluation (CARS-FIB): Performance Analysis Part

Dear CARS students, Welcome on board. In this course I will teach the Performance Analysis part which is divided into two sections (section 2 and 5 of the course). Following you can find the tentative scheduling:

PART 1 : EVALUATION AND MEASUREMENT TECHNIQUES

1. INTRODUCTION TO PERFORMANCE CONCEPTS

1.1. Basic Concepts of Performance Measurement

LAB 1: System utilities for performance monitoring

2. PERFORMANCE  ANALYSIS BASICS

2.1. Queuing concepts

2.2. Basic concepts of Operational Analysis

LAB 2: Quantitative performance analysis of system components

2.3. The usefulness of Operational Laws

LAB 3:  Predicting the impact of changing system components

3. PERFORMANCE MEASUREMENT TECHNIQUES

3.1. Program Instrumentation Tools

3.2. Program Characterization and Data Selection

3.3. Other Measurement Techniques: Simulation and VM

LAB 4: Building tools to evaluate application performance

LAB 5: Program characterization based on parallelism degree and memory accesses

PART 2: MODELLING AND CHARACTERIZATION

4. PERFORMANCE IN DISTRIBUTED COMPUTING SYSTEMS

4.1. Introduction to distributed computing

4.2. System-centric performance management

4.3  User-centric performance management

LAB 6: System-centric performance management

LAB 7: User-centric performance management

5. A GLANCE AT MODELLING AND PREDICTING TODAY SYSTEMS

5.1. Brief review of concepts applied to today systems

5.2. Bottlenecks and scalability by examples

LAB 8:  Optimizing resources usage while fulfilling an SLA

5.3. Advanced application of modelling: real-time performance prediction

6. BENCHMARKING AND WORKLOAD CHARACTERIZATION

6.1. Benchmarking

6.2. Workload characterization

LAB 9: HTTP server log analysis

LAB 10: Performance competition

For the documentation of this part (sections 2 and 5) I will be using this web page that allows the information to be kept more organized. Basic information for this class (including homework assignments, some materials presented in class, additional support material) and the current course outline is available from this page and it is also published as:

J. Torres. “A glance at SYSTEM PERFORMANCE ANALYSIS“. Ed. Lulu Interprises, London. ISBN 978-1-4092-2916-2. 2008.

This course material is composed of the set of slides presented in class and the laboratory task descriptions which are in the performance analysis part of the System Performance Evaluation (CARS) graduate course at the Barcelona School of Informatics (FIB).

The broad objective of these notes is to study the theory by using examples and hands-on practice in the performance analysis of a system. It will give a quick review of the mathematical techniques that can be used for performance analysis of todays systems, reviewing techniques that have been widely used. It is impossible to provide an in-depth treatment of the vast areas covered in this part of the course though, because we only have about 20 hours available.

To achieve the course objectives of this part, the student must have good class attendance and participation, conduct the tasks during the laboratory periods as well as the assigned homework. Before coming to the laboratory sessions it is required that the case study is read carefully, thought about, fully understood and that an attempt is made to mark the relevant data.

Many people have contributed to the work included in this document. First of all, part of this material is extracted from others courses. We want to especially thank professors Virgilio A. F. Almeida (Federal University of Minas Gerais, Brazil) and Xavier Mulero (Universitat Politècnica de Valencia, Spain) for their contribution at the beginning of this set of slides which comes from a course in 2005. The first version of the slides for this course mainly used the slides that professor Almeida taught in our PhD program in 2003 and the slides that professor Mulero used in a course on performance in 2004. We also used some slides obtained or inspired from several courses devoted to these topics in other universities around the world. It is impossible to enumerate all the authors, however we are also especially thankful to every one of them. Furthermore, I would like to thank my research team at UPC and BSC for their hard work on many interesting projects in this field that gave us some of the input for this material.

Finally I ask the students to keep in mind that none of the professors are native English speakers. Please be kind if you find some mistakes that may escape our revisions.

I hope you enjoy this course.

HANDOUTS: http://www.frombarcelona.org/JordiTorresHomePage/2007/08/17/cars-handouts/

Official web page: http://www.fib.upc.edu/en/infoAca/estudis/assignatures/CARS.html

____________________

Comments Off

Permalink

Second Workshop on Execution Environments for Distributed Computing

Nowadays, business and scientific organizations have a large amount of critical workflow processes that depend upon a set of heterogeneous applications. This set of applications can range from transactional applications, with databases, to non-interactive applications such as those that perform scientific CPU-intensive computing, document indexing or intensive I/O. In order to efficiently execute all these kinds of applications, a new execution environment is required to manage the available resources and simplify the development and integration of different types of applications and services. These middleware components are of crucial importance for new systems and architectures because of their impact on the quality of the service offered by the applications. The EEDC’08 workshop provides an open forum for researchers, practitioners, and users to present discussion issues, directions, and results that will shape the future of execution environments for Distributed Systems, as well as the thoughts from the students in the CANS Masters program at UPC.

This year’s special focus is on the hot topics of next generation data centres and cloud/utility computing. Today we are definitely at an inflection point in the distributed computing: Cloud computing. For this reason we centered this workshop at the “Cloud Computing”.

SOON MORE INFORMATION.

Cloud Computing

Comments Off

Permalink