Lecture 02: Distributed Information Systems

Copyright Notice

Some material in this lecture is adapted from the teaching materials of the book “Web Services: Concepts, Architectures and Applications” and is thus Copyright © 2003 Gustavo Alonso, ETH Zürich and/or Copyright © 2004 Springer-Verlag Berlin Heidelberg.

Distributed Information Systems

Web services are a form of distributed information system. Many of the problems that Web services try to solve, as well as the design constraints encountered along the way, can be understood by considering how distributed information systems evolved in the past.
— Introduction to Chapter 1 of our Textbook

Information System Design
- Layers and Tiers
- Bottom Up Design
- Top Down Design
Information System Architecture
- One Tier
- Two Tier (client/server)
- Three Tier (middleware)
- N-tier Architectures
- Clusters and Tier Distribution
Communication Styles
- Blocking Interactions (Synchronous)
- Non-Blocking Interactions (Asynchronous)

Overview: Information System Design

Layers and Tiers
Bottom Up Design
Top Down Design

Layers and Tiers

	A client is any user or program that wants to perform an operation on the system. Clients interact with the system through a presentation layer.
	The application logic determines what the system actually does. It takes care of enforcing the business rules and establishing the business process. The application logic can take many forms: programs, constraints, workflows, etc.
	The resource manager deals with the organization (storage, indexing, and retrieval) of the data necessary to support the application logic. This is typically a database but it can also be a text retrieval system or any other data management system providing querying capabilities and persistence.

Boxes and Arrows

	Each box represents a part of the system. Each arrow represents a connection between two parts of the system.
	Adding boxes makes the system modular: this provides opportunities for adding distribution and parallelism. It also supports encapsulation, component–based design, reuse, etc. Adding arrows, on the other hand, adds connections that need to be maintained; more coordination is necessary. The system becomes more complex to monitor and manage.
	The more boxes, the greater the number of context switches and intermediate steps to go through before one gets to data. Performance suffers considerably. System designers try to balance the flexibility of modular design with the performance demands of real applications.

There is no problem in system design that cannot be solved by adding a level of indirection. There is no performance problem that cannot be solved by removing a level of indirection.

Top Down Design

	The functionality of a system is divided among several modules. Modules are typically not stand-alone components, their functionality depends on modules located in a lower layer.
	Hardware is typically homogeneous and the system is designed to be distributed from the beginning.

The Process of Top Down Design

Bottom Up Design, Part 1

In a bottom up design, many of the basic components already exist. These are stand-alone systems which need to be integrated into a new system.
The components do not necessarily cease to work as stand-alone components. Often old applications continue running at the same time as new applications.
This approach is used widely because legacy systems exist and typically cannot be easily replaced.
Much of the work and products in this area are related to middleware, the intermediate layer used to provide a common interface, bridge heterogeneity, and cope with distribution.

Bottom Up Design, Part 2

The Process of Bottom Up Design

Overview: Information System Architecture

One Tier
Two Tier (client/server)
Three Tier (middleware)
N-tier Architectures
Clusters and Tier Distribution

One Tier: Monolithic

	The presentation layer, application logic and resource manager are built as a monolithic entity.
	Users/programs access the system through “dumb” terminals, whose display is controlled by the information system.
	This was the typical architecture of mainframes, offering several advantages: no forced context switches in the control flow everything is centralized; managing and controlling resources is easier the design can be highly optimized by blurring the separation between layers

Two Tier: Client/Server

As computers became more powerful, it was possible to move the presentation layer to the client. This has several advantages:

Clients are independent of each other: one can have several presentation layers depending on what each client needs to do.
One can take advantage of the computing power at the client machine to have more sophisticated presentation layers while also saving computer resources on the server.
It introduces the concept of API (Application Program Interface). An interface to invoke the system from the outside.
The resource manager only sees one client: the application logic. This greatly helps with performance since there are no client connections/sessions to maintain.

Two Tier: Server API

Client/server systems introduced the notion of service (the client invokes a service implemented by the server)
Client/server systems also introduced the notion of service interface (how the client can invoke a given service)
Taken together, the interfaces to all the services provided by a server define the server's API

Two Tier: Advantages/Disadvantages

Advantages
- can off-load work from server to clients
- work within a server takes place within one scope (similar to 1 tier systems)
- server design is still tightly coupled and can be optimized by ignoring presentation issues
- relatively easy to manage from a software engineering point of view
Disadvantages
- A single server can only manage a limited number of clients
- Clients are “tied” to the system since there is no standard presentation layer. If one wants to connect to two systems, then the client needs two presentation layers
  - Other Problems
    - the underlying systems don’t know about each other
    - there is no common business logic
    - the client is the point of integration (increasingly fat clients)
    - The responsibility of dealing with heterogeneity is shifted to the client
    - The client becomes responsible for knowing where things are, how to get to them, and how to ensure consistency
- There is no failure or load encapsulation. If a server fails, no clients can work. Similarly, the load created by a client will directly affect other clients since they are all competing for the same resources.

Three Tier: Middleware

In a 3 tier system, the three layers are fully separated; they are also typically distributed
Middleware introduces an additional layer of business logic encompassing all underlying systems
By doing this, a middleware system:
- simplifies the design of clients by reducing the number of interfaces it needs to know
- provides transparent access to the underlying systems
- acts as a platform for inter-system functionality and high level application logic
- takes care of locating resources, accessing them, and gathering results

Middleware systems also enable the integration of systems built using other architectures

N-Tier: Web Integration

N-tier architectures result from connecting several 3-tier systems to each other and/or by adding an additional layer to allow clients to access the system via the Web
The Web layer was initially external to the information system (a true additional layer); today, it is being incorporated into a presentation layer that resides on the server side (part of the middleware infrastructure in a three tier system, or part of the server directly in a two tier system)
The addition of the Web layer led to the notion of “application servers” which was used to refer to middleware platforms supporting Web access

N-Tier Systems in the “Real World”

Overview: Communication Styles

Blocking Interactions (Synchronous)
Non-Blocking Interactions (Asynchronous)

Blocking Interactions

traditional, information systems use blocking calls (client waits while server processes a request)
synchronous interaction requires both parties to be “on-line”
advantage: simple to understand and implement
disadvantages: connection overhead, higher probability of failures, failures hard to manage
one solution: transactions
another solution: non-blocking interactions

Non-Blocking Interactions

with non-blocking interactions, a call to the server returns immediately
client can continue to run and occasionally check with server to see if a response is ready
typically implemented via message queues
disadvantage: adds complexity to client architecture
advantages: more modular, more distribution modes (multicast, replication, message coalescing, etc.), more natural way to implement complex interactions between heterogeneous systems

Next Week

Middleware and CORBA (Chapter 2)