Morgan Kaufmann, 1996. — 873.
Distributed algorithms are algorithms designed to run on hardware consisting of many interconnected processors. Pieces of a distributed algorithm run concurrently and independently, each with only a limited amount of information. The algorithms are supposed to work correctly, even if the individual processors and communication channels operate at different speeds and even if some of the components fail.
Distributed algorithms arise in a wide range of applications, including telecommunications, distributed information processing, scientific computing, and real-time process control. For example, today's telephone systems, airline reservation systems, banking systems, global information systems, weather prediction systems, and aircraft and nuclear power plant control systems all depend critically on distributed algorithms. Obviously, it is important that the algorithms run correctly and efficiently. However, because the settings in which they run are so complicated, the design of such algorithms can be an extremely difficult task.
This book contains a comprehensive introduction to the field of distributed algorithms—a collection of the most significant algorithms and impossibility results, all presented in a simple automata-theoretic setting. Mathematical proofs are given (or at least sketched) for virtually all of the results. Algorithms are analyzed according to precisely defined complexity measures. Altogether, this material provides an excellent foundation for a deep understanding of distributed algorithms.
This book has been written with several audiences in mind. First, it is organized as a textbook for a first-year graduate computer science course, especially for students interested in computer systems, theory, or both. It can also be used as a text for a short course for designers of distributed systems. Finally, it is intended as a reference manual for designers, students, researchers, and anyone else interested in the field.
The book contains algorithms for many typical problems, including problems of consensus, communication, resource allocation, and synchronization, in several different system settings. The algorithms and results are organized according to basic assumptions about the distributed setting. The first level of organization is according to the timing model—synchronous, asynchronous, or partially synchronous—and the second level is according to the interprocess communication mechanism—shared memory or message passing. Several chapters are devoted to each type of system model; the first chapter in each group presents a formal model for that type of system, while the rest of the chapters contain the algorithms and impossibility results. Throughout, the presentation is rigorous, yet it is firmly grounded in intuition.
Because this field is so large and active, this book does not attempt to cover everything. The results that are included have been selected because they are the most fundamental. These are not always the optimal results, in terms of the complexity measures; they are generally those that are simple and that illustrate important general methods of design or reasoning.
This book will make you familiar with many of the most important problems, algorithms, and impossibility results in the area of distributed computing. You will be able to recognize the problems when they arise in practical settings, apply algorithms like the ones contained here to solve them, and invoke the impossibility results to argue that the problems are not solvable. The book will also give you a good feeling for the various system models and their capabilities, so that you can design new algorithms yourself (or even prove new impossibility results). Finally, this book should convince you that it is feasible to reason carefully about distributed algorithms and systems: to model them formally, give precise specifications for their required behavior, prove rigorously that they satisfy their specifications, identify appropriate complexity measures, and analyze them according to these measures.
Part I Synchronous Network AlgorithmsModelling I: Synchronous Network Model
Leader Election in a Synchronous Ring
Algorithms in General Synchronous Networks
Distributed Consensus with Link Failures
Distributed Consensus with Process Failures
More Consensus Problems
Part II Asynchronous AlgorithmsModelling II: Asynchronous System Model
Part IIA Asynchronous Shared Memory AlgorithmsModelling III: Asynchronous Shared Memory Model
Mutual Exclusion
Resource Allocation
Consensus
Atomic Objects
Part IIB Asynchronous Network AlgorithmsModelling IV: Asynchronous Network Model
Basic Asynchronous Network Algorithms
Synchronizers
Shared Memory versus Networks
Logical Time
Global Snapshots and Stable Properties
Network Resource Allocation
Asynchronous Networks with Process Failures
Data Link Protocols
Part III Partially Synchronous AlgorithmsPartially Synchronous System Models
Mutual Exclusion with Partial Synchrony
Consensus with Partial Synchrony