Desktop version

Home arrow Computer Science

  • Increase font
  • Decrease font

<<   CONTENTS   >>

Multicomputer Architectures

The first group of MIMD is a multiprocessor with all its variants, and the second group of MIMD is the evolution of machines in which physical memory is distributed among the processors to support large processor counts. For this reason, this form of MIMDs of those days have also been called no-remote-memory-access (NORMA) machines. Typically, memory as well as I/O is distributed among the processors to reduce memory-access latency to yield a cost-effective higher bandwidth. Therefore, each individual processor-memory- I/O module thus here forms a unit, called a node, which is essentially a separate standalone autonomous computer. A machine (arrangement), when built with these nodes (multiple computers) together, is thus rightly called a multicomputer, or a loosely-coupled distributed computing system. The nodes in this arrangement (system) have network interfaces to connect with one another through an interconnection network. They communicate with one another, and also with the non-local (local to some other node) memories by explicitly passing messages across this interconnection network. As a result, the inter-node message delay is relatively large, and the data rate is comparatively low. Peripherals can also be attached using some other form of sharing. The basic structure of a distributed-memory multicomputer is shown in Figure 10.21.

Multicomputers when built with such interconnected self-sufficient nodes offer a very pleasant environment with a cost-effective high throughput for applications in which multiple independent jobs are simultaneously executed requiring little or almost no communication with one another. This type of multicomputers, however, provide a coarse-grained parallelism that proposes a system-level parallelism. An extension of this approach can even be to have each node to further contain a small number of processors (2-8) instead of a single processor. These processors may be interconnected locally by a small bus or by a suitable different interconnection technology, which is often less scalable than the global interconnection networks being used. While this arrangement of multiple processors in a node together with a private memory and a local network interface may be a more useful approach from a cost-efficiency point of view, it is not fundamental to how these machines work.

Many variations of this basic scheme are also possible, including private cache memory, which is offered to the processor on each individual computer (node) to negotiate two primary points of contention, such as on the distributed memory, and on the shared communication network itself. However, the major architectural differences between the distributed-memory multiprocessor (such as NUMA model), and the various types of multicomputers (distributed memory) are essentially how global communication is made, and also the logical architecture of the distributed memory itself. Usually, high-bandwidth communication networks are used in multicomputers to extract a high degree of resource

FIGURE 10.21

Basic architecture of a distributed-memory multicomputer.

sharing, of course, within the confines of its cost, complexity, inter-system communications, and similar other aspects. Summarily, the parallelism as obtained in a multicomputer is certainly more scalable (scalability) than that in a distributed shared memory multiprocessor (NUMA).

A brief detail of this topic is described on the website: http://

Design Considerations

Many different issues are entwined with multicomputer design. While solving these issues, different approaches may be taken,which again give rise to different categories of multicomputer, when a particular category is found most suitable and viable to handle a particular application environment. However, some of the most important fundamental issues are:

A Message passing mechanism is required for processor-processor communication in multicomputer organisation that may be again of two forms; synchronous mode, which can be thought of as a remote procedure call (RPC), and the other form is asynchronous mode. However, the message passing mechanism used in different machines are also of fairly diverse types. The RPC is a software system that actually encapsulates the entire message-passing mechanism as well as the details of sending and receiving messages. Another way for message passing is to use hardware routers as are found in many modern multicomputers. A series of routers and channels are involved here in the message passing between any two participating nodes. Nodes of different types can also be mixed in this arrangement, which gives rise to a different category of multicomputers known as heterogeneous multicomputers. The internode communications in a heterogeneous multicomputer are realized in a different way by introducing compatible data representations and message-passing protocols.

  • The types of network and the nature of interconnections and communications are to be used between the nodes of a multicomputer. Various types of static network (already discussed) having different topologies, namely Ring, Tree, Mesh, Torus, Hypercube (nCube) etc. can be used to construct different types of multicomputers. Diverse patterns of communication such as point-to-point, broadcasting, multicasting, various permutations, etc. can be exploited in these multicomputers to satisfy certain performance targets. Other similar existing issues are also to be settled, and different approaches may however, be then taken to solve these issues that in turn, give rise to different categories of multicomputer to handle different types of application environments.
  • • Last but not the least is targeted over an issue, as how to build the fastest multicomputers (multiprocessors); whether using many small processors or a smaller number of faster processors to attain at least an expected level of parallel processing. After a prolonged debate, the road to reach the target is, however, assumed mainly relying on expected advancement in our ability to program parallel machines, and at the same time anticipating continued progress in microprocessor (both RISC and CISC) performance, and also looking forward to appreciable advancements in the design of parallel architecture. The goal with regard to processor performance was set for a computer capable of sustaining at least a teraFLOPS (one million MFLOPS), and that was expected to be availed by 1995. It was observed that a machine in 1994, however, already achieved 140,000 MFLOPS (0.14 TFLOPS) using a 1904-node Paragon machine containing 3808 small processors.

More details of this topic are described on the website:

Multicomputer Generations

Multicomputers have gone through a series of innovative developmental stages that can be narrated in terms of its several generations, targeting ultimately towards massive parallel processing (MPP). The first generation (1983-1987) was built up with multiple microprocessors using popular static interconnection networks (hypercube-based) and software- controlled message passing mechanisms, thereby offered an attractive cost / performance that eventually put a strong challenge against the most popular versatile mainframe machines of those days from multinational giants, like IBM, NCR, HP, DEC, and others.

The second generation (1988-1992) of multicomputers used faster microprocessors including RISC processors with advanced interconnection network, mesh topology, and hardware- supported routing schemes (wormhole routing) replacing the slower software-controlled message-passing technique. As a result, the communication latency came down to less than 5 ps from 6000 ps for global communication and that for local communication to 5 ps from 2000 ps, and the speed up was around ten times on average than its predecessors as a whole. This generation, however, offered medium-grain parallelism implemented at the processing level of tasks, procedures, and subroutines to balance this computational granularity with the existing communication latency for the sake of realizing proper synchronization.

The third generation (1993-1997) is a fine-grained multicomputer with a greater number of more powerful processors, faster routing channels, and moderate size of private memory to each processor, targeting ultimately towards massive parallelism by exploring data parallelism as well as instruction-level parallelism. Adequate supports from the language as well as the run-time software environment created by OS have been provided along with the use of optimizing compiler, which can automatically detect the parallelism and translate source code to an acceptable parallel form to be recognized by the run-time system during execution. Here, the trade-off is now simply between the level of parallelism required and the involvement of allied cost to realize that.

Today's multicomputers, while offering massive parallelism (MPP machines), have been further enhanced by way of combining the private virtual address spaces already distributed over the nodes into a globally shared virtual memory. To extract even more fine-grained parallelism with minimum possible latency in processor-memory handshaking, various levels of caches have been included, thereby replacing page-oriented message passing mostly by cache block-level mechanisms.

A brief detail of this topic is described on the website: http://

Different Models of Multicomputer Systems

Many variations of the basic scheme to construct multicomputers are used to build various models of this system. These models, however, can be broadly classified into many different categories, of which some common ones are:

  • 1. Minicomputer model (Peer-to-peer): This organisation usually consists of a few stand-alone minicomputers (may even be large supercomputers as well) interconnected by a suitable communication network, each may have several interactive terminals to accommodate multiple users and allow them to individually work concurrently on many unrelated problems, accessing remote data and resources, if required, that are collectively stored and managed on some other machines. The need to place individual objects and retrieve them, and to maintain replicas amongst many computers in order to distribute the load and to provide resilience in the event of individual machine faults or communication link failure, summarily renders this architecture relatively more complex than its counterparts, other popular forms of relatively simple architecture.
  • 2. Workstation model: A workstation is simply a stand-alone autonomous small computer (usually PC) equipped with its own disk (diskful workstation) and other peripherals, providing a highly user-friendly interface to a single-user. Several such workstations scattered over a wide area when interconnected by a communication network, give rise to multicomputers, normally called Workstation model. Here, each machine, apart from its own use, might provide services to the others on request, and processes that run on separate machines can exchange data with one another through the network.
  • 3. Workstation-server model: This model, also commonly known as the client- server model, comprises of mostly diskless (but a few may be diskful) workstations and a few minicomputers essentially used as servers. With the availability of relatively low-cost high-speed networks, diskless workstations have been found to be more convenient in the network environment than their counterpart diskful workstations, since it is easier to maintain and enhance both hardware and system software on only a few large disks at the minicomputers' (servers') end, than for many small disks attached with many diskful workstations geographically scattered over a large area as was the case with the ordinary workstation model already discussed. Here, normal computation activities required by the user’s processes are performed at the user’s home workstation. But, a client (a user located on any workstation) process can issue a request to a server process (which is a minicomputer in this case, or a specialized machine) for getting some service, and the server complies with the request by executing it, and finally sends back a corresponding reply to the client process after processing the request. Therefore, this model does not require any migration of the user’s processes to the target server machine for getting the work executed by those machines. Several variations can, however, be derived on the above models from the consideration of the following factors:
    • • deployment of multiple servers and caches to increase performance and resilience;
    • • use of low-cost computers with limited hardware resources to fulfill users' needs that are equally simple to manage;
    • • the use of mobile code and mobile agents;
    • • to add or remove mobile devices in a convenient manner, as and when is required.

The client-server model, however, can be realized in a variety of hardware and software environments with typical characteristics, distinct from other types of distributed computing systems, and thus have become increasingly important, mainly for providing an effective general-purpose means for the sharing of information and resources. Moreover, there is as such no clear difference between a client and a server process, and that both the client and server processes sometimes can even run on the same computer. Some processes are often found both client and server processes. That is, a server process may sometimes use the services of another server (as in the case of three-tier architecture), thereby appearing as a client to the latter.

4. Processor-Pool model: The workstation-server model nicely fits with the most common environment in which many almost-evenly distributed users having their own relatively small work load requiring mostly uniform computing power and shared resources can be simultaneously accommodated by way of allocating a processor to each such user. But, there are certain applications that have been observed to place demands once in a while for a massively-large amount of computing power over a relatively short duration of time. Processor-pool model is the right choice in such situation that befits to handle an environment in which all the processors are then clubbed together to be shared by the users as and when needed. The pool of processors can, however, be built with a large number of microcomputers and minicomputers (small mainframes) interconnected with each other over a communication network. Each such processor in the pool is, however, equipped with sufficient amount of memory of its own, and is capable of independently executing any program of the distributed computing system.

Usually, in the processor-pool model, no terminal is directly attached to any processor; rather, all the terminals are attached to the network via an interface, a special device (a terminal controller, may also be in-built within the communication network). As a result, the user can access the entire system, and never logs onto a particular machine as a home machine, from any of the terminals which are usually small diskless-workstation, or may be a graphic terminal, such as X- terminals. In fact, when a job is submitted, it will be received by a special server (commonly called run server, a part of the related operating system) that manages, schedules, and allocates one or an appropriate number of processors from the pool to different users depending on the prevailing environment or on a demand basis. In some situations, more than one processor can be allocated to a single job that can run in parallel, if the nature of the job supports it (i.e. if the job can be decomposed into many mutually exclusive segments, one processor can then be assigned to each such segment, and all these segments can then eventually run in parallel on different processors). As usual, the processors are deallocated from the job when it is completed, and returned to the pool for other users to utilize them.

Processor-pool model, by virtue, provides much better utilization of total available processing power of a distributed computing system than any other model that supports the home-machine concept, in general. This is due to the fact that the entire processing power of this system in this model is available for any logged-on users, if needed, whereas this does not hold good for any other model including the most important one, the workstation-server model in which several workstations at times may be lying idle, but still they cannot be utilized for other jobs running at that time. Despite having many other advantages, one of the major shortcomings of this model is the presence of a relatively-slow speed interconnection network that communicates between the processors where the jobs are to be executed, and the terminals via which the users interact with the system. That is why this model is usually considered to be not suitable for the general environment in which typically high-performance interactive applications run. A distributed computing system based on processor-pool model has been implemented in one of the reputed system, Amoeba [Mullender, et al 1990].

Multicomputers are best suited primarily to cater general-purpose multi-user applications in which many users are allowed to work together on many unrelated problems, but occasionally in a cooperative manner that involves sharing of resources. Such machines usually yield cost-effective higher bandwidths, since most of the accesses made by each processor in individual machine are often to its local memory, thereby resulting in reduced latency that eventually yields increased system performance. The nodes in the machine are equipped with required interfaces so that they can be always connected with one another through a communication network.

In contrast to the tightly-coupled multiprocessor system, the individual computers forming the multicomputer system can be located far from one another, thereby covering a wider geographical area. Moreover, in tightly-coupled systems, the numbers of processors that can be effectively and efficiently employed is usually limited, and are constrained by the bandwidth of the shared memory, resulting in restricted scalability. Multicomputer systems, on the other hand, having loosely-coupled architecture are more freely expandable in this regard, and theoretically can contain any number of interconnected computers with no limits as such. On the whole, multiprocessors tend to be more tightly-coupled than multicomputers, because they can exchange data almost at memory speeds, but some fiber-optic based multicomputers of today have also been found to work closely at memory speeds.

A more detail of each of these models in this topic with their respective figures is described in the web site.

Multitiered Architecture: Three–tier Client–Server Architecture

The commonly-used conventional client-server architecture comprises of two-level or tiers: a client tier and a server tier. The concept of a three-tier architecture is also increasingly popular and common. In this architecture, the application software is usually distributed among three types of machines, namely, a user machine, a middle-tier server, and a backend server. The user machine in this model is typically a thin client (a PC or workstation). The middle-tier machines are separate servers that contain programs which form part of the processing level, and are supposed to be essentially gateways between the thin user clients and a variety of backend servers. Here, although this middle-tier machine is a server to the thin client, but sometimes it needs to act as a client to the backend servers. The middle- tier machines can also be employed to convert protocols and can map from one type of database query to another. In addition, the middle-tier machine can even integrate/merge results from different data sources. In effect, the middle-tier machines works as an interface between the desktop applications and the background legacy applications (or evolving corporate-wide applications) by mediating between these two different worlds. The number of tiers, can be even extended beyond three depending on design principles in relation to what type of distributions are to be provided to service the execution requirements. In effect, multitier architecture is used to process an application by dividing it into a user-interface, processing components, and data level in a distributed manner, equivalent to organizing it as a client-server model. This type of distribution is often referred to as vertical distribution.

A typical usage of a three-tier architecture is observed in transaction processing in which a separate process, called the transaction monitor located on middle-tier server coordinates all transactions across possibly different data servers. Here, the thin client (workstation) is, however, equipped with a local disk of its own that contains part of the data, and runs most of the applications. But all operations on files or database entries go to the respective servers. Conclusively, it can, however, be inferred that the interaction between the client and the middle-tier server as well as that between the middle-tier server and the backend server also follow the client-server model in which the middle-tier server acts both as a server to the client, and also as a client to the backend server. Thus, the middle-tier system acts at times as a client or as a server. Many examples of a distributed computing system based on the workstation-server model can be cited; an earlier one perhaps is the V-System [Cheriton, 1988].

<<   CONTENTS   >>

Related topics