Home Computer Science
Table of Contents:
The performance of a traditional computer, as proposed by Von Neumann, with a single CPU that issues sequential requests to memory, one at a time, executing scalar data has steadily progressed in steps with continuous implementation of faster hardware technologies and techniques, innovative designs in processors, and through numerous novel ideas used in system software design and its optimization. By this time,, due to the availability of more modern sophisticated hardware resources, computer architecture has progressed through mostly many evolutionary rather than revolutionary changes over the last few decades. The notable ones are pipeline architecture (instruction—level parallelism), and then the superscalar architecture (machine parallelism) using several pipelines with multiple functional units within one chip. Continuous advancements of these two approaches and their exhaustive use in gradual steps have been rigorously implemented up to their extreme limits in the on-going evolutionary process of computer architecture. The ever-increasing speed in processor and memory technologies coupled with increased capability and progressive reduction in size was then gradually approaching their physical limits. Revolution in VLSI technology, however, continued, which ultimately further made it feasible to develop multiple processor cores within a one-chip powerful microprocessor, historically known as multicore, as well as larger-capacity RAM at a reasonable cost. In addition, many other significant improvements have been also made in the areas of other resources involved in computer architecture. As a result, many different forms of advanced computer architecture, leaving the line of traditional approach have been experimented with; some of them have already been implemented, while others are still in the process of further befitting developments that summarily make abrupt changes in the conventional concept of computer architecture.
In spite of these achievements, computing demand was observed to be still remaining ahead of the available computer facilities. Moreover, some real-life applications gradually evolve by this time, which require specific system supports that were absolutely beyond the existing capabilities of the fastest contemporary machines. Real-life processes are those, such as, in disciplines including aerodynamics (Aircraft control, Ballistic missile control), seismology (Seismic data processing), meteorology (Weather forecasting), fluid-flow analysis, computer-aided design (CAD), nuclear / atomic physics problems, and many more of similar types in different disciplines. With relentless progress in more advanced VLSI technologies, the cost of the computer hardware has sharply come down. Consequently, all-out attempts were then taken by computer designers aiming to ultimately assume that a computer should consists of some number of control units, ALUs, and memory modules that can operate in parallel without substantially increasing the cost, and the results thus obtained have been eventually found to be spectacular. This idea, when implemented, resolved many of the pending issues by exploiting a new concept known as processor- level parallelism. The use of multiple CPUs to achieve very high throughput, and fault tolerance / or reliability, opens up a new horizon, popularly known as parallel organisation. Moreover, while the failure of the CPU is almost always fatal to a single-CPU computer, a parallel system can be so designed that it can continue functioning, even in the event of failure of one of its CPUs, perhaps with a reduced performance level.
Numerous questions now start to arise as to (parameters):
i. How many complete CPUs or processing elements would be present? What will be their characteristics and size?
ii. Whether the number of control units would be one or more than one?
iii. How the memory module would be constructed; what will their characteristics and size be? Above all, how they will be organised?
iv. Last but not the least, how all these resources would be interconnected?
Some designs envisaged an idea to build a machine with a relatively small number of powerful CPUs, to be interconnected with an efficiently highbandwidth for communications. Others used a large number of ALUs that were strongly connected and worked in unison, monitored by a single control unit. Another possibility that has been explored was to build a machine using many small stand-alone systems / or workstations, connected by a local area communication network. Numerous intermediate architectures have been also experimented with. Above all, each design was found to have its own strengths and drawbacks if the implementation complexity and the cost factor were taken into account, apart from the considerations of the environment with regard to their usage. Last but not the least, is that when a system is built with many processors for various computations to proceed in parallel, it is tedious to break an application down into small tasks, and then assign each one to individual processors for in-parallel execution. Scheduling of tasks in multiple processors, control of the functional line, as well as coordination over the entire execution demand the services of additional sophisticated hardware and related software techniques.
Classicfication of Computer Architectures: Flynn’s Proposal
Computer architecture, however, has progressed through many different forms by this time, and finally has been able to include multiple CPUs in its architectural design, which consists of a set of n > 1 processors (CPUs) Ру Рг/ Р3/ • • • P„, and m > 0 shared / distributed, or shared as well as distributed main memory units M,, M2,..., M,„ that are interconnected using various forms of design approaches. In addition, different types of emerging I/O processors associated with numerous categories of I/O devices are also involved in the core architecture. The net result was that a new form of computer architecture known as parallel computers eventually evolved. Using this architecture, a different type of more sophisticated application processing known as parallel processing came into use, which is usually defined in terms of both multiple processors (processing elements) and multiple processes of the instructions.
Flynn's Classification: The world of computer architecture was then found flooded with many different proposals having numerous forms of architectural designs with multiple CPUs (or processing elements) and multiple memory modules. It started from relatively very simple design approaches to the most sophisticated and versatile ones that indulged in comparatively complex design methodologies involving various amounts of basic hardware processing resources (like CPU, main memory and I/O). But whatever be the form of the architecture, the fundamental approach in any of its form is that a processor, after fetching instructions and operands from memory M (main memory or cache), executes the instructions on the related operands (data), and finally place the results in M. The instructions being executed form an instruction stream that flows from M to the processor. The data which is involved in the execution forms another stream, called data stream that flows to and from the processor, before and after the execution. Numerous schemes have been, however, tried to classify various forms of existing computer architectures, and the only scheme that is well accepted is that of Michael J. Flynn who proposed a broad classification (rather a crude approximation) based on the number of simultaneous instructions and data streams handled by the processor during the execution of the program.
Considering all the constraints and the bottlenecks that the system architecture usually encounters while executing parallel processing, let I„, and D,„ be the minimum number of active instruction and data streams that are being operated by the processor at its maximum capacity, exercising its full degree of parallelism. These two streams I,„ and D„„ to some extent, are independent, and thus four combinations of these two can exist, which classify the world of computers into four broad categories as proposed by Flynn based on the values of I,„ and D,„ associated with their CPUs.
i. Single Instruction stream Single Data stream (SISD): I,„ = D,„ = 1, The classical, sequential Von Neumann computer with a single processor capable of executing only scalar arithmetic operation using a single Instruction stream and one Data stream at a time, fall into this category. All types of machines - whether pipelined, superscalar, or any combination of these two architectures - however, also belong to this category.
ii. Single Instruction stream Multiple Data stream (SIMD): I,„ = 1, D,„ > 1, Here, a single instructionis executed at a time, and hence, requires a single control unit. Multiple data sets are concurrently executed by this single instruction, and hence, multiple ALUs are needed. This category is again divided into two subgroups: The first one is for numeric supercomputers, and the second one is for parallel-type machines that operate on vectors performing the same operation (single instruction) on each vector element (multiple data). Early parallel computers,such as the Array processors ILLIAC IV, ICL DAP (Distributed Array Processor) that have a single master control unit used to broadcast each instruction to many independent ALUs, each containing their own data for simultaneous execution.
iii. Multiple Instruction stream Single Data stream (MISD): lm > 1, Dm = 1, Multiple instructions operate on same single piece of data. A single data stream passes through a pipeline and is processed by different (micro) instruction streams in different segments of the pipeline. CRAY-1, CYBER205, etc. are such types of pipeline processing computers. Fault-tolerance computers also fall into this category where several CPUs process the same data using different programs to detect and eliminate faulty results.
iv. Multiple Instruction stream Multiple Data stream (MIMD): I„, > 1, D,„ > 1, Multiple instruction streams executed simultaneously on multiple data streams. This demands that multiple independent CPUs operating as part of a larger system execute several programs simultaneously, or different parts of a single program concurrently. All multiprocessors, including most parallel processors, belong to this category. MIMD is again subdivided into two categories: Those MIMDs that use shared-primary memory are called Multiprocessors. Those that do not use this are known as Multicomputers, Private Memory Computers, or Disjoint Memory Computers.
Figure 10.1 illustrates a framework of different categories of machines. Flynn's taxonomy ends here, projecting a somewhat subjective view point, which is essentially a behavioural one. This classification, although schematically distinguishing computers by category in the way of making a distinction between instruction (control) and data, is still considered a course model (rough estimate) as some machines nowadays are often found to be hybrids of some of these categories. In a CM-5 machine for example, we have observed this hybrid approach where a universal architecture has been realized, combining the useful features
Flynn's classification of computers, (a) SISD uniprocessor framework, (b) SIMD framework (multiple processing element), (c) MISD framework (multiple control unit), and (d) MIMD framework (with shared memory).
and advantages of both SIMD and MIMD machines. Nevertheless, this classification is useful in that it provides a framework for the design space. However, it fails to provide anything that could indicate a computer's structure.
Flynn's classification can be once again extended to further split each category into subgroups. In the following section, we further divide each category in the domain of parallel computers (SIMD and MIMD) into even more sub-classes, based on their architectural differences. This is explained in the next section and is depicted in Figure 10.2. However, of the four machine models, most parallel computers built in the past assumed the MIMD model for general-purpose applications. The SIMD and MISD models are particularly suitable for special-purpose computations. For this reason, MIMD is considered the most popular model, SIMD next, and MISD, the least popular one being applied in machines used in commercial environments.