 Home Computer Science  # Arithmetic Pipeline

The arithmetic/logic unit (ALU), a part of the CPU, carries out all types of both fixed-point and floating-point arithmetic and logical operations with finite precision by an integer unit (IU) and an in-built floating-point unit (FU) or by a separate coprocessor to speed up the floating-point operation. However, the advanced RISC microprocessors usually provide the hardware for all types of arithmetic operations in both fixed-point as well as floating-point formats on the same processor chip. Such arithmetic units can be pipelined to maximise performance. The arithmetic units that perform scalar operations accept one pair of operands at a time and are called scalar arithmetic units. These units, when pipelined, are usually controlled by software loops. Vector arithmetic units, on the other hand, accept a set of vector operands at a time, and these units, when pipelined, are designed with pipelined hardware under direct hardware or firmware control. Vector hardware pipelines are often implemented as an additional option on an existing scalar processor unit or as a separate additional stand-alone processor attached to the main unit driven by a control processor. Both scalar and vector pipelined processors are found in extensive use in large mainframes and also in supercomputers.

An arithmetic pipeline is commonly used for implementing complex arithmetic functions like floating-point addition, multiplication, and division. These operations can be implemented with some form of hardware to carry out the basic add and/or shift operations. A pipelined multiply unit is essentially an array multiplier with special adders designed to generate the partial products in a way to reduce the carry propagation time. Floating-point operations can be decomposed into consecutive subfunctions and corresponding suboperations.

## Adder Pipeline Design

The floating-point adder pipeline is constructed with two normalised floating-point binary numbers as input, which are: Where and M2 represent the mantissas and E, and E2 are the exponents. The floatingpoint addition and subtraction can be performed by the pipeline consisting of mainly four stages (segments), with one suboperation for each stage, namely, comparison of the exponents, mantissa alignment accordingly, mantissa addition/subtraction and result normalisation. Latches are, however, placed between these stages in order to store the intermediate results.

A brief description of the function being executed by each such stage is provided with an appropriate figure in the website: http://routledge.com/9780367255732.

## Multiplicatio Pipeline Design

To speed up the multiplication operation, the pipelined architecture requires the use of a carry propagation adder/carry lookahead adder (CPA/CLA) which adds partial products to generate the result (see Chapter 7). The pipelined architecture can alternatively use a

carry save adder (CSA) to add two or more n-bit numbers such as X, Y, Z, etc., expressed as X = (x„ _ j, x„ _ 2,..., xb x0), and as usual produce one bitwise sum output number denoted as where S, = x, © у, Ф z, and a carry output C = (C,„ C„ _ u......, C„ 0).

The leading bit of the bitwise S' is always a 0, and the tail bit of the carry vector C is always a 0. The CSA performs bitwise operations (Ф) simultaneously on all columns of digits to produce the output numbers S'; the carries are not allowed to propagate; instead they are saved in a carry vector C. The result is finally obtained by adding the output number (S') and the carry vector (C) using a CPA/CLA.

Carry-save multiplication is well suited to pipelined implementation. The CPA and CSAs can be used together to implement the pipeline stages of a fixed-point multiplication unit. Figure 8.19 depicts a pipelined architecture for multiplying two unsigned 4-bit numbers using CSA and CPA/CLA. The first stage generates the partial products Pi/ Рг/ Р3/ and P4. The right-hand side of Figure 8.19 shows how P, is generated, and the other partial products can be generated in the same way. The second and the third stages add the partial products through a Wallace tree of CSAs, and the final stage is a CPA which adds up the last two numbers to produce the final product P.

For an 8-bit number, four such stages of CSA are required. Each level of the CSA can be realised with a two-gate-level logic. The synchronisation of CSA and CPA in operation is one of the dominant factors in determining the number of pipeline stages as well as the clock period to be used. Motorola 68040, a member of 68000 series of one-chip 32-bit microprocessors introduced sometime in the 1990s, implements the carry-save multiplication method as discussed above.

 Related topics