Home Computer Science



Arithmetic PipelineTable of Contents:
The arithmetic/logic unit (ALU), a part of the CPU, carries out all types of both fixedpoint and floatingpoint arithmetic and logical operations with finite precision by an integer unit (IU) and an inbuilt floatingpoint unit (FU) or by a separate coprocessor to speed up the floatingpoint operation. However, the advanced RISC microprocessors usually provide the hardware for all types of arithmetic operations in both fixedpoint as well as floatingpoint formats on the same processor chip. Such arithmetic units can be pipelined to maximise performance. The arithmetic units that perform scalar operations accept one pair of operands at a time and are called scalar arithmetic units. These units, when pipelined, are usually controlled by software loops. Vector arithmetic units, on the other hand, accept a set of vector operands at a time, and these units, when pipelined, are designed with pipelined hardware under direct hardware or firmware control. Vector hardware pipelines are often implemented as an additional option on an existing scalar processor unit or as a separate additional standalone processor attached to the main unit driven by a control processor. Both scalar and vector pipelined processors are found in extensive use in large mainframes and also in supercomputers. An arithmetic pipeline is commonly used for implementing complex arithmetic functions like floatingpoint addition, multiplication, and division. These operations can be implemented with some form of hardware to carry out the basic add and/or shift operations. A pipelined multiply unit is essentially an array multiplier with special adders designed to generate the partial products in a way to reduce the carry propagation time. Floatingpoint operations can be decomposed into consecutive subfunctions and corresponding suboperations. Adder Pipeline DesignThe floatingpoint adder pipeline is constructed with two normalised floatingpoint binary numbers as input, which are:
Where and M_{2} represent the mantissas and E, and E_{2} are the exponents. The floatingpoint addition and subtraction can be performed by the pipeline consisting of mainly four stages (segments), with one suboperation for each stage, namely, comparison of the exponents, mantissa alignment accordingly, mantissa addition/subtraction and result normalisation. Latches are, however, placed between these stages in order to store the intermediate results. A brief description of the function being executed by each such stage is provided with an appropriate figure in the website: http://routledge.com/9780367255732. Multiplicatio Pipeline DesignTo speed up the multiplication operation, the pipelined architecture requires the use of a carry propagation adder/carry lookahead adder (CPA/CLA) which adds partial products to generate the result (see Chapter 7). The pipelined architecture can alternatively use a carry save adder (CSA) to add two or more nbit numbers such as X, Y, Z, etc., expressed as X = (x„ _ j, x„ _ _{2},..., x_{b} x_{0}), and as usual produce one bitwise sum output number denoted as
where S, = x, © у, Ф z, and a carry output C = (C,„ C„ _ _{u}......, C„ 0). The leading bit of the bitwise S' is always a 0, and the tail bit of the carry vector C is always a 0. The CSA performs bitwise operations (Ф) simultaneously on all columns of digits to produce the output numbers S'; the carries are not allowed to propagate; instead they are saved in a carry vector C. The result is finally obtained by adding the output number (S') and the carry vector (C) using a CPA/CLA. Carrysave multiplication is well suited to pipelined implementation. The CPA and CSAs can be used together to implement the pipeline stages of a fixedpoint multiplication unit. Figure 8.19 depicts a pipelined architecture for multiplying two unsigned 4bit numbers using CSA and CPA/CLA. The first stage generates the partial products Pi/ Рг/ Р3/ and P_{4}. The righthand side of Figure 8.19 shows how P, is generated, and the other partial products can be generated in the same way. The second and the third stages add the partial products through a Wallace tree of CSAs, and the final stage is a CPA which adds up the last two numbers to produce the final product P. For an 8bit number, four such stages of CSA are required. Each level of the CSA can be realised with a twogatelevel logic. The synchronisation of CSA and CPA in operation is one of the dominant factors in determining the number of pipeline stages as well as the clock period to be used. Motorola 68040, a member of 68000 series of onechip 32bit microprocessors introduced sometime in the 1990s, implements the carrysave multiplication method as discussed above. 
<<  CONTENTS  >> 

Related topics 