Home Computer Science



FixedPoint ArithmeticTable of Contents:
Four basic arithmetic operations for fixedpoint as well as for floatingpoint numbers are addition, subtraction, multiplication, and division. The arithmetic algorithms and the related logic circuits needed to implement these arithmetic operations are the main focus of this section and onwards. It has been shown that two nbit signed numbers expressed in 2's complement representation can be added/subtracted using nbit binary addition/subtraction, FIGURE 7.4 Representative block diagram of ALU with inputs and outputs. treating the sign bit the same as the other bits. In other words, a logic circuit that is designed to add/subtract unsigned binary numbers can also be used equally well to add/subtract signed numbers in 2's complement form. The time needed by an ALU to perform an arithmetic operation often influences the performance of the processor. Addition/subtraction in this regard takes relatively lesser time than multiplication and division which require more complex circuitry than either addition or subtraction operation. That is why, some modern techniques are used in today's advanced computers that can perform all arithmetic operations at a comparatively higher speed. Addition and SubtractionAddition and subtraction for fixedpoint numbers are the most fundamental operations that are found in the instruction set of every computer since inception. The elementary scheme to accomplish addition and subtraction is shown in the form of a basic circuit in Figure 7.5, which consists of a binary adder as central element that executes addition/ subtraction operation with two unsigned integers sent from two registers X and Y, and then produces the result with an overflow indication. The result after computation may be stored in one of these registers or in a third register (depending on the design). The overflow, if happened, is usually stored in a 1bit flag (for 0 = no overflow, 1 = overflow). For subtraction, the subtrahend, i.e. here the content of register Y, is passed through a 2's complementer so that the 2's complement of its content can be presented to the adder. Appropriate signals from the control unit are here required to ensure whether the contents of register Y will need to go through the complementer or not, which again depends on whether the operation is subtraction or addition, decoded by the control unit in advance. To implement all these, the required hardware can, however, be designed in many different ways that involve various tradeoffs between operating speed and related hardware cost. FIGURE 7.5 Schematic block diagram with hardware components for addition and subtraction. Basic AddersIn general, the addition of two nbit numbers X and Y is performed by subdividing the numbers into stages X, and Y, of length n, where n > «,> 1. X, and Y, are added separately, and the resulting partial sums are then combined to form the overall sum. The formation of this sum, however, involves assimilation of carry bits generated by the partial additions. However, a 1bit adder (bitbybit addition), also called a full adder, can be directly implemented in various ways that can be modified even to work as a full subtracter, since subtraction of a number (Y) from another number (X) can be defined as:
Thus, the subtraction can also be done using a full adder. Parallel adder: Circuits that can add all bits of two nbit numbers at a time, together with an external carryin signal c_{in}, in one clock cycle are called parallel adders or simply «bit adders. The simplest such adder can be formed by connecting n numbers of 1bit full adder giving rise to a cascade configuration of nbit full adder. As each 1bit adder stage may supply a carrybit to the stage on its left, all such carry signals or ripples must then propagate through this (cascade) adder from right to left, giving the name of this configuration an nbit ripplecarry adder or carrypropagation adder (CPA). In the worst case, a carry signal can ripple through all « stages of the adder. The input carry signal c_{in} is normally set to 0 for addition. The maximum propagation delay of an «bit ripplecarry adder is nd (d is the delay of a fulladder stage), which essentially determines its operating speed in synchronous circuit design. This adder is, however, relatively expensive in terms of hardware cost than a serial adder, and the cost simply increases linearly with «, the word size of the numbers to be added. SubtractersThe «bit adders can operate addition correctly on both unsigned and positive numbers such as X and Y, since the 0 sign bit of a positive number has the same effect as a leading 0 in an unsigned number. To add negative numbers (they have las the sign bit in 2's complement representation), like adding X to Y, is equivalent to subtracting X from Y. Similarly, subtracting X from Y is equivalent to adding X to Y. So the ability to add negative numbers implies the ability to perform subtraction. A brief detail of different types of adders and subtracters with their respective figures is given in the website: http://routledge.com/9780367255732. HighSpeed AdderClassical adders suffer from several major drawbacks: one of these is that they cause too much delay in developing their outputs, which are further increased while implementing overflow detection. To minimize all these inherent shortcomings, one approach is to use an enhanced logicgate network structure (larger than the usual) that ultimately must speed up the generation of carry signals. In other words, this approach must provide some means to compute the input carry beforehand, which can be arrived at any stage i directly, similar to carrylike signals obtained from all the preceding stages i  1, i  2,..., 0, rather than waiting for usual production of actual carries that ripple slowly from stage to stage. Adders that use this principle are called carrylookahead adders. CarryLookahead Adder (CLA)The basic principle lying behind an «bit CLA is that it is formed from n stages, each of which is basically a full adder, but modified by replacing its carry output line c, at stage i by two auxiliary signals called g, (generate) and p, (propagate), respectively. Let z, be the sum, and c,be the carryout of two 1bit numbers x,and y,at stage i. Then, by suitable mathematical calculation, it can be shown that the generate function g, and the propagate function p, at stage i can be defined only in terms of x, and y, at the stage i without depending on the value of the carry of its just previous stage c, _Moreover, all g_{i} and p, functions for all i's in an яbit adder can be formed independently and in parallel in one logicgate delay after the X and Y vectors are applied to the inputs of the adder. Similarly, the output function z, at the stage i can also be defined in terms of x, and y, at that stage i. Although a substantial improvement is thus achieved by this design in reducing critical gate delays, the number of gates, however, grows almost in proportion to n^{2} as n increases. In contrast, the number of gates in a twolevel adder of the sumofproducts type grows exponentially with n, while the number of gates in a ripplecarry adder grows linearly with n. Moreover, the complexity of the carrygeneration logic embedded in the CLA, including its gate count, its maximum fanin, and its maximum fanout, increases steadily with the increase in n. Moreover, it is found that the last AND gate and the OR gate require a fanin of i + 2 in generating c,. For c_{3} in the 4bit adder, a fanin of 5 is thus required. This is not only almost about the limit of practical gates, but consideration of cost in practice is also a matter of concern that often limits n in a single CLA module not to exceed four or so. More details of CLA with mathematical calculations and corresponding figures are given in the website: http://routledge.com/9780367255732. Adder ExpansionCLA, no doubt, is relatively enriched with several distinct advantages compared to all of its contemporary counterparts. But the inherent shortcomings as already explained encompassed with its embedded carrylookahead logic often limit n in the addition/subtraction operation of иbit integers to keep и bound within four or so. That is why, this adder design cannot be directly extended to longer operand (increased value of «) sizes. One possible solution in this regard may be to improvise a method by combining ripplecarry propagation and carrylookahead approaches together to let carry signals be handled in a reasonably efficient way. This approach, however, can be easily exploited to design larger adders of this kind needed to execute add instructions with longer operand sizes, say up to 64 bits. If we replace « number of 1bit (full) adder stages in the «bit ripple carry design with и number of Lbit CLAs, we obtain an nkbit adder. For example, four 4bit adders, each one of a 4bit carrylookahead circuit, can be then connected in ripple carry design to realize a relatively larger 16bit adder. Similar approach can be employed with eight 4bit CLAs to be connected to form a 32bit adder. The detail of this approach with related figure is given in the website: http://routledge. com/9780367255732. CarrySave Adder (CSA)The inherent drawbacks of CLA, or of any derived adder based on this principle, often limit и in the addition/subtraction operation of иbit integer that keeps « bound within a small range. In practice, arithmetic operation often requires longer operand (increased value of n) and, at the same time, the addition of several operands at a time to realize the final result (as happens with multiplication operation that requires the addition of several partial products as summands to arrive at the final result). Moreover, simultaneous addition of multiple operands is almost a regular feature in any computation, apart from the event of parallel processing. Hence, more powerful fast adders are required for negotiating such situation that can add many operands at a time instead of only two. A technique thus devised by which it is possible to accomplish highspeed multioperand addition is known as carrysave addition, and the respective adder is called carrysave adder (CSA) that eventually speeds up the addition process. To illustrate the strength and effectiveness of such an adder, consider the following example with decimal integers as shown in Figure 7.6. In this example, four decimal numbers are taken for addition. Firstly, all the digits in the unit place (first column) are added that produce a sum of 9 and a carry digit of 1 written in the second line shown by arrow (<—) shifted left by one position. Similarly, all the digits in the tens place are added that produce a sum of 0 and a carry digit of 2 which is written in a similar way like the addition of digits in the unit place. Likewise, all the digits in the hundreds place are added that produce a sum of 8 and a carry digit of 1 which is written in a similar way as mentioned above. These summations (columnwise) can be then executed in parallel to produce a sum vector of 809 and a carry vector of 121, since no propagation of carry is required from the addition of digits in units place to the addition process of digits in tens place, and similarly from the tens place to the hundreds place, and so on. When the addition of all digits present in the operands is over, thereby producing a sum vector and a carry vector, the sum vector and the shifted carry vector are then simply added in the conventional way using either CPA or CLA that produces the final result. Assume that the CSA here takes three binary numbers I, J, and K, and produces two outputs, namely the sum vector S and the carry vector C. The sum vector S and the carry vector C, however, can be obtained using the following expressions:
and
Here, all logical operations are performed bitwise. FIGURE 7.6 An illustration of carrysave addition technique with decimal numbers. FIGURE 7.7 A representative scheme of a twostage CSA. The final arithmetic sum of three inputs, i.e. Z = I + J + K, is obtained by adding the two generated outputs S and C, i.e. Z = I + J + K = C + S, using a CPA or CL A. In general, an иbit CSA consists of n disjoint full adders. As shown in Figure 7.7, its input is three nbit numbers to be added, while the output consists of the и sum bits forming a word S, and the и carry bits forming a word C. Unlike the adders discussed so far, there is no carry propagation within the individual adders. The outputs S and C can be fed into another иbit CSA where, as shown in Figure 7.7, they can be added to a fourth иbit number L. It is to be noted that the carry connections are shifted to the left to correspond to normal carry propagation. In this way, m numbers can be added using a treelike network of CSAs to produce a result of the form (S, C). However, to obtain the final sum, S and C must be added by a conventional adder (CPA or CLA) with usual carry propagation. Carrysave addition mechanism appears to be highly conducive and well suited to pipelined implementation. Multistage CSA circuit can now be built up in the form of a Wallace tree (see Chapter 8) to handle these groups (all groups in one stage) as just explained, and can be then implemented as a sequence of substages within the multiplier circuit built in the execute stage in a pipelined architecture. The type of operands used can also, however, be modified easily to handle floatingpoint numbers. The input significands (imantissa) are then processed in a fixedpoint multiplier pipeline. The exponents are combined by a separate fixedpoint adder, and a normalization circuit can be then included at the appropriate position in the execute stage within the pipeline. 
<<  CONTENTS  >> 

Related topics 