Home Computer Science



FloatingPoint ArithmeticTable of Contents:
Let X and Y be two floatingpoint numbers, be expressed as (Xs, XJ;) and (Ys, YE), respectively. Therefore, the numerical value of X is Xs x BXE and that of Y is Ys x BYE. To explainthis, some realistic assumptions in this respect are needed to be made, which are as follows: • Xs is an nsbit two's complement or signmagnitude binary fraction; • X£ is an nEbit integer in excess 2'V1 code, implying an exponent bias of 2V1;}} ^{[1]} TABLE 7.1 Rules for Basic Operations Involving FloatingPoint Operands
Basic operations: General methods for floatingpoint addition, subtraction, multiplication, and division are given in Table 7.1. For addition and subtraction, it is necessary to ensure that both operands have the same exponent value. This may require shifting the radix point on one of the operands to realize alignment. Multiplication and division are relatively simple, because the significands (mantissas) and exponents can be processed independently. The floatingpoint operations normally produce the usual expected expressible results, but at times, they may give rise to one of these situations, such as: i. Significand underflow is observed while aligning significands that digits may flow off the right end of the significand. To cope with this situation, some form of roundingoff is required, to be explained later in section; ii. significand overflow occurs when the addition of two significands of the same sign may result in a carryout of the most significant bit. This can be fixed by realignment, which will be explained later; iii. Exponent underflow happens when a negative exponent becomes less than the minimum possible exponent value (e.g. 145 is less than 127) in the prescribed format. This means that the number is too small to be represented, and thus may be considered to be equal to 0; and iv. exponent overflow happens when a positive exponent exceeds the maximum possible exponent value defined in the prescribed format. This may be designated in some systems as +~ or °°. Addition and SubtractionFloatingpoint addition and subtraction are relatively complex since the exponents of the two input operands must be made equal before the corresponding significands can be added or subtracted. Following the floatingpoint format as already described, the two operands must be placed in the respective registers within the ALU to execute the required operation. The floatingpoint includes an implicit bit in the significand, but that bit must be made explicit at the time of executing the operation. The procedures being followed to perform addition and subtraction, however, are explained in Table 7.1. During addition/subtraction, if the signs of two numbers are the same, there also exists the possibility of significand overflow, the rectification of which, in turn, may invite exponent overflow. Whatever be it is, the appropriate actions would then be taken with suitable intimation, and possibly the operation is to be halted, and the subsequent needful actions are then required. After addition/subtraction, the result may be required to be normalized, which may invite exponent underflow. Again, suitable actions should be taken to resolve the situation. A typical flowchart for performing addition/subtraction incorporating all the activities as mentioned in Table lalong with a solved example is given in the website: http:// routledge.com/9780367255732. Implementation: FloatingPoint UnitA floatingpoint arithmetic unit can be built up by connecting two loosely coupled fixed point arithmetic circuits, one to be used as an exponent unit and the other as a significand (mantissa) unit. As the significand unit is required to perform all four basic arithmetic operations on the significands, a conventional fixedpoint arithmetic circuit (already described earlier) can be used for this purpose. The exponent unit, however, is implemented by a relatively simpler circuit, capable of only adding, subtracting, and comparing exponents of the input operands. Comparison of exponents can be made by a comparator or by subtracting the exponents. With this idea, a schematic structure of a floatingpoint unit can be built up on the lines of the illustration shown in Figure 7.18. The exponents of the input operands are loaded in registers £1 and E2, which are connected to an adder that computes El + E2. The comparison of exponents required for addition and subtraction is made by computing El  E2 (i.e. El + (E2), essentially is an addition) and placing the result in a counter E. The larger exponent is then determined from the sign of E. The bitshift of one of the significands (mantissas) required before the addition/subtraction of the significands can be controlled by E. The magnitude of E is sequentially decremented to zero. After each such decrement, the corresponding significand located in the significand unit is shifted onedigit position. After the needed alignment of the respective significand FIGURE 7.18 Schematic block diagram of a floatingpoint arithmetic unit. (equalizing the exponent, i.e. when £ becomes 0, of the two input numbers X and Y), they are processed in the usual manner depending on the type of arithmetic operation being required. The exponent of the result is also computed and is placed in E. All the computers have the fixedpoint arithmetic instructions as well as the floatingpoint instructions; it is, hence, always desirable to have a single unit within the ALU to execute both these types of instructions. But, as the sophisticated, faster, and also cheaper electronic technology is now readily available in abundance, it is almost common nowadays in most of the computer systems to incorporate separate units: one dedicated for fixedpoint integer (FXU) and another for floatingpoint arithmetic operations (FPU). Separation of these two individual units located within the architecture of ALU facilitates the execution of fixedpoint and floatingpoint instructions to continue in parallel. Multiplication and DivisionMultiplication and division are relatively simpler and somewhat easier than addition and subtraction, in that no alignment of significand (equalization of the exponents) is needed. As usual, the input operands here are represented in 2's (two's) complementary form. In multiplication, if either operand is 0, the result is automatically declared as 0. The next step is to add the exponents. If the exponents are stored in biased form, the sum of the exponents would then contain double the bias value. Hence, the bias value must be subtracted from the sum. The result may sometimes give rise to a situation of exponent overflow or underflow which must be intimated with the termination of the process. However, if the exponent of the product (result) lies within the specified range, the next step is to multiply the significands of the input operands, taking into account their signs, as is done for integer multiplication (already described earlier). The product (result) will be double the length of the multiplier or multiplicand, which one is larger. The extra bits may be lost due to roundingoff the result. After obtaining the product, the result as usual needs to be normalized, and roundedoff, if required. The action of normalization may sometimes lead to a situation of exponent underflow. Appropriate actions should then be taken to resolve the situation. Division is performed almost on the same lines as multiplication. Here too, the testing of 0 is to be carried out first. If the divisor is 0, an error is to be declared, or the result may be set to infinity, as per the guidelines of the particular implementation. But, for having a dividend of 0, the final result will be 0. The next step is to subtract the divisor exponent from the dividend exponent. This subtraction removes the bias, which must be added back in, but this addition may result in exponent overflow. However, appropriate tests are then made to inspect exponent underflow or overflow, if any, and a befitting test report can then be accordingly issued. The next action is to divide the significands of the two input operands. Finally, the result as obtained will go through the usual process of normalization, and rounding, if needed. Two typical flowcharts for separately performing multiplication and division incorporating all the activities as described in Table 1, respectively, are shown in the website: http://routledge.com/9780367255732. Implementation: FloatingPoint MultiplicationA multiplier circuit can be implemented using a multistage CSA circuit (already described earlier). This circuit is popularly known as a Wallace tree after the name of its inventor (Wallace 1964). The inputs to the adder tree are n terms of the form M, = x, Y 2^{k}. Here, M, represents the multiplicand Y multiplied by the ;'th multiplier bit weighted by the appropriate power of 2. Suppose M, is 2nbit long, and that a full doublelength product is required, tii the desired product P is ^ M,. This sum is computed by the CSA tree that produces a
A brief detail of this topic along with a befitting figure is given in the website: http:// routledge.com/9780367255732.

<<  CONTENTS  >> 

Related topics 