Home Computer Science
Table of Contents:
A user program while currently running sometimes requires the assistance of a service program (operating system program), as a result of an external- or internal-generated request. This request is issued in the form of an interrupt, which is essentially an unusual event.
When an interrupt (essentially a system call, i.e. a call to the operating system) is issued, the execution of the currently running program is halted, and the control is transferred to the beginning of the corresponding interrupt service program (OS program) as requested to resolve the situation (to service the interrupt). Program control once again returns to the original user program after the completion of service program execution.
Service program consists of system control instructions, which are generally called privileged instructions. Typically, these instructions are supplied by the operating system and are reserved for exclusive use of the operating system. These instructions are executed by the CPU in the same manner as a user instruction, only with the exception that the processor will switch from the user mode to the privileged mode (supervisor mode) when this type of instruction is executed.
System Call versus Subroutine Call
The system (supervisor) call to an interrupt service routine (ISR) is, in principle, quite similar to a subroutine call except with three distinct differences:
i. The interrupt is usually initiated (except for software interrupt) by an internal or external signal rather than from the execution of an instruction;
ii. The address of the interrupt service program is determined by the hardware at the time of such a call, rather than from the address field that happens to be associated with an ordinary (subroutine) call instruction;
iii. An ISR, before starting its own execution, usually stores a lot of information with regard to the CPU state rather than storing only the PC and the contents of the other registers.
The hardware procedure for servicing an interrupt is very similar to the procedure being used in handling subroutine call instructions. Here also, the stack approach is used for storing necessary information before jumping to the ISR. The additional information that is to be saved is the PSW/PCB, which keeps track of the CPU status at that moment. Hence, when the last instruction in the service program, return from interrupt, is executed, the stack is popped to retrieve the old PSW/PCB as well as the return address. This PSW/PCB is then transferred to the System Register (SR), and the return address to the PC. Thus, the CPU state is restored once again to user mode and the original program execution can now be resumed as if nothing has happened.
Different machines do have numerous types of system control operations depending on their organisation, especially the CPU organisation. The amount of information to be saved, and the way the control operation would be handled, is a critical issue which is primarily decided at the time of CPU design and subsequently in its implementation. Besides, a lot of other types of machine instructions such as HALT, WAIT, NOOP, etc. are also available in many machines. To what extent the various types of machine operations are to be included in the instruction set is the trade-off, and one of the major criteria in CPU design which again depends on the target that the CPU intends to attain.
Other Operations: IA-32 Instruction Set
Only a small set of common operations and the corresponding instructions to realize them that are available in a generic instruction set have been described. However, there may still be other useful operations and related instructions in the instruction set of any particular processor with suitable hardware to attain certain specific functional objectives. Here, we will describe a few more important instructions that are available in the instruction set of the IA-32 architecture, in particular.
Memory handling: The basic IA-32 instruction set contains some specialized instructions to realize memory segmentation activities related to memory handling. These are privileged instructions that can only be executed in supervisor mode from the OS level. They perform loading and reading local and global segment tables (called descriptor tables) as well as checking and modifying the privilege level of a segment.
MMX (Multimedia Extension) Operation
The MMX technology is used for multimedia applications that deal with an array of a large number of pixel (picture element), the smallest element of a video (digital) image that can be assigned an individual dot in a dot-matrix representation of a picture. Each pixel is represented by 8 bits, its colour component (red, green, and blue) comprises 8-bit integer data item, and the brightness is also an 8-bit integer data item. A pixel of an image thus may be represented by a 24-bit quantity. Multimedia applications, however, involve the manipulation of individual pixel, matrix multiplication and matrix convolution type of operations, and also operations on multiple numbers of pixels simultaneously. The same characteristics, however, also apply to sampled audio signals or speech processing, where a sequence of signed digital numbers represents the samples of a continuous analog audio signal taken at periodic intervals. All these features together require an SIMD kind of CPU architecture for needed processing of a multimedia application.
The IA-32 MMX technology adds 57 new specific instructions known as enhanced MMX instructions to the instruction set of X-86 processors for performing multimedia tasks. MMX instructions operate on large arrays of small-packed audio and video data types, typically of 8 or 16 bits (see Section 3.8). Typical analog audio signal samples are of 16-bit units. Since the MMX registers are all 64 bits long, one can pack a total of 8 or 4 pixel values (each 8-bit or 16-bit integer) in a single register that can be used as a single-referenced memory item packed of these multiple operands (pixels) in the MMX instructions. These multiple operands packed within an item can then be loaded simultaneously and are then executed in parallel by a single instruction that eventually gives a SIMD operation, which is ideally suited for the integer pipeline of Pentium architecture. These fast parallel operations can yield an approximate speed-up of two to even eight times over comparable algorithms that do not use the MMX instructions (Atkins, 96). Each instruction typically takes a single clock cycle to execute. The operands on which these instructions are executed can be in memory or in the eight floating-point registers (see Section 3.2.1) referred to as MMO through MM7. With the introduction of IA-64 architecture, Intel has further expanded this extension to include double quadword (128 bits) operands and floating-point operations. The additional new data types as included within the existing one are as follows:
In ARM (Acorn RISC Machines) RISC processor, apart from having different types of ADD instruction, there are a set of parallel addition and subtraction instructions, in which portions of two operands are operated on in parallel. For example, ADD16 instruction adds the top half-words of two registers to form the top half-word of the result, and adds the bottom half-words of the same two registers to form the bottom half-word of the result. This type of instruction is in extensive use in MMX applications.
A representative list (not an exhaustive one) of different types of instructions of IA-32 MMX instruction set is shown in the website: http://routledge.com/9780367255732 (Table 3.1).
The saturation arithmetic with an example showing its impact is depicted in the website: http://routledge.com/9780367255732.
A real-life example with MMX instruction is depicted in the website: http://routledge. com/9780367255732.
Streaming SIMD Extension (SSE)
Since MMX instructions use eight floating-point registers, it is hence not possible to execute an MMX instruction and a floating-point execution simultaneously. But many multimedia operations, like in video processing, require simultaneous operations involving floating-point numbers, and MMX instructions are then clearly disadvantageous. Moreover, since MMX instructions are executed using only the floating-point registers, a large number of processor clock cycles are unnecessarily consumed for context switching from the state of executing MMX instructions to the state of executing floating-point operations, and vice versa. This also includes, in addition, few cycles to initialize these registers as required, by executing additional instructions, for subsequent use. To get rid of these shortcomings, there was an urgent necessity to extend the MMX instruction set to include floating-point instructions. This extended instruction set called streaming SIMD extensions or SSE instructions has been used in Pentium III, and SSE instruction set has further been enhanced for use in Pentium 4.
The SSE instructions are SIMD instructions for single-precision floating-point numbers operated on four 32-bit floating points concurrently. To execute these instructions, a set of eight 128-bit new registers-паmed as xmmO through xmm7 - have been specifically defined for SSE, and each of them can hold four 32-bit single-precision floating-point numbers. Since different registers have been allocated, it is now possible to execute both fixed-point MMX instructions and floating-point operations simultaneously without unnecessarily consuming lot of useful cycles. These SSE instructions can also execute non-SIMD floatingpoint and SIMD floating-point instructions simultaneously. In memory streaming instruction executions, the data is pre-fetched into a specified level of the cache hierarchy, and prefetching this type of data into the L2 cache of Pentium III and Pentium 4 is certainly an effective way that summarily improves the memory system performance as a whole.
SSE instruction set has been further enhanced, keeping in view the added capabilities and the architectural changes incorporated within the Pentium 4. The innovative Pentium 4 NetBurst microarchitecture introduces Internet SSE2 instructions, which are essentially an extension of existing SSE by adding 144 new instructions. This new instruction set, however, increases the accuracy of the 128-bit SIMD double-precision floating-point operations, supports new formats of packed data (new data types),and increases the speed of manipulation of 128-bit SIMD integer operations. With the introduction of SSE2, Intel has ultimately extended its SIMD capabilities that MMX technology and SSE technology together already delivered.
The next-generation 90-nm process-based Pentium 4 processor introduces the next version SSE3 in 2004 when they released their next version of Pentium 4, the Prescot. The SSE3 instruction set includes 13 additional SIMD instructions over SSE2 that comprise five different types of instructions, namely floating-point-to-integer conversion, complex arithmetic operations, video encoding, SIMD floating-point operations using array of structures format, and thread synchronization. These additional instructions are mainly aimed towards enhancing 3D graphics, video, and multimedia applications, and some of them will be useful for improving thread synchronization. Summarily, the processor's capability is largely increased to speedily handle many important computations, including faster in-parallel floating-point computations required for 3D graphics applications, multimedia, and gaming.