Desktop version

Home arrow Computer Science

  • Increase font
  • Decrease font


<<   CONTENTS   >>

Interleaved Memory Organisation

Background

Main memory is centrally located, since it is both the destination of input and the source for output, and satisfies the demands of CPU requests and also serves as the I/O interface at the same time. Performance measures of main memory emphasize both latency and bandwidth parameters. Amdahl's law warns us what will happen if we concentrate only on one parameter to speed up the computation while placing less importance on and/or ignoring the other parameters. So, both the latency and bandwidth improvement should be considered at the same time with equal importance. Memory latency is traditionally the primary concern of cache usage, and is minimized by using various types of cache organisation. While memory bandwidth is the primary concern of and critical to Processor and I/O, the memory design goal in this regard is targeted to broaden the effective memory bandwidth so that more and more memory words can be accessed per unit time. The ultimate aim is, however, to match the memory bandwidth with the processor bandwidth and with the bandwidth of the bus to which the memory is attached. Benefit in bandwidth improvement using caches can also be obtained by increasing each cache block size, but that may not be a cost-effective approach.

Innovative organisations of main memory are, hence, needed. Memory interleaving is such a new organisation aimed more at the improvement of memory bandwidth than at reduction of latency with almost no extra cost. Increasing width is one way to improve bandwidth, but another benefit of this is to extract the potential parallelism by way of having multiple memory in a memory system as demanded by pipeline processor and vector processor for their optimal performance. These processors often require simultaneous access to memory from their two or more sources (stages). An instruction pipeline sometimes accesses memory to fetch anew instruction, and at the same time to get an operand from two different segments(stages) of the processor for another executing instruction lying in the pipeline. An arithmetic pipeline in a similar way also requires two or more operands at the same time before entering the execution stage of the pipeline. This simultaneous accesses require two memory buses. It can be avoided, if the memory can be divided into a number of modules connected to a common memory address bus and data buses.

Memory Interleaving

Here, the main memory is constructed with multiple modules. Memory chips can be organised in banks to read or write multiple words at a time rather than a single word. These memory modules are connected to a system bus or a switching network to which other resources such as processors or I/O devices are connected to communicate with memory modules. A memory array is thus formed with these memory modules when each memory module has its own address register and data register (like MAR and MBR).

Each bank is often of one-word width so that the width of the bus and cache does not need any change, and each bank returns one word per cycle. Sending different addresses to several banks simultaneously permits them to operate at the same time so that multiple words can be accessed in parallel, or even, at least, in a pipelined fashion.

Consider a memory system formed with memory interleaving having m = 2X memory modules, with each module containing w = 21' words. The total capacity in the memory system is m x w = 2X+U words. These words are assigned with usual linear addresses. Linear addresses could be assigned in different ways giving rise to different memory organisation, both for random access and for block access. Block-access at consecutive addresses is sometimes needed for fetching a sequence of instructions or for accessing a set of data linearly ordered. This size of the block may correspond to the size of the cache block, or to that of several cache blocks (cache lines). The memory organisation should also consider this aspect of block-access on contiguous words at the time of its design.

The number of modules (banks) present in a memory system is called Interleaving factor or degree of interleaving. Word-oriented interleaving optimizes sequential memory access, and is equally well-matched to catch read miss as the words in a block of cache are read sequentially. Write-back caches make sequential write to yield more efficiency from word-interleaved memory.

Types of Interleaving

Low-order X-bits of the memory address is used to identify the target memory module (bank). The high-order Y-bits of the said memory address are the word address (displacement or offset) of the target location within each module. The same address can be applied to all memory modules simultaneously. Such type of arrangement of modules to support memory addressing is called low-order interleaving. Figure 4.39 illustrates the scheme of this interleaving. In this arrangement, contiguous memory locations are

FIGURE 4.39

Low-order m-way interleaving word-address scheme with m = 2V modules, each module having w = 2>' words.

spread across the ш-modules horizontally. Low-order interleaving facilitates block-access in a pipelined fashion.

When failure is detected in one module, the remaining modules cannot be used. The fault isolation cannot be carried out in this low-ordered organisation. A module failure in this arrangement may paralyse the entire memory bank. That is why, this type of organisation is not fault tolerant.

High-order X-bits of the memory address is used as the module address and the low- order Y-bits as the target word address within each module. Contiguous memory locations are hence assigned to the same module. Such type of arrangement of modules to support memory addressing is called high-order interleaving. Figure 4.40 shows such an arrangement. Only one word is accessed from each module in each memory cycle. Block access of contiguous locations thus cannot be obtained from high-order interleaving.

On the other hand, since sequential addresses are assigned to each module in this organisation, it is easier to handle module failure. When one module failure is detected in a memory bank of ш-memory modules, the faulty memory module is isolated, the remaining

FIGURE 4.40

High order m-way interleaving word-address scheme with m = Iх modules, each module having w = 2у words.

modules can still be used by opening another window in the address space. Fault tolerance is a salient feature of this organisation apart from having other distinct disadvantages.

Interleaving in Motorola 68040

The Motorola 68000 series was introduced sometime in the late 1970s. This was one of the first chips to use 32-bits (4 bytes) for addresses to access, in principle, a memory having 232 different locations (words). The memory organisation of the Motorola 68040 processor, however, served to reduce the overall access time in DRAM. The 68040 performs burst accesses to read or write 16 bytes of data in 4 adjacent long words between its caches and memory in a single bus transaction. The interleaved memory configuration is designed to speed up 68040 burst accesses by as much as 30% (of course, the actual speed up depends on DRAM access time and system clock speed). The four long words of a burst access are spread across two physical modules of DRAM; the individual access over each module can

FIGURE 4.41

Interleaved burst access timing.

be overlapped to reduce overall access time, and to hide part or all of the memory access delay. This is illustrated in Figure 4.41.

Conclusion

Some aspects of this memory organisation encourage low-order interleaving, other aspects indicate a clean sweep in favour of high-order interleaving. However, high-order and low- order interleaving can again be combined to yield many different interleaved memory organisations. These different types, however, offer normally a better bandwidth and that too, even in the case of module failure. One of such representative organisations is shown in Figure 4.42 using a four-way low-order interleaving for a clear understanding of this hybrid organisation. Here, low-order interleaving is organised in each of two memory banks.

The advantage of this arrangement is that in case of module failure, this organisation of the two-bank four-way design as shown in Figure 4.42 will still offer a reduced bandwidth

FIGURE 4.42

Four-way interleaving within each memory bank; 1-bit to address one of two banks; 2-bit to address one of

4 modules within each bank; 3-bit to address any of 8 words in any module.

to four words per memory cycle, since only one of two faulty banks will be invalid. The pure low-order interleaving in this situation makes the entire memory bank out of use.

In interleaved memory organisation, the trade-offs should consider the degree of interleaving to obtain maximum memory bandwidth, its fault tolerance, and liberty of each memory banks so that in the worst situation of module failure, something could be extracted from the proposed design.

Associative Memory Organisation

Background

As technology constantly progresses, CPU-main memory speed disparity continuously increases, thereby creating a severe bottleneck at the CPU's end. Inclusion of cache memory in the memory hierarchy primarily solves this problem of latency. Main memory when organised in an interleaved fashion reduces mostly the bandwidth problem. But, supply of data to CPU requires, firstly to reach the particular target data from a list of it, and that too as fast as possible. Thus, searching of target data to reach it before it is supplied is an inherent process both in user-oriented as well as in system-oriented applications. The traditional search procedure usually follows a strategy that chooses a sequence of addresses, reads the memory content at each address, and then compares the information read with the item being searched until a match occurs, or the sequence of addresses thus searched comes to an end with no match. Under this scheme, the total time required to access the desired data depends on the number of accesses to memory which again depends on the location of the target item, the organisation of this list of items, and the efficiency of the search algorithm being employed as well. Many different techniques have been proposed in this regard, a few of them have been devised to optimize this approach of searching within the limit of its periphery, but have been observed to have achieved only a minimum at best.

Thus conventional approaches to identify the data by its address (location) have been left away, and an innovative hardware-based mechanism has been devised to accomplish fast search to reach the desired item. The item under search here is identified for access by its content rather than by an address. A memory unit in which any stored item can be accessed directly by using the contents of the item, in question, is called an Associative Memory or Content Addressable Memory (CAM) or Parallel Search Memory. The entire memory of this type is accessed simultaneously, and in parallel on the basis of data content rather than by specific address as usually happens with RAM. When a data is stored in this memory, no address is linked with it.

Associative memory is a small hardware device usually inside the MMU or within the CPU chip, and contains a small number of entries, rarely more than 32. Due to its particular form of organisation, this memory is uniquely matched to perform parallel searches by data association. Searches can be done on an entire word, or on a specific field within a word. The field chosen to address the memory is called the Key. Items stored in associative memory can be viewed as having the format: KEY, DATA; where KEY is the address (a subfield of record), and DATA is the information (contents of records) to be accessed. Applications where the search time is assumed to be very critical and must be very short are the ideal situation for associative memory to use.

Associative memory is generally used to contain the data which are heavily in use in the form of a table during the execution of a process. Sometimes, it is used to contain a small fraction of the page table entries which are frequently demanded, and thus accelerates the mapping process of virtual addresses to physical addresses without going through the entire page table available in RAM. It is, hence, sometimes called Translation Lookaside BuffedTLB). The MIPS R2000, a RISC machine that has eventually taken the associative memory idea to its limit. Here, the CPU contains a 64-entry associative memory on the CPU chip, each entry in this memory is of 64-bit, holds the virtual page number and other related information. When the CPU generates a virtual address during execution, this memory's entries are used for the purpose of faster address translation. It is to be noted that the addressing of associative memory is performed with the contents of one of the fields (here, virtual page number) in each row of the table.

The basic difference between associative memory and RAM is that associative memory is content-addressable allowing parallel access of multiple memory words, whereas the RAM must be accessed by specifying the word addresses. The inherent parallelism in associative memory has a great impact on the architecture of associative processors, a special class of SIMD array processors which are updated with associative memories.

An associative memory is more expensive than a RAM of the same size, because each cell must have storage capability as well as logic circuits for matching its content with the supplied argument. Some additional circuit like Select Circuit is also included in the hardware mechanism to provide other services as will be discussed later.

The major advantage of associative memory over the RAM is its capability of performing parallel search and comparison operations, which are needed in many important applications, such as table look-up, information storage and retrieval of rapidly changing databases, radar-signal tracking, execution of image processing, and real-time artificial intelligence computation.

The major disadvantage of associative memory is its much increased hardware cost. Currently, associative memory is much expensive than RAM, even though both are built with integrated circuitry. However, with the rapid advent of VLSI technology, the price gap between this type of memories is gradually reducing.

Implementation

Word-Organised Associative Memory

It consists of a memory array of m words, each is of n bits, and also the related logic for this m words with n bits per word. In this organisation, several registers are employed to accomplish different responsibilities. Figure 4.43 shows the structure of this type of a simple associative memory. The functions that different registers perform are as follows:

Input register (I): The input register I holds the input. This means that it holds the data to be written into the associative memory, or the data to be searched for. At any instant, it holds one word of data, i.e. a string of n bits of the memory. Consequently, the length of the input register is n-bit.

Mask register (M): Each unit of stored information is a fixed-length word (record).Any subfield of the word may be chosen as the key. The mask register provides a mask for choosing a particular field or key (i.e. the key to be searched) in the input register's word. The maximum length of this register is n bits, because this register has to hold a portion of the word or all bits of the word to be searched. For example, consider an inventory file containing various items, where each item is a record in the file. Each such record contains

FIGURE 4.43

Block structure of a simple associative memory.

several fields, such as product code, type, product name, description, etc. Any field can be chosen as a key for searching an item over this file. At this moment, assume that the "productcode" field will be used here as the key for searching an item over this file. This desired key is specified by the mask register whose contents identify the bit positions (not necessarily be adjacent) in the record that define the key field.

To illustrate the searching mechanism involved with associative memory, let the input register and mask register contain the following information:

and the file contains three records (words) to be searched, for example, are given as:

The contents of the mask register indicates four Is in its leftmost four bits. This signifies that four leftmost bits of input register will be used as key for searching. The contents of the key field as obtained from the input register is then 1011, and this string of bits will be searched over all the records present in the file. Only that (or those) record (s) will be selected as match which contains 1011 in its leftmost four bits (i.e. at the corresponding position as given in the mask register) irrespective of the contents of the other fields in the record (s). It is found that there is a match for word 3 only, and not with others.

The current key is compared simultaneously with all stored words, those that match the key will emit a match signal, which enters a select circuit. This circuit does have a select register S of m bits, one for each memory word. If matches are found after comparing input data in I register with key field in M register, then the corresponding bits in the select register (S) are set. The select circuit enables the data field to be accessed (the back arrow from select circuit to S).

If several entries have the same key (i.e. more than one match), then the select circuit determines which data field to be read out; it may, for example, read out all matched entries in some pre-determined order. Since, all words in the memory (storage cell arrays) are

FIGURE 4.44

A representative block diagram of an associative memory of size мхи.

required to compare their keys simultaneously, each must have its own match circuit. The match and select circuits make associative memories much more complex and expensive than conventional memories (RAM). The arrangement of the memory array and four external registers linked with the associative memory system are depicted in Figure 4.44.

In practice, most associative memories have the capability of word-parallel operations, that is, all words in the associative memory array are involved in the parallel search operation. This differs radically from the word serial operations encountered in RAMs.

Based on how bit slices are involved in the operation, there are mainly two different associative memory organisation:

(i) Bit-parallel organisation, (ii) Bit-serial organisation.

i. Bit-parallel organisation: Under this scheme, the comparison process is performed in a parallel-by-word and parallel-by-bit fashion. All bit-slices which are not masked off by the masking pattern are involved in the comparison process. Essentially the entire array of cells is involved in a search operation Parallel Element Processing Ensemble (PEPE). Burroughs Corporation has employed associative memory with bit-parallel organisation.

ii. Bit-serial organisation: This memory organisation operates with one bit slice at a time across all the words. The particular bit slice is selected by an extra logic and control unit. The bit cell read out will be used in subsequent bit-slice operations. The associative processor STARAN (Goodyear Aerospace) has the bit-serial memory organisation.

The bit-serial organisation requires less hardware, but is slower in speed. The bit-parallel organisation while requiring additional word-match detection logic, it is also faster in speed.

The logic circuit for a 1-bit associative memory cell with figure is given in the web site: http://routledge.com/9780367255732.

The Bit-Serial organisation and Bit-Parallel organisation is shown with figures in the web site: http://routledge.com/9780367255732.

 
<<   CONTENTS   >>

Related topics