Desktop version

Home arrow Computer Science

  • Increase font
  • Decrease font


<<   CONTENTS   >>

Advanced DRAM Organisation

While the processor speed is rapidly increasing with relentless technological advancement, the memory speed, however, also increases but rather relatively modestly. As a result, one of the most critical system bottlenecks arises due to such continuous increase in processor-main memory speed disparity. The basic building block of main memory still remains the DRAM chip as it had for quite a long period until recently, and there has been as such no remarkable change in the architecture of existing DRAM since the early 1970s. It has been observed that in spite of having several notable advantages, the traditional DRAM chip is also not capable, since it is critically constrained both by its internal structure and by its existing interface to the processor-main memory bus. Most of the shortcomings including the memory speed faced due to the use of DRAM main memory have been, however, somehow compensated with the use of one or more levels of high-speed SRAM cache inserted between the DRAM main memory and the processor. Although this approach improves the system performance to a great extent, but the SRAM memory being used as cache in between, is much costlier than the DRAM, and again cache usage has some critical limitations, that expanding cache size and increasing cache levels beyond a certain point start to contribute diminishing returns.

Thus, once again it became essential to shift the focus to the actual problem rather than to somehow avoid or negotiate it indirectly, and that too, in the quest for some definite effective approaches to get rid of it. Consequently, a number of improvements in the basic DRAM architecture have been explored in recent years. Not all of them have been successfully implemented. However, the schemes that are currently most popular, and commonly in use are SDRAM, DDR-DRAM and its variants, and RDRAM. The CDRAM also cannot be kept set aside, and it is considered to be equally important.

SDRAMs (Synchronous DRAMs)

SDRAM is one of the most popular and widely used modified forms of DRAM, and unlike the traditional DRAM, which is asynchronous, SDRAM in operation is directly synchronized to an external clock signal to nullify the bad wait state associated with traditional DRAM. As a result, it can run at the highest possible speed of available processor-memory bus. The cell array in SDRAM is, however, organised similar to traditional asynchronous DRAMs, already discussed in the previous section. Since, this modified DRAM uses synchronous access, it is called synchronous DRAM or SDRAM. The principle that works behind is very simple. Here, the DRAM operates moving data in and out under the control of the system clock. While the processor or other master issues the request with related information as required by DRAM, it is then latched by the DRAM, and later it responds, only after performing all its own operations, consuming a predefined number of clock cycles. Meanwhile, the master remains free to do its own work while SDRAM is engaged in its own operations to process the request.

Figure 4.7 illustrates a schematic block design of an SDRAM. The address and data connections are buffered by means of two individual registers. The notable feature is that the output of each sense amplifier is connected to a latch. The SDRAM employs a

FIGURE 4.7

Schematic block diagram of a synchronized DRAM (SDRAM).

burst mode (already discussed in the previous section) to eliminate the address setup time, and row and column line pre-charge time after the first access. In burst mode, a series of data bits can be clocked out rapidly after the first bit has been accessed. This mode is effective when all the bits to be accessed are in sequence and in the same row of the array as the initial access. That is why, at the time of read operation, the contents of all the cells in the selected row (as per the supplied address) of SDRAM are to be loaded into the latches (buffer). Data available in the latches that correspond to only the selected columns (as per the supplied address) are then transferred into the data output registers as shown in Figure 4.7. During this operation, even if an access is made for the purpose of refreshing, the refreshing action will be simply carried out only on the contents of the cells, and not on the data located in the latches.

One of the salient features of the SDRAM is that it has an associated built-in refresh circuitry. A part of this circuitry is a refresh counter that provides the addresses of the rows being selected for refreshing. Normally, a typical SDRAM refreshes each of its rows in an interval of, at least, every 64 ms. Another key feature that differentiates SDRAM from traditional DRAM is the inclusion of a mode register and associated control logic attached with SDRAM as shown in Figure 4.7. This module provides a suitable mechanism to customize the SDRAM to suit specific system needs. It facilitates to realize various modes of operation that an SDRAM can usually perform. The particular mode is, however, selected by writing control information into this mode register. For example, the burst operations can be executed by specifying the burst length into the mode register, which is the number of separate units of data synchronously fed onto the bus. At that time, it uses the block transfer approach as the fast page mode feature. Moreover, this register also allows the programmer to adjust the latency between the receipt of a read request, and the beginning of a data transfer. In addition, SDRAM has a multiple-bank internal architecture that often improves on-chip parallelism whenever there is a scope as such.

The SDRAM is found to be most efficient, and perhaps performs best when large blocks of data need to be transferred serially. Block transfers are mostly prevalent in general- purpose computer applications in which main memory transfers are primarily to and from processor caches. Besides, the block transfers are often carried out in applications, like in multimedia, with high-quality video displays, word processing, and in other types of similar applications. Commercial SDRAMs as launched by Intel have defined PC100 and PC133 bus specifications to meet the requirements of contemporary available processors. These memory chips are accordingly used in the motherboards of computers with system bus speeds of 100 or 133 MHz.

A brief detail of SDRAM with figure is given in the following web site: http://routledge. com/9780367255732.

DDR SDRAM

One of the limitations of standard SDRAM is that it can perform all its actions once per bus clock cycle, and that is at the rising edge of the clock signal. This drawback has been, however, alleviated by introducing a new version of SDRAM, referred to as double data rate SDRAM (DDR-SDRAM) that transfers data on both edges of the clock, i.e. twice per clock cycle; once on the rising edge of the clock pulse and once on the falling edge of it. DDR- SDRAM accesses the cell array in the same way that traditional SDRAM does. The latency of these devices is also the same as that of standard SDRAMs. But, since they transfer data on both edges of the clock, their bandwidth is essentially doubled for long burst transfers. To make this possible to work with data at such a high rate, the cell array is organised in two banks. Each bank can be accessed separately. Consecutive words of a given block are stored in different banks. Such interleaving of words allows simultaneous access to two consecutive words located in two different banks that can now be transferred on successive edges of the clock. The concept of interleaving of memories, however, are discussed in Section 4.8.2 in more detail.

Continuous innovations with no indication of any let-up in the quest for further improved performance in existing DDR technology has been steadily progressing. As the technology is getting gradually matured, two generations of improvement to the DDR technology has evolved. The first one is the introduction of DDR2 that increases the data-transfer rate by increasing the operational frequency of the RAM chip, and by increasing the prefetch buffer from 2 bits to 4 bits per chip. The prefetch buffer is essentially a memory cache located on the RAM chip that enables the RAM chip to preposition bits to be placed on the data bus as rapidly as possible. The next one is DDR3, introduced sometime in 2007, came out with some notable features as part of further improvement over the existing DDR2 including a higher clock rate, and an increase in the prefetch buffer size to 8 bits. Table

4.1 shows a rough comparison of the DDR generations in relation to some of their basic characteristics.

A brief detail of DDR read operation with timing diagram is given in the following web site: http://routledge.com/9780367255732.

TABLE 4.1

Comparison of Basic Characteristics of DDR Generations

Specification

Standard DDR (DDR1)

DDR2

DDR3

Voltage levels

2.5V

1.8 V

1.5 V

Prefetch buffer (bits)

2

4

8

Data transfer clock rate (MHz)

200-600

400-1,066

800-1,600

Front-side bus data rates (Mbps)

200,266,333,400

400, 533,677,800

800,1,066,1330,1600

Rambus DRAM (RDRAM)

As already mentioned, all DRAMs including the different generations of DDR chips use similar organisations for their cell arrays, and also access the cell array in a similar way. As a result, their latencies tend to be almost similar if the same components and same fabricating technology are employed in these chips. This latency as well as the bandwidth are often regarded as the primary characteristics of these DRAMs that mostly determine their performance as a whole. Again, the effective bandwidth as obtained in a computer system (involving data transfers between the memory and the processor) is not determined solely by the speed of the memory, but it also depends on the transfer capability, typically the speed of the memory bus. Memory chips are thus usually designed to meet the speed requirements of popular buses. The problem is that the bus speed can be attained only up to a certain limit and cannot be increased at will to improve the data-transfer rate. The only way to reasonably increase the data-transfer rate on a speed-limited bus is to increase the width of the bus by providing more data lines to widen the bus. From a design point of view, a reasonably wide bus is not only expensive, but needs a lot of space on the motherboard which is difficult to organise. An alternative approach that can negotiate this conflicting situation suggests the use of a comparatively narrow bus, but a relatively faster one in implementation. This approach was religiously used by Rambus Inc. to develop a proprietary design methodology known as Rambus.

One of the key features of Rambus technology is its fast signalling method which is used to transfer information between chips. Instead of using signals that have usual voltage levels of either 0 or Vsupply to represent the logic values, the signals here actually consist of much smaller voltage swings around a reference voltage, Vref. This reference voltage is about 2V, and two logic values are represented by 0.3V swings above and below Vief. This type of signalling is generally known as differential signalling. Small voltage swings, however, make it possible to have short transition times that, in turn, allow relatively higher transmission speed. Special techniques are thus here employed for the design of communication links (bus) to implement differential signalling, and special circuit interfaces are also designed to deal with these differential signals. All these together, however, ultimately put several constraints on making the bus wide. Rambus thus provides a complete specification for the design of such communication links called the Rambus channel. The earlier designs of Rambus, however, allowed for a clock frequency of 400 MHz, and while the data is transferred on both edges of each clock signal (PGT and NGT), the effective data-transfer rate then eventually attained 800 MHz. This design specified a channel that provided 9 data lines and a number of control and power supply lines. Eight of the data lines were intended for transferring a byte (8 bits) of data. The ninth data line can be used for other purposes: one such is parity checking. Subsequent enhancements specified additional channels. A two-channel Rambus, also known as Direct RDRAM has 18 data lines (16 actual data, and two parity) intended to transfer two bytes of data at a time, resulting in a signal rate of 800 Mbps on each data line. There are no separate address lines.

Rambus requires specially designed memory chips. These chips, however, use similar organisations for their cell arrays as found in standard DRAM technology. Multiple banks of cell arrays are interleaved to access more than one word at a time. Required circuitry needed by the cell arrays to interface to the Rambus channel is included on the chip. Such memory chips are known as Rambus DRAMs (RDRAMS) [CRISP,1997]. RDRAM chips are vertical packages, with all pins on one side. The chip exchanges data with the processor over 28 wires, no more than 12cm long. Presently, the bus can address up to 320 RDRAM chips, and is rated at 1.6 Gbps. This chip has been subsequently recognized, and later adopted by Intel for its Pentium and Itanium processors. Soon, it became the main competitor to SDRAM.

The special RDRAM bus delivers address and control information using an asynchronous block-oriented protocol. After an initial 480 ns access time, this produces the 1.6 Gbps data rate. What makes this speed possible is the bus itself, which defines impedances, clocking, and signals very precisely. Rather than being controlled by the explicit RAS, CAS, R/W, and CE signals as being used in traditional DRAM, an RDRAM gets a memory request over the high-speed bus. This request is communicated between the master (be it the processor or some other devices) and the RDRAM modules which serve as slaves, and it is carried out in terms of packets transmitted on the data lines. There are three types of packets: request, acknowledge, and data. A request packet issued by the master indicates the type of operation that is to be performed. The operation types include memory reads and writes, as well as reading and writing of various control registers present in the RDRAM chips. The request packet also contains the address of the desired memory location, and includes an 8-bit (byte) count that specifies the number of bytes needed to be transferred. When the master issues a request packet, the addressed slave (RDRAM) responds by returning a positive acknowledgement packet, if it can immediately satisfy the request. Otherwise, the slave indicates that it is “busy'' by returning a negative acknowledgement packet, in which case, the master will try again. If the number of bits in a request packet exceeds the number of data lines, it then means that several clock cycles are needed to transmit the entire packet. Use of a narrow communication link is, however, compensated here by the availability of very high rate of transmission.

Figure 4.8 illustrates a schematic representation of the RDRAM consisting of a controller and a number of RDRAM modules connected via a common bus. The controller is located at one end of the configuration, and the far end of the bus is a parallel termination of the bus lines. As already mentioned, the bus includes 18 data lines cycling at twice the clock rate, thereby resulting in a signal rate of 800 Mbps on each data line. There is a separate set of 8 lines (Rambus channel, RC) used for addressing and controlling signals. There is also a clock signal that starts at the far end of the controller, propagates to the controller end, and then loops back. A RDRAM module sends data to the controller synchronously to the clock, to the master, and the controller sends data to the RDRAM synchronously with the clock signal in the opposite direction. The remaining bus lines include a reference voltage, ground, and power source.

RDRAM chips can be assembled into larger modules, similar to SIMMs and DIMMs (see next section: RAM module organisation). One such module called RIMM can hold up to 16 RDRAMs. Rambus technology, however, gradually matured with the passing of days and started to compete directly with the DDR SDRAM technology. Each one has its own several merits as well as certain drawbacks. Still, the one notable factor is that while DDR SDRAM technology is an open standard, RDRAM is a proprietary design of Rambus Inc.

FIGURE 4.8

Rambus DRAM (RDRAM) structure.

for which the manufacturers engaged in fabricating RDRAM chips have to pay a royalty, an extra overhead added on the cost of the chips. However, from a performance point of view, even if it is found to be moderately adequate, the last word is certainly the price of the component that eventually often influences the final decision in regard to its usage.

Cache DRAM (CDRAM)

In recent years, a number of attempts have been made to further enhance the basic DRAM architecture. Some of them have been explored, a few of them have been implemented, and are presently in the market. CDRAM implementation is one such enhancement that received considerable attention. It is basically a combination of SRAM onto a DRAM in one single chip. A brief description of CDRAM is given here for an overall understanding. Cache DRAM (CDRAM) is essentially an integration of a small SRAM cache (16 Kb) onto a generic DRAM chip, developed and introduced by Mitsubishi Corporation, Japan. The SRAM on the CDRAM can be used in many different ways. One such is that the SRAM portion of the memory present in CDRAM, by virtue of being faster in operation than its counterpart DRAM, can be used truly as a cache. This cache, as usual, consists of 64-bit lines. The cache mode of the CDRAM is, however, effective for ordinary random access to memory. The SRAM in CDRAM can also be used in another way, as a buffer to support the serial access to a block of data. As for example, to refresh a bit-mapped screen, the CDRAM can prefetch the data from the DRAM into the SRAM buffer. Subsequent accesses to the chip can then result in accesses entirely from the SRAM portion only, thereby making the usual memory accesses much faster.

A lot of schemes in relation to the extension of traditional DRAM architecture have been implemented to overcome its inherent constraints caused by its existing internal architecture as well as the limitations of its interface to the processor-memory bus. As a result, different categories of DRAM have been fabricated. But, only those schemes that currently dominate the market, and worthy of mention are essentially SDRAM, DDR-DRAM, and RDRAM. The performance of these products are, however, summarily compared in Table 4.2, taking into account mainly those basic parameters that have direct impacts on the performance of these RAMs.

TABLE 4.2

Performance Comparison of Some Main DRAM Alternatives

Memory

Clock Frequency (MHz)

Access Time (ns)

Transfer Rate (GB/s)

Pin Count

SDRAM

166

18

1.3

168

DDR

200

12.5

3.2

184

RDRAM

600

12

4.8

162

 
<<   CONTENTS   >>

Related topics