Home Computer Science
Cache–Main Memory Hierarchy: Its Performance
Introduction of cache into the memory hierarchy, exploiting the inherent tendency of the principle of locality in executing programs, certainly improves the performance of the memory system response, and thereby the performance of the computer system as a whole. This performance improvement can however, be estimated roughly as follows:
Let, Rhit = Hit ratio (probability of hit).
Rmiss = Miss ratio (probability of miss) at the cache level, and particularly, holds for any level.
^hit = Time to complete the memory reference activity, when hit occurs at any given level in the memory hierarchy.
Tmiss=Time to complete the memory reference activity, when miss occurs at any level, and have to go down to the next level in the memory hierarchy.
Thus, the average time required to complete any memory reference irrespective of hit or miss (or taking into account the probability of both hit and miss) would be:
However, Rhit should be always 100% for the lowest level (bottom-most level) in the memory hierarchy.
A number of problems with solutions related to this topic are given in the following web site: http://routledge.com/9780367255732.
Cache is controlled by the Memory Management Unit (MMU) and also by cache controller, and is transparent to the operating system and also to the programmer. Cache stores a set of frequently referenced main memory contents М(Д) into cache blocks (cache pages) where A-, is the memory block address. Each cache block is a sub-block of some main-memory page. The contents of the entire cache array are thus copies of a set of small non-contiguous main memory blocks attached with addresses. When an address is sent from the CPU at the beginning of a read or write memory access cycle, the relevant part of address (or the address itself) is compared by the cache with all its currently stored addresses. If there is a match, i.e. a cache hit, the cache then accesses the word M(A) corresponding to the address A just matched. It completes the appropriate read or write operation within the memory cycle. If the address A is not matched with any of the stored addresses in cache, a cache miss occurs. Usually, the cache then initiates a sequence of one or more read cycles to copy the required memory block P(A) into the cache containing the desired item M(A). If necessary, the cache controller first saves in main memory the existing contents of any cache page frame P'(A) it selects by any standard replacement algorithm to make a room for P(A), if the cache is found to be full. The size of the cache page is often designed in such a way that an entire page can be fetched in one main-memory cycle. The actual cache-main memory transfer procedures vary from cache to cache, and read/write operations are generally handled in a different way.
Cache Design Issues: Different Elements
Cache design and cache organisation involve many of the issues as have already been found in other memory module organisation. Although there are various types of cache implementations, a few basic design elements serve to classify and differentiate cache architectures. The primary elements which have a strong influence in cache design are:
a. Cache size
b. Cache block size
c. Mapping functions Fully associative direct
Set associative Sector associative
d. Cache initialization
e. Write policy Write through Write back Write once
f. Replacement Algorithm Least-Recently Used (LRU)
First-In First-Out (FIFO)
Out of all of the aforementioned issues, the two most important issues that are specifically peculiar to cache design are:
i. The way in which main-memory information is transferred (mapped) into cache memory.
ii. When a cache-write operation will modify (change) the contents of a cache, how the corresponding main memory location will be updated.
The size of the cache at the time of its design is a compromise between two different opposite approaches:(i) the size should be small enough so that cost-wise it should be very close to that of main memory alone. Moreover, the small size permits it to be accommodated within a limited available chip and board area;(ii) the size should be large enough so that most of the memory references could be available in the cache, and hence, the overall average access time would be as close to that of the cache alone; the ultimate motive of using a cache. On the contrary, the larger the size, the larger will be the number of gates involved in addressing the cache which will eventually make the cache slightly slower in operation than the small ones, even when built with the same IC technology, and placed in the same chip, and circuit board.
With the continuous advancement in electronic technology, the logic density has enormously increased. Consequently, it has created a massive impact on the overall cache design including the size of caches which has been found to have radically changed over the past decade. Moreover, a new concept has been emerging in the usage of multiple caches, and that too at different levels, providing a comparatively larger size of cache in totality which has become a de facto current standard norm. In addition, it has become possible to have a cache on the same chip as the processor; the on-chip cache, and that too again at different levels within the CPU chip. All these aspects, and their implications in actual implementations have been separately discussed in later sections. Summarily, it can be said since the cache operation and its performance are very sensitive to the nature of the applications, it is not only difficult, but nearly impossible to conclude what the optimum size of cache should be.
The choice of the block size of cache is one of the deciding factors in cache performance. Larger block size could store not only the desired word when it is retrieved and placed into the cache, but some number of adjacent words also. As a result, the hit ratio will then automatically increase because of the principle of locality (spatial locality). But, larger blocks reduce the number of blocks that can fit into a cache of fixed size. As a result, a small number of blocks results in frequent replacement of data shortly after it is fetched, increasing unnecessarily the overhead of cache operation. On the other hand, while a smaller block size increases the number of blocks available in fixed-size cache, the hit ratio cannot be appreciably improved. Smaller block size can only store a few units of information which is not enough to meet all the near-future references that could arrive according to the principle of locality (spatial locality). As the block size increases from small to larger sizes, the hit ratio will at first instance increase, and as the block becomes even larger, the hit ratio will then begin to decrease because each additional word newly fetched is farther from the requested word, and therefore less likely to be needed in the near future. In fact, the relationship between cache size, cache block size, and hit ratio is very complex to derive, and no such convincing optimum value has been found yet. It is more or less depending on the characteristics of a particular application. However, a reasonable approach close to optimum is a size of about 4-8 addressable units (bytes, or words).