The pure Harvard machines have separate pathways with separate address spaces . " Split cache " modified Harvard machines have such separate access paths for CPU caches or other tightly coupled memories, but a unified address space covers the rest of the memory hierarchy.
32.
Thus, a program will achieve greater performance if it uses memory while it is cached in the upper levels of the memory hierarchy and avoids bringing other data into the upper levels of the hierarchy that will displace data that will be used shortly in the future.
33.
Though initially proposed by Joupii to improve cache performance of a direct-mapped cache Level 1, modern day microprocessors with multi-level cache hierarchy employ Level 3 / Level 4 cache to act as victim cache for the cache lying above it in the memory hierarchy.
34.
This is most significant if, prior to expansion, the working set of the program ( or a hot section of code ) fit in one level of the memory hierarchy ( e . g ., L1 cache ), but after expansion it no longer fits, resulting in frequent cache misses at that level.
35.
Finally, at the other end of the memory hierarchy, the CPU register file itself can be considered the smallest, fastest cache in the system, with the special characteristic that it is scheduled in software typically by a compiler, as it allocates registers to hold values retrieved from main memory, as an example loop nest optimization.
36.
The term is also used for misses between other levels of the memory hierarchy, not just paging ( memory to disk ) : when a small set of faster storage space, intended to be used to speed up access to a larger set of slower storage space, is accessed in a way that cancels out any benefits from the faster storage.
37.
The significance of this is that if the Working Set Size is larger than the available memory in a virtual memory system then the memory manager must refer to the next level in the memory hierarchy ( usually hard disk ) to perform a swap operation swapping some memory contents from RAM to hard disk to enable the program to continue working on the problem.
38.
A common optimization is to put the unsorted elements of the buckets back in the original array " first ", then run insertion sort over the complete array; because insertion sort's runtime is based on how far each element is from its final position, the number of comparisons remains relatively small, and the memory hierarchy is better exploited by storing the list contiguously in memory.