Jump to content

Talk:CPU cache: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Market1G (talk | contribs)
→‎Technical sections: damaged by vandal
Line 334: Line 334:
:Also the link for reference [2] is broken as well as the last paragraph of the Associativity section (missing text).
:Also the link for reference [2] is broken as well as the last paragraph of the Associativity section (missing text).
[[User:Market1G|Market1G]] ([[User talk:Market1G|talk]]) 19:50, 6 April 2010 (UTC)
[[User:Market1G|Market1G]] ([[User talk:Market1G|talk]]) 19:50, 6 April 2010 (UTC)

::The section on Associativity was mangled in [http://en.wikipedia.org/w/index.php?title=CPU_cache&diff=351341126&oldid=349311082 this edit]. — [[User:Aluvus|<font style="background: #3371A3" color="#FFFFFF">Aluvus</font>]] [[User talk:Aluvus|t]]/[[Special:contributions/Aluvus|c]] 00:28, 7 April 2010 (UTC)


== ''Details of operation'' to be clarified ==
== ''Details of operation'' to be clarified ==

Revision as of 00:28, 7 April 2010

Former featured articleCPU cache is a former featured article. Please see the links under Article milestones below for its original nomination page (for older articles, check the nomination archive) and why it was removed.
Main Page trophyThis article appeared on Wikipedia's Main Page as Today's featured article on January 7, 2005.
Article milestones
DateProcessResult
September 15, 2004Featured article candidatePromoted
July 11, 2007Featured article reviewDemoted
Current status: Former featured article

Template:V0.5

WikiProject iconComputing Unassessed
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.

AMD Centric Article

I came here looking into why the Intel Core 2 Duo chips with 64k L1 cache per core, and 4Mb shared L2 cache, works so well compared to a AMD K8 with 1MB L1 cache. However, there is very little Intel based information here. Any chance the knowledable can make this article less AMD centric? --Mgillespie 10:43, 7 August 2006 (UTC)[reply]

The AMD K8 has 128 kB L1 cache and 1 MB L2 cache (maximum). -- Darklock (talk) 02:36, 16 March 2008 (UTC)[reply]
There is reference to Intel Pro, this article is not AMD centric and it is not desirable to make it more Intel centric. Other pages are dedicated to specific architecture implementations.
However the page needs more architecture references to illustrate the concepts. It would also need to better deal with real-time and embedded constraints. Market1G (talk) 17:39, 6 April 2010 (UTC)[reply]

Latency

>> Latency: The virtual address is available from the MMU some time, perhaps
>> a few cycles, after the physical address is available from the address generator.

Isn't this a mistake? The MMU translates into *physical* addresses. Therefore the *physical* address is available from the MMU some time, perhaps a few cycles, after the *virtual* address is presented to it.

-- agl

That was a mistake, and it's been fixed a while ago. 67.164.0.182 05:22, 4 July 2006 (UTC)[reply]
>> Historically, the first hardware cache used in a computer 
>> system did not cache the contents of main memory but rather 
>> translations between virtual memory addresses and physical 
>> addresses. This cache is known by the awkward acronym 
>> Translation Lookaside Buffer (TLB).

This needs some clarification, as early computer systems did not have virtual memory, though they had instruction caches.

--Stephan Leclercq 08:50, 22 Jul 2004 (UTC)

Yep. There's a whole history to write here, of which I only know a little. I know that early Crays had essentially a one-line cache. I have read that there were two IBM 360 projects developed simultaneously. One was the famous "Stretch", and the other was a simpler machine which had a cache. The simple one was somehow better.

I have read that TLBs predated data caches, but I have not yet tracked down an authoritative source yet. Perhaps I should remove that comment until I do.

Iain McClatchie 20:54, 22 Jul 2004 (UTC)

I know that CDC Cyber (designed by Seymour Crey) had a 8 word instruction cache, that contained the last 8 words executed, and was cleared at every jump instruction that did not fall on the cache. Looks like nothing, but the cache enhanced tight loops by a factor of 6-10...

Hope it helps ... --Stephan Leclercq 22:41, 22 Jul 2004 (UTC)

I would enjoy a history section. One tidbit I enjoyed is that the MC68010 CPU (which I believe found its widest use in the original LaserWriter printer) had an instruction cache big enough for exactly 2 instructions, which was just enough for a big memory-move loop of one MOVE and one DBRA instruction. Tempshill 04:39, 7 Jan 2005 (UTC)

I can say with a fair amount of confidence that TLBs predated caches (unless you call a TLB a cache, of course, which I don't). The first commercial computer with a cache (more or less as we think of it today) was the System/360 Model 85, announced in 1968 and delivered the following year. The 360 Model 67 had no cache, but did have an 8-entry TLB; it was delivered in May of 1966. I believe that at least 2 earlier machines also had TLBs: the Multics hardware, and the Atlas.

The CDC 6600 (1964, predating all the CDC Cyber machines) had an 8-word instruction stack, which could be used to contain a 7 word loop, which might have as many as 27 instructions. The words had to be from consecutive memory locations. The 7600 (1968) had a 12 word stack whose contents did not have to be from consecutive locations.

Lastly, the Stretch vs. System/360 story recounted above doesn't ring true, at least as told. Stretch was delivered to customers before System/360 was much more than a gleam in anyone's eye. Capek 07:20, 10 Jan 2005 (UTC)

In fact it would be worth adding read/pre-fetch and write-back buffers, which are kind of 1-entry caches most often associated with proper caches —Preceding unsigned comment added by Market1G (talkcontribs) 17:43, 6 April 2010 (UTC)[reply]

Inclusion property

We need two expressions for inclusive cache hierarchies because implementations do not necessarily enforce the inclusion property. IIRC x86 implementations generally do not. When contents of L1 are not guaranteed to be backed by L2, L2 snoop misses do not imply L1 misses even though the hierarchy is generally labeled as inclusive. Guaranteeing inclusion however may have adverse effects an associativity: backing two n-way L1 caches by a direct mapped L2 cache (Alpha EV6?) significantly restricts L1 associativity.

Why is that? i.e. Why does it significantly restrict L1 associativity? Isn't that only if the L2 is small? —Preceding unsigned comment added by 71.198.7.54 (talk) 07:18, 27 February 2010 (UTC)[reply]

A.kaiser 09:31, 25 Sep 2004 (UTC)

The K7 and K8 L2/L1 designs obviously are not inclusive, but rather exclusive. My current understanding is that the P3 and P4 designs are inclusive, so that bus snoops check only the L2 tag. Can you point to any evidence to the contrary?

The adverse effects on associativity from the inclusion guarantee is an excellent point and should be added to the page somewhere.

Iain McClatchie 07:01, 26 Sep 2004 (UTC)

Intel Optimization guide on P-M and both P4s: "Levels in the cache hierarchy are not inclusive. The fact that a line is in level i does not imply that it is also in level i+1."

Since the P-M shares much of its microarchitecture with the P3, I expect the P3 to be similar.

A.kaiser 12:34, 26 Sep 2004 (UTC)

That's good evidence. I'll go think about what that means and how to talk about it. Unless you'd like to hack the article, in which case, please go ahead. I might get to it in a week or so if you don't.

It does seem like the right hierarchy isn't exclusive-inclusive, with inclusive broken into really inclusive-not actually inclusive. I think I'm seeing three completely different categories: inclusive, exclusive, and "serial". I'm making up that last name, because I don't know what it's formally called in the literature.

Iain McClatchie 19:51, 27 Sep 2004 (UTC)

It looks like the article still doesn't explain this well. In my experience, you have at least three different kinds of inclusivity possibilities:

  1. inclusive: when a higher-level cache (L1) allocates, you allocate also. When you evict, you also back-invalidate the higher-level cache. No special requirements for when the higher level evicts, or when you allocate of your own volition.
  2. exclusive: when a higher-level cache allocates, you evict (possibly allocating in its spot what the higher level evicted). When you allocate, you back-invalidate the higher-level cache. No special requirements on when the higher level evicts (although you could choose to allocate), or when you evict of your own volition.
  3. pseudo-inclusive (I made that up a few years ago, but don't think it's gotten very widespread use anywhere): this is what is done on the L2 caches that I am most familiar with (Freescale/Motorola): when a higher-level cache allocates, you allocate as well. All other actions do not have strict requirements (L1 evict, L2 alloc, L2 evict). In particular, when you evict, you do not back-invalidate (this is the main difference from true inclusive). You start out as inclusive, but don't maintain inclusivity via back-invalidates. This allows you to make better use of your L2 cache (especially if it has a smaller associativity than your L1s, or has very different set size), at the expense of requiring all snoops to go to the L2 and the L1. It also means that on a non-dirty L1 eviction, you don't need to explicitly cast out to the L2 -- it likely still has a copy from when you allocated.

There are likely many possibilities between full inclusive and full exclusive, but most of Freescale's L2 caches have been what I'm calling pseudo-inclusive.

Actually, now that I think about it, the MPC7400 was a true victim L2: when the L1 evicts, allocate. That's it. The MPC7450/e600 and e500 are pseudo-inclusive: when the L1 allocates, you allocate. That's it.

Now I'm wondering if we could merge the concept of victim with inclusion, and show that victim caches and inclusion properties are special cases of the more general allocation/eviction policy as it concerns two or more levels of caching. That would be more of a sweeping change....

I can try to write up something about this a little more carefully than the above, if y'all think it's a Good Idea. I'd definitely want feedback before I drastically alter the page.

BGrayson 14:50, 27 April 2007 (UTC)[reply]

K8 Caching diagram

The diagram of the K8 cache hiearchy is misleading. While the TLBs are caches, they are not filled from "normal" memory as the icache and dcache are, but are filled by the OS from page tables. Dyl 07:40, Dec 24, 2004 (UTC)

I was under the impression the P5, P6, P4, K7, and K8 all have hardware page table walkers. Is this not correct?

Also, can you be more specific about how the diagrams are misleading? The icache and dcache cache main memory, and the TLBs cache the page tables (which are in main memory). If the TLBs are not filled by a hardware table walker, then I agree there should be some distinction made between the hardware and software fill paths on the diagram. Iain McClatchie 09:09, 25 Dec 2004 (UTC)

I believe all x86 implementations have hardware page table walkers. It's generally a per-instruction-set type of thing rather than per-implementation, since it has software consequences. --CTho 01:24, 23 December 2005 (UTC)[reply]

Can you elaborate? I don't understand. --Patrik Hägglund 07:33, 21 September 2006 (UTC)[reply]
If you don't provide hardware table-walk, then you need some kind of special instructions in your instruction set that enable management of the TLB. However, the two are not orthogonal: the Power Architecture (and I'm sure many others) provides TLB management instructions for those that use software tablewalk (SWTW), even though many implementations have supported hardware tablewalk (HWTW) also. Some OS people like HWTW because it's fast, but others don't like it because it constrains how they manage virtual memory -- they can't provide their own preferred TLB replacement algorithms, or have page table entries with extra information, because those details are constrained by the HWTW algorithm. BGrayson 15:49, 27 April 2007 (UTC)[reply]

I think that the K8 diagram and its text was a very inforamtive example. Thanks! However, I want to know more about for example how the load-store unit is connected to the caches. In AMD's "BIOS and Kernel Developer's Guide", section 10.2.1.2 Miss Address buffers (MABs) and Page Directory cache (PDC) are mentioned. How do they fit into the picture?

How are L1 and L2 caches indexed and tagged? Reding the text about address translation, I assume that L1 caches use virtual indexing, and physical tagging with vhints, and the L2 cache use physical indexing and physical tagging. Is that correct? --Patrik Hägglund 07:33, 21 September 2006 (UTC)[reply]

In general, not all L1 and L2 caches are virtually-indexed -- this article seems too slanted in that direction. For example, Freescale and IBM PowerPC chips have always used physically indexed and physically tagged L1 caches, without sacrificing L1 cache latency. BGrayson

Stalls, rewording

The article does not make mention of a stall, which is what occurs to program execution when a cache miss occurs. That is were the penalty is ultimately felt, because the program executes slower.

Fixed

Also the article needs rewording. I'm a software developer, and still had a great deal of difficulty trying to follow along with the article. I wouldn't think it would be very useful to a lay-person in this state. It contains lots of good information; the sentences are just hard to follow.

Very disappointing to hear. If you can say anything about where you were having trouble following along, it might help me fix the article. Iain McClatchie 01:28, 7 Jan 2005 (UTC)

Better organization?

Just an opinion, but this article might benefit from a reorganization along these lines:

  • Intro
  • Why cache is necessary/important
  • How cache works
  • Design - i think this is the most important reorganization, since the concept of cache design is somewhat convoluted in this article: should mention clearly 1. why a design choice/option exists, and 2. what the problem this design choice solves 3. how this design choice solves the problem
    • How researchers analyze cache performance
    • Areas where current research is focused on, in regards to cache design
  • Implementation (how design concepts are implemented on CPUs) - stuff like address translation should go here, imo
  • History (discusses how cache design has evolved along with CPU development)--Confuzion 02:02, 7 Jan 2005 (UTC)

Organize this AND dumb the wording down please! Not everyone knows all the terminology of processors.


It strikes me that this page is a victim of its own success. Not all of this is specific to the CPU cache and much text duplicates the Cache page. I understand the need for completeness and narrative, but this page could benevolently improve the cache page yet retain narrative and be more specific to what it is. Stuff like address translation would then work well here, as that is quite specific to CPU caches. Notwithstanding of course that address translation has a disambiguation page that does not reference address translation in the context of CPU caches. 64.157.7.133 22:16, 7 May 2007 (UTC)[reply]

Working sets

The phrase "working set" doesn't appear in this article at all, which I think is a fairly major omission. I must sleep now, or else I'd add it right now. A quick Google search shows that most people consider a "working set" to refer to memory pages, but my understanding is that the concept also applies to cache lines. --Doradus 05:26, Jan 7, 2005 (UTC)

I'd like to stay away from adding "working sets" into this article.

Working sets are generally attributed to the use of main memory by processes in a multiprocessing virtual memory system. The set size matters because the operating system can allocate more memory to one process and less to another. There is some similarity to the hit rates versus size that characterize caches, but folks have found hit rate rather than working set to be a more useful concept for hardware caches with fixed sizes. Iain McClatchie 06:13, 4 July 2006 (UTC)[reply]

indexing vs tagging

Virtually indexed and/or tagged caches.

What is the difference between indexing and tagging? 145.97.222.38 14:32, 7 Jan 2005 (UTC)

Incomprehensible

This part is incomprehensible (to me):

Implementation
Because cache reads are the most common operation that take more than a single cycle, the recurrence from a load instruction to an instruction dependent on that load tends to be the most critical path in well-designed processors, so that data on this path wastes the least amount of time waiting for the clock. As a result, the level-1 cache is the most latency sensitive block on the chip.

--145.97.222.38 14:54, 7 Jan 2005 (UTC)

I think the point being made is that most ALU operations complete in one clock cycle and are probably the most common instructions. Beyond ALU operations and JMPs, reading/writing memory is probably the next most common/useful operation. When doing a bunch of memory reads, the CPU will probably fill the cache with what's in RAM at those locations, so most of your memory reads are going to be from cache. So the cache reads are extremely common, but take multiple clock cycles because they may need to transparently read from RAM and fill into cache, and from there it may still take multiple clock cycles just to read data from cache. The more clocks it takes to read from cache, the slower you are going to operate on your data. Since a majority of memory reads are actually from cache, the time required to read from cache has more impact than how quickly cache can fill from RAM. Also, if it takes an extra clock cycle to read from cache, any typical operation on data will require an additional clock cycle. Rmcii 02:37, 5 May 2006 (UTC)[reply]
I wrote this paragraph, and I've just cut out most of it. I was attempting to convey too much insight, and the troubles were many: data caches are often NOT the critical path, due to all sorts of practical difficulties; understanding this paragraph required some understanding of synchronous systems; and finally, it was only really necessary to motivate the following description of the implementation. So I just said that folks try hard to make caches go fast, and left it at that. Iain McClatchie 06:06, 4 July 2006 (UTC)[reply]

This section also confuses me:

The diagram to the right shows two memories. Each location in each memory has a datum (a cache line), which in different designs ranges in size from 8 to 512 bytes. The size of the cache line is usually larger than the size of the usual access, which ranges from 1 to 16 bytes.

"Each location" has a "datum" == "cache line" == "between 8 and 512 bytes in size"? And between the CPU, the cache, the main memory, and all the kinds of things they contain, what exactly does a "usual access" mean? --Piet Delport 11:22, 11 April 2006 (UTC)[reply]

I think the point being made is that most caches allow storage of more than the typical datum size. In x86, memory is accessed 1 byte at a time, and there's support for up to 8 byte (qwords) registers (expanded by MMX/SSE), so you're not likely to find a cache with a size on the order of 8 bytes. I think DDR/DDR2 supports streaming an entire row to the memory controller. If you're going to stream a row, you need a cache large enough to hold it or else there's no benefit from its use. Rmcii 02:37, 5 May 2006 (UTC)[reply]
The "usual access" is the usual access from a CPU instruction. On a 32-bit CPU, this is usually a 32-bit access, but sometimes it's 64 or 128 bits. It is very unusual for a cache request to be larger than that. CPUs may have various bus widths throughout the design which have little relationship to the size of these accesses. I've updated the article, please let me know if it's more understandable. Iain McClatchie 06:06, 4 July 2006 (UTC)[reply]

By the way, I'd just like to point out to anyone frustrated by the article that these two feedback comments were quite valuable to me. Once resolved, I think they will have helped improve the clarity of the article, and I appreciate that. Iain McClatchie 06:09, 4 July 2006 (UTC)[reply]

Request for references

Hi, I am working to encourage implementation of the goals of the Wikipedia:Verifiability policy. Part of that is to make sure articles cite their sources. This is particularly important for featured articles, since they are a prominent part of Wikipedia. The Fact and Reference Check Project has more information. Thank you, and please leave me a message when a few references have been added to the article. - Taxman 19:31, Apr 22, 2005 (UTC)

Is it ok to use references like http://portal.acm.org/citation.cfm?id=224437 which require accounts to access them? --CTho 01:29, 23 December 2005 (UTC)[reply]

Yes, it *is* OK to use inline references that require paid accounts to access. The WP:EL#Sites_requiring_registration guideline clearly states "A site that requires registration or a subscription should not be linked unless the web site itself is the topic of the article or is being used as an inline reference."
If you got information from it, WP:SAYWHEREYOUGOTIT -- it doesn't matter if other people can get it for free or if it requires a paid account to access it. --68.0.124.33 (talk) 02:05, 26 April 2009 (UTC)[reply]
I found the following articles which may benefit the authors of this article:
Whetham, Benjamin. (5/9/00), "Theories about modern cpu cache". Overclockers.com Retrieved: 31st May 2007 From: http://www.overclockers.com/articles139/
The Computer Language Co. Inc., (1999), "Cache". Techweb.com Retrieved: 31st May 2007 From: http://www.techweb.com/encyclopedia/imageFriendly.jhtml?term=cache
Alan Jay Smith. (August, 1987). "Design of cpu cache memories". Retrieved: 31st May 2007 From: http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-87-357.pdf
Jupitermedia. (16/09/04). "Cache". HardwareCentral. Retrieved: 31st May 2007 From: http://systems.webopedia.com/TERM/c/cache.html
PantherProducts. (2006). "Central processing unit cache memory". Retrieved: 31st May 2007 From: http://www.pantherproducts.co.uk/Articles/CPU/CPU%20Cache.shtml
Slowbro 03:54, 31 May 2007 (UTC)[reply]

does address translation really belong here?

It seems to me that much of this section should be moved to the virtual memory article (or removed, if it is redundant) --CTho 01:34, 23 December 2005 (UTC)[reply]

Perhaps the design section of this article is not filled out enough. Address translation fundamentally affects cache design: virtual vs physical tagging/indexing, virtual hints, and virtual aliasing can only be explained in the context of address translation.
As a seperate issue, address translation is performed by TLBs. Many common implementations of TLBs are, in a broad but useful sense, caches of the page tables in memory. I think this is a useful similarity to present.Iain McClatchie 05:36, 4 July 2006 (UTC)[reply]

Clarification for increasing of associativity vs increasing cache size

This sentence is not clear: "The rule of thumb is that doubling the associativity has about the same effect on hit rate as doubling the cache size, from 1-way (direct mapped) to 4-way." Is the associativity doubling from 1-way to 4-way ? Isn't that quadrupling ? Does the same apply for doubling from 4-way to 8-way ? Beside clarification, I think this deserves further explanation, perhaps with examples - e.g. cache sizes and associativies for Athlon vs P6, P4 and Core, etc.

Attempted a fix. Please let me know if it's better/understandable now.Iain McClatchie 05:32, 4 July 2006 (UTC)[reply]

Trace cache history

A discussion of "first proposed" should take into account Alex Peleg and Uri Weiser, "Dynamic flow instruction cache memory organized around trace segments independent of virtual address line," US Patent 5,381,533 (filed in 1994, which continued an application filed in 1992; granted to Intel 1995).--M.smotherman 13:42, 23 June 2006 (UTC)[reply]

Merger

There was an apparent merger with the L1, L2, and L3 caches of a CPU. I would like it if there were sections depicting each or at least a section that explains them.

Pronounciation

Could we have a guide to pronunciation? I have head it pronounced "cash", "catch" and "cashay" before, which one is right? —Preceding unsigned comment added by 81.179.78.222 (talk) 20:35, 8 September 2007 (UTC)[reply]

Main article(Cache) contains transcription of this word, so I think it doesn't necessary to include it in this article. If you are really interested, you can read and listen pronunciation of this word on the en.wiktionary.org, i.e. here. Dan Kruchinin 03:13, 21 October 2007 (UTC)[reply]

Recent edit

This edit http://en.wikipedia.org/w/index.php?title=CPU_cache&curid=849181&diff=169828857&oldid=169769446 seems generally good, but I don't like "the more economically viable solution has been found: ", because von Neumann's original paper proposed a hierarchy of memories. The solution was "found" before the first machine was even built. --CTho 13:26, 7 November 2007 (UTC)[reply]

Yes, my bad, please WP:SOFIXIT next time. --Kubanczyk 15:05, 7 November 2007 (UTC)[reply]

History. 1970 vs 1980

In the history section I pointed that performance gap between processor and memory has been growing since 1980. But in this edit this year was changed to 1970. When I wrote about it I used the "Computer architecture : a quantitative approach" ISBN 1-558-60596-7 by John L Hennessy as a source of information. On the page 289 he says a bit about cache history. There he writes that 1980 year was a start point of processor-memory performance gap growing process. Also the same information can be found here in the "The Processor-Memory performance gap" section. Dan Kruchinin 03:45, 8 November 2007 (UTC)[reply]

Yes, my bad, please WP:SOFIXIT and provide those refs in normal way :)) --Kubanczyk 08:09, 8 November 2007 (UTC)[reply]

Image:Cache,associative-read.png

I find it confusing that the same word (index) denotes both tags in the Tag SRAM and words in the Data SRAM. Index often denotes the part of address used for selecting the whole cache line (Addr[10:6]), which is not the same as the part used for addressing the Data SRAM (Addr[10:2]) as shown in the image.

Usually people draw the index field connected to a decoder which selects the line. The relevant portion of the line is finally extracted by an additional decoder, which is addressed by the offset field of the address.

The detached organization in the image is also fine, but the words "index" in each line seem redundand and confusing.

Perhaps you could attribute Data SRAM entries as word 0, word 1, etc, and the Tag SRAM entries as tag 0, tag 1, etc?

Victim cache section seems wrong

There were some questionable claims in the victim cache section. I've fixed some of them, but it needs some additional work.

I'll try and read up some papers to see and add some information, but that will likely take some time. In the meantime if there are some experts who know this well enough, please contribute.

Pramod 10:28, 27 December 2008 (UTC) —Preceding unsigned comment added by Pramod.s (talkcontribs)



Note: it has been observed that a faulty L2 cache will prevent Windows XP systems from booting unless the cache is manually disabled from BIOS. Doing so however will severely reduce overall system performance.--89.147.67.118 (talk) 14:19, 12 July 2009 (UTC)[reply]

low power cache

As far as I know, the people designing caches generally ignored the amount of energy consumed by the cache until fairly recently. And so it is understandable that, until recently, this article has said nothing about low power cache.

I think this article should say something about current research in CPU caches. In particular, I think this article should say something about research on low power caches.

I attempted to add a couple of sentences about research on low power caches, but they were deleted a minute later. --68.0.124.33 (talk) 06:21, 13 December 2009 (UTC)[reply]

I'm reverting that delete. I hope this doesn't ignite a huge edit war. Feel free to replace my text with a better description of current research in CPU caches. --68.0.124.33 (talk) 03:06, 30 December 2009 (UTC)[reply]

In fact, it would be worth a special page on optimizing CPU consumption! On one the hand fast caches consume a lot of power but on the other hand the memory hierarchy and cache efficiency drastically reduce power for a given CPU throughput. Market1G (talk) 20:03, 6 April 2010 (UTC)[reply]

ways and sets

In the example describing ways and sets, the number of ways and sets is the same. This might lead one to believe that ways and sets are the same things, which I think is wrong. Something needs to be done to clarify the difference between a way and a set. —Preceding unsigned comment added by Skysong263 (talkcontribs) 02:34, 2 January 2010 (UTC)[reply]

Is way prediction same as pseudo associativity?

Is way prediction same as pseudo associativity? --132.68.40.87 (talk) 09:16, 5 February 2010 (UTC)[reply]

Technical sections

I am having a hard time understanding the Structure and Associativity sections.

Structure is overly detailed. I'm skeptical of it's general application to cache architectures. I'm tempted to delete the section.

Associativity launches in to an explanation of how associativity without first explaining what associativity is. I'm not familiar enough with the concept to write an introductory paragraph myself. --Kvng (talk) 19:25, 29 March 2010 (UTC)[reply]

This needs better explanation and introduction you are right but no deleting please.
Also the link for reference [2] is broken as well as the last paragraph of the Associativity section (missing text).

Market1G (talk) 19:50, 6 April 2010 (UTC)[reply]

The section on Associativity was mangled in this edit. — Aluvus t/c 00:28, 7 April 2010 (UTC)[reply]

Details of operation to be clarified

>> If data are written to the cache, they must at some point be written to main memory as well. 
>> The timing of this write is controlled by what is known as the write policy. 
>> In a write-through cache, every write to the cache causes a write to main memory. 
>> Alternatively, in a write-back or copy-back cache, writes are not immediately mirrored to the main memory. 
>> Instead, the cache tracks which locations have been written over (these locations are marked dirty). 
>> The data in these locations are written back to the main memory when that data is evicted from the cache. 
>> For this reason, a miss in a write-back cache may sometimes require two memory accesses to service: 
>> one to first write the dirty location to memory and then another to read the new location from memory.
1. The last bit about 2 memory accesses, a write followed by a read is hard to understand. There are neither clear reasons nor implications. Should this be clarified or deleted? If clarified, it should rather be moved to a specific section dealing with details such as prefetch, by-pass, write buffer...
2. To me this section sounds more like an overview than 'Details of operation', can the title be changed?

Market1G (talk) 19:07, 6 April 2010 (UTC)[reply]