Talk:Direct memory access
|This is the talk page for discussing improvements to the Direct memory access article.|
|WikiProject Computing / Hardware||(Rated C-class, Mid-importance)|
|This article is the subject of an educational assignment at Department of Electronics and Telecommunication, College of Engineering, Pune, India supported by Wikipedia Ambassadors through the India Education Program during the 2011 Q3 term. Further details are available on the course page.|
- 1 PCI Express Section
- 2 IO Accelerator in Xeon
- 3 "Principle"
- 4 "[...] and skillfully created applications can outperform cache."
- 5 "[...]slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM."
- 6 Dubious "counterexamples"
- 7 Strcpy example
- 8 a bad idea?
- 9 32-bit address bus
- 10 DMA Channels
- 11 History Section?
- 12 What memory types can DMA work on?
- 13 Rewrite needed
- 14 Diagram needed
- 15 Scatter/gather and Vectored I/O the same?
- 16 PCI-part: Modern design: Soutn+North-Bridge?
PCI Express Section
The PCIe section doesn't make much sense. It contains a single sentence, "PCI Express uses DMA. The DMA engine appears as another function on the upstream post with a TYPE 0 configuration header." First of all, obviously "port" is meant, not "post". But PCI Express doesn't use DMA and devices don't need to implement DMA to comply with PCI Express; PCI Express is a protocol which can be used for DMA. DMA engines aren't PCIe functions. Depending on architecture there may be a one-to-one, many-to-one, or one-to-many correspondence between DMA engines and PCIe channels of a device. For example, in an SR-IOV device, there could be many virtual Functions sharing one (or a few) engine(s).
IO Accelerator in Xeon
"in CPU utilization with receiving workloads, and no improvement when transmitting data." The source cited seems to indicate the improvements are more complex than simple CPU utilization measurements indicate. In particular, this seems relevant: "This data shows that I/OAT really benefits from larger application buffer sizes. There is a CPU spike at 2K, although also increased throughput." Which seems to indicate that I/OAT is enabling greater throughput and CPU utilization with buffers <2K. —Preceding unsigned comment added by 184.108.40.206 (talk) 18:14, 30 March 2009 (UTC)
This section was either lifted directly from http://www.avsmedia.com/OnlineHelp/DVDCopy/Appendix/dma.aspx, or visa-versa. —Preceding unsigned comment added by 220.127.116.11 (talk) 11:31, 11 January 2008 (UTC)
- This section existed on Wikipedia verbatim back in 2006  (and probably much earlier as well). According to web.archive.org, that page appeared in May 2007 , and their copyright also states 2007. -- intgr [talk] 22:56, 11 January 2008 (UTC)
I've removed this sentence because it makes no sense to me.
- "DMA transfers are essential to high performance embedded algorithms and skillfully created applications can outperform cache."
How can an application outperform cache? Did you mean an application (implementation) of DMA? If so, perhaps the term "implementation" should be used, because "application" certainly reminds of the concept of a software application. Even then, can DMA outperform cache? Aren't we comparing apples to oranges, or at least aren't we unless the context is made clearer?
LjL 21:59, 28 Apr 2005 (UTC)
Question: Is UDMA related to DMA?
Yes. UDMA is an advanced DMA for hard disks and CD/DVD drives.
- I would rather phrase it like this: UDMA is the name of the capability of non-ancient ATA chipsets to use DMA to transfer data directly to/from system memory. Saying that "UDMA is an advanced DMA" makes it sound as if UDMA is some new, special DMA technique, which it is not. ATA chipsets use normal PCI bus-mastering to do DMA, just like any other PCI component. It's just that ATA chipsets lacking UDMA don't do DMA at all, and have to be bit-banged by the CPU. --Dolda2000 23:01, 31 August 2007 (UTC)
"[...]slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM."
Can someone explain this to me? How is it that the process of accessing I/O devices can be slower than normal system ram? Did the author mean slower than accessing RAM? Exactly what is doing the copying, and from where to where? And which process is slower? Dudboi 12:21, 5 November 2006 (UTC)
- It's slower since the CPU would be occupied chunking out bits and bytes whenever it's directly communicating to a device. When using DMA, the CPU will just send out a DMA command and the device will be able to act on its own. Also, this way the CPU can request data transfers between two devices without fetching the data into its own cache. -- intgr 16:09, 5 November 2006 (UTC)
- Oh, ok I think I get it now, thanks for clearing that up. I think the sentence needs to be reworded though. It's not very clear, especially since it compares the speed of a process (accessing the devices - verb) with that of a hardware component (RAM - noun) —Dudboi 23:48, 5 November 2006 (UTC)
- Just for future reference, I thought I'd make it a bit clearer (hopefully ;). Say that an IDE controller has just fetched a sector from disk. The sector then resides in the IDE controller's buffer, and needs to be copied to system memory. Without using DMA, the CPU would have to have to ask the IDE controller for each individual word (32 bits on PCI), and then copy it to system memory (during the transactions from the IDE controller to the CPU, the CPU would be the bus master). Using DMA, however, the IDE controller would request PCI bus ownership and transfer the sector on its own to system memory, and signal the processor when it's done. It is worth noting that with PCI, the DMA-less process needs not necessarily be slower in terms of bandwidth, since both the transfer of one word from the IDE controller to the CPU having the CPU as bus master and the transfer from the IDE controller to RAM having the IDE controller as bus master would not necessarily have to take more than one PCI clock cycle (15 or 30 ns depending on bus speed). However, using DMA, the CPU is free to do whatever it wishes in the meantime, rather than having to idle while waiting for PCI transactions to complete, increasing parallelism greatly. It is worth noting, of course, that ATA PIO is a lot slower than ATA UDMA, and while I know too little about ATA to speak with authority, I suspect it to be because the ATA PIO protocol simply does not allow sequential reads to fetch successive bytes from the buffer, but uses some other PCI protocol. It could also be because of some ATA protocol peculiarity between the controller and the disk itself of which I know nothing. --Dolda2000 23:15, 31 August 2007 (UTC)
Note that I'm not very knowledgeable about low-level hardware interaction, but I found two out of the three bullets in the "counterexamples" section dubious:
- PC architectures after the ISA lost it ability to use DMA for memory defragmentation or initialization.
Not entirely true – at least AGP/PCIe graphics cards these days come with an IOMMU (GART). Not sure about other kinds of devices.
- DMA commanding in a PC is so expensive and dumb that it is not used for bit blit.
Is it really "so expensive on a PC" or does it just have more overhead considering today's processing power? I don't know about 2D hardware, but mapping textures to surfaces is very trivial in 3D hardware, these days.
- The ATA hard disk interface moved from programmed input/output (PIO) to direct memory access (DMA) only later in history.
This bullet says "later in history" – but later than what? -- intgr 16:09, 5 November 2006 (UTC)
I don't think the strcpy example was a particularly good one about DMA engines, and I don't think mention of DMA engines deserves a place in the lead section, either. strcpy is a particularly problematic function because:
- The length of the string is not known in advance - it's terminated by a null byte. I would be willing to bet that DMA engines do not generally have any logic to search for a terminator, as they are designed for bulk transfers.
- Strings are typically very short and likely to be in the CPU cache anyway whenever a copy is issued.
- The overhead of simply copying bytes is much less compared to (1) making a syscall; (2) sending an I/O request, halting the requesting process, context switching to another process; (3) handling an interrupt and re-scheduling the process.
I can see, however, that DMA engines can be beneficial when copying large buffers, or when building for example, network packets within the kernel. And indeed, such copies are not currently offloaded since today's computers lack such a device. Thus I've created a new section, 'DMA engines' for this. It's still a stub, though; Intel's I/OAT certainly deserves a mention. -- intgr 17:12, 12 November 2006 (UTC)
- I see you added the blurb about I/OAT. I thought I would mention I changed that because I/OAT (code name Crystal Beach) is not implemented in the processors but rather in the chipsets. Since there is already I/O DMA via PCI bus-mastering, I/OAT (as you probably know) is designed for memory-to-memory DMA and as such is best implemented in the memory controller (which is usually in the MCH/north bridge on Intel chipsets that implement I/OAT). It is very nice to have the memory controller create a device that commands can be sent to copy memory blocks about. 18.104.22.168 (talk) 03:27, 25 April 2008 (UTC)
a bad idea?
This seems like a bad idea to me. How would the CPU know when the memory is being written - what if the device is in the middle of updating a couple of KB of data in memory and the CPU reads off the whole range and gets half the new values and half the old values? Why not just have a dedicated component on the CPU for data throughput that shares the CPU clock and makes the appropriate information available to memory protection systems? --frothT C 17:20, 28 November 2006 (UTC)
- The device will send an interrupt when it's done with a DMA request, and only after that will the CPU attempt to read that data or do anything with it. I can't see what exactly you have in mind with the I/O component. If its only job was to mediate between devices an the memory while guaranteeing memory protection, DMA through an IOMMU would essentially achieve the same, except that each bus can have its own IOMMU operating at the native clock rate of the bus, and the data would not even have to congest the CPU or its bus at all (except when explicitly read by the CPU). I think DMA is a brilliant idea. :) -- intgr 18:24, 28 November 2006 (UTC)
32-bit address bus
"A modern x86 CPU may use more than 4 GiB of memory, utilizing PAE, a 64-bit addressing mode. In such case, a device using DMA with 32-bit address bus is unable to address the memory above 4 GiB line."
I don't understand what the phrase "32-bit address bus" means in the context above. Is this referring to the device's own addressing ability, or some bus external to the device, or something else? -- AzzAz (talk) 20:27, 28 February 2008 (UTC)
So DMA channels are ISA-specific, right? In other words they would not be applicable to PCI, since any PCI device can bus-master? Also, what is the "Direct Memory Access Controller" shown with Channel 4 in msinfo32.exe on Windows? -- AzzAz (talk) 20:27, 28 February 2008 (UTC)
- No, DMA is a generic concept. PCI bus-mastering is a type of DMA. It is true PC architecture (but by no means all computers) has ISA DMA controllers. It is also true that with the advent of PCI bust-mastering there is another type of DMA that is sttndard in PC architecture now. 22.214.171.124 (talk) 03:21, 25 April 2008 (UTC)
What memory types can DMA work on?
DMA controllers don't do checksum calculations; if it was that smart, it would be an IO processor, not a simple DMA controller. Need to talk about bus mastering and cycle stealing; in the PCish world, you can either use the DMA controller on board or become bus master and write to memory directly. ( I can kind of see how this would have worked in the old days, but have no idea how modern designs do this.) --Wtshymanski (talk) 02:38, 23 August 2011 (UTC)
We have a diagram to explain cache coherency but not one that explains how DMA works. This could be two panels; first panel shows CPU doing reads/writes to IO device, and data passing through a CPU register to/from memroy. Second panel shows CPU doing something else and DMA controller doing the transfers. Rainy day project for me if I can't find one on Commons. --Wtshymanski (talk) 13:58, 30 March 2012 (UTC)
Scatter/gather and Vectored I/O the same?
I added a wikilink from scatter/gather to vectored I/O. Are these two terms indeed referring to the same thing? — Preceding unsigned comment added by Jimw338 (talk • contribs) 16:03, 8 August 2012 (UTC)
PCI-part: Modern design: Soutn+North-Bridge?
The article is talking about modern architecture and uses there the North and the South-Bridge. In modern architectures is only one "Hub" left. — Preceding unsigned comment added by 126.96.36.199 (talk) 22:20, 22 October 2014 (UTC)