Detailed interpretation of AHB and DMA

Infineon / Mitsubishi / Fuji / Semikron / Eupec / IXYS

Detailed interpretation of AHB and DMA

Posted Date: 2024-01-19

AHB (Advanced High Performance Bus)

As deep submicron process technology becomes increasingly mature, the scale of integrated circuit chips is getting larger and larger. Digital ICs have evolved from timing-driven design methods to design methods based on IP core reuse, and have been widely used in SOC design. In the design of SoC (abbreviation for System on Chip, also called system-on-chip) based on IP core reuse, on-chip bus design is the most critical issue. To this end, many on-chip bus standards have emerged in the industry. Among them, the AMBA on-chip bus launched by ARM has been favored by the majority of IP developers and SoC system integrators, and has become a popular industrial standard on-chip structure. The AMBA specification mainly includes the AHB (Advanced High Performance Bus) system bus and the APB (Advanced Peripheral Bus) peripheral bus.


AHB=Advanced High Performance Bus, advanced high performance bus. Like USB (Universal Serial Bus), it is also a bus interface.
AHB is mainly used for connections between high-performance modules (such as CPU, DMA and DSP, etc.). As the on-chip system bus of SoC, it includes the following features: single clock edge operation; non-three-state implementation; support for burst transmission ;Supports segmented transmission; supports multiple host controllers; configurable 32-bit to 128-bit bus width; supports byte, half-word and word transmission. The AHB system consists of three parts: the master module, the slave module and the infrastructure. All transmissions on the AHB bus are sent by the master module, and the slave module is responsible for responding. The basic structure consists of an arbiter, a multiplexer from the master module to the slave module, a multiplexer from the slave module to the master module, a decoder, a virtual slave module (dummy Slave), and a virtual master module (dummy). Composed of Master). A new solution is proposed for the IP reuse problem in SoC design. The traditional method is to standardize the non-standard interface of a specific functional module into an AHB master/slave device interface. This article proposes a new ARM-based Soc general platform design register bus standard interface. This design makes the structure of the entire system clear, enhances the versatility of the system and the portability of the functional modules in the system.


The AMBA 2.0 specification consists of four parts: AHB, ASB, APB and Test Methodology. The interconnection of AHB adopts the traditional shared bus with master module and slave module, and the interface and interconnection functions are separated, which is of great significance to the interconnection between modules on the chip. AMBA is not only a bus, but also an interconnection system with interface modules.


APB is mainly used for connections between low-bandwidth peripherals, such as UART, 1284, etc. Its bus architecture does not support multiple main modules like AHB. The only main module in APB is the APB bridge. Its features include: two clock cycle transmission; no need to wait for cycles and response signals; simple control logic with only four control signals.
1) The system is initialized to the IDLE state. At this time, there is no transmission operation and no slave module is selected. 2) When there is a transmission to be carried out, PSELx=1, PENABLE=0, the system enters the SETUP state and will only stay in the SETUP state for one cycle. When the next rising edge of PCLK arrives, the system enters the ENABLE state.
3) When the system enters the ENABLE state, keep the PADDR, PSEL, and PWRITE in the SETUP state unchanged, and set PENABLE to 1. The transmission will only remain in the ENABLE state for one cycle and will be completed after passing through the SETUP and ENABLE states. After that, if there is no transmission to be carried out, it will enter the IDLE state and wait; if there is continuous transmission, it will enter the SETUP state.


Most modules (including processors) hanging on the bus are only functional modules with a single attribute: master module or slave module. The master module is a module that issues read and write operations to the slave module, such as CPU, DSP, etc.; the slave module is a module that accepts commands and responds, such as on-chip RAM, AHB/APB bridge, etc. In addition, there are some modules that have both attributes at the same time. For example, direct memory access (DMA) is a slave module when being programmed, but it must be a master module when the system reads and transmits data. If there are multiple master modules on the bus, an arbiter is needed to decide how to control the access of various master modules to the bus. Although the arbitration specification is part of the AMBA bus specification, the specific algorithm used is determined by the RTL design engineer. The two most commonly used algorithms are the fixed priority algorithm and the round-robin algorithm. There can be up to 16 master modules and any number of slave modules on the AHB bus. If the number of master modules is greater than 16, an additional layer of structure is required (for details, please refer to the Multi-layer AHB specification introduced by ARM). The APB bridge is both the only master module on the APB bus and the slave module on the AHB system bus. Its main function is to latch the address, data and control signals from the AHB system bus, and provide secondary decoding to generate the selection signal of the APB peripheral device, thereby realizing the conversion of the AHB protocol to the APB protocol.


Direct Memory Access (DMA, Direct Memory Access) is a function provided by some computer bus architectures that allows data to be sent directly from an attached device (such as a disk drive) to the memory of the computer motherboard.
Typically a portion of memory is designated for direct memory access. In the ISA bus standard, up to 16 megabytes of memory can be used for DMA. The EISA and Microchannel architecture standards allow access to the full set of memory addresses (assuming they can be addressed with 32 bits). The peripheral interconnect accomplishes direct memory access through the use of a bus master. Another option for direct memory access is the programmable input-output (PIO) interface. In the program-controlled input and output interface, all data transmission between devices must pass through the processor. The new protocol for the ATA/IDE interface is Ultra DMA, which provides burst data transfer rates up to 33 megabytes per second. Hard drives with Ultra DMA/33 also support PIO modes 1, 3, 4 and multiword DMA mode 2 (16.6 megabytes per second).

Data transmission between peripherals and memory and between memory and memory usually uses program interrupt mode, program query mode and DMA control mode. Both program interrupt mode and program query mode require the CPU to issue input/output (In/Out, I/O) instructions, and then wait for the I/O device to complete the operation before returning. During this period, the CPU needs to wait for the I/O device to complete the operation. When DMA transmits data from memory and I/O devices, it does not require the CPU to control the data transmission. It directly uses the DMA controller (direct memory access controller, DMAC) to complete the high-speed data transfer between peripherals and memory and between memory and memory. transmission. (3) DMA transfer principle A complete DMA transfer includes 4 steps: DMA request, DMA response, DMA transfer and DMA end. The principle of DMA transmission is shown in Figure 1. The I/O device in the figure is the source device, and the I/O device transmits data to the destination device (memory). The basic transmission process of DMA is as follows: ①CPU initializes the bus controller , formulate the working memory space, read the register information in the DMAC, and understand the transmission status of the DMAC (1); ② The I/O device sends a DMA request (DMA request, DREQ) to the DMAC. After receiving this signal, the DMAC sends a bus to the CPU Hold signal (HOLD); ③CPU sends a bus response signal to hold acknowledgment (hold acknowledgment, HLDA) after the execution of the current bus cycle; ④After receiving the bus authorization, DMAC sends a DMA response signal DMA acknowledgment (DMA acknowledgment, DACK) to the I/O device , indicating that the I/O device is allowed to perform DMA transmission; ⑤ When starting the transmission, the DMAC first reads the data from the source address and stores it in the internal cache, and then writes the destination address to complete the transmission of bus data from the source address to the destination address (1 ); ⑥ After the DMA transfer is completed, the DMAC sends an end signal to the CPU and releases the bus, allowing the CPU to regain control of the bus. A DMA transfer only needs to execute one DMA cycle, which is equivalent to a bus read/write cycle, so it can meet the needs of high-speed transmission of peripheral data.

DMA is an important feature of all modern computers, allowing hardware devices of different speeds to communicate without relying on a large interrupt load on the central processor. Otherwise, the CPU needs to copy each fragment's data from the source to the registers, and then write them back again to the new location. During this time, the CPU is unavailable for other work. DMA transfers are often used to copy a memory area from one device to another. When the CPU initializes the transfer action, the transfer action itself is executed and completed by the DMA controller. A typical example is moving a block of external memory to faster memory inside the chip. Operations like this do not stall the processor's work, allowing it to be rescheduled to handle other work. DMA transfers are important for high-performance embedded system algorithms and networks. For example, the ISADMA controller of a personal computer has 8 DMA channels, 7 of which can be utilized by the computer's central processor. Each DMA channel has a 16-bit address register and a 16-bit count register. To initiate a data transfer, the device driver sets the DMA channel's address and count registers together, as well as the direction of the data transfer, read or write. Then instruct the DMA hardware to start this transfer action. When the transmission is completed, the device will notify the central processor in the form of an interrupt. Scatter-gather DMA allows data to be transferred to multiple memory areas in a single DMA transaction. It is equivalent to stringing together multiple simple DMA requests. Again, the purpose of this is to relieve the CPU of multiple I/O interrupts and data copying tasks. DRQ means DMA request; DACK means DMA confirmation. These symbols can usually be seen on the hardware outline of computer systems with DMA capabilities. They represent the electronic signal transmission lines between the CPU and the DMA controller.

DMA can cause cache coherency issues. Imagine that the central processor has a cache and external memory. The operation of DMA is to access the external memory. When the central processor accesses an address in the external memory, it temporarily writes the new value into the cache, but does not Update the data in the external memory. If DMA occurs before the data in the cache is updated to the external memory, the DMA process will read unupdated data. Similarly, if an external device writes a new value to the external memory, the CPU will access data that has not yet been updated when accessing the cache. These problems can be solved in two ways:

1. Cache-coherent system: It is completed by hardware method. When the external device writes to the memory, a signal is used to notify the cache controller that the value of a certain memory address has expired or the data should be updated.
2. Non-coherent system: This is accomplished through software methods. The operating system must confirm that the DMA program has started or prohibit DMA from occurring when the cache is read. The second method will cause system burden on DMA.

In addition to being related to hardware interactions, DMA can also relieve expensive memory consumption. Such as large copy operations or scatter-gather operations, from the CPU to a dedicated DMA engine. Intel's high-end servers include this engine, which is called I/O acceleration technology.

In the field of computer computing, remote direct memory access (RDMA) is a direct memory access technology that transfers data directly from the memory of one computer to another computer without the intervention of both operating systems. . This allows for high-throughput, low-latency network communication, especially suitable for use in massively parallel computer clusters.

RDMA supports zero-copy network transfers by enabling the network adapter to transfer data directly between application memory, eliminating the need to copy data between application memory and operating system buffers. This transfer does not require the involvement of a central processing unit, CPU cache, or context switching, and the transfer can occur in parallel with other system operations. When an application performs an RDMA read or write request, application data is transferred directly to the network, reducing latency and enabling fast message delivery.

However, this strategy also exhibits several issues related to the target node not receiving notification of request completion (one-way communication).

#Detailed #interpretation #AHB #DMA