Talking about the evolution of memory chips
Semiengineering recently held discussions with industry experts on the path forward for memory in increasingly heterogeneous systems. Below are excerpts from the conversation.
As we grapple with AI/ML and power demands, what configurations need to be rethought? Will we see a shift to von Neumann architectures?
Steven Woo, Distinguished Inventor of Rambus:In terms of system architecture, the industry is divided. Our dominant legacy applications running on x86-based cloud servers are not going away. Software that has been built and developed over decades will rely on this architecture to function well. In contrast, AI/ML is a new category. People rethought the architecture and built very domain-specific processors. We see that about two-thirds of the energy is spent moving data between the processor and the HBM device, while only about one-third of the energy is spent actually accessing the bits in the DRAM core. Data movement is now more challenging and much more expensive. We will not erase memories. We need it because data sets are getting bigger. So the question is, 'What's the right path forward?' There's a lot of talk about stacking. If we put the memory directly on the processor, it does two things for you. First, today's bandwidth is by the shore Or the girth limit of the chip. That's where the I/O goes. But if you stack it directly on the processor, now you can utilize the entire area of the chip for distributed interconnect, and you can get more in the memory itself of bandwidth, it can provide information directly to the processor. The link becomes shorter, and the power efficiency may increase by 5 to 6 times. Secondly, the amount of bandwidth you can obtain also increases because more area of the array is interconnected with the memory several integer multiples. Doing both of these things at the same time provides more bandwidth and makes it more energy efficient. The industry evolves to meet various needs, and this is definitely one way we see storage systems starting to evolve in the future , it will become more energy efficient and provide more bandwidth.
Frank Ferro, Director of Cadence Product Management Group:When I started working on HBM in 2016, some more advanced customers asked if it could be stacked. They've been looking at how to stack DRAM on top because there are clear advantages. From a physical layer perspective, the PHY is basically negligible, which saves a lot of power and efficiency. But now you have a 100w processor with a memory on it. Memory cannot withstand the heat. This is probably the weakest link in the thermal chain, which presents another challenge. There are benefits, but they still need to figure out how to deal with thermals. There is now more incentive to push this type of architecture forward because it does save you overall costs in terms of performance and power, and it will increase your computational efficiency. But there are also some physical design challenges that need to be addressed. As Steve said, we're seeing all sorts of architectures emerging. I completely agree that GPU/CPU architectures are not going away, they will still dominate. Meanwhile, every company on the planet is working hard to develop better mousetraps for their artificial intelligence. We see a combination of on-chip SRAM and high-bandwidth memory. LPDDR has been receiving much attention recently on how to utilize LPDDR in data centers. We even see GDDR being used in some artificial intelligence inference applications, as well as in all older memory systems. They are now trying to squeeze as much ddr5 into one space. I've looked at every architecture you can think of, whether it's DDR, HBM, GDDR or whatever. It depends on your processor core, what your overall value-add is, and then how you break out of your specific architecture. memory system, so you can carve your CPU and your memory architecture, depending on what's available.
Jongsin Yun, Siemens EDA memory technology expert:Another issue is non-volatility. For example, if the AI has to deal with power intervals between running IoT-based AI, then we need a lot of power switches, and all the information used for AI training has to be rotated again and again. If we had some kind of solution where we could store these weights into the chip so that we didn't always have to move them back and forth for the same weight, it would save a lot of power, especially for IoT-based AI. There will be another solution to meet these power needs.
Frank Schirrmeister, Vice President of Solutions and Business Development, Arteris:What I find interesting from an NoC perspective is that you have to optimize those paths, from the processor to the NoC, through the controller to access the memory interface, possibly through UCIe from one chip to another, and then you have the memory in the chip. This is not to say that the von Neumann architecture is dead. But now there are a lot of variations, depending on the workload you want to compute. They need to be considered in the context of memory, of which memory is only one aspect. Where do you get the location of the data, how is it arranged in this DRAM? We are looking at all these things, like performance analysis of the memory, and then optimizing the system architecture on top of that. It inspired many innovations in new architectures that I had never thought of when I studied von Neumann in college. On the other end, you have things like grids. There are more architectures to consider now, and that's driven by memory bandwidth, compute power, etc., and it's not growing at the same rate.
Randy White, Program Manager, Memory Solutions, Keysight Technologies:There is a trend involving disaggregated computing or distributed computing, which means architects need more tools. The memory hierarchy has been expanded. Includes semantics, as well as CXL and different hybrid memories, available for flash and DRAM. A parallel application to the data center is the automobile. Cars always have this kind of sensor calculations and ecu (electronic control unit). I'm fascinated by how it evolved into the data center. Fast forward, today we have distributed computing nodes called domain controllers. It's the same. The problem it's trying to solve is that maybe power isn't a big issue because computers aren't that big in scale, but latency is definitely a big issue in cars. ADAS requires ultra-high bandwidth, and you need to make different trade-offs. Then you have more mechanical sensors, but there are similar limitations in the data center. You have cold storage that doesn't require low latency, and then you have other high-bandwidth applications. It's interesting to see how much the tools and options for architects have changed. The industry has responded well and we all offer a variety of solutions to meet the needs of the market.
How did memory design tools evolve?
Schirrmeister:When I started making my first chips in the 90s, the most commonly used system tool was Excel. Since then, I've been hoping that it would break at some point because of the things we do at the system level, memory, bandwidth analysis, etc. This has a huge impact on my team. At the time, this was very advanced stuff. But Randy's point is that something complex now needs to be simulated at a level of fidelity that wasn't possible before without computers. As an example, assume that a certain delay in DRAM access can lead to poor architectural decisions and possibly incorrectly designed data transfer architecture on the chip. The same is true on the other side. If you always assume the worst case scenario, you will over-engineer your architecture. It's a fascinating environment to have tools perform DRAM and performance analysis and provide appropriate models for the controller so architects can simulate all of this. Since the 90s, I've hoped that Excel might break down as a system-level tool at some point, because there are certain dynamic effects that you can't do with Excel anymore because you need to simulate them - especially if you throw in a When it comes to dead-on interfaces with PHY features, then link layer features, such as checking whether all are correct and possibly re-sending the data. Failure to perform these simulations will result in suboptimal architectures.
Ferro:Our first step in doing most evaluations is to give them a memory test platform and start looking at the efficiency of the DRAM. This is a huge improvement, even for something as simple as running a local tool to do a DRAM simulation, to doing a comprehensive simulation. We are seeing more and more customers requesting this kind of simulation. In any evaluation, ensuring that your DRAM efficiency is above 90% is a very important first step.
Woo:Part of the reason you're seeing the rise of full-system emulation tools is that DRAMs are becoming more complex. For some complex workloads, it is now difficult to do it using simple tools like Excel. If you look at DRAM datasheets from the 90s, those datasheets were about 40 pages long. It's now several hundred pages long. This just illustrates the complexity of equipment required to achieve high bandwidth. Plus memory is a driver of system cost, along with bandwidth and latency associated with processor performance. It's also a big power driver, so you now need to simulate at a more detailed level. In terms of tool flow, system architects understand that memory is a huge driver. Therefore, these tools need to be more sophisticated, and they need to interface well with other tools so that system architects can get the best global view of what is going on—especially how memory affects the system.
Yun:As we enter the age of artificial intelligence, a lot of multi-core systems are used, but we don't know what data goes where. It's also more parallel to the chip. The size of the memory is much larger. If we use chatgpt type AI, then the data processing of the model requires about 350MB of data, which is a huge amount of data for one weight, and the actual input/output is much larger. The increase in the amount of data required means there are many probabilistic effects that we have never seen before. It's a challenging test to see all the bugs related to such a large amount of memory. ECC is everywhere, even in SRAM, which traditionally didn't use ECC but is now common in the largest systems. Testing all of these is very challenging and requires an EDA solution to support testing all of these different conditions.
What challenges do engineering teams face in their daily work?
White:On any given day, you can find me in the lab. I rolled up my sleeves and got my hands dirty, poking at wires and soldering and stuff. I think a lot about post-silicon validation. We talked about early simulation and on-chip tools - BiST, and things like that. At the end of the day, before we launch, we want to do some system validation or device-level testing. We discussed how to overcome memory walls. We co-locate memory, HBM, etc. If we look at the evolution of packaging technology, we start with lead-containing packaging. They are not very good for signal integrity. A few decades later, we moved on to things like ball grid arrays (BGAs) that optimize signal integrity. We don't have access to it, which means you can't test it. So we came up with this concept called a device interposer - a BGA interposer - which allowed us to clamp in a special fixture and route the signal out. We can then connect it to the test device. Fast forward to today and now we have HBM and little babies. How do I clamp the silicon interlayer between my clamps? We can't, and that's where we struggle. This is a challenge that keeps me up at night. How do we perform failure analysis in the field when the OEM or system customer can't get 90% efficiency? There are more bugs in the links, they don't initialize properly and the training doesn't work properly. Is it a system integrity issue?
Schirrmeister:Wouldn't you rather do this at home with a virtual interface instead of walking to the lab? Wouldn't the answer be that you build more analysis into the chip? With chiplets, we further integrate everything. Getting your soldering iron out there isn't really an option, so there needs to be a way to do on-chip analysis. NoC has the same problem. People view the NoC, you send the data, and then it's gone. We need to put analytics out there so people can debug, and expand to the manufacturing level so you can work from home and do everything based on chip analytics.
White:Especially in the case of high-bandwidth memory, you can't physically get in there. When we license PHY, we also have a product that goes with it so you can monitor every one of those 1024 bits. You can read and write DRAM starting from this tool, so you don't have to go into the tool yourself. I like the idea of a middle man. During testing we did remove some pins from the middle pins, which is not possible in the system. Getting into these 3D systems is really a challenge. Even from a design tool process perspective, it seems like most companies have their own processes on these 2.5D tools. We started to build a 2.5D system in a more standardized way, from signal integrity, power, to the entire process.
White:As things evolve, I hope we can still maintain the same level of accuracy. I'm on the UCIe Form Factor Compliance Team. I'm working on how to describe a known good die, a golden die. Ultimately, this will take more time, but we'll find a happy middle ground between the performance and accuracy we need for testing, and the built-in flexibility.
Schirrmeister:If I'm working on chiplets and their applications in a more open production environment, testing is one of the bigger challenges in getting it to work properly. If I were a large company and I controlled all aspects of it, I could constrain things appropriately so that testing etc. became feasible. If I want to use the slogan of UCIe, UCI and PCI are only one letter apart, and I imagine that the assembly of UCIe in the future, from a manufacturing perspective, will be like the PCI slots on PCs today, then the testing aspect will be really challenging sex. We need to find solutions. There is a lot of work to be done.
Review Editor: Huang Fei
#Talking #evolution #memory #chips
- Infineon reorganizes its sales and marketing organization to further enhance customer-centric services and leading application support
- Reducing halide segregation in wide-bandgap mixed-halide perovskite solar cells using redox mediators
- Advantages and applications of Aigtek power amplifiers
- Microchip further expands its mSiC solutions with the introduction of 3.3 kV XIFM plug-and-play mSiC gate drivers,
- What is NAND type Flash memory?
- Data leaks can sink machine learning models
- What does the AC voltage regulator indicate when it is overvoltage?
- Adjustable voltage stabilized power supply circuit diagram based on LM317
- Demystifying Qualcomm Domain Controller Level 1 Power Design: Power Supply Design and Computation
- EU consumers challenge Meta paid service as privacy ‘smokescreen’