Lauro Rizzatti, Nasr Ullah (Samsung), Bruce Cheng (Starblaze), and Robert Kaye (ARM) discuss emulation for data storage chips, system IP, and embedded software
Source: EE Times
By Wednesday afternoon, the 54th Design Automation Conference program was starting to wind down and attendees were scattering for parts unknown. It was a different scene at the Mentor booth on the exhibit floor as a panel titled, “The Explosion of Emulation Used Models in Diverse Segments of The Industry,” got under way.
I moderated a discussion about hardware emulation, a topic that was mostly absent from the DAC program. My guests were Nasr Ullah, senior director of performance-power architecture at Samsung, Bruce Cheng, senior staff engineer at Starblaze, the leader in solid state drives (SSDs), and Robert (Rob) Kaye, senior technical specialist with ARM. We discussed emulation for data storage chips, system IP, and embedded software.
What follows is an edited transcript of our talk
Rizzatti: I have been involved with emulation for longer than I can remember and today we invited three panelists from different segments of the industry to talk about emulation. Nasr, Bruce and Rob, please describe the design environment in your company, the type of designs, the complexity of the designs, and a bit about design verification and the challenges you are running into every day.
Ullah: In Samsung Austin R&D Center (SARC), we build the microprocessors and the critical system IP for the next generation of the Galaxy phone series. We design the microprocessors, the memory controllers, and the interconnect network.
Our overall design evolves in three phases. First, we start with the architectural development, the micro architecture from the concept stage. That is followed by the RTL design and design implementation via synthesis, and all the backend stuff. Finally, when the silicon comes back from the foundry, we check that the silicon complies with our specifications.
Those three phases require us to be able to design something, to make sure it’s implemented correctly, and to validate that the silicon meets our needs. To be able to tie together all that is a major verification and emulation challenge, and that’s what we have to deal with at every stage.
Cheng: In Starblaze, we design SSD controller chips. In an SSD controller design, the firmware defines most of the functionality of the SSD controller. And the performance of the overall SSD controller results from the correct interaction between the firmware and hardware. The key problem in design an SSD controller is to fine tune the best firmware on the best hardware.
This creates a fundamental problem in that we must develop firmware and hardware together before tape-out. It is imperative to establish a strong and deep collaboration between the firmware development team and the hardware development team.
Our verification environment includes the universal verification methodology (UVM) and hardware description language (HDL) simulation for block-level verification and for some basic system-level verifications. But to make sure that before tape-out the firmware works correctly on the system hardware, we use emulation.
Kaye: I’m part of ARM’s development solutions group that designs the software tools we use in the development process. As you probably know, ARM develops a wide range of IP, central processing unit (CPU), graphics processing unit (GPU), video, interconnect, memory controllers, etc. We also develop reference platforms and system guidance platforms that we provide to our partners as a reference when they are designing their systems on chips (SoCs).
My involvement with emulation is really from the software development perspective primarily, not from the verification environment perspective. We use emulation as a standalone solution to accelerate software validation. We also use emulation in hybrid environments where we link abstract models to the emulators and partition the design up, so that we can focus the emulation capacity onto specific kinds of tests.
Rizzatti: Emulation has been around for more than 30 years. Initially, it was used in what is called ICE, in-circuit emulation mode. Over the past 10 years, the virtual deployment mode where the test environment is made up by a software model driving the chip inside the emulator became popular. The virtual mode opened the door to several usage modes. In your line of work, which mode do you use and what are the benefits?
Ullah: In the development of our microprocessors at Samsung, we use emulation in three main areas. The first area is performance and that includes five sub areas that I will describe in a moment. The second area is power and the third area is workload characterization.
Regarding performance, my team does all the performance projections. Before we design the chip, we predict what it should do. We use emulation or a combination of emulation and simulation in hybrid mode to verify our projections and ensure they are correct.
Second, one of the big challenges in any design is to ensure that we can architect things right and that we can figure out all the things that have to happen. When things get implemented, they change. We use emulation extensively to confirm that our implementation is meeting the requirements we established when we defined the architecture.
The third subarea has to do with the tight design cycle, typical of smartphone designs since a new phone comes out every year. The problem is that when we have late features that cannot be implemented through the entire process, we have to figure out what late features we can implement. Emulation helps us to quickly verify some late features and decide whether we can proceed with them.
The fourth subarea is this: These chips are designed with a lot of configuration bits that control several aspects of the design functionality. Only by running full-blown applications under Android we can see the true impact of the multitude of configuration bits on the design.
Last but not least, we verify the quality of service. With a smartphone, the user may want to perform multiple concurrent tasks. For instance, he may want to make a phone call and at the same time check a text message and maybe even browse the web. When he is engaged in these multiple tasks, he doesn’t want the phone to die. The quality of service is important. Typically, the key thing in quality of service is the CPU and that gets the highest priority. In a phone device, the user wants some communications to keep on going. To be able to see this quality of service, we must run multiple applications and monitor their interaction. We use emulation to see if we can achieve this and get good quality of services.
Regarding power, in a phone there are three power issues. First, we have to have power constraints. Second, we want to prevent the phone from becoming too hot when we are running an application and avoid burning out pockets. And, third, the battery can only generate a finite amount of current. Hence, we don’t want to consume too much current. However, the microprocessors in today’s smart phones are powerful, and they can draw more current than the battery can possibly generate.
To check current, we run full applications and look for power peaks to make sure we are not exceeding the current provided by the battery. When the maximum current that the battery can provide is exceeded, the phone can potentially shut down. We design over-current protection, and we also look at the total wattage being used so that we don’t exceed the overall envelope.
The last area involving the use of emulation is workload. When compute and graphics intensive applications such as games are running, we want to know what segments take up the most instructions per clock (IPC).
By using an emulator, we can identify those segments.
Cheng: In Starblaze, similarly to Samsung, we use emulation to run real application firmware on the SSD controller design mapped inside the emulator. Emulation allows us to fine tune the firmware on the hardware and achieve the best results. As with Samsung, emulation is used for functional verification, power performance to verify the latency of the design and to assure quality of service.
We do not use emulation in ICE mode. For the SSD controller, ICE mode is not quite applicable.
An SSD controller has two major interfaces, PCI Express (PCIe) and NAND. The PCIe interface connects the controller to a host PC. The host runs at a fast speed, maybe gigahertz (GHz), but the emulation runs significantly slower, maybe one megahertz (MHz). Consequently, we need to implement hacks on the PCIe controller to try to match the speed of the host. Ultimately, we end up changing the register transfer level (RTL) design to migrate from the application specific integrated circuit (ASIC) version of the RTL code to the emulation version of the RTL code. This is not acceptable since any RTL code change introduces a risk of failure in the final system verification.
The other interface is NAND and is affected by something called aging. NAND is not like DDR. When we use NAND and erase a program or data multiple times, NAND blocks wear out over time. If we use ICE mode and plug a daughter board populated with a real NAND array onto the emulator, every NAND is okay the first day. After a while, the NAND wears out and NAND functionality changes when we continue to use the daughter board.
Instead, we use a 100% virtual peripherals and the VirtuaLAB toolkit from Mentor. The PCIe part is virtual. We connect a simulator to the emulator through the direct programming interface (DPI) interface. We replace a physical PCIe PHY in the ASIC version with a virtual PCIe PHY in the emulator, and they are 100% pin compatible.
For the NAND interface, we use 100% synthesizable NAND models. We use fast memory models running on the host, and they allow us to perform fast simulation.
The advantage of this pure virtual solution is that we can change configurations easily. For instance, we can have one or two PCIe and make a dual-port configuration. For NAND, we can use one channel or two channel NAND and the setup can be modified dynamically by changing some parameters.
It’s easy to change the system performance. In ICE mode, we would need a daughter board for the NAND that may run at 400 MHz and we would have to slow that down via a speed bridge to communicate with the emulator running at 1 MHz.
In virtual mode, we deal with soft models and can run them at any frequency. If the NAND runs at 400 MHz, we can also run the system at 400 MHz.
A virtual solution assures us that the ASIC version and the emulation version of the SSD controller are the same. Before tapeout, we run the final version of the firmware in emulation and when the ASIC returns from the foundry, we run the same firmware on the ASIC version.
The 100% virtual solution is good for this kind of application.
Kaye: A lot of the points that Nasr and Bruce have raised are also applicable to ARM particularly within the IP validation teams and IP development teams. I wish to focus on a couple of areas that are a bit different and these would be in the software bring-up.
The software team does its initial development on a virtual prototype of the systems they designed. Then they move key parts of that prototype onto the emulator and run the software at a fully timed level within that environment. They will do the initial debugging, get all the basic bugs out on an abstract model and then move on to a more detailed model for the final proof of the software stack before it’s delivered.
The other area is around software-driven validation. They split the environment between the virtual and emulation worlds, and run C programs on the simulated model of the CPU to debug the drivers and develop validation flows for the blocks and test on the emulator. The key here is knowing how to partition the two environments and how to make sure that the interfaces and synchronization are efficient to take advantage of the higher performance models. That means tracking down their performance by having them waiting for the emulation to take place and being able to tune that to the number of times that the synchronization happens. It’s important to know how often is enough to make sure everything is working correctly together but not so often that it impacts the performance of the simulation. Software developers tend to get impatient when the things run at less than a few hundred megahertz or a couple of gigahertz. Performance is key.
Rizzatti: Could you describe a verification challenge you ran into, how emulation came to rescue, and how emulation was able to solve the challenge?
Ullah: I will give you one example of something we could not have done without emulation or without a hybrid simulation/emulation mode.
Any piece of code running on a microprocessor has a lot of branches. In microprocessors, branch prediction is a big thing and these branch predictions get complex due to history tables, looking at different types of branches, and all that. They require some skill to train which branches are coming as the code is running. We also have prefetchers, which prefetch data will be used later, and we have to train those prefetchers to work in the program. These things have large counters, some of them are 64-bit counters, others can be 128-bit counters.
I will describe one problem we found that we would not have located had it not been for the emulator and Android and Android applications running on it.
We got our design into emulation and, after several days of running Android and Android apps, we found that the performance started to slowly slide down. We thought it must be the emulator because we rebooted it and everything went back to normal. We ignored it for a few months until we noticed that this was a consistent slide. After doing a bit of debug on the emulator, we found that as these 64-bit counters were counting up, we had a small bug in the code where some of the counters did not reset. Instead, they reached the top and stayed there. This happened during several days of running and we would have never caught that with simulation or with anything else but emulation.
What makes this even more interesting is that we found this in emulation but we first saw this problem in a hybrid mode and noticed that something strange was happening. This is a unique type of problem we couldn’t find any other way because it requires running for a long time. We would see it in silicon but by that time, it is too late.
Cheng: In Starblaze, we ran an application in the ASIC for five days and it crashed. It was a creepy situation because we didn’t know what happened. Running for five days in the ASIC is a long time. To replicate the problem in the emulator, we had to modify the application code and reproduce it in the ASIC within few minutes of run-time. Then we ran the code in the emulator.
The emulation and ASIC environments are perfectly the same. This allows us to use the same firmware on the ASIC and on the emulation. If we could make the ASIC runtime shorter than few minutes, we could reproduce the problem in the emulator and then use emulation to trace and debug the problem.
Before tapeout, the firmware engineers can use field programmable gate array (FPGA) prototypes, but there are two problems with that.
The first problem is that we have to scale down the frequencies. We needed to change completely the PCIe controller and use another version of the NAND. This is okay for the initial development of the firmware but when we approached tapeout, we needed to use the exact ASIC version of the firmware. For that, we can only use emulation because emulation is using the virtual solution. We use emulation for final validations and performance of the SSD system before tapeout.
The second issue with an FPGA prototype is that when we find a bug, it’s hard to debug it because there is little visibility inside the FPGA. Sometimes, we have to trace a signal that requires recompilation of the entire design and it may take several hours, 10 hours or longer. In emulation, debug is convenient because we can dump every signal in proximity of the bug.
After tapeout, we use emulation to reproduce firmware bugs or hardware bugs in the real chip because it runs fast and it is the same as the ASIC version.
Kaye: The area I will focus on is running GPU benchmarks.
The hybrid environment allows us to take the stable compute subsystem, including the CPU and intracontroller and move those into the virtual world and leave the GPU on the emulator. This gives us the benefit of running the CPU-centric software quickly and get to the point where we are able to run GPU benchmarks much sooner. It allows us to run more cycles through the benchmarks in the same amount of time if we ever run the entire system on the emulator. Much of the emulation time would be spent simulating bits we already knew about. Now we can find issues with the GPUs by running the benchmarks multiple cycles. By being able to run the benchmark to completion, we found cases we were able to solve and remedy before getting to tapeout of the GPU.
Rizzatti: If you could give a recommendation to the emulation users or even vendors, what would that be?
Ullah: Start emulation early, as early as possible. Traditionally, verification engineers use emulation late in the verification cycle, when all the RTL code has been verified. Don’t do that. Verification engineers need to start early because they can find may more issues and that can help implement a better design.
Cheng: My first recommendation to the emulation vendors is to make the emulators run faster. Compared to the ASIC version of the design, the emulator is not fast enough. Faster execution speed would help us perform more validation.
My second recommendation is to improve debug or add more debug tools. For example, in the VirtuaLAB PCIe solution, a virtual PCIe analyzer helps us debugging because we can see all internal transactions at the top level. We wish there were more debug tools like a PCIe analyzer, such as a NAND or CPU analyzer. More assistance to debug would be helpful for us.
Kaye: My recommendation would be to be able to do context switching between the virtual world and the emulator and run to a point of interest either in the simulator or in the emulator. Then, swap to the other environment by capturing the state and transferring the run control to the other. That would improve the hybrid model by not just having both parts running at the same time but get to a point of interest, take everything in simulation and move to full emulation from there. It would give us the benefit of being able to run quickly to a certain point of interest, capture state and move on. Also, the reverse should be possible. After running some cycles in emulation, be able to switch back to the hybrid world and keep that sort of interaction going. That would be my recommendation.
Rizzatti: Can you predict how emulation will evolve in the future? Will emulation still be here in five years and will it be the same or any different?
Ullah: Emulation absolutely needs to be here five years from today but it will be different. And, I will add to what Rob said about the ability to switch between environments.
We now have incredible power in emulation. The hybrid mode supported by Mentor has a mechanism for tying in a number of different worlds. Before, it was just RTL code and emulation. Now, users can run high-level simulation and mix and match it with emulation. As Rob mentioned, being able to connect the two together and move seamlessly from one world to another, is powerful because users can perform much more testing. Users can do a lot more work than they have ever been able to do before, and do it in a fast mode with less accuracy, or switch over to accurate mode at lower speeds. This capability is extremely powerful. We did not have this capability for the last 30 years and this is where I see emulation evolving — the ability to switch back and forth between great design details and the highest abstraction level.
Cheng: In Starblaze, our needs are somewhat different than what Nasr and Rob described. We are not trying to switch from emulation to simulation at the system level. Our customers use a behavioral model of the chip functionality to run their firmware. We call that behavioral model a SSD simulator that is compatible with our ASIC version. The SSD simulator is much faster than the emulator because it is really an abstract model. It would be helpful to be able to link the SSD simulator to the emulator and switch the software contents between the two. This is one trend we see.
In general, I see a trend in verification with the move to higher levels of abstraction that started three decades ago with the adoption of Verilog, and progressed with SystemVerilog and UVM. Today, we can link Verilog with C models and generate algorithmic stimuli. More to the point of this panel, the speed of emulation supports system-level stimulus. For example, emulation allows for applying real traffic via ICE or for executing software — or firmware as in the case of an SSD — on the OS in a CPU mapped inside the emulator or via QEMU in hybrid mode.
All are important in the context of SoC verification, and we will see more abstraction at the system level in the future.
Kaye: Looking at it from the software perspective, I see the future use of emulation in the continuous integration of software with hardware and software regression testing.
Over the last few years, embedded software expanded exponentially. For example, Android exploded in the number of instructions by 10X over a period of four years, from two-billion instructions to about 20-billion instructions. The emulator is a great compute engine for running lots and lots of batch tests and for continuously testing the software both in terms of functionality and performance. I see much more use of emulation in software development.
Traditionally, emulation has been a resource that software developers have often been pushed away from by hardware teams. But as capacity grows and the availability of platforms grows, I see much more use of emulation within the software development community.
As Nasr, Bruce and Rob described so well, emulation is a versatile verification tool. It has various use models today and each predicted emulation will continue to be an important part of the verification flow.
Dr. Lauro Rizzatti is a verification consultant and industry expert on hardware emulation (www.rizzatti.com). Previously, Dr. Rizzatti held positions in management, product marketing, technical marketing, and engineering. He can be reached at firstname.lastname@example.org.