Maximizing hardware emulation’s value for networking designs

Source: EDN Network

Lauro Rizzatti -October 24, 2018

There are challenges unique to designing ASICs for networking applications. One is that bandwidth and latency performance tests for these devices require significantly more simulation cycles than required by other types of ICs. Of course the extended simulation slows the entire design process. To address these and other issues, Cisco engineers have adopted the practice of combining simulation with emulation to both improve and accelerate the verification process.

In the past, Cisco would pursue a unique verification regime for each new IC. To save effort and time, the company has worked with its tool vendor, Mentor Graphics, to standardize a methodology that can be applied to multiple designs.

“Migrating towards off the shelf emulation offering has been beneficial, especially the bring up of large chips and systems,” said Afzal Malik, a verification manager in the Core ASIC Group at Cisco Systems. Malik is involved in the emulation of the application specific integrated circuit (ASIC) family for enterprise and campus switching networks, the Catalyst 9000 Series, one of Cisco’s most successful product lines.

Malik’s group uses emulation to target hard-to-find, deep-cycle bugs. Without emulation, these types of bugs end up getting implemented in silicon, where fixing them is exceedingly expensive. Goals set by the group are to detect all bugs via emulation, hitting 100 percent coverage by combining formal verification simulation and hardware emulation before tape out. Meeting those goals will lead to improved time-to-market.

The challenges only start with bandwidth and latency performance tests that make for such long simulation runtimes. Multi-chip interactions in complex systems are difficult to test in simulation due to excessive build and runtimes. Design verification groups spend time developing drivers and monitors as interfaces evolve constantly. Some networking protocols, such as PTP 1588 and Link Pause and Priority Flow Control (PFC), are simulation intensive, and runtimes are long to get to a steady state.

Networking ASICs are typically large designs, which means more time is required for code coverage and functional coverage. Furthermore, verification of the latest networking standards require enhancing testbench components. A final challenge is hardware/software co-verification, which involves running actual software on hardware before tape-out.

Simulation is irreplaceable for block-level verification and basic integration testing, Malik noted, but as design sizes increase, simulation performance deteriorates, especially on systems using multiple ASICs. To conquer the challenges, simulation is not enough.

There are some terms associated with the verification process that ought to be defined here. Back-door initialization and front-door initialization refer to how content gets loaded or extracted from the memory. A front-door flow means the design itself is used to move data in and out of the memory. A back-door flow is a way for a testbench or software to move data in and out of the memory without using the design. Test engineers often want to pre-load memory content, or extract data from memory at the end of a test run or in the middle, and back-door access is often used for this type of memory loading or extraction.

Malik reports that his group uses back-door initialization simulation for more than 90 percent of its testing. Simulation is not the ideal solution for front-door verification. Front-door initialization becomes a requirement when software configures the ASIC and runs production software.

To cope with the challenges, Cisco’s design verification engineers introduced the Veloce2 Emulator from Mentor Graphics for this ASIC design. Emulation runs thousand times faster than simulation, and its runtime performance does not degrade with the increase of design size.

The Veloce-based emulation environment provides full debug visibility unlike an FPGA system. Compile and run steps are similar to simulation, so it is simple to use. For instance, a large variety of verification components, especially scoreboards, checkers, and functional cover points, can be reused in emulation, Malik pointed out.

The group uses simulation for design bring up. Malik affirmed that even multi-unit level verification in simulation is a good start to get the first few packets at the chip level.

In addition to helping find deep-cycle bugs that are otherwise time-consuming to detect, emulation helps with executing real software, running performance tests on a chip, and for system-level verification. Emulation is also handy for line-rate testing, flow control, and internet mix (IMIX) tests. Pause testing, datapath testing, and load balancing are efficiently performed in emulation.

Cisco’s ASIC verification environment is used by both software and hardware teams. The virtual PCIe interface (left) is typically for the software team to boot its OS or the kernel on a control plane communicating with a switching ASIC through a standard PCIe interface. From the software perspective, the software team is operating as if it is working with ASIC. However, the design itself is in the emulator. Source: Cisco

For functional verification, Cisco did a couple of things. It designed a testbench for front-door initialization. It took all its C++/System C test checkers and simulation checks, even real time checks, and ported them over to the emulator. It also uses the Ethernet Packet Generator Monitor (EPGM) from Mentor, as IP for generating Ethernet packets or different varieties of packets.

The steps Cisco goes through to bring up a design include:

  • Choose a model from a model library with the specs of memory model that they select for the tape-out.
  • TCAM, SRAM models must be synthesized to the memory models supported by Veloce.
  • Minimal clocking and PLL changes
  • Identify parts of the design that they do not emulate, for example, design for test (DFT) logic. Some can be tied off leading the compiler to remove them while compiling the design for the emulator.

Testbench issues include:

  • Creating a Veloce-friendly transactor to configure the ASICs
  • Deploying the EPGM for sending and analyzing Ethernet packets
  • Creating end of simulation checks in SystemC and C++
  • Synthesizing functional coverage for the emulator

Main features for design debug involve:

  • EPGM analysis window
  • Trigger transactor to capture the waveform
  • Other custom trigger waveform generation
  • Hardware-implemented assertions and monitors that they can generate (These critical assertions are exceptions fired and can automatically generate waveforms for debug.)
  • Full waveform uploads

Cisco worked with Mentor for several years on the EPGM, a virtual solution for networking ASICs. It supports multicore models and scales performance. It has a TCL-based interface to write complex test cases fairly quickly and prebuilt triggers to capture waveforms. Mutable port group is a recent addition, a super port mode that allows a single build to support multiple port modes as opposed to doing multiple builds for a possible configuration of the chip.

On the debug analysis side, Cisco gets per-stream statistics, such as bandwidth/latency/total frames, and all errors –– out of sequence, CRC and preamble errors –– are captured and reported by EPGM. In addition, the group implemented custom checkers and rate monitors within the ASIC.

Results in terms of speedup may vary somewhat, depending on the size of the ASIC and application. In front-door initialization via simulation took about 6,000 minutes. In emulation, the group got it down to 30 minutes –– tens of thousands of front-door writes carried out on these complex ASICs. With a new optimized flow using the inbound streaming from Mentor, Malik’s group got it down to less than five minutes, just a few minutes of front-door initialization. Using simulation alone, the process typically took days.

With runtime performance for a given configuration, Cisco can process 40 packets per minute in simulation, and in emulation it can process more than 600,000 packets per minute. That’s 15,000× over simulation!

Malik noted that the approach described is not proprietary. “It is our implementation of the capabilities provided by Mentor’s Strato solution,” he said.

Now that Malik and his group are experienced emulation users, they plan to use it for other verification tasks. Pre-silicon software development, multi-chip system verification, silicon readiness, and pre-silicon power analysis are several areas of interest. For instance, during pre-silicon software development, they can boot the control plane OS, run applications on the actual ASIC before tape out. That area is beneficial, especially with the diagnostic and system software team.

Software development is an area Malik’s group wants to invest in as well to justify the effort that it’s putting into emulation. It was important for Malik and the group to develop and validate diagnostic software before tape-out. New verification features using actual system software needed to be validated in hardware before tape-out as well. The diagnostic, kernel and application software teams now can start debugging and come up to speed quickly on the emulation platform.

Multi-chip verification is still another area. Cisco’s systems are complex –– modular systems have supervisor cards and line cards with multiple ASICs talking to each other. These are scalable systems and trying to verify them in simulation is a challenge.

Silicon bring up and readiness is yet another possible application. When the chip comes back, the group does tests and ASIC qualification for silicon validation. Emulation will offer a head start when silicon is back in the lab. Pre-silicon power analysis, an area that Mentor supports, and an area Cisco, a Veloce user, is actively investigating.

Malik already is looking to the future for Cisco’s verification flow that will be unified for regression and coverage analysis. Such a flow requires some changes to the standard functional coverage flow where coverage needs to be synthesized and mapped inside the design. Power analysis is an area being actively investigating, as is advanced trends and analytic capabilities and getting them incorporated into the flow. Of course, incremental improvements are being done in the flow for performance.

To summarize, Malik maintained emulation has helped the Cisco design verification group reach the high level of confidence it needs to tape out its ASICs. Getting software ready for silicon bring-up was a great benefit. Emulation typically helps left-shift the time-to-market.

Emulation is a great complement to the overall verification strategy, concluded Malik. Fast bring up, mature compile and full visibility are key. While there are great technologies being developed, there’s nothing like emulation to provide full visibility and full debug.

Dr. Lauro Rizzatti is a verification consultant and industry expert on hardware emulation.