Embedded UVM Testbenches for SoC FPGA Accelerators
The Death of Moore's Law has limited microprocessor frequency to around 4 gigahertz since 2005. As a result System Architects are looking for alternate ways to mitigate the ever increasing demand for real time data processing in Embedded Systems as well as in the Data Center servers.
Two key technogoies are driving the transformation in the world of computing -- Multicore Processors and FPGA based Hardware Accelerators. While Multicore Processors help boost data crunching throughput, Hardware Accelerators are key to low latency systems.
In the Embedded Systems space, an SoC FPGA provides a typical platform for Hardware Accelerators. A Hard Processor System generally consists of a multicore embedded processor. A programmable FPGA Logic Fabric sits right next to the processor cores, making it possible to map critical parts of software onto the FPGA. The FPGA Logic Fabric also make it possible to have very high bandwidth and low latency communication channels between the processor and the FPGA.
How does it impact the Hardware Verification domain? Traditionally, hardware and software are designed and developed in isolation. In an ASIC hardware design scenario, an IP team codes the RTL, and verifies it using SystemVerilog powered UVM testbenches before handing off the IP to the SoC integration team. An essential part of the IP -> SoC handoff is the UVM test suit that is required to be run at the SoC or at a subsystem level to make sure that the IP integration is seamless.
A similar work-flow is undertaken by software teams. A software IP (or library) developer passes on a set of tests to the application development team to make sure that the SW IP works without glitches in the application/system development environment.
FPGA based Hardware Accelerator technology requires a complete re-look at how verification is done today, and how it needs to evolve. Since a hardware accelerator integrates tightly with the processor, a lot more hardware-software coverification is obviously required.
In traditional SoC designs, most of the interaction between software and hardware is limited to hardware configuration registers. With hardware accelerators, this changes completely. Since programmable logic (FPGA) resources now sit right next to the processor, the bottlenecks in software algorithms are identified and programmed directly on the FPGA. This makes the data-flow in a hardware accelerated system intertwined between hardware and software. So while in an ASIC/SoC setting, a hardware IP is driven by another hardware module, in case of hardware accelerators, the data is often driven by some software algorithm.
In case of a traditional ASIC/SoC, a Hardware IP designer delivers an IP to an SoC integrator. On the other hand, a Hardware Acellerator IP is delivered to a sowtware system integrator. In most cases, the software integrator would have no idea how to run the SystemVerilog testcases. Even if he gets around to actually run the simulations, he would be able to run the tests for the hardware IP in isolation -- just repeating the simulation that the IP designer would have already run. A SystemVerilog UVM testbench provides no easy path to test the integration of the IP in a software environment.
This is where Embedded UVM comes to rescue. Embedded UVM testbenches can be run directly on embedded linux running on HPS unit of an SoCFPGA. Note that the hardware accelerator logic mapped on to the FPGA Logic Fabric of the same SoCFPGA. Embedded UVM testbenches thus enable UVM based coverification of Hardware Accelerators at the point of deployment. The same constrained random tests can be run on the Verilog/VHDL simulator, thus providing a debug path for failing testcases.
If you want to explore, you can follow the steps on Cyclone-V de10-nano kit and on Xilinx Zynq Parallella or on Zybo Z7 kit. Another tutorial shows how to run the same test with opensource Icarus Verilog compiler.
Since the Harware design is mapped to the logic fabric of the FPGA, an Embedded UVM testbench runs much faster compared to the traditional UVM based simulation of the RTL design. You can experience the speedup provided by Accelerated UVM yourself for a reference design that we cover in the above set of tutorials.