ANVIL - ANother Verilog Interaction Layer

ANVIL – ANother Verilog Interaction Layer

ANVIL is a mechanism to faciliate some of the Specification-Driven Development (SDD) ideas so they can be applied towards the design and verification of Verilog RTL.

“Speciﬁcation-Driven Development (SDD), which combines the best features of the agile Test-Driven Development (TDD) methodology with the best features of the plan-driven approach of quality-ﬁrst Design-by-Contract (DbC).”

Verilog HDL is well-suited for describing bit-level and cycle accurate behavior of a particular implementation - synthesizable designs. However, Verilog lacks significant semantic (and syntax) capability to express beyond bits. There is no notion of objects, interfaces, complex data structures, operator overloading, etc.; such capabilities being the foundation of object-oriented languages such as C++ and Java.

SystemVerilog does fold many of these notions into a superset (of Verilog) language. It is not clear that this solution: one size fits all --- verification + hardware language really works in practice. There never seems to be the one best language for all needs, so perhaps it is best to just have many languages which satisfy many specific needs.

In order to efficiently utilize SDD ideas, we need much higher level levels of abstraction than bits. In fact, the SDD paper touts the power of Eiffel, since it has DbC built-in; but, since C++ seems more familiar to hardware-types, we'll stick with that one, first. (Note: There is a Ruby-VPI package available which is a Ruby interface to VPI.)

Thus, ANVIL was created to move the verification effort to a pure C++ environment with minimal perturbation to existing Verilog RTL. In fact, using ANVIL, one simply instantiates a top-level RTL module (the device under test: dut), adds a rudimentary collar of regs and wires and a simple set of ANVIL task and function calls to essentially connect the dut, running in a Verilog (VPI-compliant) simulator, with a C++ testbench.

The ANVIL architecture is shown on the right. The current implementation uses shared memory between server (the testbench: tb.cxx) and client (the dut: verilog simulator running rtl.v) processes running on a single host.

The icon is a semaphore to control exclusive access (of shared memory) to either server or client.

Example

The Verilog RTL for a n-bit, synchronous reset, loadable up-counter is given:

module cntrn
    #(parameter N = 8)
    (input clk, input rst, input load,
     input [N-1:0] d, output reg [N-1:0] q, output isZero);

    assign isZero = !|q;

    always @(posedge clk)
    begin
        if (rst)
            q <= 0;
        else if (load)
            q <= d;
        else
            q <= q + 1;
    end
endmodule

Next, a module wrapper (module test1) which instantiates (dut) the counter and connects its inputs to regs and outputs to wires of corresponding names is shown below in green:

module test1
    #(parameter N = 8, TH = 5)();
    parameter time T = 2*TH;
    reg clk;
    reg rst, load;
    reg  [N-1:0] d;
    wire         isZero;
    wire [N-1:0] q;

    cntrn #(N) dut(clk,rst,load,d,q,isZero);

    initial
        clk = #0 0; //need #0 to get onto queue
    always @(clk)
        clk <= #TH ~clk;

    // Used to synchronize back/forth <-> testbench.
    event ev1, ev2;

    initial
    begin: bInitialize
        $anvilYieldSetup(`PORT, "test1", "clk");
        #0 ->ev1; //need #0 to get event onto queue
    end

    always @(ev1)
    begin: bToTestbench
        $anvilYield;
        ->ev2;
    end

    always @(ev2)
    begin: bRunDut
        time t;
        t = $anvilGetNextTime;
        if (0 == t)
            $finish;
        #t ->ev1;
    end
endmodule

The following describe the named blocks in the above (test1) module:

bInitialize

Upon (simulator) initialization, invoke the task anvilYieldSetup with (at least) 2 arguments:

port id to share with testbench process. In this example it would be passed in as a (simulator) command line +define+PORT=... argument. (more on this port stuff later...)
module name which instantiates device-under-test (dut).
Optional list of reg names which should not be driven by external (C++) testbench. Since the system clock is modeled here, the external testbench should not model it.

After the anvilYieldSetup task completes, the simulator passes control to the named block bToTestbench.

bToTestbench

When this block is activated, the anvilYield task is called which:

copies current values of the wires (connected to dut) into a shared memory area (shared between the Verilog process and the external (C++) testbench process).
copies the current simulation time into the shared memory area.
Releases semaphore to allow external (C++) testbench process to assume control.

External (C++) testbench:

reads current wire values (from shared memory)
checks against expected values, etc.
schedules next (new) values onto regs (driving inputs of dut)
specifies how long to run next time step (on Verilog side)
releases semaphore so Verilog-side assumes control.

Upon return from testbench, the next (new) values are scheduled and anvilYield returns which then passes control to the bRunDut block.

bRunDut

When this block is activated, the anvilGetNextTime function returns the next time step duration (scheduled from testbench-side). The dut (Verilog side) then advances time by this amount. (By convention, a 0 advance is used to cease simulation.)

The following is an example of the testbench (C++) side which is ultimately running the simulation:

#include <iostream>
#include <cstdlib>
#include "xyzzy/socket.hxx"
#include "anvil/tbshm.hxx"

using namespace std;
using namespace xyzzy;
using namespace anvil;
using namespace anvil::tb;

int
main(int argc, char *argv[])
{
    static const unsigned T = 10;  //same period as dut/Verilog

    const int cMaxCount = 1 << 8;  //N==8 bit counter
    int port = 3000;               //default port id
    if (1 < argc)
        port = atoi(argv[1]);

    // The mgr is the manager; an interface which provides 
    // methods to interact with shared memory and semaphores.
    TRcIManager mgr;
    try  //use try/catch since socket/network can fail
    {
        mgr = new TShmManager(port);
    }
    catch (const TException &ex)
    {
        ex.print();
        return(EXIT_FAILURE);
    }
    // Until the dut has valid outputs, we will ignore
    // X (unknown) values returned from dut.
    mgr->setExceptionOnX(false);

    cout << "Info: Connected to client on port: " << port << endl;

    // Define a macro (DCL) to instance connections to dut by name.
    // Then, connect to corresponding IOs of the dut.
    // A declaration of the form:
    //    TInput foo(“foo”);
    // creates a local (mutable) variable which can be assigned
    // and read.  The current value is assigned to the dut
    // when the assignAndRunDut method is called, subsequently.
    // A declaration of the form:
    //    TOutput bar(“bar”);
    // creates a local (immutable) variable which can be read.
    // The current value is the dut (output) value at the
    // simulation time when control last transferred from dut
    // to this testbench; i.e., after waitOnDut method returns.
#define DCL(_x) _x(#_x)
    TInput  DCL(rst);
    TInput  DCL(load);
    TInput  DCL(d);
    TOutput DCL(isZero);
    TOutput DCL(q);
#undef DCL

    // Done with (tb) initialization; so assign values
    // immediately.  The second argument 0 is required,
    // but ignored.
    mgr->assignAndRunDut(0, 0); 

    mgr->waitOnDut(); // wait for dut to hit $anvilYield

    // Assign some values to inputs
    rst = true;
    load = false;
    d = 8;
    // Schedule input values at #0 then return to dut/verilog and
    // advance by #T+1.
    mgr->assignAndRunDut(0, T+1);

    mgr->waitOnDut();  // wait for dut to transfer control back
    rst = false;       // de-assert reset value
    mgr->assignAndRunDut(0, T); 
    mgr->setExceptionOnX(true);  // notify on dut X output values

    int iter = 1000;    if (2 < argc)
        iter = atoi(argv[2]);

    bool synced = false;
    TInt32 expect = 0;
    unsigned checkCnt = 0;
    // Loop for iter clock cycles; and start checking outputs
    // after 0 is detected; i.e., a known sync point.
    for (int i = 0; i < iter; i++)
    {
        mgr->waitOnDut();
        if (!synced && 0 == q)
            synced = true;
        cout << "i=" << i
             << ", q=" << q
             << ", isZero=" << isZero << endl;
        if (synced)
        {
            ASSERT_TRUE(expect == q);
            expect = ++expect % cMaxCount;
            checkCnt++;
        }
        mgr->runDut(T);
    }

    mgr->waitOnDut();
    mgr->runDut(0);  // signal dut we're all done!

    cout << "Info: check count=" << checkCnt << endl;
    return(EXIT_SUCCESS);
}

The example above is contained in the download and can be run as:

Download and install cver – a GPL'd Verilog simulator.
Download and untar the ANVIL package.
cd to the directory where ANVIL (top directory anvil) is located and change the variable PLI_INCL in the anvil.mk file to point to the directory where cver pli_incs directory is located:

> cd /your_install_root/anvil
> your_editor anvil.mk

# Change appropriately for your cver install
PLI_INCL := /.../gplcver-2.11a.src/pli_incs

Next, need to find certain g++-related files, and then update/revise common/common.mk.

> find /usr/lib/gcc -name crtbeginS.o

On my 64-bit system this returns:

/usr/lib/gcc/x86_64-redhat-linux/4.1.1/crtbeginS.o
/usr/lib/gcc/x86_64-redhat-linux/4.1.1/32/crtbeginS.o

On a 32-bit system the location may be different.

Edit common.mk to update the value of LIB_CRT with the correct value(s). If you have a 32-bit system just update both values (in the ifeq/else) with the same (or remove the ifeq clause altogether:

> cd /your_install_root/common
> your_editor common.mk

ifeq (${shell uname -p},x86_64)
LIB_CRT := /usr/lib/gcc/x86_64-redhat-linux/4.1.1/32/crtbeginS.o
else
LIB_CRT := /usr/lib/gcc/i386-redhat-linux/4.1.1/crtbeginS.o
endif

All this revision/setup (above) is a one-time deal (unless, of course, you change OSes); and it will be re-used by the examples and subsequent efforts.

Now, on to the examples...
cd to the test1 directory under the anvil release and make a 32-bit version:

> cd tests/test1
> make ndebug32

you should see 2 primary files created:

> ls -1 */32/ndebug/*exe ../../*/32/ndebug/*so

../../linux_86_64/32/ndebug/anvil.so
linux_86_64/32/ndebug/test.exe

NOTE: on my system:

> uname -si

Linux x86_64

so, the subdirectory name on your system may be different than linux_86_64
The anvil.so will be linked into the (cver) simulator during runtime; and, the test.exe is the standalone testbench executable.

Next, we'll start the simulation. First, the testbench starts and opens a server socket listening on (default) port 3000. Next, the client (Verilog, cver) starts and initializes the shared memory area, then connects to the server (through the same port) and passes information about the shared memory id. The server gathers information from the shared memory area, then closes the socket, passes control back to the dut/client process and then the semaphore and VPI mechanism handles the testbench to dut back and forth behavior.

For this example:

> pwd
/.../tests/test1

> (./linux_x86_64/32/ndebug/test.exe 3001 | & tee server.log & ) ;\
sleep 5 ; make -f Makefile.ver ndebug PORT=3001 > & client.log

The above command starts the server, listening on port 3001; waits 5 sec; then starts the client, connecting on the same port 3001; and, runs the simulation.

You should see:

Info: Listening on port: 3001
... sleep here for 5 seconds until client connects ... 
Info: Connected to client on port: 3001 
i=0, q=1, isZero=0
i=1, q=2, isZero=0
i=2, q=3, isZero=0
...
i=998, q=231, isZero=0
i=999, q=232, isZero=0
Info: check count=745
>

The client.log will show:

cver +loadvpi=../../linux_x86_64/32/ndebug/anvil.so:vpi_compat_bootstrap \
    +verbose /home/karl/projects/local/anvil/tests/test1/test.v +define+PORT=3001
GPLCVER_2.11a of 07/05/05 (Linux-elf).
Copyright (c) 1991-2005 Pragmatic C Software Corp.
  All Rights reserved.  Licensed under the GNU General Public License (GPL).
  See the 'COPYING' file for details.  NO WARRANTY provided.
Today is Thu Mar 15 12:11:23 2007.
  Verbose mode is on. 
  Invoked by: "cver".
  +loadvpi= dynamic library ../../linux_x86_64/32/ndebug/anvil.so 
  loaded with bootstrap routine(s) :vpi_compat_bootstrap
  `define of PORT value 3001 added from command option.
  P1364 2001 config map library not specified - using -y/-v libraries.
  Begin Translation:
Compiling source file "/home/karl/projects/local/anvil/tests/test1/test.v"
  Begin pass 2:
  Approximately 394973 bytes storage allocated (excluding udps).
Highest level modules:
test1
  Verbose mode statistics:
  71 source lines read (includes -v/-y library files).
  Design contains 2 module types.

  Variable storage in bytes: 23 for scalars, 48 for non scalars.
  Begin load/optimize:
  Approximately 383232 bytes storage allocated (excluding udps).
  Begin simulation:
  Approximately 460384 bytes storage allocated (excluding udps).


Warning: $anvilYieldSetup: "clk": skipping!
Info: Connect to port 3001 to send "/vuVpiKey16678,392"
Info: Wait for tb to release semaphore ...  Got it! 
Halted at location **/home/karl/projects/local/anvil/tests/test1/test.v(60) 
     time 10021 from call to $finish.
3011 simulation events and 3035 declarative immediate assigns processed.
18055 behavioral statements executed (6023 procedural suspends).
  Times (in sec.):  Translate 0.0, load/optimize 0.1, simulation 2.0.

From the same test1 directory, make the debug anvil.so file and re-run to see more details about the dut interaction with the testbench.

> make debug32
> (./linux_x86_64/32/ndebug/test.exe 3001 | & tee server.log & ) ;\
sleep 5 ; make -f Makefile.ver debug PORT=3001 > & client.log

The client.log contains debug (DBG) messages. These emanate from the ../../src/anvil/anvil.cxx file. This file is the source of the anvil.so (the VPI routines) which the Verilog simulator loads/uses on the client side.

You should view the anvil.cxx file and search for the DBG string for more details.

Example 2

This example tests a floating point unit from opencores.org.

cd to tests/test2 and look at tfpu.v and test.cxx which are the dut and testbench, respectively. The RTL design files for the dut (instance of fpu) are under test2/fpu/verilog.

Note that the tfpu.v file is simply an instance of fpu, a collar of regs and wires and the same code above, in the previous example, to coordinate to/from the testbench.
Next, build the testbench (and the same anvil.so, if out-of-date):

> cd tests/test2
> make ndebug32
you should see 2 primary files created:

> ls -1 */32/ndebug/*exe ../../*/32/ndebug/*so

../../linux_86_64/32/ndebug/anvil.so
linux_86_64/32/ndebug/test.exe

NOTE: on my system:

> uname -si

Linux x86_64

so, the subdirectory name on your system may be different than linux_86_64
As in the previous example, invoke the server (testbench) then the client (dut running in Verilog simulator, cver). Here, we do not specify a port,so it defaults to 3000:

> (./linux_x86_64/32/ndebug/test.exe | & tee server.log & ) ;\
sleep 5 ; make -f Makefile.ver ndebug > & client.log
Here, the output is a little more interesting; it shows the progress of 256 ^ 2 iterations of successive floating point adds. Again, the stimulus and response (functional intent) is expressed in the C++ testbench test.cxx.
Invoking test.exe -help shows that random operations and data can be applied for some specfied number of iterations:

> ./linux_x86_64/32/ndebug/test.exe -help

Usage: ./linux_x86_64/32/ndebug/test.exe fpuOp? iter? port?

fpuOp: 0,1,2,3 for add, sub, mul or div, respectively.
       Or, -1 for random selection of add, sub, mul or div.

iter: 0 for all 2^32 permutations of BOTH a and b operands.
      n>0 for n permutations of BOTH a and b operands.

port: listen to port for dut/verilog client connection.

Thus, we can re-run 256 ^ 2 iterations of random; and we will get more details about the actual Verilog operands by enabling a monitor (see tfpu.v for details on how the MONITOR is used).

> (./linux_x86_64/32/ndebug/test.exe -1 | & tee server.log & ) ;\
sleep 5 ; make -f Makefile.ver ndebug MONITOR=1 > & client.log
See the client.log for details.
To see that the division by zero behavior of fpu out (below in red) does not quite jive the x86 processor (below in green) (at least for my AMD Turion-64; though perhaps due to the rounding mode?):

> (./linux_x86_64/32/ndebug/test.exe 3 10 | & tee server.log & ) ;\
sleep 5 ; make -f Makefile.ver ndebug MONITOR=1 >& client.log

Info: Wait for connection on port: 3000
Info: Connected to client on port: 3000
FAIL: 0/0 = nan(-4194304): ia=0, ib=0, out=-4194303(nan), [inf,snan,qnan,ine,ovr,uvr,zero,divbz]=00100000
1% 2% 3% 4% 5% 6% 7% 8% 9% 10% ...