Kent Palmkvist

http://www.isy.liu.se/edu/kurs/TSEA44

Based on slides by Andreas Ehliar



TSEA44: Computer hardware - a system on a chip

2019-11-04

#### What is the course about?

- How to build a complete embedded computer using an FPGA and a few other components. Why?
  - Only one chip
  - The computer can easily be tailored to your needs.
    - Special instructions
    - Accelerators
    - DMA transfer
  - The computer can be simulated
  - A logic analyzer can be added in the FPGA
    - Add performance counters
  - It's fun!



TSEA44: Computer hardware = a system on a chin

2019-11-04 3

# Prerequisites (expected knowledge!)

- Digital logic design. You will design both a data path and a control unit for an accelerator.
- Binary arithmetic. Signed/unsigned numbers.
- VHDL or Verilog. SystemVerilog (SV) is the language used in the course.
- Computer Architecture. It is extremely important to understand how a CPU executes code. You will also design part of a DMA-controller. Bus cycles are central.
- ASM and C programming. Most of the programming is done in C, with a few cases of inline asm.

LINKÖPING UNIVERSITY

TSEA44: Computer hardware – a system on a chip

2019-11-04

### Course organisation

- · Lab 0: learn enough Verilog, 4 hours
  - Individual work and demonstration
- · Lab course: 4 mini projects
  - 1-3 people/labgroup
- Lectures: 8\*2 hours
- Examination 6 credits:
  - 3 written reports/group
  - Oral individual questions

LU LINKÖPING







Lab tasks and examination

2019-11-04

- Lab 0 (individual work and demonstration)
  - Build an UART in SystemVerilog
  - Demonstration

TSEA44: Computer hardware - a system on a chip

- Deadline 14 November
  - Not allowed to join any group before lab0 complete
- Lab 1 (in groups of 2 or 3 students)
  - Interface to the Wishbone bus
  - Demonstration (individual questions)
     Written report

LU LINKÖPING UNIVERSITY

2019-11-04 10

#### Lab tasks and examination, cont.

- Lab 2+3
  - Design a JPEG accelerator + DMA
  - Demonstration (with individual questions)
     Written report
- Lab 4
  - Custom Instruction
  - Demonstration (with individual questions)
     Written report

LIU LINKÖPING

TSEA44: Computer hardware – a system on a chip

2019-11-04 11

#### Written report requirements

- · A readable short report typically consisting of
  - Introduction
  - Design, where you explain with text and diagrams how your design works
  - Results, that you have measured
  - Conclusions
  - Appendix: All Verilog and C code with comments!

LU LINKÖPING UNIVERSITY TSEA44: Computer hardware - a system on a chip

2019-11-04

# Competition - fastest JPEG compression

- An unaccelerated JPEG compression (using jpegfiles) takes roughly 13.0 Mcycles (@ 25MHz) = 2 FPS (Frames Per Second)
- Our record: ~ 100 000 cycles (everything in hardware)
- Goal: Highest framrate. Exception: At over 25 FPS, the smallest implementation wins
- Deadline: 20/12 2019



wunderb.jpg 320 x 240

LINKÖPING UNIVERSITY

TSEA44: Computer hardware - a system on a chip

2019-11-04 13

#### The hardware, lab0

- Zedboard
  - Programming connection at top of the board
  - Serial port at bottom of the board
  - Only use PL part of of the chip



LINKÖPING UNIVERSITY







Processor core: Openrisc 1200

Initially developed within opcores initiative

Split into a new website

Openrisc.io

Complete risc processor including synthesizable code, instructions set simulator etc.

2019-11-04 18

### (System)Verilog

- The course uses SystemVerilog
- SystemVerilog is easy to learn if you know VHDL/C
- Our soft computer (80% downloaded from OpenCores) is written in Verilog
- It is possible to use both languages in a design
- · You need to understand parts of the computer

LIU LINKÖPING

```
2019-11-04 19
TSEA44: Computer hardware - a system on a chip
(System) Verilog vs VHDL
                   An edge-trigged D-flip/flop
                               Ada-like syntax
C-like syntax
                               entity dff is
module dff(
                               port (clk,d : in std_logic;
 input clk, d,
                                     q: out std_logic);
 output reg q);
  always_ff @(posedge clk) architecture firsttry of dff is
   q \ll d;
                               begin
                               process (clk) begin
endmodule
                               if rising_edge(clk) then
                                  q <= d;
                                end if;
                               end process;
                               end firsttry;
LIU LINKÖPING
UNIVERSITY
```











```
Booting
UClinux/OR32
Flat model support (C) 1998,1999 Kenneth Albanowski, D. Jeff Dionne
Calibrating delay loop. ok - 2.08 BogoMTPS
Memory available: 53080k/62325k RAM, 0k/0k ROM (667892k Kernel data, 2182k code)
Swansea University Computer Society NET3.035 for Linux 2.0
NET3: Unix domain sockets 8.13 for Linux NET3.035.
Swansea University Computer Society ToPTy For NET3.034
UClinux version 2.0.38.1pre3 (clles@kotte) (gcc version 3.2.3) #180 Sat Sep 11 0
9:01:55 CSET 2004
Serial driver version 4.13pl with no serial options enabled
ttyS00 at 0x00000000000 (1rq = 2) is a 165504
Ramdisk driver initialized : 16 ramdisks of 2048K size
Blkmem copyright 1998,1999 D. Jeff Dionne
Blkmem of disk images:
loop: registered device at major 7
eth6: Open Ethernet Core Version 1.0
RAMDISK: Romfs filesystem found at block 0
RAMDISK: Nomfs filesyste
```



Lecture info

1 Course Intro, FPGA
2 Verilog (lab0)
3 A soft CPU
4 A soft computer (lab1)
5 HW acceleration (lab2)
6 FPGAs
7 Test benches, SV
8 Custom instructions (lab4)



LIU LINKÖPING



2019-11-04 30

# How we built our first FPGA computer

5. Write top file ("wire wrap in emacs")

```
Size 35 kB in Verilog, 13 kB in SV (Verilog does not have struct)
```

```
module myfirstcomputer(clk,rst,rx,tx)
  input clk,rst,rx;
  output tx;

wishbone Mx[0:1], Sx[0:1];

or1200cpu cpu0(.iwb(Mx[0]), ...);
  wb_conbus wb0(clk, rst, Mx, Sx);
  romram rom0(Sx[1]);
  uart uart0(Sx[0], ...);
end module
```

LINKÖPING UNIVERSITY

TSEA44: Computer hardware – a system on a chip

2019-11-04

# How we built our first FPGA computer

- 6. Download the cross compiler
- 7. Write a small monitor and place in ROM
- 8. ModelSim. Does it boot? Anything on tx?
- 9. Test with the simulator or 32-uclinux-sim
- 10. Synthesize for 10 minutes (originally 40 minutes, note that simulation are quite important in this course)

LINKÖPING UNIVERSITY





















2019-11-04 42

#### Embedded 18 kb Block RAM

- Up to 3 Mb on-chip block RAM
- High internal buffering bandwidth
- · Clocked write and read
- √ 18Kbit block RAM
- ✓ Parity bit locations (parity in/out busses)
- ✓ Data width up to 36 bits
- √ 3 WRITE modes
- ✓ Output latches Set/Reset
- True Dual-Port RAM
- ✓ Independent clock (async.) & control

LIU LINKÖPING

TSEA44: Computer hardware - a system on a chip

2019-11-04 43

# True Dual-Port<sup>™</sup> configurations

- Configurations available on each port:
- Independent port A and B configuration.
  - Support for data width conversion including parity bits (same memory array!)



LIU LINKÖPING UNIVERSITY

2019-11-04 44 TSEA44: Computer hardware = a system on a chin How to use Block RAM: Just Instantiate template 2048x8 512x32 2-port 512x32(+4) RAMB16\_S36\_S36 inmem RAMB16\_S9\_S36 inmem (// port A (// port A .CLKA(wb.clk), .SSRA(wb.rst), ADDRA(bram\_addr),
DIA(bram\_data), DIPA(4'h0),
ENA(bram\_ce), .WEA(bram\_we),
DOA(doa), .DOPA(), // port B // port B .CLKB(wb.clk), .SSRB(wb.rst), .ADDRB({3'h0,rdc}), .DIB(32'h0), .DIPB(4'h0), .ENB(1'b1), .WEB(1'b0), .DOB(dob), .DOPB()); LINKÖPING LINIVERSITY

TSEA44: Computer hardware - a system on a chip

2019-11-04 45

#### Distributed RAM

- Virtex-II LUT can implement
  - 16 x 1-bit synchronous RAM
  - Synchronous write
  - Asynchronous read
    - D flip-flop in the same slice can register the output
- · Allow fast embedded RAM of any width
  - Only limited by the number of slices in each device
  - Example: RAM 16 x 48-bit fits in 48 LUTs







```
TSEA44: Computer hardware - a system on a chip
                                                                              2019-11-04 48
counter
 module dec(
      input clk,rst
      output u);
      reg [3:0] q;
     always_ff @(posedge clk or posedge rst)
if (rst)
  q <= 4'h0;</pre>
        else if (q == 9)
q <= 4'h0;
        else
          q <= q+1;
      always_ff @(posedge clk)
  if (q == 9)
           u <= 1'b1;
        else
          u <= 1'b0;
 endmodule
LINKÖPING
UNIVERSITY
```







2019-11-04 52

#### Hints for lab work

- Remember to think hardware!
  - Draw block diagrams (required!)
  - Each block should be simple to translate to verilog
    - Counters
    - Registers
    - Boolean expressions, arithmetic operations
    - State machines
  - Use testbenches and simulate to verify behavor
  - Finally test on hardware



