Kent Palmkvist



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# Agenda

- Practical issues
- Hardware description
  - FPGA
- HDL based design



2024-09-19

### TSTE12 Deadlines Y,D,ED

- Final version of design sketch and project plan this week
  - Show implementation ideas, show sequence of implementation and task partitioning between group members
- Weekly meetings should start
  - Internal weekly meeting with transcript sent to supervisor
- Lab 2 deadline tuesday next week (24/9)



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

1

### TSTE12 Deadlines MELE, erasmus

- Final requirement specification this week
- First version of design specification and projekt plan tuesday next week (24/9)
- Wednesday 25 September 21.00: Lab 2 soft deadline
  - Lab 2 results will be checked after project completed



2024-09-19

### Handin (homework), Individual!

- 1st handin deadline Monday 23 September 23:30
- Use only plan text editor (emacs, vi, modelsim or similar) for code entry.
- Solve tasks INDIVIDUALLY
- Submit answers using Lisam assignment function
  - 4 different submissions for code, one for each code task
  - 1 submission for all theory question answers
- Use a special terminal window when working with handins

module load TSTE12; TSTE12handin



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

6

### Control units

- · Hard wired
  - Moore (output only dependent on state)
  - Mealy (output dependent on state and input)
  - Fast
  - Custom designed
- Microcoded
  - Cheap
  - Standardized (easy to reuse)



2024-09-19

### Microcoded control unit

- General structure
  - AG = Adress generator
  - MAR = Memory Adress Register
  - MIR = Memory Instruction register





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

9 8

### Microcoded control unit

- Advantages
  - Easy to create a generic design
  - Only ROM contents needs to be replaced
  - Easy to change existing design
  - Short design time (low design cost)
  - May use compiler to create ROM contents
- Drawbacks
  - Slower in many cases (ROM must be read)
    - · Only Moore type of controllers
  - Small controllers are more expensive due to extra register and ROM
  - Must be designed for worst case regarding required features



2024-09-19

9

### Microcoded control unit, example

- Controller for an extremly small RISC processor
  - 4 register (PC, R, MDR, MIR)
  - 1 subtraction unit
  - Some multiplexers and busses
  - Use the same add unit both for instruction operation and PC update
  - Cost: 9 clock cycles per instruction
- Only one instruction: subtract with branch on negative result
  - 3 byte instruction
    - · 1st operand address, 2nd operand address, branch address



LINKÖPING UNIVERSITY

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

### Controller for structure

- One instruction execution
  - 0. PCout, Zin, MARin, READ, ZEND
  - 1. MDRout, MARin, READ
  - 2. MDRout, Rin
  - 3. Pcout, Cin, PCin, MARin, READ
  - 4. MDRout, MARin, READ
  - 5. MDRout, COMP, Cin, Nin, MDRin, WRITE
  - 6. PCout, Cin, PCin, MARin, READ
  - 7. PCout, Cin, PCin, NNEND
  - 8. MDRout, Pcin
- 2 loops, 0-7 or 0-8





2024-09-19 11

# Control with two jumps, microcoded

- All control steps described in a ROM table
- Easy to understand
- Easy to redesign

```
ROM: process(C)
 type SQ_ARRAY is array(0 to 8,0 to 8) of BIT;
 constant MEM : SQ_ARRAY:=
('1',
        ('0',
                                          '0'), --6
 ('0',
         101, 101, 101,
 ('1',
                                          '0'));--8
                      '1',
                            '0', '0', '0',
begin
 MDR OUT <=
             MEM(INTVAL(C),0) after ROM_DEL;
 MAR_IN <=
             MEM(INTVAL(C),1) after ROM DEL;
 N_IN
             MEM(INTVAL(C),2) after ROM_DEL;
         <= MEM(INTVAL(C),3) after ROM_DEL;
 R_IN
 PC IN
         <= MEM(INTVAL(C),4) after ROM_DEL;
<= MEM(INTVAL(C),5) after ROM_DEL;</pre>
 ZEND
            MEM(INTVAL(C),6) after ROM_DEL;
 WRITE
         <= MEM(INTVAL(C),7) after ROM_DEL;
 NNEND
             MEM(INTVAL(C),8) after ROM_DEL;
end process ROM;
```



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# URISC controller, Mealy

- Inclear sequence
- Hard to modify
- Faster

```
-- Hard Wired Control Unit
 --First Stage Decoding
STO <= not C(2) and not C(1) and not C(0) after AND_DEL;
ST1 <= not C(2) and not C(1) and C(0) after AND_DEL;
ST2 <= not C(2) and C(1) and not C(0) after AND_DEL;
ST2 <= not C(2) and C(1) and not C(0) after AND_DEL;
ST3 <= not C(2) and C(1) and C(0) after AND_DEL;
ST4 <= C(2) and not C(1) and not C(0) after AND_DEL;
ST5 <= C(2) and not C(1) and C(0) after AND_DEL;
ST6 <= C(2) and C(1) and not C(0) after AND_DEL;
ST7 <= C(2) and C(1) and C(0) after AND_DEL;
--Second Stage Decoding
ST07 <= ST0 or ST7 after OR DEL;
ST25 <= ST2 or ST5 after OR_DEL;
ST36 <= ST3 or ST6 after OR DEL:
ST78 <= ST7 or C(3) after OR_DEL;
    -Control Signals
PC_OUT <= (ST07 or ST36) and not C(3)
after (OR_DEL + AND_DEL);
C_IN <= ST36 or ST57 after OR_DEL;
PC_IN <= ST36 or ST78 after OR_DEL;
MAR_IN <= not(ST25 or ST78) after (OR_DEL + INV_DEL);
MDR_OUT <=not PC_OUT after INV_DEL;
READ <= MAR_IN; COMP <= ST5; N_IN <= ST5; MDR_IN <= ST5;
WRITE <= ST5; R_IN <= ST2; ZIN <= ST0; ZEND <=ST0;
```



2024-09-19 13

### More on microcoded controllers

- Lecture 11 will cover more details on microcoded controller structures
  - Introduces also lab 3
- · Lab 3 includes an example of a microcoded controller structure
  - Controller used to control a user interface and a datapath
  - Y and D program students have seen this approach in computer technology courses
    - · Used there for creating machine instruction implementations



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 14

### Gate level simulation

- All designs will eventually reach the gate level
- Need accuracy to allow check of timing requirements
  - Setup time on flip-flops
  - Clock signals
  - Races, hazards
  - Glitch example (inverter + and with rising edge input)
- Models must be efficient
  - Large number of gates
  - Slow simulation due to accuracy
    - · Still much faster than spice simulation



2024-09-19 15

### How accurate can a gate model be?

• Example: 2 input OR-gate

Entity OR2 IS
Port (I1, I2 : in bit; O : out bit);
END OR2;
Architecture DELTA\_DEL of OR2 IS
BEGIN
O <= I1 OR I2;
END DELTA\_DEL;

Architecture FIXED\_DEL OF OR2 IS BEGIN

O <= I1 OR I2 after 3 ns; END FIXED\_DEL; ENTITY OR2G IS
Generic (DEL: TIME)M
Port (I1, I2: in bit; O: out bit);
END OR2G;
Architecture GNR\_DEL of OR2G IS
BEGIN
O <= I1 OR I2 after DEL;
END GNR\_DEL;



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 16

# Model accuracy

- Models are better and better, but not good enough
  - Multiple timing models required
  - typical delay, max, min
- · Want single model, only changing one constant
  - Timing CONTROL
  - Set one constant to define type of timing (min, max, typical)



2024-09-19 17

### Code example

```
use work.TIMING_CONTROL.all;
entity OR2_TV is
generic(TMIN,TMAX,TTYP: TIME);
port(I1,I2: in BIT; O: out BIT);
end OR2_TV;

architecture VAR_T of OR2_TV is
begin
O <= I1 or I2 after T_CHOICE(TIMING_SEL,
TMIN,TMAX,TTYP);
end VAR_T;
```

```
package TIMING_CONTROL is
 type TIMING is (MIN, MAX, TYP, DELTA);
 constant TIMING SEL: TIMING := TYP
function T_CHOICE(TIMING_SEL: TIMING;
            TMIN, TMAX, TTYP: TIME)
  return TIME;
end TIMING CONTROL;
package body TIMING_CONTROL is
function T CHOICE(TIMING SEL: TIMING;
             TMIN, TMAX, TTYP: TIME)
  return TIME is
 begin
  case TIMING SEL is
   when DELTA => return 0 ns;
   when TYP => return TTYP;
   when MAX => return TMAX;
   when MIN => return TMIN;
  end case:
end T CHOICE;
end TIMING_CONTROL;
```

LINKÖPING UNIVERSITY

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 18

# Additional timing details

- Timing is asymmetric
  - Different rise and fall times
  - Needs modeling

```
entity OR2GV is
 generic(TPLH,TPHL: TIME);
 port(I1,I2: in BIT; O: out BIT);
end OR2GV;
architecture VAR DEL of OR2GV is
begin
 process(I1,I2)
  variable OR_NEW,OR_OLD:BIT;
  OR NEW := I1 or I2;
  if OR_NEW = '1' and OR_OLD = '0' then
   O <= OR NEW after TPLH;
  elsif OR_NEW = '0' and OR_OLD = '1' then
   O <= OR_NEW after TPHL;
  end if;
  OR_OLD := OR_NEW;
 end process;
end VAR DEL:
```



2024-09-19 19

# Load dependency

- Every attached gate input slows the output speed
  - Large fan-out
  - Load is gate dependent
    - · Number of transistor gates connected
    - · Size of transistors on input gate
- Each connection corresponds to a small delay
  - Model each individual input wire delay
  - Gate delay included in output wire delay
- · Not good enough still
  - Delay depends on edge slope, temperature, etc.



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 20

# Common model used in synopsis library compiler

- $D_{TOTAL} = D_I + D_S + D_T + D_C$
- $D_I$  = Intrinsic delay inherent in gate and independent of where/how it is used
- $D_S$  = Slope delay caused by ramp time of the input signal
- $D_T$  Transition delay caused by loading of the output pin (approx  $R_{driver}$  ( $C_{wire}$ + $C_{pin}$ ))
- $D_{\mbox{\tiny C}}$  Connect media delay to an input pin (wire delay).



2024-09-19 21

### Different max and min

- Wire delay (D<sub>C</sub>) more complicated
  - Worst case
  - Best case
  - Balanced
- · Technology library
  - Large amount of information
  - Usually described as tables
  - Sometimes described as polynomial coefficients







TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 22

### **Back annotation**

- The process of abstraction
  - adding more details to a high level model by analyzing a lower abstraction level model
  - Example: Layout information used to generate timing information in a gate netlist
- Standardized way: SDF
  - Add timing info from layout to gate level
  - Useful for general timing requirements and properties)
  - Delays module path, device, interconnect, and port



2024-09-19 23

### SDF file format

- Timing checks: setup, hold, recovery, removal, skew, width, period, and no change
- Timing constraints: path, skew, period, sum, and diff
- Each trippel defines min, typical, and max delay
  - One for positive edge
  - One for negative edge

```
(CELL
(CELLTYPE "DFF")
(INSTANCE top/b/c)
(DELAY
(ABSOLUTE
(IOPATH (posedge clk) q (2:3:4) (5:6:7))
(PORT clr (2:3:4) (5:6:7))
)
(TIMINGCHECK
(SETUPHOLD d (posedge clk) (3:4:5) (-1:-1:-1))
(WIDTH clk (4.4:7.5:11.3))
)
```

LINKÖPING UNIVERSITY

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 24

# SDF File format, cont.

- Design/instance-specific or type/library-specific data
- Timing environment:
- intended operating timing environment
- Scaling, environmental, and technology parameters
- Incremental delay builds on the previous models timing by adding/subtracting timing information
- Absolute replaces timing information



2024-09-19 25

# Gate models of increasing complexity

- Creating accurate library models is time consuming
- Delay, timechecks etc. can be done in many different ways
- A standard has evolved that defines what parameters to use
  - Simplifies back annotation
  - Allows for accelerated models (hard-coded)



TSTE12 Design of Digital Systems, Lecture 8

end block:

2024-09-19 26

### VITAL models of gates

 Three parts: Input delay, Functional and Path delay

```
library IEEE;
use IEEE.VITAL_Primitives.all;
library LIBVUOF;
use LIBVUOF.VTABLES.all;
architecture VITAL of ONAND is
   attribute VITAL_LEVEL1 of VITAL : architecture is TRUE;
SIGNAL A_ipd : STD_ULOGIC := 'X';
SIGNAL B_ipd : STD_ULOGIC := 'X';
SIGNAL C_ipd : STD_ULOGIC := 'X';
SIGNAL D_ipd : STD_ULOGIC := 'X';
begin
-- INPUT PATH DELAYS
WireDelay : block
begin
   VitalWireDelay (A_ipd, A, tipd_A);
   VitalWireDelay (B_ipd, B, tipd_B);
   VitalWireDelay (C_ipd, C, tipd_C);
   VitalWireDelay (D_ipd, D, tipd_D);
```

```
BEHAVIOR SECTION
VITALBehavior : process (A_ipd, B_ipd, C_ipd, D_ipd)
 -- functionality results
  VARIABLE Results: STD_LOGIC_VECTOR(1 to 1):=(others => 'X');
 ALIAS Y_zd : STD_LOGIC is Results(1);
-- output glitch detection variables
  VARIABLE Y_GlitchData : VitalGlitchDataType;
    Functionality Section
  Y_zd := (NOT ((D_ipd) AND ((B_ipd) OR (A_ipd) OR C_ipd))));
     Path Delay Section
 VitalPathDelay01 (
    OutSignal => Y,
    GlitchData => Y_GlitchData,
    OutSignalName => "Y",
    OutTemp => Y_zd,
    Paths => (0 => (A_ipd'last_event, tpd_A_Y, TRUE),
              1 => (B_ipd'last_event, tpd_B_Y, TRUE),
2 => (C_ipd'last_event, tpd_C_Y, TRUE),
                3 => (D_ipd'last_event, tpd_D_Y, TRUE)),
    Mode => OnDetect,
    Xon => Xon,
    MsgOn => MsgOn,
    MsgSeverity => WARNING);
end process;
```



2024-09-19 27

### Detection of timing errors

- Input path delay: Transport delay dependent on previous value and wire delay
- Functional part. Boolean expression or lookup tables for fast simulation
- · Path delay: output delay, glitch handling
- Models often includes error detection
  - Short spikes, short setup/hold timing etc.
  - Unacceptable values (Z or X)
  - Unacceptable input combinations (both set and reset active on SR flipflop)



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 28

### Hardware overview

- Detailed description of Hardware
- Standard parts
  - TTL (SSI, MSI LSI)
  - Memories, microprocessors, I/O
- ASIC (Application Specific Integrated Circuit)
  - Integrated circuit that has been produced for a specific application and (often) produced in small numbers
  - Memories and microprocessors are general application devices



2024-09-19 29

# ASIC technologies

- May use different technologies for ASIC: PLD, Gate array, FPGA, Standard cell, custom. ASIC is however limited to Standard cell and gate array. Custom design is also used.
- CMOS switch. Power consumption:  $P \sim CV^2f$ 
  - Use low power supply, reduce clock, reduce area
- Transistor channel length (old measure of chip manufacturing process) shorter than 0.01 um (so called 4 nm used today, e.g. TSMC N4P process in Ryzen 9000 series CPUs, 3 nm in Apple A18 Pro in Iphone 16)



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 30

# Gate array, mask programmable

- Predefined pattern of transistors
  - Add interconnect metal for each design
  - Fast manufacture (weeks)
  - No transistor sizing
- Example shows inverter design
- Combined with library of existing cells
  - Basic gates, flipflops etc.





2024-09-19 31

### Standard cell

- Transistor placement and metal layers unique for each design, needs to be manufactured
- Limited number of layout cell types (Cell library)
- Cells already characterized
- Slow manufacture (month)





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# Full custom layout

- Full custom
  - Individual placement and scaling of transistors
  - Full control of wires and connections
  - Maximum control, maximum effort
- Complete freedom to place and route transistors
  - Not limited to existing logic style/library
  - Slow manufacture (months)
  - Higher performance than standard cells
- Requires more testing (simulation)



2024-09-19

### Programmable devices: PLD

- Input variables forming AND-OR arrays, flipflops at the outputs.
  - Crosspoint can connect to create products
- Inputs and outputs on chip edge
- Only small circuit designs





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

3/1

# Programmable devices: CPLD

- Combine PLD structure with additional onchip interconnect
  - Support larger designs



Figure 1: XC9500XL Architecture



### 2024-09-19 35

### FPGA structure

- Field Programmable Gate Array
- Cells in an array, special I/O blocks around the edges.
- Between cells (CLBs or LEs) are routing wires located (interconnect)





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# FPGA building blocks

- CLB/Logic Element
  - Different name in different manufacturers designs
  - In many cases are they based on lookup tables (i.e., no simple gates, instead more advanced functions) => less need for routing channels (that are expensive). Lookup table can be viewed as a small RAM or a MUX with fixed inputs.
  - Trade off between big lookup tables and utilization. Optimal around 4-6 bits address.
  - Often a flip-flop included in the CLB/LE



2024-09-19 37

# CLB Example: Xilinx (AMD) Spartan II

- Choose positive or negative clock edge
- May combine lookup tables
- CLB may be rearranged into a memory or shift register





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# Xilinx Spartan II I/O logic

- Support multiple I/O standards
  - 3.3V, 1.8V, 1.2V etc.
  - Differential
- Flipflops located close to pin
  - Reduce delays due to routing signal to pin
- Different drive strength





2024-09-19 39

# Other FPGA example: Altera (intel) Cyclone

- · Same basic structure
  - Logic Array
  - Separate I/O blocks
  - Dedicated memory
  - PLL for internal clock generation





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

40

# Altera (Intel) Cyclone

- Logic element structure
  - More detailed description
  - One Lookup table + one flipflop





TSTE12 Design of Digital Systems, Lecture 8 2024-09-19 41 Alternative FPGA structure: Actel (Microsemi) Smaller functions Separate register from combination logic LINKÖPING UNIVERSITY

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# Programming of FPGA

- Two types: reprogrammable or one-time programmable
- · Control a CMOS-switch using a RAM/EPROM/EEPROM-cell. The CMOS switch is slow (compared to the alternative)
- The alternative is fuse/Antifuse (burn together two wires by using high voltage)





2024-09-19 43

### Important aspects

- Speed of the switched (impedance, capacitans). Many switches in series ruins the performance
- Reprogrammable? Needs any design changes to be done?
- Volatile designs? What happens at power failure? How is the design put into the chip? How long delay from power on to working design?
- Area of the switches? Needs many switches?



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 44

# Technology comparison table

| • | Most common Volatile |                  | SRAM  | Antituse Eprom Eeprom/Flash |       |         |
|---|----------------------|------------------|-------|-----------------------------|-------|---------|
|   | MOSt Common          | Volatile         | yes   | no                          | no    | no      |
|   | - SRAM               | Re-programmable  | yes   | no                          | yes   | yes     |
|   | - EEPROM/FLASH       | Chip area        | large | small                       | small | average |
|   |                      | R (routing nets) | large | small                       | large | large   |
|   | Yiliny/Altora        | С                | large | small                       | large | large   |

- Xilinx/Altera
  - SRAM/EEPROM
- Actel
  - Antifuse
  - Do also create classic FPGA (SRAM/EEPROM based)
- SRAM based FPGA usually support automatic configuration from serial flash memory at power-on



2024-09-19 45

### How to configure the FPGA

- Non-volatile technology
  - FLASH, EEPROM, PROM, etc.
- External programmer
  - Software on PC to program device
- External ROM/FLASH
  - Standard FLASH
  - Serial FLASH
- Embedded microcontroller
  - Boot application configures FPGA
  - Not possible if flash needed for CPU operation



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 46

# FPGA configuration, cont.

- Large volumes may use non-programmable devices based on FPGA
  - Resynthesize: may give different behavior
  - Strip FPGA: Remove configuration logic



2024-09-19 47

### FPGA hardware options

- · Multipliers
- · DSP blocks
  - Multiply-ackumulate
  - Common operation in DSP
  - High precision (> 20 bits)
- Optimized I/O support
  - Differential signaling
  - Low swing/current steering



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 48

### FPGA hardware options, cont.

- · Clock circuits
  - Phase locked loops (PLL), Delay locked loops (DLL)
  - Clock tree distribution
- Serializer/deserializer
  - Support modern PC bus standards such as PCI Express
  - Dedicated block to send/recieve high speed (> Gbit/s) serial data
  - Reduce number of I/O pins



2024-09-19 49

### FPGA hardware options, cont.

- · A/D and D/A converters
- · Memory units
- CPU
  - e.g., physical powerpc or ARM core inside FPGA
  - Usually combined with external memory interfaces and CPU-based I/O support (e.g. wired ethernet, SD-card reader etc.)
- Alternative to dedicated CPU hardware: soft cpu
  - VHDL design of a processor
  - Allows for modification of processor structure



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 50

### ASIC vs FPGA

- ASIC have a large NRE cost
  - Non-Recurring Engineering cost, price of 1st unit
- FPGA have large per unit cost
- Selection of technology depend on
  - Performance requirements
  - Number of units
  - Time to market



2024-09-19 51

# HDL based design

- · Structured design using HDL
- FSM descriptions



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 52

# **Example: Combination logic**

- Binary comparison. Compare two two-bit words
  - GE: Greater or equal
  - LE: Less or equal
  - E: Equal
  - G: Greater
  - L: Less



entity COM is generic (D:time); port (N1, N0, M1, M0: in BIT; GE, LE, E, G, L: out BIT); end COM;



2024-09-19

### Descriptions

- ROM
  - Table lookup



```
use work.TRUTH4x5.all;
architecture TABLE of COM is
begin
process (N1,N0,M1,M0)
variable INDEX: INTEGER;
variable WOUT: WORD;
begin
INDEX := INTVAL (N1&N0&M1&M0);
WOUT := TRUTH (INDEX);
GE <= WOUT(4) after D;
LE <= WOUT(2) after D;
G <= WOUT(1) after D;
L <= WOUT(1) after D;
L <= WOUT(0) after D;
end process;
end TABLE;
```

```
package TRUTH4x5 is
 constant NUM_OUTPUTS: INTEGER:=5;
 constant NUM_INPUTS: INTEGER:=4;
 constant NUM_ROWS: INTEGER:= 2 ** NUM_INPUTS;
 type WORD is array(NUM_OUTPUTS-1 downto 0) of BIT;
 type ADDR is array(NUM_INPUTS-1 downto 0) of BIT;
 type MEM is array (0 to NUM_ROWS-1) of WORD;
function INTVAL(VAL:ADDR) return INTEGER;
end TRUTH4x5:
package body TRUTH4x5 is function INTVAL(VAL: ADDR) return INTEGER is
  variable SUM: INTEGER:=0;
  for N in VAL'LOW to VAL'HIGH loop
   if VAL(N) = '1' then
    SUM := SUM + (2 ** N);
   end if;
  end loop;
  return SUM;
 end INTVAL;
end TRUTH4x5;
```



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# Descriptions, CASE statement

One multiplexer for each output



```
architecture MUX of COM is
begin
 process(N1,N0,M1,M0)
   case N1&N0&M1&M0 is
    when "0000" => GE <= '1' after D; LE <= '1' after D;
      E <= '1' after D; G <= '0' after D; L <= '0' after D;
    when "0001" => GE <= '0' after D; LE <= '1' after D;
      E \le 0' after D; G \le 0' after D; L \le 1' after D;
    when "0010" => GE <= '0' after D; LE <= '1' after D;
      E <= '0' after D; G <= '0' after D; L <= '1' after D;
    when "0011" => GE <= '0' after D; LE <= '1' after D;
      E <= '0' after D; G <= '0' after D; L <= '1' after D;
    when "0100" => GE <= '1' after D; LE <= '0' after D; E <= '0' after D; G <= '1' after D; L <= '0' after D; when "0101" => GE <= '1' after D; LE <= '1' after D;
      E <= '1' after D; G <= '0' after D; L <= '0' after D;
    when "0110" => GE <= '0' after D; LE <= '1' after D;
      E \le 0' after D; G \le 0' after D; L \le 1' after D;
    when "0111" \Rightarrow GE <= '0' after D; LE <= '1' after D; end process; E <= '0' after D; G <= '0' after D; L <= '1' after D; end MUX;
```

when "1000" => GE <= '1' after D; LE <= '0' after D; E <= '0' after D; G <= '1' after D; L <= '0' after D; when "1001" => GE <= '1' after D; LE <= '0' after D; when "1001" => GE <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; when "1010" => GE <= '1' after D; LE <= '1' after D; LE <= '1' after D; G <= '0' after D; LE <= '1' after D; E <= '1' after D; G <= '0' after D; LE <= '1' after D; when "1011" => GE <= '0' after D; LE <= '1' after D; G <= '0' after D; LE <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; E <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; Men "1101" => GE <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '0' after D; G <= '1' after D; LE <= '1' after D; LE <= '1' after D; G <= '1' after D; LE <= '1' after D; G <= '1' after D; LE <= '1' after D; E <= '0' after D; E <= '1' after D; E <= '0' after D; E <= '1' after D; E <= '0' after D; E <= '1' after D; E <= '0' after D; E <= '1' after D; E <= '0' after D; E <= '1' after D; E <= '0' after D; E <= '1' after D; E <= '0' after D; E <= '1' after D;

LINKÖPING UNIVERSITY

2024-09-19 55

### Descriptions, improved CASE

- Remove one variable in the selection of the case statement
- Use the removed variable as output value or its inverse
- More variables can be removed
  - Increase logic in front of multiplexer

begin process (N1, N0, M1, M0) beain case N1&N0&M1 is when "000"  $\Rightarrow$  GE  $\Rightarrow$  not M0 after D; LE  $\Rightarrow$  1' after D; E <= not M0 after D; G <= '0' after D; L <= M0 after D; when "001" => GE <= '0' after D; LE <= '1' after D; E <= '0' after D; G <= '0' after D; L <= '1' after D; when "010" => GE <= '1' after D; LE <= M0 after D;  $E \le M0$  after D;  $G \le not M0$  after D;  $L \le 0'$  after D; when "011"  $\Rightarrow$  GE  $\iff$  '0' after D; LE  $\iff$  '1' after D; E <= '0' after D; G <= '0' after D; L <= '1' after D; when "100"  $\Rightarrow$  GE  $\iff$  '1' after D; LE  $\iff$  '0' after D;  $E \le 0'$  after D;  $G \le 1'$  after D;  $L \le 0'$  after D; when "101"  $\Rightarrow$  GE  $\Rightarrow$  not M0 after D; LE  $\Rightarrow$  '1' after D;  $E \le not M0 after D; G \le 0' after D; L \le M0 after D;$ when "110" => GE <= '1' after D; LE <= '0' after D; E <= '0' after D; G <= '1' after D; L <= '0' after D; when "111" => GE <= '1' after D; LE <= M0 after D;  $E \le M0$  after D;  $G \le N0$  after D;  $L \le 0'$  after D; end process; end MUX3;

LU LINKÖPING UNIVERSITY

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 56

# Hardware, improved Case statement

- One mux plus inverter
- Every output have its own multiplexer (same as for nonimproved case statement)





2024-09-19 57

### **Partitioning**

- Rewrite expressions, sharing common subexpression
  - E= GE AND LE
  - G = GE AND NOT LE
  - L = LE AND NOT GE
- That is, two expressions followed by simple generation of E, G, and L
- · Designer makes logic synthesis instead of tool
  - Synthesis tool may still modify description



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 58

# Two-level logic

- · Many different choices
- Can be described as structure

```
architecture POSDF of COM is
signal Z1,Z0: BIT;
begin
Z1 <= (not N0 or M1 or M0) and (not N1 or M1) and
(not N1 or not N0 or M0);
Z0 <= (N1 or N0 or not M0) and (N1 or not M1) and
(N0 or not M1 or not M0);
LE <= Z1 after D;
GE <= Z0 after D;
E <= Z1 and Z0 after D;
G <= Z0 and not Z1 after D;
L <= Z1 and not Z0 after D;
end POSDF;
```



2024-09-19

### Structural description

architecture TWO \_LEVEL\_OR\_AND of COM is signal Z10,Z11,Z12,Z00,Z01,Z02: BIT; signal NOBAR,N1BAR,M0BAR,M1BAR: BIT; signal X0,Z1,Z0NOT,Z1NOT: BIT; component NOT2G generic (D: TIME); port (I: in BIT; O: out BIT); end component; for all: NOT2G use entity NOT2(BEHAVIOR); component AND2G generic (D: TIME); port (I1, I2: in BIT; O: out BIT); end component; for all: AND2G use entity AND2(BEHAVIOR); component AND3G generic (D: TIME); port (I1, I2: in BIT; O: out BIT); end component; for all: AND3G use entity AND3(BEHAVIOR); component AND3G generic (D: TIME); port(I1,I2:I3: in BIT; O: out BIT); end component OR2G generic (D: TIME); port(I1,I2: in BIT; O: out BIT); end component; for all: OR2G use entity OR2(BEHAVIOR); component OR3G generic (D: TIME); port (I1,I2: II: BIT; O: out BIT); end component; for all: OR3G use entity OR3(BEHAVIOR); component WIREG port (II: IBIT; O: out BIT); end component; for all: OR3G use entity OR3(BEHAVIOR); component WIREG port (II: IBIT; O: out BIT);

begin C1: NOT2G generic map (2 ns) port map (N0, N0BAR); C2: NOT2G generic map (2 ns) port map (N1, N1BAR); C3: NOT2G generic map (2 ns) port map (M0, M0BAR); C4: NOT2G generic map (2 ns) port map (M1, M1BAR); C5: OR3G generic map (2 ns) port map (N0BAR, M1, M0, Z10); C6: OR2G generic map (2 ns) port map (N1BAR, M1, Z11); C7: OR3G generic map (2 ns) port map (N1BAR, N0BAR, M0, Z12); C8: AND3G generic map (2 ns) port map (Z10, Z11, Z12, Z1); C9: OR3G generic map (2 ns) port map (N1, N0, M0BAR, Z00);

C10:OR2G
generic map (2 ns)
port map (N1, M1BAR, Z01);
C11:OR3G
generic map (2 ns)
port map (N0, M1BAR, M0BAR, Z02);
C12:AND3G
generic map (2 ns)
port map (200, Z01, Z02, Z0);
C13:NOT2G
generic map (2 ns)
port map (Z1, Z1NOT);
C14:NOT2G
generic map (2 ns)
port map (Z0, Z0NOT);
C15:AND2G
generic map (2 ns)
port map (Z0, Z0NOT);
C15:AND2G
generic map (2 ns)
port map (Z0, Z1, E);
C16:AND2G
generic map (2 ns)
port map (Z0, Z1NOT, G);
C17:AND2G
generic map (2 ns)
port map (Z0, Z1NOT, G);
C17:AND2G
generic map (2 ns)
port map (Z1, Z0NOT, L);
C18:WIREG
port map (Z0, GE);
C19: WIREG
port map (Z1, LE);
end TWO\_LEVEL\_OR\_AND;



port (t: in BIT; O: out BIT); end component; for all: WIREG use entity WIRE(BEHAVIOR);

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# Finite state machines (FSM)

- Example: serial/parallel converter
  - A indicates start of data
  - Output Z only during one clock cycle



entity STOP is
port (R, A, D, CLK: in BIT;
Z: out BIT\_VECTOR(3 downto 0);
DONE: out BIT);
end STOP:





2024-09-19 61

### FSM design, cont.

- First: Select type of state machine (Moore, Mealy)
  - Moore machine have stable output after a few gate delays
  - Moore machine can not produce output dependent on current input values
  - Moore machine may require more states than Mealy machines
  - Mealy machine may sometimes be required due to direct respons from FSM on input signal change



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 62

### FSM Design, cont.

- Second: Create a state diagram. Good start is reset-state S0.
  - S1: First data on D, Done=0, Z unspecified
  - S2: Second data on D, Done =0, Z unspecified
  - S3: Third data on D, Done = 0, Z unspecified
  - S4: Fourth data on D, Done = 0, Z unspecified
  - S5: Output on Z, Done= 1
  - In S5 can A also be 1 (indicating new data)
    - · Next clock cycle must take care data, i.e., use S1 without passing through S0



2024-09-19 63

# FSM state diagram

 Some tools can translate state diagram automatically to VHDL (e.g., HDL Designer)





TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 64

# Alternate desription

- Transition list
  - Textual description of the FSM
  - Useful for large state diagrams
  - Graphs become hard to understand when number of states increase
  - Possible to cope with complexity by use of hierarchy

| Curren<br>State | t Transition<br>Expression | Next<br>State | Data<br>Transfers | Output                      |  |  |
|-----------------|----------------------------|---------------|-------------------|-----------------------------|--|--|
| S0              | R+A                        | S0            | None              | DONE=0, Z unspecified       |  |  |
| S0              | R & A                      | S1            |                   | ·                           |  |  |
| S1              | R                          | S2            | Store bit 1       | DONE=0, Z unspecified       |  |  |
| S1              | R                          | S0            |                   | ·                           |  |  |
| :               | :                          | :             | :                 | :                           |  |  |
| S5              | R&A                        | S1            | None              | DONE=1, Z=parallel data out |  |  |
| S5              | R + A                      | S0            |                   |                             |  |  |



2024-09-19

### FSM description in VHDL

```
architecture FSM_RTL of STOP is
type STATE_TYPE is (S0, S1, S2, S3, S4, S5);
signal STATE: STATE_TYPE;
signal SHIFT_REG: BIT_VECTOR (3 downto 0);
begin

STATE: process (CLK)
begin
if CLK='1' then
case STATE is
when S0 =>
-- Data Section
-- Control Section
if R='1' or A='0' then
STATE <= S0;
elsif R='0' and A='1' then
STATE <= S1;
end if;
when S1 =>
-- Data Section
SHIFT_REG <= D & SHIFT_REG(3 downto 1);
-- Control Section
if R='0' then
STATE <= S2;
elsif R='1' then
STATE <= S2;
elsif R='1' then
STATE <= S0;
end if;
```

```
-- Data Section
SHIFT REG <= D & SHIFT REG(3 downto 1);
  -- Control Section
   if R='0' then
     STATE <= S3
    elsif R='1' ther
     STATE <= S0;
end if;
when S3 =>
-- Data Section
 -- Shift in the third bit
   SHIFT_REG <= D & SHIFT_REG(3 downto 1);
- Control Section
  if R='0' then
  STATE <= S4;
elsif R='1' then
    STATE <= S0;
end if;
when S4 =>
  -- Data Section
-- Shift in the fourth bit
  SHIFT REG <= D & SHIFT REG(3 downto 1);
    Control Section
  if R='0' then
   STATE <= S5
elsif R='1' then
    STATE <= S0:
```

```
when S5 =>
- Data Section
- Control Section
if R='0' and A='1' then
STATE <= S1;
elsif R='1' or A='0' then
STATE <= S0;
end if;
end case;
end if;
end process STATE;

OUTPUT: process (STATE)
begin
case STATE is
when S0 to S4 =>
DONE <= '0';
when S5 =>
DONE <= '1';
Z <= SHIET_REG;
end process OUTPUT;
end FSM_RTL;
```

LINKÖPING UNIVERSITY

TSTE12 Design of Digital Systems, Lecture 8

2024-09-19

# State machine partitioning

- State machines partitioned into multiple processes
  - Updating (clocked), i.e., the state register
  - Next state calculation
  - Output calculation
- · May find different combinations of these
  - Single process
  - Two processes (nextstate + output, state update)
  - Three processes (nextstate, output, state update)
- Multiple processes to avoid creating Mealy instead of Moore machine



2024-09-19 67

### State assignment

- States are not coded in VHDL
  - Use enumeration
    - · Allows synthesis tools do a better work
  - Powerful computer algorithms usually find better state assignment
  - Possible to control state minimisation and assignment in synthesis tool
    - E.g. one-hot encoding may be more suitable in same cases



TSTE12 Design of Digital Systems, Lecture 8

2024-09-19 68

# Alternative description: table based

- Small statemachine, one input X and one output Z
- Code the state table as an array with nextstate and output

```
entity TWO_CONSECUTIVE is
 port(CLK,R,X: in BIT; Z: out BIT);
end TWO_CONSECUTIVE;
architecture FSM of TWO CONSECUTIVE is
 type STATE is (S0.S1.S2)
                                                                                                 begin
 signal FSM_STATE: STATE := S0;
 type TRANSITION is record
   OUTPUT: BIT;
  NEXT_STATE: STATE;
                                                                                                   end if:
 end record:
 type TRANSITION MATRIX is array(STATE,BIT) of TRANSITION;
 constant STATE_TRANS: TRANSITION_MATRIX :=
   \begin{array}{l} (S0 \Rightarrow ('0' \Rightarrow ('0',S1), \ 1' \Rightarrow ('0',S2)), \\ S1 \Rightarrow ('0' \Rightarrow ('1',S1), \ '1' \Rightarrow ('0',S2)), \\ S2 \Rightarrow ('0' \Rightarrow ('0',S1), \ '1' \Rightarrow ('1',S2))); \end{array}
                                                                                                   end if;
                                                                                                 end process;
                                                                                                end FSM:
```

begin
process(R,X,CLK,FSM\_STATE)
begin
if R = '0' then -- Reset
FSM\_STATE <= SO;
elsif CLK'EVENT and CLK ='1' then -- Clock event
FSM\_STATE <= STATE\_TRANS(FSM\_STATE,X).NEXT\_STATE;
end if;
if FSM\_STATE'EVENT or X'EVENT then -- Output Function
Z <= STATE\_TRANS(FSM\_STATE,X).OUTPUT;
end if;
end process;



