## ASIC 2006-03-18

- 1. a) Clustered look-ahead may become unstable. Pole-zero cancellation may become inexact under finite wordlength conditions.
  - b) Two's complement: W/2 CSD: W/3
  - c) Power:  $V_{DD}^{2}$ Gate delay:  $1/V_{DD}$
  - d) Non-recursive parts can be pipelined (or interleaved)
  - e) Form two positive two's complement numbers a and b, where a has a one for all +1-digits and b has a one for all -1-digits. Subtract b from a using a two's complement subtraction.
- 2. Sort processes according to increasing starting time and decreasing life time:



Assign possible processes from start of list



PE 1: f, c PE 2: a, i, h PE 3: g, b PE 4: j PE 5: e PE 6: d

3. a) 
$$T_{min} = max \left\{ \frac{T_{mult, a} + T_{mult, b} + 2T_{add}}{2}, \frac{T_{mult, a} + T_{mult, b} + T_{mult, d} + 2T_{add}}{2} \right\}$$

 $= max\{3, 4\} = 4$  time units

b) 
$$T_{cp} = T_{mult, b} + T_{mult, d} + T_{add} = 5$$
 time units

c) Precedence graph:



d) Initial computation graph:



Reschedule:



e) Execution time is one time unit for both types.

Multipliers: 
$$\left\lceil \frac{4 \cdot 1}{4} \right\rceil = 1$$
 multiplier  
Adders:  $\left\lceil \frac{3 \cdot 1}{4} \right\rceil = 1$  adder

4. a)  $-0.875 = \overline{1.001}_{\text{CSD}}$ .



- b) Latency: 3 + 1 + 1 + 1 = 6 clock cycles Execution time: 17 + 6 = 23 clock cycles assuming that the flip-flops are set/reset at the same time. With different set/reset the execution time is idnetical to that of the longest PE, i.e., 17 + 4 = 21 clock cycles (which can be reduced to 20 clock cycles if the final flip-flop of the multiplication is not reset).
- c) Yes, four ways (one is enough to get full score): The one contained in the set flip-flop can be distributed among the remaining flip-flops (with less weight 1/2 + 1/4 + 1/8 + 1/8), and, hence, the subtraction can be replaced with an inverter.



The negation can be performed as a subtraction after the shifts (d flip-flops).



By swapping the inputs to the subtraction of the inputs, the sign of the coefficient can be changed, and, hence, the coefficient changed to  $0.875 = 1.00\overline{1}_{\text{CSD}}$ , which has a simpler straightforward realization. Compare this with the laborations.



The two additions at the output can be changed to subtractions, and, hence, the sign of the coefficient is changed.



d) Additions:  $10^7 \cdot (2 \cdot 7 + 1) = 150$  Madd/s

(Due to bad formulation of the problem  $10^7 \cdot (3 \cdot 7 + 1) = 220$  Madd/s will also be an OK answer.)

Multiplications:  $10^7 \cdot 1 \cdot 7 = 70$  Mmult/s

5. a)



b) Latency for computing  $v_1(n + 1) = 6$  clock cycles. Latency for computing  $v_2(n + 1) = 9$  clock cycles. Latency for computing y(n) = 4 clock cycles.



The introduced 21 delays after the y(n) vector multiplier are not required. Two's complement is selected

| $v_1(n)$ | $v_2(n)$ | x(n) | Value                       | Two's complement |
|----------|----------|------|-----------------------------|------------------|
| 0        | 0        | 0    | 0                           | 00.000000        |
| 0        | 0        | 1    | 31/32 = 62/64               | 00.111110        |
| 0        | 1        | 0    | -9/16 = -36/64              | 11.011100        |
| 0        | 1        | 1    | -9/16 + 31/32 = 26/64       | 00.011010        |
| 1        | 0        | 0    | 7/64                        | 00.000111        |
| 1        | 0        | 1    | 7/64 + 31/32 = 69/64        | 01.000101        |
| 1        | 1        | 0    | 7/64 - 9/16 = -19/64        | 11.101101        |
| 1        | 1        | 1    | 7/64 - 9/16 + 31/32 = 43/64 | 00.101011        |

c) Two's complement is selected.