## Written examination TSTE87

## 2016-06-02

- 1. (a) State (exactly) two reasons to schedule for more than one sample period. (2 p)
  - (b) A DSP algorithm is either iterative or block processing. Describe the difference and give an example of each. (2 p)
  - (c) What are the two reasons to perform scaling of signal levels? (2 p)
  - (d) Which two methods can be used to control the timing of a circuits? That is, when things happen. (2 p)
  - (e) How are the iteration period bound,  $T_{\infty}$ , minimal sample period,  $T_{\min}$ , number of operations, and number of delay elements respectively affected by unfolding an algorithm a factor N? (2 p)
- 2. The digital filter algorithm in Fig. 1 shall be implemented on a shared-memory architecture with homogeneous non-preemptive processing elements. There are two types of processing elements: multipliers and adders. The latency of a multiplier is two clock cycles and the latency of an adder is one clock cycles. Both types of processing elements have an execution time of one clock cycles.



Figure 1: Digital filter for use in Problem 2.

- (a) Determine the minimal sample period,  $T_{\min}$ . (2 p)
- (b) Determine the critical path,  $T_{cp}$ . (2 p)
- (c) Determine the precedence graph. (6 p)
- (d) Schedule the algorithm for  $T_s = 4$  clock cycles and a reasonably efficient resource utilization. Make sure that you indicate all the connections in the schedule. (8 p)
- 3. The following expression shall be realized using shifts, additions and subtractions. Draw a realization using as few additions and subtractions as possible.

$$y = 23x_1 - 3x_2 + 12x_3$$

Note that  $23 = 3 \times 8 - 1$ . (4 p)

- 4. The expression in Problem 3 is to be used for implementing an FIR filter using distributed arithmetic, with  $x_i = x(n-i+1)$ .
  - (a) Draw the structure of the FIR filter using building blocks such as ROMs, shift-accumulators and shift-registers. (4 p)
  - (b) Determine the ROM contents in a named and stated binary representation. (4 p)

5. Consider the memory variable life time diagram in Fig. 2, consisting of variables a to h. This should be realized using single port memories, i.e., memories where a single variable can be either read or written in each time slot.



Figure 2: Life time diagram for Problem 5.

- (a) Divide the variables between a minimal number of memories. (4 p)
- (b) For each memory, assign the variables to a minimal number of memory cells. (4 p)
- 6. Consider the seventh-order LWDF filter in Fig. 3 which consists of symmetric twoport adaptors as in Fig. 4. There are three possible shared-memory approaches to implement this filter:
  - 1. Using homogeneous and non-preemptive symmetric two-port adaptor PEs
  - 2. Using homogeneous and non-preemptive add/subtract PEs and homogeneous and non-preemptive multiplier PEs
  - 3. Using homogeneous and non-preemptive add/subtract PEs and homogeneous and non-preemptive multiplier with pre-adder PEs, where the latter consists of a multiplier with a pre-adder, i.e., one of the multiplier inputs has an adder/subtracter

Both adder and multiplier PEs are implemented using combinational logic followed by a pipeline register. Hence, both the latency and the execution time are one clock cycle. The remaining PEs are constructed from such adder and multiplier PEs.

- (a) Draw the symmetric two-port adaptor PE using add/subtracters, multipliers and pipeline registers (one after each operation). Include shimming delays under the assumption that the adaptor coefficient is provided at the same time as the inputs.

  (2 p)
- (b) What is the latency and execution time of the symmetric two-port adaptor as constructed above? (2 p)
- (c) Draw the multiplier with pre-adder PE using add/subtracters, multipliers and pipeline registers (one after each operation). Include shimming delays under the assumption that the adaptor coefficient is provided at the same time as the inputs.

  (2 p)



Figure 3: Seventh-order LWDF filter for use in 6.



Figure 4: Detail of a symmetric two-port adaptor in Fig. 3.

(d) What is the latency and execution time of the multiplier with pre-adder PE as constructed above? (2 p)

(4 p)

(4 p)

- (e) Assuming that the clock frequency is 100 MHz. What is the highest sample rate that the filter can run at for the different cases, assuming no limit on the number of resources? (Hint: consider  $T_{\min}$ )
- (f) Assuming a sample rate of 10 MHz, how many PEs are required for the different cases? Assume that the final addition and scaling is realized using a two-port adaptor operation for the first case and an add/subtract operation for the other two cases (and the scaling is solved elsewhere).
- (g) As the two-port adaptor and multiplier with pre-adder PEs are constructed from add/subtracters and multipliers, determine the total number of add/subtracters and multipliers used when implementing all PEs. as determined above. (2 p)
- (h) Again, assuming a sample rate of 10 MHz, how many read and write operations are required to/from the memories, respectively? The input and output signal of the filter does not need to be stored, but the cofficients should be considered. (4 p)