## Exercise 1

The two parallel interconnects in Figure 1 are separately driven by one buffer each. The parasitic capacitances of the interconnects can be modeled as in Figure 2.


Figure 1. Two parallel interconnects driven by buffers.


Figure 2. Model of the two interconnects.
a) Determine the dissipated energy due to $C_{a}, C_{b}$, and $C_{a b}$ for the six scenarios shown in Figure 3.
1)

b

2)


3)

4)



5)

6)



Figure 3. Six scenarios.
b) Why do the energy dissipation in scenario 4 and 5 differ?
c) Why do the energy dissipation in scenario 3 and 6 differ?
d) The two buffers are explicit shown in Figure 4. Determine where the energy is dissipated for the time slots $t_{1}<t<t_{2}$ and $t_{2}<t<t_{3}$, respectively.


Figure 4. Two interconnects driven by buffers.
e) Determine the amount of energy which is taken from the power supply in the scenarios above (Figure 3). Why do these results differ from the results in a)?

## Exercise 2

a) Encode the sequence $<0,1,2,3,4,5,6,7,0>$ with both Gray and Binary code. Compare the number of transitions.
b) Determine the number of transitions in a general counting sequence. For a large counter, what is the ratio between the two different encodings?

## Exercise 3

A specific control unit cycles through the states shown in Figure 5. By simulation over a large number of different input signals the probability for change of state has been estimated. The probabilities for the different branches is also shown in Figure 5.
a) What is the average number of logic transitions/cycle using the state coding of the FSM in Fig. 5? (one cycle = "A complete cycle from state 000 and back to state 000 ")
b) Suggest another state coding with fewer transitions/cycle.


Figure 5. State transition diagram with transition probability.

## Exercise 4

In Figure 6 the behavior of a state machine is shown. Before coding the states a statistical measurement of the two input signals, x 2 and x 1 is performed. The following results where found: The input value of 00 is found in $10 \%$ of the cases, 01 in $40 \%, 10$ in $30 \%$ and 11 in $20 \%$. How should the states be coded to minimize switching nodes in the resulting circuit? Assume that the input signals are independent of the current state.


Figure 6. State machine.

## Exercise 5

A circuit realizing a simple finite state machine is shown in Figure 7. The input signals $a$, $b$, and $c$ are random and have equal probability of being low or high. They are further glitch free and make their transitions simultaneously.


Figure 7. Simple state machine.
a) Determine the transition activity in the output node $f$. Also, determine the probability of $f$ being high. Neglect glitches and motivate your assumptions.
b) In what nodes are there a risk of glitches? Motivate your answer.
c) Redesign the circuit for a reduced power consumption. Identify the nodes that may have glitches in your redesigned circuit. Motivate your decisions.

## Exercise 6

a) What is bus invert encoding and how can it be used to lower the power consumption of a bus?
b) Encode the data sequence $0010,1101,0001,0000,1100,0101,1011$ with bus invert encoding. Let the bus invert signal be " 0 " for the first data.

## Exercise 7

a) In a logic function the input signal $a$ has a lower transition activity than the other input signals ( $b, c, d$, and $e$ ). Rewrite the logic function $f$ with the use of Shannon expansion. Let $a$ be the load-enable signal.

$$
f=\bar{a} \bar{b} c+\bar{a} b \bar{c}+a c \bar{d} e+a c \bar{e}
$$

b) Sketch the architecture of the circuit, where you show explicit which circuit the input and output signals are connected to. (Hint: Use registers with a load-enable input)

## Exercise 8

a) What is the essence of precomputational logic?
b) Consider an $n$-bit comparator. How can precomputational logic be used to reduce power consumption?

## Exercise 9

In a circuit the complement of each signal is required. A model of a bus consisting of $2(N+1)$ bit lines is shown in Figure $8(\mathrm{a})(N \geq 1)$. The transition activity of $x_{j}$ ( $0 \leq j \leq N)$ is $\alpha$. The bit lines $x_{j}$ and $x_{k}$ are independent when $j \neq k$. The probability of $x_{j}$ being equal to one is one half (the signal may be correlated). The clock frequency is $f$. Assume that all signals have the same rise and fall time and that all transitions starts simultaneously.


Figure 8. (a) Bus consisting of 2(N+1) bit lines, (b) two complementary bit lines, and (c) two uncorrelated bit lines.
a) Calculate the power consumption for the complementary bit lines shown in Fig. 8 (b).
b) Calculate the power consumption due to the parasitic capacitance $C_{2}$ for the two uncorrelated bit lines in Figure 8 (c).
c) Use the results from a) and b) to calculate the power consumption for the bus in Figure 8 (a).
d) The signal order is changed to: $x_{0}, x_{1}, \overline{x_{0}}, \overline{x_{1}}, x_{2}, x_{3}, \overline{x_{2}}, \overline{x_{3}}, \ldots, x_{N-1}, x_{N}, \overline{x_{N-1}}, \overline{x_{N}}$. Calculate the power consumption.
e) Calculate the power consumption in c) and d) when $C_{1}=50 \mathrm{fF}, C_{2}=100 \mathrm{fF}$, $N=15, f=1000 \mathrm{MHz}, \alpha=0.25$ and $V_{D D}=1.2 \mathrm{~V}$. Compare the results.

## Solution 1

a)

1) $\frac{\left(C_{a}+C_{a b}\right) V_{D D}{ }^{2}}{2}$
2) $\frac{\left(C_{b}+C_{a b}\right) V_{D D}{ }^{2}}{2}$
3) $\frac{\left(C_{a}+C_{b}+4 C_{a b}\right) V_{D D}{ }^{2}}{2}$
4) $\frac{\left(C_{a}+C_{b}\right) V_{D D}{ }^{2}}{2}$
5) $\frac{\left(C_{a}+C_{b}+2 C_{a b}\right) V_{D D}{ }^{2}}{2}$
6) $\frac{\left(C_{a}+C_{b}+2 C_{a b}\right) V_{D D}{ }^{2}}{2}$
c) The voltage over the capacitance $C_{a b}$ is never changed in scenario 4 .
c) In case $3, C_{a b}$ is charged from $V_{D D}$ to $-V_{D D}$ where all charges $\left(2 C_{a b} V_{D D}\right)$ are taken from the power supply voltage. In case $6, C_{a b}$ is first charged from $V_{D D}$ to 0 V where no charges are taken from the power supply voltage. Then, $C_{a b}$ is charged from 0 V to $-V_{D D}$ where the charges $\left(C_{a b} V_{D D}\right)$ are taken from the power supply voltage.
d) For $t_{1}<t<t_{2}$, the energy is dissipated in $M_{1}$ and $M_{4}$.


Figure 9. Time slot $t_{1}<t<t_{2}$.
For $t_{2}<t<t_{3}$, the energy is dissipated in $M_{1}$ and $M_{3}$.


Figure 10. Time slot $t_{2}<t<t_{3}$.
e)

1) $\left(C_{a}+C_{a b}\right) V_{D D}{ }^{2}$
2) $C_{a b} V_{D D}{ }^{2}$
3) $\left(C_{b}+2 C_{a b}\right) V_{D D}{ }^{2}$
4) $\left(C_{a}+C_{b}\right) V_{D D}{ }^{2}$
5) $\left(C_{a}+C_{b}+C_{a b}\right) V_{D D}{ }^{2}$
6) $\left(C_{b}+C_{a b}\right) V_{D D}{ }^{2}$

In a, the energy which is dissipated as heat is considered.
In e, the energy which is taken from the power supply is considered.
(These two values of energy may differ since energy may be stored in the circuit.)

## Solution 2

a)

| Binary | No. transitions <br> (from previous <br> state to current) | Gray | No. transitions |
| :---: | :---: | :---: | :---: |
| 000 | 3 | 000 | 1 |
| 001 | 1 | 001 | 1 |
| 010 | 2 | 011 | 1 |
| 011 | 1 | 010 | 1 |
| 100 | 3 | 110 | 1 |
| 101 | 1 | 111 | 1 |
| 110 | 2 | 101 | 1 |
| 111 | 1 | 100 | 1 |

Table 1. Binary code and Gray code respectively.
When counting from a complete cycle from 0 to 0 , binary code makes 14 transitions and Gray code makes only 8 transitions.
b) Consider a counter of magnitude $2^{n}$.

Gray code: $2^{n}$ transitions.
Binary code: $2^{n} \sum_{j=0}^{n-1} 2^{-j}=2^{n+1}-2$ transitions.
Binary code/Gray code: $\frac{2^{n+1}-2}{2^{n}} \approx 2.0$ for $n$ larger than 5 .

## Solution 3

a) Using the original state coding the average number of transitions/cycle $T_{\text {avg }}$ can be found as $T_{\text {avg }}=0.15 \times 4+0.25 \times 6+0.4 \times 6+0.2 \times 4=5.3$ transitions/cycle.
b) Another coding is shown on next page.


Figure 11. An FSM with another state coding.
The new value is given by $T_{\text {avg }}=0.15 \times 4+0.25 \times 4+0.4 \times 4+0.2 \times 4=4$

## Solution 4

To minimize switching power the most likely switching between states must be determined. The probability for different signal changes in all stages give the probability graph shown in Figure 12. The probability that the circuit is in a certain state can be computed by setting up the following relations.

$$
\begin{aligned}
& p(S 0)=0.0 p(S 0)+0.3 p(S 1)+0.0 p(S 2)+0.0 p(S 3) \\
& p(S 1)=0.5 p(S 0)+0.3 p(S 1)+0.4 p(S 2)+0.5 p(S 3) \\
& p(S 2)=0.3 p(S 0)+0.4 p(S 1)+0.0 p(S 2)+0.0 p(S 3) \\
& p(S 3)=0.2 p(S 0)+0.0 p(S 1)+0.6 p(S 2)+0.5 p(S 3) \\
& p(S 0)+p(S 1)+p(S 2)+p(S 3)=1
\end{aligned}
$$



Figure 12. Probability graph.

Solving (exclude for example $p(S 1)$ ) yields:

$$
\begin{aligned}
& p(S 0)=0.120 \\
& p(S 1)=0.400 \\
& p(S 2)=0.196 \\
& p(S 3)=0.283
\end{aligned}
$$

Now the probability of a switching between two states can be determined

$$
\begin{aligned}
& p(S 0-S 1)=0.5 p(S 0)+0.3 p(S 1)=0.180 \\
& p(S 0-S 2)=0.3 \mathrm{p}(S 0)=0.036 \\
& p(S 0-S 3)=0.2 \mathrm{p}(S 0)=0.024 \\
& p(S 1-S 2)=0.4 \mathrm{p}(S 1)+0.4 \mathrm{p}(S 2)=0.239 \\
& p(S 1-S 3)=0.5 \mathrm{p}(S 3)=0.142 \\
& p(S 2-S 3)=0.6 \mathrm{p}(S 2)=0.118
\end{aligned}
$$

The most common switching is between S1 and S2 and thus this change of state should consume a minimum of power i.e. as few changing bits as possible. The second most common switching is between S 0 and S 1 , and should therefore also be coded so that the haming distance is small in between.

Examples of codings
$\mathrm{S} 0=10, \mathrm{~S} 1=00, \mathrm{~S} 2=01, \mathrm{~S} 3=11$ or
$S 0=100, S 1=000, S 2=001, S 3=010$ (requires one more memory element, less activity in the bits representing the state, but the load for the clock signal is higher)

## Solution 5

a) The circuit realizes the state machine

where the probability of a state output is obtained from the following relations
$\left\{\begin{array}{l}p(0)+p(1)=1 \\ p(1)=p_{a b} p(0)+p_{a c}-p(1) \\ p_{a b}=p_{a c}^{-}=1 / 4\end{array} \Rightarrow\left\{\begin{array}{l}p(0)=3 / 4 \\ p(1)=1 / 4\end{array}\right.\right.$
The transition activity becomes $\alpha=p(0) p(1)+p(1) p(0)=3 / 8$ and the probability of an $f=1$ is $p(1)=1 / 4$.
b) Consider the following labeling $i_{j}$ of the internal nodes


Node $i_{1}$ should be glitch free since the inputs to the preceeding NAND gate are glitch free. Node $i_{2}$ may on the other hand have glitches, since the inputs to the preceeding NAND gate have different delays. The inverters may propagate but do not add glitches, hence nodes $i_{0}$ and $i_{3}$ should be glitch free while there is a risk of glitches in node $i_{4}$. The multiplexer both have differently delayed inputs and may propagate node $i_{4}$ to its output, making glitches possible at the output $i_{5}$. Even if $i_{3}$ and $i_{4}$ are without glitches, there is still a risk of glitches in node $i_{5}$. For example, consider the case where $f=0, i_{3}=1, i_{4}=0$. In the next state, $f$ will change value from 0 to 1 . If $i_{4}$ change value to 1 in the same clock cycle, there is a risk of a glitch due to that the $i_{4}$ can be propagated to $i_{5}$ before the value of $i_{4}$ is settled. The circuit output $f$ is on the other hand synchronized with the clock by a D flip-flop, and should hence be glitch free.
c) The inversion of $i_{1}$ and $i_{2}$ could be shared by postponing the inversion until after the multiplexer operation and, if the timing of $f$ allows, until after the D flip-flop operation. In the original circuit there are three inverters; one with transition activity $1 / 2$, one with transition activity $3 / 8$, and one with transition activity $3 / 8+$ transition activity due to glitches. In the redesigned circuit there are two inverters; one with transition activity $1 / 2$, and one with transition activity $3 / 8$. Hence, the redesigned circuit will have one glitchy node less. The redesigned circuit is shown in the figure below.


In the redesigned circuit, the nodes $i_{2}^{\prime}$ and $i_{3}^{\prime}$ may have glitches.

## Solution 6

a) See corresponding paper.
b) Original: $0010,1101,0001,0000,1100,0101,1011$

With bus-invert encoding: 0 0010, 1 0010, 1 1110, 1 1111, 10011,1 1010, 01011

## Solution 7

a) $f=\bar{a}(\bar{b} c+b \bar{c})+a(c \bar{d} e+c \bar{e})$

We define two new functions $f_{\bar{a}}=\bar{b} c+b \bar{c}, f_{a}=c d e+c e$.
The function $f$ can now be written as $f=\bar{a} f_{\bar{a}}+a f_{a}$.
b) The architecture of the new circuit is shown below.


Architecture of the new circuit

## Solution 8

a) Make a smaller computation first where the result is used to decide if further computation is necessary and which resources that should be used.
b) For example, check the two most significant bits of the input signals. Only if they do not differ, the other $n-2$ bits need to be considered.

## Solution 9

a) $P_{a}=\alpha f V_{D D}{ }^{2}\left(C_{1}+2 C_{2}\right)$
b) Let $P(a, b)$ denote the probability for state $(a, b)$.

Let $P((a, b) \rightarrow(c, d))$ denote the probability for state $(c, d)$ in the next clock cycle given that the state is $(a, b)$ in the current clock cycle.
$P(0,0)=1 / 4$
$P((0,0) \rightarrow(0,0))=(1-\alpha)^{2}$
$E=0$
$P((0,0) \rightarrow(0,1))=(1-\alpha) \alpha$
$E=\frac{C_{2} V_{D D}{ }^{2}}{2}$
$P((0,0) \rightarrow(1,0))=\alpha(1-\alpha)$
$E=\frac{C_{2} V_{D D}{ }^{2}}{2}$
$P((0,0) \rightarrow(1,1))=\alpha^{2}$
$E=0$

$$
\begin{aligned}
& P(0,1)=1 / 4 \\
& \text { Dissipated energy } \\
& P((0,1) \rightarrow(0,0))=\alpha(1-\alpha) \quad E=\frac{C_{2} V_{D D}{ }^{2}}{2} \\
& P((0,1) \rightarrow(0,1))=(1-\alpha)^{2} \quad E=0 \\
& P((0,1) \rightarrow(1,0))=\alpha^{2} \quad E=2 C_{2} V_{D D}{ }^{2} \\
& P((0,1) \rightarrow(1,1))=\alpha(1-\alpha) \quad E=\frac{C_{2} V_{D D}{ }^{2}}{2} \\
& P((a, b) \rightarrow(c, d))=P((a, b) \rightarrow(c, d)) \\
& P_{b}=f\left\{\frac{1}{4}\left(2 \alpha(1-\alpha) \frac{C_{2} V_{D D}^{2}}{2}\right)+\frac{1}{4}\left(2 \alpha(1-\alpha) \frac{C_{2} V_{D D}^{2}}{2}+2 \alpha^{2} C_{2} V_{D D}^{2}\right)+\right. \\
& \left.\frac{1}{4}\left(2 \alpha(1-\alpha) \frac{C_{2} V_{D D}^{2}}{2}\right)+\frac{1}{4}\left(2 \alpha(1-\alpha) \frac{C_{2} V_{D D}^{2}}{2}+2 \alpha^{2} C_{2} V_{D D}^{2}\right)\right\} \\
& P_{b}=\left(\alpha(1-\alpha)+\alpha^{2}\right) f C_{2} V_{D D}{ }^{2}=\alpha f C_{2} V_{D D}{ }^{2} \\
& \text { c) } P_{c}=(N+1) P_{a}+N P_{b} \\
& P_{c}=\left\{(N+1)\left(C_{1}+2 C_{2}\right)+N C_{2}\right\} \alpha f V_{D D}{ }^{2}=\left\{(N+1) C_{1}+(3 N+2) C_{2}\right\} \alpha f V_{D D}{ }^{2} \\
& \text { d) } P_{d}=\left\{(N+1) C_{1}+(2 N+1) C_{2}\right\} \alpha f V_{D D}{ }^{2} \\
& \text { e) } P_{c}=\left\{(N+1) C_{1}+(3 N+2) C_{2}\right\} \alpha f V_{D D}{ }^{2} \\
& P_{c}=\left\{16 \cdot 50 \cdot 10^{-15}+47 \cdot 100 \cdot 10^{-15}\right\} 0.25 \cdot 10^{9} \cdot 1.2^{2} \approx 1.980 \mathrm{~mW} \\
& P_{d}=\left\{(N+1) C_{1}+(2 N+1) C_{2}\right\} \alpha f V_{D D}^{2} \\
& P_{d}=\left\{16 \cdot 50 \cdot 10^{-15}+31 \cdot 100 \cdot 10^{-15}\right\} 0.25 \cdot 10^{9} \cdot 1.2^{2} \approx 1.404 \mathrm{~mW}
\end{aligned}
$$

With the given parameters $29 \%$ power is saved in d) compared to c).
Power may be saved if the complements of the signals are internally created in the receiver. Another way to reduce the power consumption is to increase the distance between the interconnects, which results in a decreased $C_{2}$ and a slightly increased $C_{1}$. In a certain $0.35 \mu \mathrm{~m}$ CMOS-process a bus consisting of 16 uncorrelated bit lines, a power reduction of $36 \%$ could be achieved when the distance between the interconnects were increased from $0.7 \mu \mathrm{~m}$ to $1.4 \mu \mathrm{~m}$. The cost is the additional area, but on the other hand, the propagation delay of the bus decreases (why?).

