Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSEE 4868

Homework 3

2022

1 Models of Computation and Latency-Insensitive Design (100 points)

A correct answer without adequate explanation or derivation will have points deducted.

Problem 1. RTL Design (30 points).

Consider the three block diagrams of Fig. 1 illustrating three circuits, called respectively A, B and C.

The three circuits process three primary input sequences of data items Ui, Vi and Wi by applying functions /a, /b and fc, respectively, to obtain three corresponding primary output sequences: Xi = /a(U), = /b(Vi), and Zi = fc(Wi). The functionality of these circuits is the result of composing multiple combinational logic blocks which are represented by the gray circles, where the number within each circle denotes the delay of the critical path of its logic. Assume that the internal state variable of each circuit (i.e. the content of the flip-flop on the feedback) has been initialized with valid data for the computation and focus on the steady-state behavior of the circuit after the first few cycles since its initialization. Furthermore, assume the following:

• The circuits are operating at the fastest possible target clock frequency, after considering 2ns of clock skew and 3ns of additional margin (to account for clock jitter, wire delays, setup/hold time, etc.).

Effective throughput denotes the rate at which new values for the primary output sequences are computed, assuming a continuous stream of new values at the corresponding primary input sequences.

Latency (measured in clock cycles) and effective latency (measured in ns) denote the time needed to compute a new primary output value from a corresponding primary input value.

• A task for each circuit is defined as the computation of a primary output sequence of N data items given a corresponding primary input sequences of N data items. The task latency (measured in clock cycles) and effective task latency (measured in ns) denote the time needed to complete the task counting from the clock cycle when the first data item is sampled at the primary input to the clock cycle when the last data item can be sampled at the primary output.

Do the following:

a. Complete Table 1 based on your analysis of the three block diagrams.

b. Complete Table 2 assuming first that the three circuits perform a task on sequence of length N = 10 and then again for N = 100 and N = 1000.

Circuit

Delay of Critical Path

(ns)

Target Clock Frequency (Mhz)

Latency (clock cycles)

Effective Latency (ns)

Effective Throughput (Mhz)

A

B

C

Table 1: Table for Problem 1(a).


Circuit

Task Latency

Effective Task Latency

V

N=10

5ER W

N=100

S )

N=1000

N=10

3丿

N=100

N=1000

A

B

C

Table 2: Table for Problem 1(b).

Problem 2. Tagged Signal Model (20 points).

Hint: Recall that in the Tagged Signal Model a behavior is a particular tuple of signals that satisfied a process, a process is a set of behaviors, and a system is obtained through the composition of processes, which is a new process defined as the intersection of their behaviors. Since processes can be defined over different sets of signals, to form this composition it is typically necessary to extend the set of signals over which each process is defined to contain all the signals of all processes.

Consider the set SN of all the N-tuple of signals whose events are members of V x T, where the set of values V denotes the set of all possible 16-bit words, which will be expressed in hexadecimal notation, and the set of tags T denotes the set of natural numbers. Let Pi, P» and P3 be three processes, each consisting of a set of three behaviors.

Process Pi has the following behaviors:

Process P2 has the following behaviors:

Process P3 has the following behaviors:

3 = (s6, s7, s8)with

s6 = {(0x02,1), (0x04,5),

S7 = {(0x03, 1), (0x02,5), s8 = {(0x06,2), (0x08,6),

p3 = (s6‘, s7 , s8) with

s6 = {(0x02,3), (0x00,6), s7 = {(0x01,3), (0x00,6), s8 = {(0x02,4), (0x00,7),

Let the connection (or channel) C(i,j) be the process that constrains two signals to be identical, i.e. formally C(i,j) c

SN : (s1, ...,SN) e C (i,j) Si = Sj, with i,j e [1,N ].

1. List all behaviors of the system Q that is obtained by the following composition:

Q = Pi x P2 x P3 n C(1,8) n C(2,4) n C(5,8)

2. Assume that Pi, P2, and P3 are functional processes. Analyze the behavior of each process separately: from your analysis, present a hypothesis on the direction of each signal (input or output) from the process viewpoint and on the function that the process may compute.

Problem 3. Latency-Insensitive Design (50 points).

Consider the strict design of Fig. 2(top).

a. Table 3 reports the traces of the input signals for a fragment of its RTL behavior spanning eleven timestamps n e [1,11]. Complete Table 3 for the internal and output signals by filling the blanks.

b. Consider now the latency-insensitive implementation of Fig 2(bottom) where both sequential have been encapsulated with a single shell, having 3 input channels and one output channel. As for the example done in class, each LID channel has one void signal and one stop signal together with the data signal.

Assume that the three by-passable input queues of the shell have infinite size.

Table 4 reports the traces of the input signals for a fragment of its RTL behavior spanning eleven timestamps n e [1,11]. The three variables (d, v, s) are abbreviations for the data, void and stop signals of each channel.

Complete Table 4 by filling the blanks.

c. With the same assumptions as Problem 3(b), complete Table 5 by filling the blanks.

d. Consider again the latency-insensitive implementation of Fig 2(bottom) but now assume that the three bypassable input queues of the shell have each finite size equal to two slots. Complete Table 6 by filling the blanks.

e. With the same assumptions as Problem 3(d), complete Table 7 by filling the blanks.

Figure 2: Strict design (top) and LID implementation (bottom) for Problem 3.


n

Input

Internal

Output

a

x

y

w

m

z

1

2

2

0

2

2

2

1

5

2

0

3

3

3

4

4

1

4

2

5

8

1

7

6

4

3

6

7

9

2

9

8

7

1

5

9

4

2

8

10

11

Table 3: Table for Problem 3(a).