ECE745 – Assignment 3


Develop a "mini"-architectural synthesis tool that takes simple dataflow descriptions on the input and it generates a Verilog (or SystemVerilog) RTL code at its output. This challenge is broken into several steps (described on the next page); this page contains the common problem statement for all the three steps.


An example file format (dataflow.input) is:

bitwidth b

resources a,m

latency l

input a,b,c,d

output e,f,g,h

var i,j,k

i=a+b

...

...


First, the constraints are declared. At the top of the line, there is the keyword "bitwidth" followed by a constant "b" which is the bitwidth of the entire circuit. This is followed by the statement "resources, a, m", which are single cycle ALU and single cycle multiplier and an optional statement for the latency "l" (in terms of clock cycles). If the "latency l" statement is present in the input file the number of resources is ignored because the objective in this case is to optimize the resources. On the other hand if the "latency l" is missing then the latency should be minimized under the given resource constraints.


For the input/output/variable declarations, we use the keywords "input", "output" and "var" respectively, followed by the signal names separated by a comma. All variables must start with a letter and must be alphanumeric. All the operands represent signed numbers on bitwidth "b".


When describing the data flow, it can be assumed that on the right-hand side of an assignment there is an operation with only two operands. The four types of the operators to be supported are: addition (+), subtract (-), multiplication (*), and the arithmetic right shift (>>>).


To simplify the problem all the outputs and intermediate variables are driven exactly once. If any of the above assumptions are violated, you can give an error message. No control structures of any type (e.g., if/case/for need to be supported).


For the controlling logic, it is acceptable to implement the finite state machine (FSM) as a shift register of depth "l" (latency) where each shift register bit represents a state in your FSM. This is known as one-hot encoding. Note also, in terms of arithmetic operations, there is no need to be concerned with detecting overflows (it is assumed that the lower “b” bits of the result are kept).


As stated on the first page, this challenging task is broken into four different steps:

i) Given the dataflow (either resource or latency constrained), implement a schedule using the force-directed scheduling algorithm when using a latency constraint or the list scheduling algorithm when using a resource constraint, and report the results in a readable format of your choice.

ii) Given the scheduled dataflow (output from step i), implement the mapping (or binding), both for functional units and registers, using an algorithm of your choice. As for step 1, the output of this algorithm should be in a readable format.

iii) Given the scheduled/mapped dataflow from step ii, improve the area/timing of the circuit by trying to optimize the steering logic through an iterative approach, i.e., re-schedule operations in different time-steps, or re-map variables to registers and/or re-map operations to functional resources.

iv) Given the output from step iii, generate the register-transfer level (RTL) code for synthesis (your code must be compiled by the Quartus Tool from Altera/Intel); you should also generate automatically a "self-checking" testbench. This testbench will be used for verification together with source RTL code in order to guarantee that the synthesized circuit operates correctly. Use Modelsim for verification.


The program should run as follows:

./hls <test_file.input> <mode>

<test_file.input> - input file with the dataflow graph (it must have extension .input)

<mode> - mode = 0 support only steps i, ii and iv

              - mode = 1 support all the four steps - step iii targets area reduction

              - mode = 2 support all the four steps - step iii targets clock period reduction


The automatically-generated output files should be stored in the “output” sub-folder. There should be at least one register-transfer level (RTL) Verilog file, preferably only one testbench file and a .do file to be used by Modelsim for self-checking purposes.


Extra remarks:

a. The input file must not have any extra blank lines in between statements. Spaces are permitted only between the keywords and the variable names. These restrictions should ease the task for parsing in the input file.

b. When doing an arithmetic right shift, it is assumed that the user ensures that the number of bits to shift is smaller than the bitwidth.

c. Constants have brackets around them and are always used as the SECOND OPERAND, i.e., e=d+(7). If the constant is negative, a minus sign appears just before the magnitude value of the constant, i.e., e=d+(-7); in this case you can take the 2's complement of the number prior to storing it in your data structure.