Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16557 Discussions

Optimizing my first VHDL project, big complexion of algorithm

Altera_Forum
Honored Contributor II
959 Views

Hi, 

Recently I started to continue working on my old project. The project is an implementation of a algorithm which generates polynomial in the GF(3) field ie. polynomial with coefficients 0,1,2 (so to represent one vector of GF3 i used 2 std_logic_vectors with values 00, 01, 10). 

The algorithm gives good results etc (that was the point of it) but now I have to expand it. 

At the moment the project synthesizes for an input polynomial of size 5 but I need it to synthesize for a size of about 1000. Now this is where the problem starts, due to my inexpierience with VHDL I probably coded this project very inefficiently (im guessing 2^n complexity), I made a small table with different values of n: 

n-synthesis time-%of logic resources used on DE-0 nano 

5-1:30-15% 

6-2:30-32% 

7-9:00-62% 

So in order to optimize my project I need some help. I attached the .vhd file, this is just the logic part so I have no pins connected etc, if anyone wants I can send the whole project with a nios2 processor etc. 

Here are my questions about the logic file: 

1. Was it a good idea to create my own gf3 types, would it be better if I just used 2 std_logic_vectors to represent each vector, does this have an influence on the complexity 

2. Should I use signals or variables, do they have any difference in complexity, If I understand correctly, while using signals I would have to make a new state for every time a signal changes(the algorithm is mainly sequentional). 

3. Should I even use functions? I tried a lot of things in different projects, even one loop containing a function(which also uses a loop, for example the rolfun function in the attached file) for polynomials of 1000 already made the design not synthesis-able. 

4. From what I know, loops are a big problem(I don't need the loops to finish in one clock cycle), my question is though how do I change these loops to state machines efectively, does this mean that I should put all of the algorithm in the main part of the "seq" process and not in a procedure. 

5. If I move the algorithm to the "seq" process how do I effectively "divide it", from what I understand the "division" of the algorithm into parts would just be different states(so this is just manually dividing parts of the algorithm into seperate clock cycles)? 

 

Thanks for all the help, if something is still unclear in what I need help with please ask. 

 

P.S. I had this topic open before:http://www.alteraforum.com/forum/showthread.php?t=39208 

It confirmed that its possible to compile the logic in modelsim which uses sequentional processing instead of parallel, I don't know how to change the logic to fit this description. 

 

Fred
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
228 Views

1) The compiler should handle that just fine, makes no difference in logic complexity. 

2) Signals and variables don't have a direct difference in complexity. They do have different behaviors and you should use whichever best describes the behavior you need. 

3) Functions can be OK or not, depending on the logic they implement. 

4) Loops can be a big problem... 

5) I'm slightly amazed Quartus even compiled that... 

 

I think you need to go back to basics and find a tutorial for digital logic and synthesisable VHDL. In terms of synthesis, your project is so complex I can't even begin to tell you where to fix it. 

 

You do need to be able to think what your code will become in terms of hardware. 

For example, the following translates into a combinatory multiplier block (a*b), whose result is fed to rising edge flip-flop bank with asynchronous reset. 

process(reset, clk) begin if reset = '1' then y <= (others => '0'); elsif rising_edge(clk) then y <= a*b; end if; end process; 

 

The following translates to actually N copies of the above. 

 

process(reset, clk) begin if reset = '1' then for i in 0 to N-1 loop y(i) <= (others => '0'); end loop; elsif rising_edge(clk) then for i in 0 to N-1 loop y(i) = a(i) * b(i); end loop; end if; end process; 

 

And this translates into a chain of N-1 multiplier blocks feeding into a register bank. 

The chain will also have N-1 times the delay, so this will not be able to work at a high frequency. 

 

process(reset, clk) variable t : some_type; begin if reset = '1' then for i in 0 to n-1 loop y(i) <= (others => '0'); end loop; elsif rising_edge(clk) then t := a(0); for i in 1 to n-1 loop t := t * a(i); end loop; y <= t; end if; end process;
0 Kudos
Reply