Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16616 Discussions

usage of arithmetic megafunctions

Altera_Forum
Honored Contributor II
1,254 Views

Hi, 

 

to solve a formula such as 5x^2+3x+2 what is the best using of the floating point mega functions? For example, do I have to use them in a finite state machine structure? If so, how? Could you please give me advices? 

 

Regards, 

 

Bedri
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
530 Views

Use them how you want. The FP megafunctions are simple pipelined blocks. So 1 data value in and out per clock cycle.

0 Kudos
Altera_Forum
Honored Contributor II
530 Views

since i'm a beginner, i dont know the best way of using FP MF's. after calculation completed by MF, i need a ready signal to start the next MF calculation process. i want to know the most suitable method? is my code below suitable? and could you please warn me of what is wrong? 

 

process (reset,clock,t) 

variable clk_u:integer:=0; 

variable clk_end:integer:=0; 

variable state,state_next:integer:=0; 

variable rdy:std_logic:='0'; 

begin 

if reset='1' then 

aclr<='1'; 

clk_en_mul<='0'; 

rdy:='0'; 

state:=0; 

else 

if rdy='0' and rising_edge(clock) then 

case state is 

when 0=> 

aclr<='0'; 

clk_en_mul<='0'; 

 

state_next:=1; 

clk_end:=2; 

when 1=> 

dataa_mul<=x"3A4DE32E"; 

datab_mul<=x"40000000"; 

 

state_next:=2; 

clk_end:=2; 

when 2=> 

clk_en_mul<='1'; 

 

state_next:=3; 

clk_end:=11; 

when 3=> 

clk_en_mul<='0'; 

 

state_next:=4; 

clk_end:=1; 

when 4=> 

clk_en_mul<='0'; 

result<=result_mul; 

rdy:='1'; 

 

state_next:=5; 

clk_end:=2; 

when others=> 

null; 

end case; 

 

clk_u:=clk_u+1; 

if clk_u>clk_end then 

state:=state_next; 

clk_u:=0; 

end if; 

end if; 

end if; 

ready<=rdy; 

end process; 

 

Regards 

 

Bedri
0 Kudos
Altera_Forum
Honored Contributor II
530 Views

I dont think you quite understand the idea of the pipelined module. 

You can input a value, and N clock cycles later you get your result. But because it is pipelined, you can input a new value on every single clock cycle, and you get the result N clocks later. There is no need to wait for the result before you input a new value. This is a pipeline.
0 Kudos
Altera_Forum
Honored Contributor II
530 Views

Lets assume for a moment that you want to solve the general calculation ax^2+bx+c. You need to break it down into what calculations you need to do. Looking at that formula you would need to do the following in order: 

 

(1) 

Solve x^2 

Multiply the result by a. 

(2) 

Multiply x by b. 

(3) 

Add (1) to (2) 

Add result to c 

 

Now the key thing here is that the bx and ax^2 do not rely on each other to be calculated, so you can do them in parallel - speed things up. The additions once those two are calculated can be done in any order. But if we look at (2) you will see one less step is required to calculate it than (1), so perhaps move the +c in there as it would reduce calculation time. You end up with something like this: 

 

https://www.alteraforum.com/forum/attachment.php?attachmentid=9454  

 

In order for the above to work, both the multiply and add modules have to have the same level of pipelining - if you think about it, if half of the calculation is ready one clock cycle earlier than the other, the next stage will be wrong (if you are using pipelining properly!). So create a module containing a floating point addition, and a second containing a floating point multiplication. I believe the addition will be quicker, so you may need to add your own pipelining to the addition block to align it. You will also need to add pipelining to the b[] and c[] variables to delay them to appear at the same time that the first blocks complete. 

 

If constructed correctly, you can then keep pumping numbers for all the variables in every clock cycle, and you will receive the result after an amount of time which is 3x the pipelining delay of one of the modules.
0 Kudos
Altera_Forum
Honored Contributor II
530 Views

Thank you for your replies.  

 

I have to implement an algorithm having too many calculations and matrix operations. Implementation exceeds the limitation on LE's and Embedded Multipliers if i use another MegaFunction for each arithmetic operation. So i tried to implement the algorithm with using a few MF's. But some equations have (for example) 2 multiplier some others have 5 multiplier. For example consequtive 4 equations need 2 multipliers:mult1,mult2 and the 5th equation need 4 multiplier:mult1,mult2,mult3,mult4. in this case the compiler returns an error such as unacceptable time delay for mult3 and mult4. Due to the such errors, i have used state machine structure to avoid exceedance of the limitations on embedded multipliers and logical elements.  

 

Now i'm trying to learn what is the best or most suitable way for such implementation. since i'm beginner i have been looking forward to find an example code but i still couldn't find. I wil be so happy if you give advices acording to the details given above. 

 

Regards, 

 

Bedri
0 Kudos
Altera_Forum
Honored Contributor II
530 Views

What is this "unnacceptable" time delay you are talking about? is that a failed timing path based on your sdc file? If thats the case, usually it's just easiest to increase the number of pipelining stages.

0 Kudos
Altera_Forum
Honored Contributor II
530 Views

thank you for your reply. i cant remember the error message but i was about time delay of carying signal from out of one MF to input of another MF. well, i have done some optimizations on my vhdl codes to process some calculations in parallel. my study is working fine now. but i want to increase my vhdl programming technique. but i could not explain my question quitely, since i'm a beginner. i will try to ask in a different way:  

 

consider you have to solve so many calculations and FPGA resources such as logical elements and embedded multipliers are limitted. therefore you have to use several altfp_mult for all multiplying process, several altfp_add_sub for all adding process etc. in a sequential structure. So how you code in vhdl to implement? What is the best or the most sutiable method for this type implementation? Could you please give me an example code or advise a tutorial or .....? 

 

Regards 

 

Bedri
0 Kudos
Reply