- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm using a Cyclone V SOC FPGA. Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating). The inputs to the multipliers are 12 and 16 bits wide. According to this document: https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/wp/wp-01159-arriav-cyclonev-dsp.pdf I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed. Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier). I tried to change the synthesis behavior to area driven - but nothing changed. Any idea what can cause such behavior ?Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you show the VHDL code? Have you tried instantiating the multipliers from the IP Catalog instead of using code inference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Have you tried instantiating the multipliers from the IP Catalog instead of using code inference? --- Quote End --- No. I preferred pure HDL since I want to parameterize the multiplier with generics during compilation.
entity multiplier is
generic
(
LOCATION_FIRST_RESULT_BIT : natural ;
WIDTH_A : positive ;
WIDTH_B : positive ;
WIDTH_RESULT : positive
) ;
port
(
IN_A : in std_logic_vector ( WIDTH_A - 1 downto 0 ) ;
IN_B : in std_logic_vector ( WIDTH_B - 1 downto 0 ) ;
OUT_RESULT : out std_logic_vector ( WIDTH_RESULT - 1 downto 0 )
) ;
end entity multiplier ;
architecture rtl_multiplier of multiplier is
signal signed_multiplier_result : signed ( WIDTH_B + WIDTH_A - 1 downto 0 ) ;
begin
signed_multiplier_result <= signed ( IN_B ) * signed ( IN_A ) ;
OUT_RESULT <= std_logic_vector ( signed_multiplier_result ( WIDTH_RESULT + LOCATION_FIRST_RESULT_BIT - 1 downto LOCATION_FIRST_RESULT_BIT ) ) ;
end architecture rtl_multiplier ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to my observation, Quartus uses all available DSP block before it starts packing multipliers. See same-topic discussion at Edaboard
http://www.edaboard.com/showthread.php?t=368754 I managed to fill up all 25 DSP blocks of Cyclone5 A2 with this testlibrary IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity test1 is
generic(
n : integer := 50;
w : integer := 18
);
port(
clk : in STD_LOGIC;
sel : in integer range 0 to n-1;
ax : in signed(w-1 downto 0);
bx : in signed(w-1 downto 0);
cx : out SIGNED(2*w-1 downto 0)
);
end test1;
architecture rtl of test1 is
type ar18 is array(0 to n-1) of signed(w-1 downto 0);
type ar36 is array(0 to n-1) of signed(2*w-1 downto 0);
signal ar : ar18;
signal br : ar18;
signal cr : ar36;
begin
process (clk)
begin
if rising_edge(clk) then
for i in 0 to n-1 loop
cr(i) <= ar(i)*br(i);
if i = sel then
ar(i) <= ax;
br(i) <= bx;
cx <= cr(i);
end if;
end loop;
end if;
end process;
end rtl
;
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page