Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16598 Discussions

Compilation Dependent Behavior

Altera_Forum
Honored Contributor II
1,307 Views

Hey Everyone, 

 

I have been using the Quartus II Software IDE for awhile in order to create programmable logic designs, but recently I have become running into some issues that I have been unable to resolve. 

 

The primary issue I have run into is the compilation dependency of a design. I realize that the timings could vary between compilation, resulting in a change in the functionality of the design. 

 

My problems come primarily from writing to an external SRAM (the reading works fine.) Some compilations work perfectly, whereas changing a constants value in VHDL and recompiling will create it to write intermittently. 

 

I have run through timing advisement wizards, have started using the timequest timing analyzer but still am having issues. 

 

If there is a particular setting or mode of operation that can lead me to resolving this issue, any information would be greatly appreciated. 

 

Thank you.
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
558 Views

A few things to note 

1 - Any change to the design will produce in a different compilation result. You can partition the design and lock down the partitions so they don't change between compilations if you'd like. 

2 - I would first analyze your SRAM interface. Does it provide proper timing by design of all the data, address, and control signals? 

3 - If the SRAM interface design is correct, proper timing constraints should take care of your problem. It's likely that you just haven't got the timing constraints quite right for your design. 

 

If you are willing to post your design, we could take a look at it. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

Thank you for your reply, 

 

1- I assume that is the Logic Locked region assignments you are referring to; I have not quite been able to utilize this feature properly, but the design I am working on actually uses four different copies of my memory controller, which is each contained within a logic locked region. 

 

2- I looked at the operation in Signal Tap, and it appears as if all the timings are correct. I actually tried increasing the hold time, decreasing the memory clock etc. in order to get the design to work. 

 

3- Are you referring to the Timing Analyzer's ability to set path constraints?  

 

The VHDL source is quite large but I will give it a shot. Right now the code is set up so that the memory can be written and read from separate sources. The SRAM being used is the CY7C1470V25. I plan on making the controller much more complexed once I get this simplified controller working properly! 

 

Thank you again. 

 

LIBRARY WORK; USE WORK.cy7c147.all; LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; USE IEEE.NUMERIC_STD.ALL; ENTITY mem_controller_cy7c147 IS GENERIC ( address_width : INTEGER := 12; data_width : INTEGER := 32 ); PORT ( --Write Control Ports. data_in : IN STD_LOGIC_VECTOR(data_width-1 DOWNTO 0); write_address : IN STD_LOGIC_VECTOR(address_width-1 DOWNTO 0); we : IN STD_LOGIC; --Read Control Ports. read_address : IN STD_LOGIC_VECTOR(address_width-1 DOWNTO 0); --Reset rst : IN STD_LOGIC; --Memory Clock mem_clk : IN STD_LOGIC; --Data Output data_out : OUT STD_LOGIC_VECTOR(data_width-1 DOWNTO 0); data_ready : OUT STD_LOGIC; --External Memory Signals ext_data : INOUT STD_LOGIC_VECTOR(chip_data_width-1 DOWNTO 0); ext_address : OUT STD_LOGIC_VECTOR(chip_address_width-1 DOWNTO 0); ext_we : OUT STD_LOGIC; ext_oe : OUT STD_LOGIC ); END mem_controller_cy7c147; ARCHITECTURE behavioral OF mem_controller_cy7c147 IS TYPE sm_ext_mem_controller IS (idle, wr_init, wr_latch, wr_end, rd_init, rd_wait0, rd_wait1, rd_end); SIGNAL state : sm_ext_mem_controller := idle; SIGNAL buff_read_data : STD_LOGIC_VECTOR(data_width-1 DOWNTO 0); SIGNAL buff_write_data : STD_LOGIC_VECTOR(data_width-1 DOWNTO 0); --External Buffers SIGNAL buff_ext_data : STD_LOGIC_VECTOR(chip_data_width-1 DOWNTO 0); SIGNAL buff_ext_address : STD_LOGIC_VECTOR(chip_address_width-1 DOWNTO 0); SIGNAL buff_ext_we_L : STD_LOGIC; SIGNAL buff_ext_oe_L : STD_LOGIC; --Read control signals SIGNAL modified : STD_LOGIC := '0'; --Signals that the memory has been modified and should be read. SIGNAL current_address : STD_LOGIC_VECTOR(address_width-1 DOWNTO 0); --Current address's data latched into the buff_read_data register. BEGIN buff_ext_address(chip_address_width-1 DOWNTO address_width) <= (OTHERS=>'0'); --Map the unused address lines to 0. PROCESS(rst,mem_clk) BEGIN IF(rst='1')THEN state <= idle; ELSIF(RISING_EDGE(mem_clk)) THEN CASE state IS WHEN idle=> --Clock domain transfer registers. buff_write_data <= data_in; buff_ext_data <= (OTHERS=>'Z'); --The data bus should be tri-stated in the IDLE state since we are reading. buff_ext_oe_L <= '0'; --Ouput enable should be enabled in the IDLE state. buff_ext_we_L <= '1'; --Disable write in the IDLE State. IF(we='1')THEN buff_ext_address(address_width-1 DOWNTO 0) <= write_address; --Put the write address on the bus. buff_ext_oe_L <= '1'; --Disable the chips output enable. modified <= '1'; --We need to signal the controller to read next chance it gets. state <= wr_init; ELSIF(modified = '1' OR current_address/=read_address)THEN buff_ext_address(address_width-1 DOWNTO 0) <= read_address; --Put the read address on the bus. modified <= '0'; state <= rd_init; END IF; --Write State Machine WHEN wr_init=> buff_ext_we_L <= '0';--Enable the Write Enable. state <= wr_latch; WHEN wr_latch=> --Memory gives one cycle to put data out. buff_ext_we_L <= '1'; --Disable the Write Enable. buff_ext_data(chip_data_width-1 DOWNTO data_width) <= (OTHERS=>'0'); buff_ext_data(data_width-1 DOWNTO 0) <= buff_write_data; state <= wr_end; WHEN wr_end=> --Data latched into memory. IF(we='0')THEN buff_ext_data <= (OTHERS=>'Z'); --The data bus should be tri-stated in the IDLE state since we are reading. buff_ext_oe_L <= '0'; --Ouput enable should be enabled in the IDLE state. state <= idle; END IF; --Read State Machine WHEN rd_init => --First address out here. state <= rd_wait0; WHEN rd_wait0 => --Pause for a cycle (A read takes 3 cycles to get out of memory.) state <= rd_wait1; WHEN rd_wait1 => state <= rd_end; WHEN rd_end => buff_read_data <= ext_data(data_width-1 DOWNTO 0); current_address <= buff_ext_address(address_width-1 DOWNTO 0); state <= idle; END CASE; END IF; END PROCESS; --This process block controls the data ready signal. PROCESS(current_address, read_address) BEGIN data_ready <= '1'; IF(current_address /= read_address) THEN data_ready <= '0'; END IF; END PROCESS; data_out <= buff_read_data; ext_address <= buff_ext_address; ext_data <= buff_ext_data; ext_oe <= buff_ext_oe_L; ext_we <= buff_ext_we_L; END behavioral;
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

Sounds like a timing optimisation problem. Certain builds work perfectly and so it is unlikely to be functional error. The most possible culprit is at the bidirectional data to memory. To optimise this is a bit of headache if you want to follow the official rules of TimeQuest and delay measurements. 

 

I suggest trial and error. Make sure all the registers are fast io. keep the Tco of various builds under control(may vary wildly if registers not fast io). 

 

A good test is having a PLL to rotate the data clk in order to find best window.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

I have been investigating the issue more closely in signal tap, trying to find the main source of the problem. I have noticed a few weird issues. 

 

Just some more information: 

The write portion of the memory controller is controlled by a PCI interface operating at 33 MHz. This clock is taken from the PCI bus. 

 

The read portion of the memory is controlled by a VHDL microprocessor that I created operating at 500 MHz. This is a derived clock from a 50 MHz clock. 

 

The memory controller/ SRAM is operating at 200 MHz clock, also derived from the 50 MHz clock. 

 

The device is a Stratix IV FPGA. 

 

I have noticed there are some very odd "failed paths" listed in the timing analyzer between logic that should be isolated... I.e. A failed path between different copies of the memory controller, which is very odd to me. 

 

Since reads appear to be 100% solid, I am confused at where the intermittent failure is coming from incorrect timings to the SRAM. I have a register at the output of the PCI bus to the memory controller latching it in at 200 MHz. The data from the PCI bus is held for a very long time, so I don't see any clock domain transfer issues. 

 

In signal tap, I noticed that the value between the clock-domain transfer register and the buff_data_write register I use to buffer the value some times aren't matched! This doesn't make sense to me, since it should be a simple register-to-register transfer in the same clock domain (the difference being one has an enable controlled by the memory controller.) I understand that this could be a timing issue, but I find it hard to believe that the fitter would place two registers farther apart then the technology's register-to-register transfer delay... but I guess I still am new to the timing constraint aspect of Quartus. (Sometimes the output of the microprocessor would be correct despite having a failure as shown in signal tap, which also brings up the idea that the logic placed on the stratix chip to sample signals has a timing issue, masking the real cause of the problem.) 

 

I am trying to figure out how to use the Timing Wizard better, since I have been using the Classic Wizard for the most of my programmable logic experience.  

 

I have also noticed that setting design partitions and/or logic locked regions currently disables the design completely, so I must be doing something very wrong with this feature. 

 

At this point, I have a few questions: 

 

- Increase more registered stages to assist the router in meeting minimum timing requirements (Reads are the only thing that needs to be fast at this point.) (Edit: Is this likely to help?) 

 

-I need Figure out how to set path delays (is there any way for the timing analyzer to do this automatically or easily, or do I have to set every path by hand?) I saw the free online session provided by Altera, and I know there is a support for asterisks, but this still seems like a very hefty task. 

 

-How do I add derive_clock_uncertainty to my SDC file? Or where do I find out how to calculate this parameter. The timing analyzer complains a lot about this. 

 

-I currently have the placement effort setting at 50, and it doesn't seem to be helping, so I am wondering if I should just shoot for logic locked regions and set this setting back to something more reasonable. 

 

-Is there anything different between the POF/SOF file? Sometimes the SOF file works, and the POF file doesn't! :eek: (this makes no sense to me.)  

 

I need to figure out whats going on so that I can actually continue implementing the difficult part of the design! The intermittent behavior is the most nerve racking thing about logic design. 

 

I greatly appreciate all your help with these issues. Once I gain more experience in this matter I am hoping to be able to provide feedback to others. 

 

Thank you again.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

 

--- Quote Start ---  

Sounds like a timing optimisation problem. Certain builds work perfectly and so it is unlikely to be functional error. The most possible culprit is at the bidirectional data to memory. To optimise this is a bit of headache if you want to follow the official rules of TimeQuest and delay measurements. 

 

I suggest trial and error. Make sure all the registers are fast io. keep the Tco of various builds under control(may vary wildly if registers not fast io). 

 

A good test is having a PLL to rotate the data clk in order to find best window. 

--- Quote End ---  

 

 

I was wondering if you could explain the idea of having a PLL rotate the data clock in order to find the best window. Does this mean change the phase offset of the data clock in order to see which offset allows for the launch/latch edges to match up? 

 

I have had some issues with fast IO registers in the past, and tend to stay away from them.. but is it typical to assign output registers as fast output or fast input? There seems to be only exclusive settings (although I guess I could assign both fast input and fast output to a register.) 

 

Edit: 

I have been working on this issue quite awhile, and it seems I am optimizing in the wrong direction! With every timing constraint I add, it almost seems as if the design begins to fail even more, but I guess some result is better than none.  

 

Can enabling duplicate registers under physical synthesis options change the design behavior? It appears as if a number of failed paths are derived from the duplicate registers.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

Hi, 

 

Using the PLL depends heavily on Tco being under control, I just wonder how much is your Tco on each of 32 bits outputs to sram. 

 

You will need to know the Tco/Tsu/Th of fpga(configurable) and that of sram. 

I am assuming same clk is used for read/write Remember that direction is opposite between read and write. 

 

You can make the assumption that data and clk board delays are equal but finite. 

 

So to optimise the window:  

The valid timing window for read is the midpoint of the section of clk period excluding the segment (Tsu + Th of fpga). board delay irrelevant. 

 

The valid timing window for write is the midpoint of the section of clk period excluding the segment (Tsu + Th of sram) but as seen at fpga taking into account the board delay and relative direction of clk/data.  

 

The final optimum point will be that which is average of both cases. 

 

The question is how to force the optimum point: 

For read or write you need to control the relation of data to clk by rotating clk. 

 

For read you control the Tsu/Th of fpga + clk phase from PLL. 

for write you control Tco of fpga and force it to the required value through same PLL(compromise between the two). Check Tco values are not wandering around. 

 

Finally, you sat 500MHz, Do you really mean it, don't you have fmax timing problems? 

The signaltap may not help in your case so don't read much into it. In fact you better remove it. 

sof/pof difference issue is not possible. I believe your design is unstable.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

I just had an epiphany that I hope could be the issue: 

 

The PCI bus interface is responsible for providing three signals in a clock domain of 33 MHz that is being sourced by the PCI bus. 

 

The memory controller looks for when WE goes high, and latches in the data to store to memory immediately, running at 200 MHz(ish.) 

 

Its probable that the tpd from the WE and the data/address registers of the PCI bus are different to the memory controller. i.e. if the WE went high, the memory controller, running more than twice as fast as the PCI bus, would grab the data on the data/address lines before they were updated. This would probably cause some of the issues I am encountering. 

 

I am testing this theory at the moment, and I hope to have a reason to dance around the room :o.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

 

--- Quote Start ---  

Hi, 

.... 

 

Finally, you sat 500MHz, Do you really mean it, don't you have fmax timing problems? 

The signaltap may not help in your case so don't read much into it. In fact you better remove it. 

sof/pof difference issue is not possible. I believe your design is unstable. 

--- Quote End ---  

 

 

Yes, I do have some very simple logic running at 500 MHz. There is some fmax warnings for that logic, but believe it or not that logic is running just fine. 

 

The difference of SOF/POF seems to exist. I can't believe it either. I have tested this many times, and programming the SOF for the design works fine, no matter how many times I try it... whereas the POF will fail the first time. I understand how FPGAs work (for the most part), and cannot see a probable reason for this. 

 

Thank you very very much for your information. I will give your recommendations a shot.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

 

--- Quote Start ---  

Yes, I do have some very simple logic running at 500 MHz. There is some fmax warnings for that logic, but believe it or not that logic is running just fine. 

 

The difference of SOF/POF seems to exist. I can't believe it either. I have tested this many times, and programming the SOF for the design works fine, no matter how many times I try it... whereas the POF will fail the first time. I understand how FPGAs work (for the most part), and cannot see a probable reason for this. 

 

Thank you very very much for your information. I will give your recommendations a shot. 

--- Quote End ---  

 

 

Hi, 

 

only one remark to your fmax warnings. Did you run a worst-case and a best case analysis ? The speed difference between slow and fast could be huge. That means when you plan to use the design in production you may run in problems, because the timing of your FPGA's could vary a lot. 

 

Kind regards 

 

GPK
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

Yes I did run the analysis and the estimation difference was huge, but unfortunately I don't have much of a choice for the application since the hardware has already been determined. I am left at the discretion of the requirements and what I have been given to work with. 

 

In one of my other posts I have added some additional logic between the PCI bus and the memory controllers, and oddly enough the failure rate dropped extremely. From what I can tell, two of the memory controllers now work all of the time, and the other two memory controllers fail still, but work most of the time (i.e. 1/100 write seems to fail, which is still not acceptable, but certainly an improvement.) 

 

I was wondering if there was a straightforward tutorial on assigning path assignments in the Timing Wizard? 

 

I have a feeling that the failures are not due to an output-to-memory issue like I did initially but more of an internal timing issue. 

 

 

Thank you all again.
0 Kudos
Altera_Forum
Honored Contributor II
558 Views

I wanted to thank everyone for their help and let everyone know that I have solved the problem! 

 

For anyone with similar problems, I suggest they take the following steps: 

1) Run the Timing Advisor from the Quartus Advisor menu. 

2) Verify that their logic is correct. 

3) If they encounter compilation dependent behavior, attempt to use Logic Locked Regions. 

 

A good approach at using Logic Locked Regions seems to be as follows: 

A. Assign a logic locked region to the top-level "device" you are having issues with. 

B. Check the Chip Planner to see where the fitter placed the "floating" region. This means that the region can be re-located between compilation, which could be giving the compilation dependent behavior. 

C. If your "device" has external pins, try moving the region in the Chip Planner closer to the PIN bank you mapped it to. It will tell you it needs to convert the region to a "fixed" region in order to do this. Once the region is fixed, you will notice that either your design magically works, or the failure is the same between compilations! (A constant failure is better than an intermittent one!!) 

D. Try to optimize the placement of the logic, or add additional register-register transfers if there are Pins in different banks. This helps give the signals time to propagate to their destination pin. 

 

To solve my problem I logic-locked each controller independently, ran my Memtest program that I made. My memtest program would compare data read / written and found specific BITS that were failing. I then found the FPGA pin that bit was mapped to, and changed the logic locked region so that it was mapped closer to the failing pin. And magically, everything works! 

 

So now I am running stress tests on the memory, to make sure it is rock solid, while I get ready to add in the rest of my design! 

 

Thanks again. 

:D:D:D
0 Kudos
Reply