Define internal signal as a base clock?

Altera_Forum · ‎02-20-2008

I have a problem performing timing analysis on a design. I'm using the Classic timing analyser.

Suppose I have a device which has many (low speed) inputs on seprate physical pins, and I need to be able to select any one of them to use as the clock for some downstream logic. The selection between them is made by an external CPU writing a register over a separate, high speed, synchronous interface.

The code the might look like this:

PROCESS (everything...)

BEGIN

IF clk_cpu'event AND clk_cpu = '1' THEN

IF addr = clk_mux_addr THEN

clk_mux_setting <= data;

END IF;

clk_logic <= clk_array (clk_mux_setting);

END PROCESS;

All the clocks in clk_array are asychronous to each other and to the CPU. I know I'll get glitches in clk_logic when the CPU changes settings and this is OK.

Further on I then have:

PROCESS (clk_logic, clk_cpu)

BEGIN

IF clk_logic'event AND clk_logic = '1' THEN

result <= (stuff);

END IF;

IF clk_cpu'event AND clk_cpu = '1' THEN

result_meta <= result;

result_cpu <= result_meta;

END IF;

END PROCESS;

ie. I'm double sampling the result to get it back into the clk_cpu domain, and have no expectation at all of any relationship between clk_cpu and clk_logic.

However...

When running the timing analysis, Quartus picks up on the fact that clk_logic does depend on clk_cpu, because it depends on clk_mux_setting. Moreover, this is quite a long path, so the fitter is wasting effort and my Fmax is greatly reduced. What I see in the report is a setup time violation between result and result_meta.

This is a false path, of course, and I could just specify cut_timing_path between these two registers. But, the real code is considerably more complex, and it would mean that the timing constraints are closely tied in with the workings of the process.

So, what I really want to do is insist that clk_logic be treated as a base clock - ie. as though it came in from an external pin, with its own Fmax and no implied relationship to any other clock.

Is this possible without actually routing it out to a physical pin and back in again through another pin?

Altera_Forum · ‎02-20-2008

You can route the input to an output and back in through another pin but not internally.

You've to route your signal to an output then connect the output to an input (maybe choose a dedicated fast input line or a global one) with an external wire.

It's a very annoing problem that I've in almost all design with Quartus.

Sometimes I've had success in put a global buffer out of the mux and then giving it assignement as a clock (and before cut the timing to the source of the mux).

The problem is that sometimes it works and sometimes not and I can't understand why.

Altera_Forum · ‎02-20-2008

See my posts in the thread at http://www.alteraforum.com/forum/showthread.php?t=754 for things you need to be aware of if you are going to drive a clock with logic resources. The things I say in that thread about divided-down clocks also apply to clocks driven by a multiplexer.

Some device families have clock control blocks, which are dedicated silicon configured with the altclkctrl megafunction to perform functions like clock gating and clock multiplexing. For a clock multiplexer implemented in logic resources instead of in a clock control block, make sure that each LUT or ALUT has at most one input toggling at a given time. There is more about that below. If your mux has a LUT with more than one clock input toggling, the output can glitch even while the select input is constantly selecting only one of those clock inputs. You can use "keep" synthesis attributes, instantiate LCELL primitives, or use WYSIWYG primitives to control how the muxing logic is broken down into individual LUTs or ALUTs.

The Classic Timing Analyzer is limited in its support for clock muxes. It would be best to use TimeQuest.

I doubt that your process with two clocks in the sensitivity list will synthesize correctly. Each process should have a single clock. When using metastability registers to cross between clock domains, the registers for each domain should be in their own process.

Here is what someone else wrote about having at most one input to the mux toggle at once:

--- Quote Start ---

Make sure each clock is gated prior to the mux to prevent glitches on the output. So:

clkA ----|

AND --- LCELL

cntrlA ---|

clkB ----|

AND----LCELL

cntrlB ---|

The outputs of the LCELLs will feed the mux, and only the active clock will be toggling when it hits the mux as it will be the only enabled clock. The mapper will collapse the “extra” LCELL into the AND, but will not collapse the AND functionality into the MUX which is what you want.

--- Quote End ---

Altera_Forum · ‎02-20-2008

Thanks for the tips guys, much appreciated :) These are the sorts of things that are hard to pick up when you have to teach yourself VHDL!

I'm reasonably happy that my derived clock, and the mechanism I'm using for passing signals between clock domains, is robust - it has been shipping in volume for several years now. The code fragment above is just an example to illustrate a point, of course - the real code is much more complicated.

If you're certain that TimeQuest would be able to better understand what I'm trying to achieve, then maybe the time has come (if you'll pardon the expression) for me to try it out. I just don't want to spend ages learning a new tool only to be confronted with a new error message that means the same thing as the old one!

Altera_Forum · ‎02-20-2008

I understand you Andy, it's the same to me :)

TimeQuest seems very nice imo but you've to specify a lot of things in order to make it works well and on work usually you've not all that time expecially if it's a modify of an old design.

Imo it's better start to use the TimeQuest in a new project in order to understand well all you've to specify.

Altera_Forum · ‎02-20-2008

Hello,

I think it's also possible with Classic Timing Analyzer to cut the "false path" from clk_cpu to clk_logic domain. I achieved this through cut timing path assignments as suggested in Quartus Handbook, chapter Classical Timing Analyzer, timing execptions.

I first tried -from clk_mux_setting* -to *, but that didn't work. For some reason, Timing Analyzer needs the assignment for each bit of clk_mux_setting[] separately. See below how the assignment was entered in assignment editor. There are also other possibilties, e. g. cut clk_cpu to result, but this would probably require a lot of signals to be handled explicitely.

The said assignment declares clk_mux_setting effectively a static value that is unrelated to cpu_clk, accepting to "get glitches in clk_logic when the CPU changes settings", as been said.

Regarding the multiplexer issue raised by Brad, I think, this shouldn't cause any trouble as long as clk_mux_setting is constant, although the multiplexer is decomposed to multiple cascaded LUT levels. My argument is, that for a given combination of clk_mux_setting the logic is such, that the multiplexer output doesn't depend on any other clock input than the selected one. But I'm not absolutely sure of this conclusion, so if you can give an example suggesting this might be different, I would think anew.

Not as important, but I wouldn't expect the mixed process clocks to have any effect in synthesis. But they should be clearly separated for code clarity.

Regards,

Frank

Altera_Forum · ‎02-20-2008

Be careful with "Cut Timing Path" for clocks in the Classic Timing Analyzer. There was a case in the past where the behavior was changed. Neither the old nor the new behavior would be obvious if you didn't already know about it. I think it might have had to do with the same signal being handled as both a clock and a data signal, but I don't remember the details. It also seems like the case I knew about involved wildcards.

--- Quote Start ---

Regarding the multiplexer issue raised by Brad, I think, this shouldn't cause any trouble as long as clk_mux_setting is constant, although the multiplexer is decomposed to multiple cascaded LUT levels. My argument is, that for a given combination of clk_mux_setting the logic is such, that the multiplexer output doesn't depend on any other clock input than the selected one. But I'm not absolutely sure of this conclusion, so if you can give an example suggesting this might be different, I would think anew.

--- Quote End ---

A LUT output will not glitch for a single input toggling. If, for example, you use a LUT to gate a clock with clock_out = clock_in AND enable, then clock_out will not glitch while enable is inactive even though clock_in continues to toggle.

A LUT output might glitch if more than one input is toggling. Even if the second toggling input switches between two locations in the look-up-table RAM that have the same value for the output (making the input a logical don't-care), the output can glitch as the toggling input switches between those two locations. This means that even a 2:1 mux implemented in a single LUT can have glitches on the output while the mux select is held constant. This isn't a problem for a mux in a synchronous data path (the glitches settle out in time if there is positive slack for clock setup), but it is a risk for clocks and asynchronous signals (like an asynchronous reset). I do not know whether this has changed for the most recent device families, but it is the case for at least most device families before the newest ones.

This potential for glitches, besides the cautions in the other thread to which I referred AndyC_772, is yet another reason to be very careful when you have logic in a clock path. A lack of hardware failures in the field to date does not mean that there will not be a problem in the future. The odds of failure might be very low, but designs that require high reliability must be careful about the things I posted in the other thread. The already fielded units might have sufficiently covered the voltage and temperature contributions to the PVT variation. The process portion of variation could be different though every time product is shipped using devices from a new production lot.

Altera_Forum · ‎02-20-2008

Hello,

regarding the multiplexer-in-LUT, if have searched for related statements in Altera documents. I found only a statement yet, that says, no glitches should be expected.

--- Quote Start ---

problem

Will the output of a 4-to-1 mulitplexer implemented in an Altera FPGAs glitch while the select lines are stable?

solution

No. If the select lines are stable while the other three inputs are changing, the silicon is designed such that the LUT output will not glitch. However, glitches may occur when the select lines are changing. http://www.altera.com/support/kdb/solutions/rd04202001_2747.html

--- Quote End ---

In contrast, for different logic function than utilized in a multiplexer, a change of one input can cause an unexpected glitch due to differences in delay:

--- Quote Start ---

problem

Can a single FLEX look-up table (LUT) cause a glitch with one input switching?

solution

If a function fits into one LUT, there will not be a glitch on the output when any single input toggles. For example, a 2-to-1 multiplexer can be represented as: q = (a & sel)# (b & !sel) If a and b are both 1 and sel switches from 1 to 0, differences in the delay between the two gates could result in a 0 glitch on the output if the design were really implemented in gates. However, the FLEX LUT is designed so that this function will not result in a glitch on the output. A function composed of multiple LUTs may cause glitches on a combinatorial output because the delays between the LUTs may be different. To avoid glitches when using multiple LUTs, register the output of the combinatorial circuit. http://www.altera.com/support/kdb/solutions/789.html

--- Quote End ---

In my view, there is a clear structural difference between the two cases discussed here, which allows to distinguish, why in the latter case glitches may occur but not in the first. Unless other results are known, I see a confirmation of my previous assumption regarding muxes in LUT. When the above restriction is considered, also chained muxes would be safe, except for the additional clock delay, which is however under Timing Analyzers observation.

On the other hand, I agree, if LUT operation would be similar to an asynchronous RAM, then glitches should be expected. So I hope, the quoted Altera statement is fully correct.

Regarding "Cut Timing Path"

--- Quote Start ---

There was a case in the past where the behavior was changed

--- Quote End ---

I have no reason to contradict, but I'm interested to know the details.

Regards,

Frank

Altera_Forum · ‎02-20-2008

I wonder whether the information at http://www.altera.com/support/kdb/solutions/rd04202001_2747.html is correct (I used the feedback form on that page to ask about that). In 2004 (before that on-line solution was last updated), an authoritative source at Altera said, "The chips themselves guarantee that when a single input to a look-up table changes, the output will not glitch." This statement applied to all 4-input-LUT-based device families up through at least the original Stratix plus MAX II.

The person who made that statement did not determine whether the same thing applied to ALUT families (Stratix II, Stratix III). Maybe http://www.altera.com/support/kdb/solutions/rd04202001_2747.html was supposed to say it was for ALUTs (not 4-input LUTs). That on-line solution is about a 4-to-1 multiplexer. For that size mux to fit in a single look-up table, it has to be in an ALUT.

Altera_Forum · ‎02-21-2008

--- Quote Start ---

I think it's also possible with Classic Timing Analyzer to cut the "false path" from clk_cpu to clk_logic domain. I achieved this through cut timing path assignments as suggested in Quartus Handbook, chapter Classical Timing Analyzer, timing execptions.

I first tried -from clk_mux_setting* -to *, but that didn't work. For some reason, Timing Analyzer needs the assignment for each bit of clk_mux_setting[] separately. See below how the assignment was entered in assignment editor. There are also other possibilties, e. g. cut clk_cpu to result, but this would probably require a lot of signals to be handled explicitely.

--- Quote End ---

I like this idea a lot. Many of the registers in the design are effectively 'set and forget', so cutting the timing path from these registers to everything else makes perfect logical sense.

So...

This morning I tried putting all the clock control registers into an Assignment Group, and set up an assignment:

From: regs_that_control_clocks To: * Cut Timing Path On Enabled

...and it didn't work :( My design's Fmax remains low, and if I use 'Advanced List Paths' to show me details of the slowest path on the chip, it goes via one of my mux control registers.

Then I remembered an issue I've seen before, which IIRC came in with Quartus 6.0 - the same time, I think, as when 'Time Groups' got renamed to 'Assignment Groups'.

Quite simply, assignments using Assignment Groups get ignored.

To test this theory, I tried setting up a whole bunch of separate assignments, each of the form:

From: clk_mux[1] To: * Cut Timing Path On Enabled

...and now it works fine :D

The downside is, of course, I end up with lots more assignments, which will be hard to manage.

Has anyone else come across this problem with Assignment Groups?

Altera_Forum · ‎02-21-2008

Followup:

With all the 'cut timing path' assignments, I no longer get setup time errors, and the Fmax for my design is reasonable.

But...

I still get Hold time violations for a lot of the registers which are clocked by clk_logic, which is odd because I've forced that signal to use one of the global clock nets. Skew between registers really shouldn't be a problem - and looking at the deeper analysis of the paths that are failing, it actually isn't.

What seems to be happening is this:

In the real code, as well as a multiplexer to select between input clocks, I also have a programmable divider associated with each clock input. So, there are multiple paths possible between each clock pin and clk_logic, depending on how the divider is programmed.

The timing analyser is - erroneously, I think - considering the shortest possible path through that divider and the longest possible, and coming to the conclusion that there's a skew problem. It doesn't seem to notice that, whatever the path through the divider, the source and destination registers are in fact driven off the exact same global signal.

I could just ignore the hold time violation errors, but there's a lot of them and it's always possible that there's a real problem hiding amongst them. The Fitter will also be misguided.

Driving my derived clocks out to external pins and back in again is starting to feel like an increasingly attractive option, but it seems a really ugly hack. If it pushes me over into a bigger package then I'll be upset!

Any ideas please guys?

Altera_Forum · ‎02-21-2008

--- Quote Start ---

The timing analyser is - erroneously, I think - considering the shortest possible path through that divider and the longest possible, and coming to the conclusion that there's a skew problem. It doesn't seem to notice that, whatever the path through the divider, the source and destination registers are in fact driven off the exact same global signal.

--- Quote End ---

The timing analyzer is probably doing the correct analysis. If there is more than one possible path for the clock (for example, more than one possible selection for a clock mux), the timing analyzer has to assume that one path is in effect at the time of the launching edge and another path is in effect at the time of the latching edge. The shortest and longest paths will be used in whichever combination is appropriate for clock setup and for clock hold. Most designs care about the timing only when the same clock path is selected for both launching and latching edges, but you have to tell the timing analyzer that this is the case.

This is one of the areas where TimeQuest has an advantage over the Classic Timing Analyzer. TimeQuest makes the same conservative assumption as the Classic Timing Analyzer about there possibly being different clock paths selected at the times of the launching and latching edges, but the SDC commands used by TimeQuest give you lots of control to tell the analyzer how the design actually works.

Altera_Forum · ‎02-21-2008

I'm sure you're right; if the divider were indeed reprogrammed between launch and latch edges then the paths it's using would indeed be correct. I'm 99% sure that I have 'Cut Timing Path' assignments on the register that controls the divider too, though.

I did have a play with TimeQuest yesterday, but what put me off was the need to edit the SDC file by hand; whilst I've no doubt that many designers are familiar with setting constraints in this way from other tools, I'm not. (I don't think I've read a manual in 10 years... seems a shame to have to start now!!)

Altera_Forum · ‎02-21-2008

--- Quote Start ---

I did have a play with TimeQuest yesterday, but what put me off was the need to edit the SDC file by hand...

--- Quote End ---

Once you have your initial SDC file, you do need to edit it to make changes. (Don't make the mistake some people do of making changes in the TimeQuest GUI and thinking write_sdc will then update their original SDC file with the changes. That's not the correct use of write_sdc.)

You still have GUI assistance for SDC changes. If you use the text editor, use "Edit --> Insert Constraint" to get to dialog boxes that will create the SDC commands and place them in the file. If you are editing a constraint that is already in the file, hover the mouse over the SDC command name to get a tooltip with a list of arguments. If you create constraints in the TimeQuest GUI, then copy them from the Console or History tab of the Console window to your SDC file after you are satisfied with them.

Altera_Forum · ‎02-21-2008

Could it be an option to synchronize divided clock to respective input clock, having only one clock path from input. The clock divider control path should be cut anyway.

Altera_Forum · ‎04-16-2008

I filed a service request asking that both on-line solutions found by Frank be checked for correctness (see post# 8 in this thread from FvM). Neither one is available now. I suspect they were removed because of my service request. I think the one for a 4-to-1 multiplexer was incorrect, and the one for FLEX devices needed to be updated for all relevant device families.

Altera_Forum · ‎04-16-2008

I guess maybe an update is in order...

I did eventually get to the bottom of things. There's no need to use TimeQuest - and in fact, I doubt it would solve the problem.

In my design, I have a bank of clocks, each of which can be sourced from numerous places under the control of a set of registers. One of these registers was getting optimised out - in name, at least - during synthesis, so the 'cut timing path' assignment associated with it was getting ignored.

So, the problem - and it's one that's caused me no end of grief in simulation too - is that during synthesis, not all the net names specified in the design are preserved. For example, if I have two components, joined together at the top level, then a signal will have at least three aliases: its name at the source (ie. the original assignment, in one component), another name at the top level, and another where it's an input to the second component.

So, assignments made to the top level signal name, or inputs to the second component, get lost and ignored. Worse still is that the name a signal does get in the netlist isn't necessarily one from my own source files at all, but from an inferred Altera library - so there's no way to even know what a net will end up called until after synthesis is complete and the result can be analysed.

Altera_Forum · ‎04-16-2008

Your complaints are common. The general guidance is to use register names instead of combinational node names where possible in assignments, to use "preserve" for registers and "keep" for combinational nodes in HDL synthesis attributes for nodes you want to be sure are available for assignments or simulation, and to use wildcards that often allow you to use names taken from the RTL (unless it is a node inside a megafunction or other IP block) with the wildcards representing the portions of the names that are created by Quartus (being careful not to create false matches with the wildcards). The node name used by Quartus will always be the name associated with the driver of the node unless you are using the get_pins collection in TimeQuest (and maybe for the Quartus native simulator--don't remember about that).

In the Node Finder, I am usually able to find the names I want and use them successfully in assignments if I use the "Design Entry (all names)" filter. I sometimes use a filter for just pins or just registers for convenience. Once in a great while it is necessary to use a particular filter like "Post-synthesis" or "Post-Compilation" to get the name needed by an assignment, or I have to be careful because a register and a combinational node have very similar names.