Tco Constraints

Altera_Forum · ‎10-15-2008

Hi,

I am working on an old Flex10k design which has quite a few timing problems that I am in the process of fixing.

At the moment I have quite a few Tco warnings due to constraints put in place by the original designer. For all but 1 of the warnings, I am able to explain why there is an issue (its seems as if the constraint was placed to aggressively and with the way in which the code is done, in some cases with a lot of combinational logic, Quartus cannot meet the requirment.

However 1 warning remains which I cannot explain and I am hoping somebody can give me a clue. The contraint as it appear in the assignment editor has the from field left blank (does this mean the same as * ?), the to field contains a internal reset signal which is activated on a clock edge and the time requirment is 14 ns.

Quartus then reports an error 'from' this signal 'to' an output pin with a slack of -1.1 ns.

I have a feeling that the constraint is not being correctly applied, but then again Quartus does not ignore the constraint or produce a warning.

What I understand a Tco constraint does is that it specifies the maximum acceptable time for a signal at the input to a register on a clock edge to appear at the output pin. Is this what it actually specifies?

Therefore if someone specifies an internal signal in the 'To' field of the constraint (as opposed to an ouput pin), I don´t understand why an error is not produced by quartus and why this causes the warning mentioned above which doesn´t 'seem' to relate directly to the constraint. (as the output in question was not constrained)

I´d be grateful if soemone could give me some ideas as to why I see this.

Also, as I mentioned at the start of the post its seems that some of the Tco constraints seem to have been placed too aggresively. Are there any guidelines for placing Tco constaints that i should be aware of, if I want to verify or change existing constraints.

Many thanks for your help

Altera_Forum · ‎10-15-2008

--- Quote Start ---

What I understand a Tco constraint does is that it specifies the maximum acceptable time for a signal at the input to a register on a clock edge to appear at the output pin. Is this what it actually specifies?

--- Quote End ---

Tco is the time from a clock edge to the output of the register actually changing. In your case the delay between a clock edge and your reset signal being activated. I would expect your clock signal to be in the "from" field.

--- Quote Start ---

I don´t understand why an error is not produced by quartus and why this causes the warning mentioned above

--- Quote End ---

Quartus sometimes just ignores invalid constraints and assignments. Is there a section in the report or console where it says something like "found x invalid timing requirements" - if you find this and expand it you may find this assignment in there.

According to the Quartus help on tco "This time always represents an external pin-to-pin delay" so I think you're probably safe to remove the assignment in question - Quartus should handle the internal timing requirements based on the clock requirement.

Is there a tpd requirement associated with your warning message - i.e. your man might have wanted the reset to propagate to the outputs in 14 ns.

--- Quote Start ---

Are there any guidelines for placing Tco constaints that i should be aware of, if I want to verify or change existing constraints.

--- Quote End ---

I thin kit's just a case of looking at your design and working out the relationship between the signals on the board.

Altera_Forum · ‎10-15-2008

You didn't mention what type of constraint it was(or I missed it, or it's implied to be a Tco). Anyway, one thing about how the assignment editor works. There are multi-point assignments, which have a node in the from and to columns, and single-point assignments, which always are listed in the to column, regardless of if it's from or to anything. Sometimes this doesn't make sense though. For example, if doing a Tsu constraint, you could do:

From -> pin_name

To -> internal_registers

So that all paths from the pin_name the internal_registers it feeds have this Tsu requirement. Or you could do a single point assignment of:

To -> pin_name

In this case it's a single point assignment, so the concept of from/to is ignored, and Quartus just knows your putting a Tsu constraint on the pin pin_name.

Just pointing out that you have a single point assignment.

(And to get more confusing, different timing constraints work differently for single points assignments. For example, Cuts propogate forward, i.e. you're cutting the timing on all paths that that node fans out to, while if it were a Multicycle single-point assignment, you're applying the multicycle to all paths leading into that node. This is all another reason why Classic Timing Analyzer was abandoned for TimeQuest...)

Altera_Forum · ‎10-15-2008

Thank you for both replies.

Forgive me but I´m still a little confused about the Tco constraint as I always thought that it was the time from a clock edge until the signal appears at an output pin, as this was the description given in the assignment editor in Quartus. It wouldn´t be the first time that i have misinterpretted it though. Also Quartus says that the time value specified in a Tco constraint "always represents an external pin to pin delay". What exactly does altera mean by this: a register input to output pin or an input pin to output pin?

Is it possible to place a Tco between 2 signals and then the Tco could mean the time from the clk edge until the output appears at the register output?

The constraint in question is a single point Tco costraint it seems. This constraint has an internal reset signal in the 'To' field, so Quartus, if I understand correctly, will attempt to meet the Tco time specified from this signal to each output pin that it can effect. Am I correct on this?

So now the warning that I see in the timing report makes sense (I initially thought it seemed unrelated, but the original constraint did not appear in the ignored constraints list), Quartus seems to be saying that from this signal to one of the outputs that the reset can affect does not meet the timing constraint.

I have had a look at the code and can see nothing abovious that I can do to improve the timing, so i suppose I will just relax this constraint and see how this work out in the simulation later.

Many thanks.

Altera_Forum · ‎10-15-2008

Tco should be from pin to pin. If it started at some point inside the chip, then in general it would not be very useful. On one compile the delay to that arbitrary point could be 1ns, on the next compile it could be 10ns, and the time your data comes out would be all over the place, yet Quartus would report the same Tco since it ignored the variance up to that point.

That being said, I'm not sure exactly what's being analyzed in your case. Right-click on it in the Tco report and do a List Path. You should see a message added to the window below, which can be opened to get more detail. This should tell you everything that it's analyzing and in much more detail. I can't tell you what the original designers intent was, which is the difficult problem you need to tackle.

Also, I think a Tco can be an asynchronous reset, i.e. if a reset comes in, goes through the register and to the output pin, much like a clock would.

Altera_Forum · ‎10-15-2008

--- Quote Start ---

Also Quartus says that the time value specified in a Tco constraint "always represents an external pin to pin delay". What exactly does altera mean by this: a register input to output pin or an input pin to output pin?

--- Quote End ---

As I understood this, you have a clock somewhere in your FPGA which ultimately comes from an input pin.

--- Quote Start ---

The constraint in question is a single point Tco costraint it seems. This constraint has an internal reset signal in the 'To' field, so Quartus, if I understand correctly, will attempt to meet the Tco time specified from this signal to each output pin that it can effect. Am I correct on this?

--- Quote End ---

I'm not quite sure what your particular "From * To reset" constraint would be doing to be honest - I would have thought that it would get ignored because reset is not an output pin. I would be surprised and worried if Quartus was transforming that constraint into "From reset To everything_else_that_depends_on_reset". I'm not saying that's not what's going on, but if it is what's going on then it's very weird and illogical.

from Rysc:

--- Quote Start ---

Also, I think a Tco can be an asynchronous reset, i.e. if a reset comes in, goes through the register and to the output pin, much like a clock would.

--- Quote End ---

I had understood that Tco was strictly for a clock and to achieve this for a reset, one would have to use tpd - again I wouldn't swear blind to this and Rysc may well be right.

Basically I can see no reason why you would actually want to specify the tco for an internal signal - why would you want a particular register's output to change within x ns of the clock edge - you don't care how fast it changes as long as it's fast enough to meet the setup times of downstream logic and registers to meet the clock requirements. If it meets the clock requirements then why would you want to make it faster - you're just giving Quartus a harder job to do.

--- Quote Start ---

...so i suppose I will just relax this constraint and see how this work out in the simulation later.

--- Quote End ---

- I would say strip it out and check your simulation afterwards.

Frankly I think you're in a bloody difficult position - you've obviously had an undocumented design dumped on you which was done by somebody who may not have fully understood what all the constraints meant and so ended up over-constraining the design. You could chuck out all of the constraints and then look at the design yourself and work out from scratch what they should be - potentially risky and I'm not even sure that I'd want to attempt it myself but it depends on how confident you are of your abilities and how confident you are that the majority of the existing constraints aren't a load of guff.

Good luck - you really have my sympathy on this one.

Altera_Forum · ‎10-15-2008

Thanks for the sympathy Batfink. Yeah to be honest it´s an absolute nightmare, I´m still very much a beginner at this, but previous projects I have worked on were much more interesting and enjoyable. Needless to say I haven´t really been enjoying my work lately but thanks to yourself and Rysc, its becoming a little easier.

From looking at the code so far there are quite a few parts badly done, and its looks the same with the way the design has been constrained. Quartus is ignoring at least 1000 multicycle constraints (and I haven´t got to them yet!) and it seems in general that little thought was put into how the design was constrained.

Anyway I´ll keep plugging away bit by bit.

Thanks again for your help.

Altera_Forum · ‎10-16-2008

I think "Character Building" is the term

Altera_Forum · ‎10-16-2008

Hi,

I have seen a slow slew time contraint on each of the output pins in the design. I suppose its obvious what it does, and in the assignment editor it says that this value will affect any Tco constraint which apply to the pins. What I am wondering is, would applying a slow slew rate constraint to each pin be nessesery under normal circumstances? I have seen that by disabling this constraint, I can see huge improvements in the timings.

What is the danger in disabling them? Just that more noise may be generated and signals may be received incorrectly?

The design is used as a VME bus controller, so the interface with the local processor and the actual VME bus is asynchronous. Given that fact that the interface is asynchronous and precautions would be taken to ensure that all signals are read correctly to avoid metastability etc, would bit be ok to remove these constraints. Maybe this is difficult for anyone to answear given that they are not familiar with the board etc, but any adivice would be appreciated.

Just another query I have regarding the Tco constraints. Rysc said yesterday that really a Tco constraint should be from input pin to output pin, as if it is from a register to output pin, the delay could from an input pin to the register could change. But if the inputs enters the FPGA and is clocked through various components before finally being clocked out, what is the best way to constrain this. Is it by constraing every path between the clock cycles from the input pin to the output. in which case it would be aceptable to place Tco contraints from register inputs to FPGA outputs

One final thing on a slightly different topic:

When I look at the clock setup time warnings, Quartus reports the required longest P2P time and the actual longest P2P time. What is the difference in these? and why is the fmax calculated using a differnt value.

For example one warning appears as follows:

Required seup relationship: 20 ns (50 MHz)

Required longest P2P Time: 18.8 ns

Actual Longest P2P Time: 19.8 ns

Actual Fmax: 47.62 MHz ( period = 21.000 ns )

Why does fmax have a period of 21ns and what is the difference in the required and actual P2P times?

Again many thanks for your help.

Altera_Forum · ‎10-16-2008

Required is what you put in for the clock constraint. Required P2P is essentially the requirement between the registers. Essentially this is after clock skew. Actual is how long that was, and then the actual Fmax is how fast that path could run at according to the timing results. Right-click on a path, do a List Path, and analyze the detailed report, and most of this should come out. You won't fully understand timing analysis with relying heavily on that list path. The panels in the timing report are just summary level info on each path.

I believe a Tco can go through ripple clocks, i.e. where the incoming clock feeds a register, and that register's .Q pin feeds the .CLK port of the next register, and can ripple through multiple registers. It's not recommended, but it's doable. So if you have a Tco on the output pin, then you should see this full analysis. Again, analyze the List Path to see what is being analyzed and then determine if it's what you need, and if not, why?

Altera_Forum · ‎10-16-2008

I may have mis-understood the Quartus literature when it says tco applies from an input pin to an output pin; but I had thought that the input pin is your clock and the output pin is obviously your signal output - i.e. tco is the time between the clock edge and output changing. Ultimately the clock will come from an input pin - hence the Quartus literature saying it applies to input and output pins.

The slew rate limiting as far as I know is to limit the rise (and fall) times of outputs. Basically the edge slope is the critical parameter when you're considering signal integrity on the board - forget about the frequency of clocks etc - it's the slope of the transitions on those signals that contains the high frequency stuff which causes you problems. If you're not running signals at stupid MHz then you don't need them to transition that quickly - so if it's giving you grief you can limit the slew rate. Quite often this is done as standard to avoid signal integrity / EMC issues rather than finding you need it and then turning it one.

My advice would be to get a scope out and see if the signals are suffering with ringing, overshoot etc without the slew rate limit. If they're OK and they don't appear to be upsetting anything else then you're probably safe to leave the slew rate limit off.

Altera_Forum · ‎10-17-2008

Batfink, good idea. I am hoping to get around to looking at the outputs on a scope and comparing them with the 2 different programs. If no significant difference exists, i am hoping to disable the slow slew rate constraints.

Rysc, thanks for the advice, I hadn´t been exapnding the paths like that before. But doing so provides a wealth of information and its much simpler to find where exactly in the desgin the problems exist.

I can understand and follow what Quartus is telling me, but there are some terms that I do not fully understand.

For example I am looking at the worst case setup warning and under the "- longest register to register delay" section, Quartus details 5 different sections that make up the path between the 2 registers, the 2 Reg nodes and 3 combinational nodes. For each one it gives the delay in the signal getting there but it calcualtes the delay using the following formula:

Info: 2: + IC(4.500 ns) + CELL(1.400 ns) = 5.900 ns; Loc. = LC1_D30; Fanout = 54; COMB Node = 'vic068:vic068_inst_1|local_if:local_if_inst_1|lm_mux:lm_mux_inst_1|i_m_blt~53'

I don´t understand what the IC delay refers to and what the Cell delay refers to. Although they both add up to 5.9 ns which is the important part. But I´m just curious as to what the classic timing anlyzer is actually doing.

Again many thanks.

Altera_Forum · ‎10-17-2008

Digging through the Quartus help it would appear that IC is the interconnection delay and CELL is the cell (i.e. logic propagation) delay. Apparently the routing forms the majority of the delay in modern chips so this would seem to tie up.

Understanding the routing delay and how Quartus has laid the chip out can help you squeeze a bit more out of the chip - e.g. where you can see a critical delay between two cells which are placed wide apart, you can use cliques to put them in the same row or LAB. You won't get that much gain by going in and looking at the routing in this way (a few ns in certain bits) but if you're right on the edge of meeting your timing constraints it can be a help.

Looking at the fan-out can help you make changes to your design - e.g. adding pipelining were you have multiple levels of combinatorial logic - if you have say an adder and a mux followed by another adder between two registers and this appears to be the critical path, then stick a layer of registers between one of the adders and the mux - uses more resources but less combinatorial logic between any two registers and so the overall clock speed will increase.

Also there are some settings in Quartus to automatically add cells (combinatorial logic) and registers to improve timing - from memory I think these ar ein the fitter settings. basically by duplicating registers or cells, you can reduce the fanout and ease the routing delay.

Just a note of caution when you're looking at fast edges - use a fast scope with fast probes and don't have a long earth clip - take the earth clip off and wrap a short length of stiff bare-metal wire around the earth case of the probe - this will give you a very short earth which you can touch onto your circuit at a point (earth of course) very close to the point you are actually probing. This will give you a very low inductance earth connection and will give you a much better picture of what you're looking at. Check out oscillator outputs like this - with a standard earth clip you can get a nasty looking sine wave, with a low inductance earth you can get a nice square wave - the difference is quite surprising.

This sort of work is wquite frustrating hen you're doing it but I do think that experience of bad designs and poor documentation from other people can make you a better engineer in the long run.

Good luck

Altera_Forum · ‎10-17-2008

Thanks for the reply Batfink.

I do not fully understand what you mean by clique, when you say "you can use cliques to put them in the same row or LAB".

It sounds like something I could try as I´m on the edge in quite a few cases.

Thanks

Altera_Forum · ‎10-21-2008

Hi Ardni

Sorry for the late reply.

Having checked the help, cliques only work on ACEX 1K, FLEX 10K, FLEX 6000, or Mercury devices. You're using a Flex10k right?

Basically you can assign nodes of a design to a defined "clique" - from memory you have to give the clique a name and then assign nodes to it. Members of that clique are then placed in the same LAB or row (depending on what you set). E.g. define a new clique called "My_clique" with all its members being placed in the same LAB. Then where you have large delays between two nodes that aren't quite meeting your timing, add those nodes to the clique and Quartus will place them in the same LAB which will reduce their timing delay. If they are already in the same LAB before you start then don't bother - the clique won't make any difference.

Sorry it's a bit vague - I haven't done it in years.

I'm sure you should also just be able to make a location assignment and assign nodes to a certain row (or possibly even a particular LAB). Try fixing your critical logic elements that don't meet timing to the row right next to the device pin.

For anything you do you'll need to look at the floorplan to see where the delays are and also only change one thing at a time - it will have other knock on effects. From memory when I've done this I haven't had more than about five such assignments. If you change your design (source code) then you'll probably have to delete these assignments and start again.

It's not an easy solution and will only really be effective if you're pretty close on just a couple of paths. if you've got a screen full of timing failures then this approach probably won't help - you'll just waste a few days chasing your tail.

Good luck

Altera_Forum · ‎10-22-2008

Hi Batfink,

Thanks for the info. Yes I´m using a Flex10K and I tried your suggestion. The results were positive. I tried placing certain nodes where I saw excess delay in the same LAB and I did see some savings..enough to make timing on certain paths.

I tried doing this at 50MHz where the design is right on the dge of making timing, but the main goal is to have this design working at 64MHz. At the moment there are too many paths which do not make timing and as you said the knock on effect is noticable.

For this design to run at 64MHz, certain parts of the code will have to be re-done. Once this is achieved, if the design is much closer, I think using cliques could be very useful.

Just one question. You mentioned that cliques are only available on ACEX 1K, FLEX 10K, FLEX 6000, or Mercury devices, so I was just wondering how can this trick be implemented on the newer devices?

Also I was wondering why Quartus would not implement something like this automatically, to guide the fitter when it sees certain paths not making timing?

I´m sure that quartus is doing something along those lines, but that there is a good explanation as to why some paths still fail.

Would this be an advantage of using a 3rd party synthesis tool? that perhaps they would synthesis the design differently and obtain better results?

Anyway although this particular project has been a real head-wrecker, I´ve certainly learned a lot of new stuff. I never would have known about cliques for sure and probably wouldn´t have have learned as much about timing issues, so once again thanks for all the help.

Altera_Forum · ‎10-22-2008

--- Quote Start ---

Just one question. You mentioned that cliques are only available on ACEX 1K, FLEX 10K, FLEX 6000, or Mercury devices, so I was just wondering how can this trick be implemented on the newer devices?

--- Quote End ---

To be honest I don't know. But if you were to try this on say a Cyclone (1,2 or 3) then you'd probably find that Quartus had no problem meeting the timing just because the device is faster.

--- Quote Start ---

Also I was wondering why Quartus would not implement something like this automatically, to guide the fitter when it sees certain paths not making timing?

--- Quote End ---

Quartus does try to place logic to meet timing but I think what you're doing here is possibly a bit more of a human kind of thinking and possibly not that easy to turn into an algorithm. Or of course it may be that Quartus does do this on newer devices which is why you don't have the option of using cliques.

--- Quote Start ---

Would this be an advantage of using a 3rd party synthesis tool? that perhaps they would synthesis the design differently and obtain better results?

--- Quote End ---

Quite possibly. I used to use Leonardo (Mentor Graphics) in my previous job. I found that for the cyclone designs that we were doing at the time, using Leonardo for synthesis gave better results than using the Quartus synthesis capability and weirdly better than Precision (Mentor's newer tool). Leonardo also seemed to give better results than Synplify.

Interestingly at that time, Quartus was better at the microscopic synthesis - i.e. if you carefully coded up exotic registers (clock enables and synchronous sets/clears asynchronous sets/clears/loads) then Quartus was better than Leonardo. However take a huge design and Leonardo was just better at optimising huge lumps of logic.

A word of caution here though - don't take any of this as a recommendation to buy that particular tool. There were a few designs where Synplify was better than Leonardo. some Lattice designs that I worked on were better in Precision than either Synplify or Leonardo. It's been a couple of years since I did any serious comparison and this may all have changed and may not have been valid for the sort of designs that you're doing. Also I think Mentor have been winding Leonardo down and trying to replace it with Precision.

Try your design(s) and see. If you get in touch with the vendors of the various tools then then they usually give you a trial licence.

--- Quote Start ---

Anyway although this particular project has been a real head-wrecker, I´ve certainly learned a lot of new stuff. I never would have known about cliques for sure and probably wouldn´t have have learned as much about timing issues

--- Quote End ---

You've also learned how vitally important it is to document your designs properly! from what you've said I seriously doubt your man would be able to explain all of his timing constraints himself. Although these sorts of jobs are a pain I do believe they can make you a better engineer in the long run if you decide you don't want to leave a similar mess as part of your legacy.

Cheers

batfink

Altera_Forum · ‎10-22-2008

For the history, cliques were around because they worked very nicely into how the fitter worked. Around the time of Apex, a completely new fitting algorithm was implemented that cliques did tie into so nicely, and so they were disabled. (The algorithm was much, much better, consistently giving better results in shorter compile times, so the trade-off was well worth it.)

In fact, you might want to look at your fitter settings and see if there's some way to enable a different fitter(it's been many years, so I don't remember this).

Since then, LogicLock regions have been introduced, including auto-sized/floating regions, which are essentially a clique with more granularity as to how big it is. That being said, most of the time if a designer takes their critical paths or hierarchy and throws them into a floating LLR, the results are equal or worse. The reason is that the fitter is already aware of what's critical and doing a very good job at optimizing it, so just drawing a rectangle around it actually limits the fitter's effectiveness and the choices it can make, but doesn't really provide it any info.

The times I do see LogicLocking help performance is when the user does it more like a floorplanning tool, i.e. they put an LLR on one edge for the PCI core, which connects to the another LLR which is the ingress hierarchy, etc. If the LogicLocking provides high-level layout information to the fitter, than it can help performance. (Not all the time, and I don't see huge gains, but it can help.)