Constrain delay of internal paths

Altera_Forum · ‎01-03-2013

Hi All,

Here is my clocking setup. I have the main clock coming into a pin and then into a 1-to-1 PLL. The PLL output is tapped off to local routing. From that routing it comes out in a tree, one branch to a global buffer, than three branches to LAB's which gate the clock, and there outputs run to global buffers. Data is passed between these clock domains, and on many synthesis runs I get failing paths which include as much as 3ns of difference in delay between one gated clock and another, both of which are on global buffers. Other times the fitter matches the paths much close, within a few 100ps. The added delay is in the routing between the first global buffer, to the clock gate, then from the clock gate to the next global buffer. Additionally I've been turning off optimize_hold_timing to get the tool to show my worst paths, so I can address them, rather than optimize them out and have less critical paths as the worst case failures. I'd like to fix the delay between these gated clocks as closely as possible, say within 100ps of each other, so when I disable optimize hold timing, it shows me all the other issues.

Stated another way in psuedocode I have

wire pllout;

wire sysclk;

wire gclk1;

wire cntrl1;

assign sysclk = pllout;

assign gclk1 = cntrl1 & gclk1;

How can I say to the fitter, match the delay between pllout and sysclk to the delay between pllout and gclk1. Or how can I set them both to have the same min and max?

Thanks a lot,

Steve

Altera_Forum · ‎01-03-2013

Your un-gated domain, which comes directly from the PLL, will just have the global clock tree for its delay. The gated clocks will have this original global clock tree(PLLs ALWAYs drive a global) and then it gets off and goes through a LUT, and then onto another Global. So those delays will be much longer. A few ideas:

- If the gated clocks don't match, you might neet to lock down the LUT location and the global buffer it drives so they're right next to each other. I thought the fitter would automatically do that, so only do it if you don't see it happening.

- You could make your "non-gated" clock look like it's gated, i.e. have it go through a LUT. Just have it go through a wire with a keep attribute on it. That will then make it go through the same type of long path that the three gated clocks are going through.

- Ideally, get rid of the gated clocks. You could:

a) Create clock enables that drive all the logic, rather than gate the clock.

b) Instantiate the altclkctrl block multiple times. So basically your PLL would have 4 outputs, all identical, but they would each drive a different altclkctrl megafunction, and you would then use the enable on that block to turn the clock on and off. This would put them all onto globals driven directly by the PLL, thereby matching skews.

Altera_Forum · ‎01-03-2013

Thanks Rysc,

I already have the keep attribute on the ungated clock, so I don't have an uneven number of global clock buffers causing the delay mismatch. Removing the clock gating is not desirable, it's emulating an ASIC so we want to keep alterations to a minimum. I forced all four onto the same quadrant of global buffers, and that helped a lot, before the fitter had the cluster of logic in one spot that was not the chip center, and using global buffers in different quadrants. I might try your altclkctrl suggestion as well, as it might get an even closer match. Unfortunately I think if I need more that these four, I'll need to use another quadrant, which means I'll need to control the LUT location as well and not just depend on the P&R to shove them all close to each other. I'd rather not have to figure out how to manually control placement...

Altera_Forum · ‎01-03-2013

Put the node name in the To column in the assignment editor, choose Location for assignment type, and put in an X/Y co-ordinate, like "X1_Y1". If you want to find the co-ordinates, locate them in the Chip Planner. (Wherever you point, the x/y co-ordinates are shown in the top-left)

Altera_Forum · ‎01-03-2013

I'm using a command line flow, due to tying into other CAD tools, so would it be something like

set_location_assignment -to clk_control|cntrl1|outnode|combout X1_Y1

not really sure what the format after -to should be. I suppose I can use the gui to do a test and see what it creates in the .qsf.

Thanks again

Altera_Forum · ‎01-04-2013

ssnyde27,

- Your clock gating logic is susceptible to producing clock glitches when you gate/ungate the clock. You should use a glitch free gating scheme.

- Modern ASIC synthesis tools can infer glitch free clock gating from clock enables.

- Quartus has the ability to convert clock gating into clock enables during synthesis.

Option 1: You can re-write your HDL using clock enables; then turn on automatic clock gating in your ASIC synthesis tool. This is, hands down, my preferred option if I have a modern ASIC synthesis tool to work with.

Option 2: Manually use glitch free clock gating modules in your design; for FPGA tests, provide a clock gating module that meets Quartus conventions and enable automated gated clock conversion.

Quartus convention for gated clocks, suitable for conversion.

http://quartushelp.altera.com/10.1/mergedprojects/verify/da/comp_file_rules_clock.htm

Altera_Forum · ‎01-04-2013

Thanks, the code I posted was just an example of the clock tree structure. I'm not actually implementing the gate that way, the gate control signal is latched on the opposite edge as the gated clock active edge, to prevent glitches. How does the clock gate conversion play if the register already has an enable? I read somewhere that it only works if the enable is available, and I know many registers have an enable...

I've implemented the clock control blocks using altclkctrl. I think it's working but I'm getting a troubling warning.

Info (176449): Merged following Clock Control Blocks

Warning (176704): CLKCTRL has less registers on the enable (ena) path for some destinations, and consequently, it may have a slightly different behavior than expected

I removed the node names that typically follow "CLKCTRL". The design is on a Stratix IV. I have my clock coming in on a dedicated clock pin driving GCLK11. This drives PLL_T1, which in the datasheet is stated to be able to drive GCLK12-15. The PLL C0 output is GCLK12, then the three altclkctrl outputs are GCLK13-15. I forced them onto these nets with GLOBAL_SIGNAL and GLOBAL_SIGNAL_CLKCTRL_LOCATION assignments. I have a C1 PLL output with 2x frequency driving a RCLK. The datasheet states that each GCLK has it's own CLKCTRL, so I'm wondering what this merge is all about. Is there an issue with picking up all the CLKCTRL inputs from the same PLL output? The three altclkctrl have separate enables, so I don't want them being replaced by a single gate...

Thanks in advance for your help folks.

Altera_Forum · ‎01-04-2013

I've never seen that message, so can't help much. I did see one other case with this where they were talking about it merging two clkctrl blocks in serial, not in parallel. Not sure if you removed the names, but maybe that's something to look into. Probably worth looking at the RTL viewer and Technology Map Viewer to get a good handle on what's going on. Good luck.

Altera_Forum · ‎01-04-2013

--- Quote Start ---

How does the clock gate conversion play if the register already has an enable? I read somewhere that it only works if the enable is available, and I know many registers have an enable...

--- Quote End ---

I don't actually know.Maybe Quartus can merge the clock enable produced by gated clock conversion with the clock enables from your code. Or maybe not.You'll have to test it.

--- Quote Start ---

I've implemented the clock control blocks using altclkctrl. I think it's working but I'm getting a troubling warning.Info (176449): Merged following Clock Control Blocks Warning (176704): CLKCTRL has less registers on the enable (ena) path for some destinations, and consequently, it may have a slightly different behavior than expected

--- Quote End ---

It looks like Quartus is converting your multiple clock control blocks with a single one and using clock enables to achieve the desired behavior.The warning says that the behavior may be slightly different, but I suspect it just means timing may be slightly different.