Need Help to Verify Timing Constraints On Output Pins to External Device

Altera_Forum · ‎02-13-2015

Hi,

I'm relatively new in FPGA designing, so forgive me if my question seem simple.

The project I am working on involves using a Cyclone V to control 4 external DDS chips (AD9915). So I'm using the FPGA pins to send data to the DDS chips. I wish to clock my design at 156.25 MHz (which is period of 6.4 ns).

These DDS chips supply the 156.25 MHz clocks that I am using to clock my design in the FPGA. The DDS chip requires a setup time of 2 ns and hold time of 0 ns. (http://www.analog.com/static/imported-files/data_sheets/ad9915.pdf [pg 6])

I have read many reference online that specifies that a virtual clock has to be created for the 'set_output_delay' command to reference to. So just taking four pins from my design as an example, the SDC file looks like this. i have just attached a register with a constant value to these output ports just to verify the sdc file, also i put a 0 ns for setup and hold for the external device just to verify.

// These are the actual clock coming into the FPGA from the DDS

create_clock -name {ddsclk1} -period 6.400 -waveform { 0.000 3.200 } [get_ports {dds_clk_1}]

create_clock -name {ddsclk2} -period 6.400 -waveform { 0.000 3.200 } [get_ports {dds_clk_2}]

create_clock -name {ddsclk3} -period 6.400 -waveform { 0.000 3.200 } [get_ports {dds_clk_3}]

create_clock -name {ddsclk4} -period 6.400 -waveform { 0.000 3.200 } [get_ports {dds_clk_4}]

// These are virtual clocks I set up

create_clock -name {ddsclk1_ext} -period 6.400 -waveform { 0.000 3.200 }

create_clock -name {ddsclk2_ext} -period 6.400 -waveform { 0.000 3.200 }

create_clock -name {ddsclk3_ext} -period 6.400 -waveform { 0.000 3.200 }

create_clock -name {ddsclk4_ext} -period 6.400 -waveform { 0.000 3.200 }

// and then the setup time and hold time (I have set them to 0 just to verify)

set_output_delay -clock { ddsclk1_ext } -max 0.000 [get_ports {func_pin_1[0]}]

set_output_delay -clock { ddsclk1_ext } -max 0.000 [get_ports {func_pin_1[1]}]

set_output_delay -clock { ddsclk1_ext } -max 0.000 [get_ports {func_pin_1[2]}]

set_output_delay -clock { ddsclk1_ext } -max 0.000 [get_ports {func_pin_1[3]}] ......... and the same thing for ddsclk2,3,4_ext for func_pin_2,3,4

set_output_delay -clock { ddsclk1_ext } -min 0.000 [get_ports {func_pin_1[0]}]

set_output_delay -clock { ddsclk1_ext } -min 0.000 [get_ports {func_pin_1[1]}]

set_output_delay -clock { ddsclk1_ext } -min 0.000 [get_ports {func_pin_1[2]}]

set_output_delay -clock { ddsclk1_ext } -min 0.000 [get_ports {func_pin_1[3]}] ......... and the same thing for ddsclk2,3,4_ext for func_pin_2,3,4

I have also set the false paths between the four clock domains. However, when I compile my very simple design and run TimeQuest, TimeQuest reports that my setup slacks for ddsclk1,2,3,4_ext are negative. I find these strange and would expect Cyclone V to be able to run at much faster speed than 156.25 MHz, moreover my design now is just a reg to the output port.

http://www.alteraforum.com/forum/attachment.php?attachmentid=10148&stc=1

So, I am suspecting that I did not constrain the design properly. Hope anyone can give their two cents on the way I did the timing constraints.

Thanks in advance!

Altera_Forum · ‎02-13-2015

since you have set output delays to zero then it means no io requirement.

I believe the negative slack is inside fpga paths , not io

Altera_Forum · ‎02-13-2015

0ns external delay doesn't mean no IO requirement, just that the external device isn't chewing up any of the requirement. I am guessing the setup relationship is 6.4ns, so you have 6.4ns to get out. Run:

report_timing -setup -to_clock ddsclk1_ext -npaths 20 -detail full_path -panel_name "s:ddsclk1_ext" -file "./TQ/ddsclk1_ext_setup.txt"

(You can do this by right-clicking on the summary row you listed above and going to report_timing, and modifying accordingly). My first guess is you're not using a PLL to shift the clock tree back, and without that it's easy to have a Tco greater than 6.4ns. It may be somewhat difficult to meet timing even with it, but I think should work.

My second concern is if your constraints are even correct. From the description, the dds sends out a clock to the FPGA, the FPGA sends data back, which must reach the dds in time to meet setup, and you want to do this in one clock cycle. If it's 6.4ns period, the external setup is 2ns, you need to get through the FPGA plus both board delays in 4.4ns. But that doesn't even account for how long it takes the dds chip to send out its clock. If that's 2ns, then your down to only 2.4ns for the FPGA and board delays.

The other option is that the FPGA sends the data back to the DDS along with a clock. I have no idea if that's even possible, but if it is, then you need to create source-synchronous timing constraints, i.e. you put a generated_clock on the output port sending a clock out, and your set_output_delay would use that clock for its -clock option.

Anyway, just throwing those out as a concern.

Altera_Forum · ‎02-13-2015

--- Quote Start ---

0ns external delay doesn't mean no IO requirement, just that the external device isn't chewing up any of the requirement. .

--- Quote End ---

it is not 0ns external delay. it is 0ns set_output_delay which according to sdc standards is defines as follows:

max value = tSU of external device + board delay

min value = negative of tH + external delay

so if tSU and tH are zero, what does it mean to you? If that isn't the case then why sdc gives such definitions and equations

Altera_Forum · ‎02-13-2015

If device doesn't manage 6.4 ns for just few registers desugn then it could be that the io register is not used. Just a thought

Altera_Forum · ‎02-13-2015

There's a clock inside the FPGA, which I assume is a 6.4ns launch clock, and the external clock that also has a 6.4ns period. (I assume they're edge-aligned). So before we know the delays inside the FPGA, the board_delay or the delay of the external device(tSU), we have a 6.4ns setup relationship to get data across. What the set_output_delay -max does is say how much of that is used externally, which is used up by the tSU of the external device + board delay. So let's say the board_delay is 250ps, and tSU is 2ns, we would do set_output_delay -max 2.25ns and the FPGA now has to get it's data out in less than 6.4-2.25 = 4.15ns. Now if the external delay is 0ns, the external device is chewing up less margin, and hence we're saying the FPGA can get it's data out in up to the whole 6.4ns delay and still meet timing. But the 0ns doesn't mean there's no requirement, just that it's looser. The hold works in a similar manner.

Altera_Forum · ‎02-13-2015

--- Quote Start ---

0ns external delay doesn't mean no IO requirement, just that the external device isn't chewing up any of the requirement. I am guessing the setup relationship is 6.4ns, so you have 6.4ns to get out. Run:

report_timing -setup -to_clock ddsclk1_ext -npaths 20 -detail full_path -panel_name "s:ddsclk1_ext" -file "./TQ/ddsclk1_ext_setup.txt"

(You can do this by right-clicking on the summary row you listed above and going to report_timing, and modifying accordingly). My first guess is you're not using a PLL to shift the clock tree back, and without that it's easy to have a Tco greater than 6.4ns. It may be somewhat difficult to meet timing even with it, but I think should work.

My second concern is if your constraints are even correct. From the description, the dds sends out a clock to the FPGA, the FPGA sends data back, which must reach the dds in time to meet setup, and you want to do this in one clock cycle. If it's 6.4ns period, the external setup is 2ns, you need to get through the FPGA plus both board delays in 4.4ns. But that doesn't even account for how long it takes the dds chip to send out its clock. If that's 2ns, then your down to only 2.4ns for the FPGA and board delays.

The other option is that the FPGA sends the data back to the DDS along with a clock. I have no idea if that's even possible, but if it is, then you need to create source-synchronous timing constraints, i.e. you put a generated_clock on the output port sending a clock out, and your set_output_delay would use that clock for its -clock option.

Anyway, just throwing those out as a concern.

--- Quote End ---

Hi Rysc,

Thanks for your opinion, I runned the report and got this. (attached the waveform pic)

http://www.alteraforum.com/forum/attachment.php?attachmentid=10163&stc=1

As you can see, the clock delay is 6.3 something ns, which I think is very large. Is this normal? Wha do you think?

On your first concern, I am not using a PLL in my design, I am just using the clocks from the DDSes to drive my design.

On your second concern, yes that is the way I'm implementing this design. The DDS sends out a clock signal that I'm using to drive my design. The DDS also uses this clock signals to sample the data that I am sending it. And the data to clock timing for the DDS are 2ns (setup) and 0ns (hold). So do you think this way of implementation is not so practical?

On your third concern, I don't think I can do it that way, because the DDSes samples the data using its own clock.

What do you think?

Thanks for your comments!

Altera_Forum · ‎02-14-2015

Rsyc,

Let's assume another simple example where I have such a design: (picture taken from Quartus 2 TimeQuest Timing Anlayzer CookBook pg 1-16 on System Synchronous Output constraints) or even like the picture you have in your TimeQuest user guide pg 16.

https://www.alteraforum.com/forum/attachment.php?attachmentid=10165

The design is just a register that is connected to an output port that is connected to an external device.

in my SDC, should'nt it be just these following steps:

1) create clock for the clock input

2) create virtual clock for the external device

3) and then set otuput delay max and min to 0 ns (simple case)

However I have setup a test module with the above description ( a reg to an output) and constrain the input clock to be 156.25 MHz. Even then, TimeQuest still reports that the design can't meet setup timing.

Altera_Forum · ‎02-14-2015

I think some details were given by Rysc. Let me recap my view:

If your interface was source synchronous then it should have gone well certainly for the one register case.

In your case your interface is not source synchronous. It is system synchronous (same clock to both chips).

What this means is that clk is opposite data. This makes a big difference which is unfortunately not documented in timequest at all.

the fact that clock is opposite data means that clock will take some time to arrive at register (clock skew) this skew is negative relative to data and so adds up to tCO at pins

thus the 6.4 ns window has to manage all that delay (clk and data).

In fast devices this may not show up but otherwise it does. The solution is:

1) use fast io register

2) apply pll in compensation mode (if supported) else it gets difficult and I can think of applying multicycle since if data is delayed regularly after one clock you still can sample it correctly provided both min and max actual delays are not hitting the timing window of 2 ns that you have.

you don't really need virtual clock here. moreover your max delay value must be 2 ns. board delay ignored.

Altera_Forum · ‎02-17-2015

wzs, I can't really read the screenshot with your timing analysis, but yes, 6.3ns is long for a clock. I would expect the physical layout to be: dedicated_clk_input -> PLL -> Global -> output register -> output. The input to register could be ~4-5ns, but that's all the components and without the PLL in any compensation mode. So make sure the PLL is compensating. If there is no PLL, that may be your problem.

I think it should meet timing then, but not with a lot of margin. If your board is like your last post, where a clock drives both the FPGA and the external device, then what you're doing is correct(although the output delay max should be 2.0, as that's the setup timing I believe). If the external device sends out a clock and you're getting data back, it's unlikely that full roundtrip delay can occur in one cycle.

Altera_Forum · ‎02-17-2015

Hi all,

Thanks for your replies and guidance.

Rsyc,

What do you mean specifically by "make sure PLL is compensating".

Can you elaborate more on this "compensating"? Or could you point me to a right direction to learn more about it?

Thanks!

Altera_Forum · ‎02-17-2015

Hi all,

Thanks for your replies and guidance.

Rsyc,

What do you mean specifically by "make sure PLL is compensating". Can you elaborate more on that, or point me in a right direction where I can learn more about it?

Thanks!

Altera_Forum · ‎02-17-2015

Open the PLL in the Megawizard or QSYS and look at the compensation mode. If it's direct, than it's not compensating, but Normal mode means it has a clock fedback into the PLL which is used to shift the output clock, i.e. it tries to align the clock at the end of the global clock tree with the time it enters the PLL. Basically it's like a negative shift of a few ns.

Altera_Forum · ‎03-02-2015

Hi all,

I'm really interested in resolving this post as I'm dealing with a very similar situation. I've just a few little differences in my case, but the heart of the problem is the same.

I'm interfacing a Cyclone V FPGA with the AD9789 DAC. From the DAC side, I have the same configuration as wzs, the DAC sends me the data clock (which is 144MHz), and expects to receive the data. One difference is that this IF is DDR, but I can ignore this for now to simplify a bit.

Then, from the FPGA side, I receive the DAC clock throught a CLK pin, pass it through a PLL with 0ns phase shift, an drive an ALTDDIO_OUT block to get the data out. I've configured the PLL to work in Normal mode and I've selected the output clk as the feedback clock to compensate for.

This is the assignment I've manually added to do this:

set_instance_assignment -name MATCH_PLL_COMPENSATION_CLOCK ON -to "pll_dac_dco:pll_dac_dco_u|pll_dac_dco_0002:pll_dac_dco_inst|altera_pll:altera_pll_i|outclk_wire[0]"

As I understand, using the ALTDDIO megafunction automatically maps the output registers to Fast registers, so from a timing point of view, I'm in the best case.

I've also defined the set_output_delay to be 0ns to simplify the things a bit, as it's the less restrictive case.

In addition, I've set false paths between the edge transactions I don't to take into account, and I've left the more restrictive cases (I think) that are opposite edge transfers. I think I can change this to same edge transfers and I will be relaxing timing a bit, but I've tried it and I couldn't meet timing anyway. I also think that it shouldn't matter too much as the data is continuosely being sent, so no matter which edge I use to send the data, there will be an edge that will sample it correctly.

When I compile the design and run timing analysis I get the same results as wzs. the total data delay through the fpga is longer than the clock period. I'm attaching the report, but it hasn't enough resolution to read it right. Anyway, the waveform view gives some idea of what I'm talking about.

I think I can't do much to get rid of this, but there is something I don't understand. Shouldn't the PLL automatically try to phase shift the output clock to meet timing? If it doesn't, how do I properly constrain the design? I think I have to use some multicycle constrain, but I'm not sure how to do that correctly.

I was thinking of opening a new post to ask my question, but then I decided that my situation has a lot of similarity with this post and it contribute to resolve the way of constraining this kind of interfaces.

Hope anyone can help :)

Diego

Altera_Forum · ‎03-02-2015

when FPGA clock is opposite fpga dout then tCO at fpga pins could be higher than 1 clock period. The remedy options are:

1) use PLL in compensation mode. If successful the pll will deskew clock so that it arrives at register (dout) with no delay relative to pin.

check that clock skew does get close to zero.

2) set setup multicycle of 2 and hold should stay as default of zero, from dout path

alternatively adjust set_output_delays by 1 clock. I believe you need to subtract one period from max but add one period to min

because addition of one clock period means advance based on altera doc.

The reason multicycle works is that we can sample correctly as long as we get same stream sequence without edge violations

3)use manual PLL shift. This could make it difficult to manage timing figures.

Altera_Forum · ‎03-02-2015

--- Quote Start ---

when FPGA clock is opposite fpga dout then tCO at fpga pins could be higher than 1 clock period. The remedy options are:

1) use PLL in compensation mode. If successful the pll will deskew clock so that it arrives at register (dout) with no delay relative to pin.

check that clock skew does get close to zero.

--- Quote End ---

I'm using a PLL in Normal compensation mode. What I understand from your answer is that if I still can't meet timing is because there is chance for the PLL to unsuccess.

In the example I've attached in the previous post, there is a "COMP" item in the clock path that I suppose is related with this. Its value is ~ -1.6ns.

Is there a way to report the skew between the input clock and the clock at output register?

--- Quote Start ---

2) set setup multicycle of 2 and hold should stay as default of zero, from dout path

alternatively adjust set_output_delays by 1 clock. I believe you need to subtract one period from max but add one period to min

because addition of one clock period means advance based on altera doc.

The reason multicycle works is that we can sample correctly as long as we get same stream sequence without edge violations

--- Quote End ---

OK. This is the solution I'm trying.

I understand that I must set multicycle of 2 and this will relax timing requirements giving me an extra cycle. I've tried this and I have now two situations to consider, same edge transfers and opposite edge transfers. I think that I must set_false_path to the case I don't want to analyze, but it's confusing for me to decide which one.

If I use same edge transfers, I'm meeting timing for both setup and hold, but it's strange to me that setup slack is greater than half a period, so I will be latching in opposite edge :S

If I use opposite edge transfers, I meet timing for setup but not for hold.

I'm attaching the waveform of these two cases.

Is what I'm doing correct? Is this solution applicable when using a PLL or not using it?

--- Quote Start ---

3)use manual PLL shift. This could make it difficult to manage timing figures.

--- Quote End ---

I'm not using this for now, as I understand that the PLL should ajust its phase to ease timing.

Altera_Forum · ‎03-02-2015

To see clock skew in timequest click on report io timing and see table of values under register to output(setup, or hold).

Regarding your waveforms: it doesn't look right. For example,the same edge case should fail setup as data arrives too early and will be latched by opposite edge. It looks like the tool applies multicycle as if IF is not DDR. I am not sure why. May be you also try adjust set output delay figures instead of multicycle . If these were originally 0/0 then try -UI/+UI or even

-period/+period

which in theory is same as multicycle of 2/0 (multicycle is only known in clock period units unlike delay which is in time units).

I also assume you haven't set your PLL to any positive phase offset.

edit:

multicycle example from altera:

set_multicycle_path -setup -end 0 -rise_from [get_clocks data_clock] -rise_to [get_clocks output_clock]

looks like you can define rise_from, rise_to, fall_from, fall_to in order to apply to DDR.

In your case you just need mcp of one UI

Altera_Forum · ‎03-04-2015

Having re-read your last post I think I misunderstood your diagram.

It looks to me that same edge is ok. The issue of slack being more than half period is ok for DDR.

Altera_Forum · ‎03-04-2015

Hi kaz. I couldn't do any test yesterday. But I still don't understand if it's correct to have a slack time greater than half a period in the DDR case.

I think the data valid window to sample the data should be less than half a period because if not I might be sampling data with the incorrect edge.

I will be simplifying the interface to analyze an SDR transfer case to see if and understand the results I get.

Thanks a lot.

Altera_Forum · ‎03-04-2015

I strongly believe it is possible for DDR slacks to be reported more than half period (UI) for either setup or hold and that is direct outcome of setting false paths between unrelated edges.

You should not think of DDR clock as equivalent to SDR at double rate clock in which case it is not possible. The DDR will have two registers to de-interleave data and so changes are ignored by one edge or the other.

Regarding same edge Vs opposite edge latching, you are free to try either and see which one passes. Once you choose one case then your design must be aware of it. Remember a stream of Hdata => Ldata if sampled on same edge will retain the sequence and if you decide on opposite edge transfers then the sequence becomes Ldata => Hdata and as long as you interleave/deinterleave back correctly then it should work.

If you don't set false paths on irrelevant edges then slack must be less than UI and this is unrealistically too restrictive.

Altera_Forum · ‎03-04-2015

Hi Rysc, (and Hi to Everybody else)

thanks for writing the TimeQuest User Guide. It is a great help.

I have a question about the set_clock_groups:

1) What if I put clocks asynchronous which are synchronous to each other?

2) Are there clocks that come from different sources which should not put asynchronous?