Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16611 Discussions

How to fix timing issues using the timing assignment options?

Altera_Forum
Honored Contributor II
3,694 Views

Hi,everybody. 

 

I make this post for your help. I have tried hardly,but just can't solve it myself. 

 

A description about my issue: 

(my device:EP1C6Q240C8;QII:6.1;AD:ADC08200--National semiconductor) 

My design is about data acquirement and storage.There are four ADs which sample clocks between them are 90 degree phase shift. Two sample clocks are generated by a single PLL with 0 and 90 degree phase shift,and another two are generated by inverting the fore two,then get 180 and 270 degree phase shift. They are 250MHz.(ADs are used overclocking,the NS says,it's no problem )  

 

Now I can correctly receive the data acquired by there ADs,but the fouth AD's(with 270 degree sample clock) data storage have a problem. After read data from memory,make a plot in debug software,it shows some data in transition have been stored. I consider it as a setup and hold violation.What I implemented is adjusting the storage clock's dealy to satisfy data valide period and prevent the transiting data being stored.Then I use "Logic Cell Insertion"(single point assignment),but it seems no use. I have set "Ignores LCELL buffers" off and unchecked "Perform WYSIWYG primitive resynthesis". I have also tried other assigements,like "Maximum Delay""Fast Input Register"...etc.but no effect on my design. At the same time, I am not very familiar with how and when to use them. 

 

Do anybody have experience in fixing timing issue using assignments? Or,how to adjust setup and hold violation? 

 

If there are ambiguity expressions,please tell me. Sorry for not good at using English. 

Thank you. 

Best Regards.
0 Kudos
15 Replies
Altera_Forum
Honored Contributor II
1,630 Views

Hi, if you have put your timing constraints in correctly defining the setup and hold requirements for the inputs, I would think it unlikely there is a timing violation. Does Quartus report any timing violations? 

Perhaps your tracking delays are different to how you think, particularly with respect to skew. I have found Quartus to be great at meeting setup constraints, but poor with respect to hold constraints, but at least it would always report if there is going to be a problem!
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

frsustrated is right, in that you need to enter correct IO constraints first. You mention that you're "adjusting the storage clock's dealy to satisfy data valide period and prevent the transiting data being stored." You also need to tell Quartus what requirements it needs to meet. I would stay away from inserting LCELL buffers, as that is much too coarse grained of a solution. The fitter can make use of placement, routing, and the input delay chain to meet your timing, so it generally does a much better job than manual work. If you want to mess with anything, there is the Input Delay Chain, which I believe is 0-3 in Cyclone, which you can make assignments for and control post-fit in the chip editor. But Quartus should be properly setting these to meet your timing constraints. 

 

Now, if the interface still fails when it meets your timing constraints, you'll have to do some debugging as to why the constraints are incorrect. There's some methodology there(lock down the register and tweak the data path or the PLL shift until it starts working to see where the window is), but you usually this is just a sign of having entered incorrect constraints.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

Inserting cells in order to adjust the delay is an unrelaible way to solve your IO timing problems. Problem is that you also depend on the routing and speed grade of your device. So you may end up having a design which does not work after a simple recompile because of changed routing, and not to mention the impact of a change of speed-grade during production. 

 

Instead put a dual clock fifo in you data path. This will give you a clean design where you can solve your timing issues by adjusting the IO clock without concern of the data reception timing in the downstream path which runs on another clock but at its own phase. You need a small state-machine to fill your fifo with a few data lines after the reset.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

I believe you're doing a phase-comp FIFO. (It's almost easier to build your own, since you generally don't need any sort of hand-shaking or empty/full flags, just free-running clocks. I can post one I wrote two weeks ago if you're interested). The other thing is phase-comp FIFOs have a variable delay, since you don't know whether the read or write clock will occur first. So if you have a phase-comp fifo for each A/D, it will be difficult to match up all the data. PC-FIFOs work best when the clocks are 0PPM difference(this is a requirement) but you have no idea what their phase relationship is. 

 

I don't think that will help in that case as the problem is writing the data in, and they'd still have to meet timing requirements when writing into th PC FIFO. If you don't have correct timing analysis between the data and the write clock, you'll potentiall write different bits of data into different clock cycles, as well as introduce metastability.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

Thanks very much,all of you. 

 

I will make a practice on my design as you mentioned.Let QII satisfy timing performance through assignment. If there is something can't be solved myself, I will turn to your help again. 

 

Actually,my project is data acquisition system of digital storage oscillograph. Using the four 90 phase shift clock to get 1GHz sample rate. I don't use a FIFO or RAM to directly receive the 250MHz data flow, because,FPGA used is -8 speed grade.I need first make data align. After the data is phase aligned,then stored in a RAM.  

 

I will attach my design about 250MHz data flow receiver.Please give me some suggestion. 

 

 

Thanks. 

Have a nice day.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

Your design looks ok to me, as a starter though, because you need some way to synchronize this block with the other 4. As it is drawn, you have now way of knowing if sample N ends up as a high byte or low byte in your output port. So you need a sync input to your feq-div2 block. 

 

Obviously you still need some way of absorbing the phase difference between your 4 adc clocks. This block only halves your effective clock rate. 

 

Constraining the timing should not involve multicycle here. In fact this block is a standard single clock synchronous block what timing concerns. Its the next block which presumably will bring you clock down to 125MHz which will benefit from multi-cycle constraining. 

 

Personally I would probably (try to) use a dual clock fifo for this, as I mentioned previously, because this would be able to deal with both your phase matching problem as well as halving the clock rate (set fifo.we=DFFE.ena and fifo.wRclk=ClkShift1, fifo.re=Vcc, fifo.rclk=CLK125MHz). Ofcourse the fifo need to be fast enough to do this trick, not sure it is in fact. 

Rysc is right at pointing out the problem with matching up the delay, but this problem is there no matter what you do and will require careful design and simulation. 

 

By the way this is an interesting project. Did you see this thread? 

http://www.alteraforum.com/forum/showthread.php?t=260&highlight=adc (http://www.alteraforum.com/forum/showthread.php?t=260&highlight=adc

 

Can you tell us if and how you plan to match the response in the 4 channels? I here speak in terms of response from analogue input signal to the 4 digital interleaved outputs.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

I'm slightly confused in the implementation. Are you doing 1GHz data and then capturing it with four phases of the PLL, all at 250MHz? 

 

Note that you're in the slowest speed grade of a low cost family, two generations old(CIII is shipping), doing a high-speed interface. (I'm not sure what IO standard you're using). 250MHz is going to be exremely hard to meet(if you're trying to capture 1GHz data, it won't work). You're Tsu/Th constraints are going to be extremely tight, and make sure you run them on both timing models(Assignments -> Settings -> Classic Timing Analyzer -> More Settings -> Run Combined Fast/Slow Models) There's a chance you won't be able to meet your constraints. 

 

You probably want to do SPICE simulations of your board. You're not in a wire-bonded package, so you're IO characteristics are not going to be nearly the quality of new packages(they are more expensive and harder to use, but they have a lot of good qualities).  

 

Naturally, I'm not familiar with your situation, but just voicing some concerns. Is this a product or just trying to get something to work?
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

Thanks so much for all your concern. 

 

It's a product. A 1GHz sample rate digital oscillograph.I'm using the four clocks at 250MHz for sampling which 90 degree phase shift each other. The data capturing clock is also 250MHz,this four are the same 90 degree phase shift each other.I'll attach a file to describe my design. 

 

ALL the interfaces IO standard using are LVTTL. In my design, I change the four phase 250MHz data flow into 125MHz data flow,using the "SinPout" module attached above,then I capture the four phase 125MHz data flow at a appropriate time to ensure the setup and hold relationship correctly,after that,the 125MHz data flow is aligned, so I store the data into a RAM. The data stored can be accessed by a DSP chip. 

 

By the way, I choose those devices with a cost consideration. In the device handbook,it shows under -8 speed,input clock rate are both 387MHz for row pins and column pins.the clock tree fmax is 275MHz.So I consider it can receive 250MHz data flow.  

 

But now,I'm encountering bad timing issue about storage.Indeed,the timing requirement is so tight.AD's output has a just 2ns hold time.Please give me suggestion about timing constraints. 

 

Thank you.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

In your PLL you should create 4 output clocks with phase shift of 0, 90, 180 and 270 degree. Do not use an inversion of 0 and 90 to get 180 and 270.  

Remeber you will pass an additional delay using an inverter, and you relie on 50% duty cycle clock.  

The advantage in having 4 outputs from the PLL is also that you can fine-tune the delay to take account of different routing on the PCB. 

 

The ad8200 is not spec'd (guranteed) at more than 200MS/s, so a product requiring 250MS/s is kind of risky. 

 

Rember you wont get an analogue bandwidth anywhere near nyquist of 500MHz unless you preceed your circuit with S/H.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

I don't agree with the comment that the inverter in the clock will add delay. (It will, but it's an inversion at the IO level that is probably tens of picoseconds at most, which should be noise no matter how tight your constraints are.) But I do agree that using 4 PLL outputs is nice in that you can hand-tweak each clock tree however you like. Since you're using Cyclone and it only has 2 outptus per PLL, you might have to go with the inverted solution. 

 

Internally, I think it can handle 250MHz data flow, although you're not going to be able to do a lot of logic between registers and I would recommend eventually doubling the data path and running at half the rate. This demux can be done inside the FPGA. The Fmax is basically how fast the clock tree can toggle. It has no consideration on how fast the IO can toggle. The problem with IO is that there are a lot of considerations outside of the FPGA(board layout, driving device, etc.) that causes users to do spice simulations on their board. (To be honest, I have little experience in this realm and can't provide a lot more info). But I feel that for the slowest speed grade, you're definitely pushing the limits.  

 

And if cost is a consideration, I believe moving to a more recent family Cyclone II or CIII would probably allow you to go to faster device, assuming the small devices have enough IO. 

 

Finally, you mention that you have a 2ns hold time. This, by itself, isn't an issue since we can phase shift the clocks however we want. But anytime you phase shift the clock backwards to help hold margin, you'll be hurting setup margin. So its the combination of those two requirements that creates a usable "window" for your data to pass between. So once you have your Tsu and Th constraints, and one fails, phase shift the PLL outputs until you can get them to pass. Ideally you want equal positive slack on your Tsu and Th constraints(rather than having one just barely make it and the other make it by a lot). This can help with on-chip variations. Also be sure to run the min timing model. (Not sure if I mentioned it on this thread, but Assignments -> Settings -> Classic Timing Analyzer -> More Settings -> Enable Fast/Slow timing models)  

 

Good luck.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

Thank you so much larsen and Rysc. 

 

Hi,Rysc.You have quite known my design.I do have to use the PLL with inverters.I also belive that 250MHz data flow can be correctly handled. It just should ensure setup and hold condition of the input registers directly associated with input I/O(external 250MHz data flow input port). I mean that four DFFs,which have a clock input of 250MHz(In the file I attached as "SinPout.zip") . After correctly handled,I can use them under 125MHz internally. 

 

But,there is someting boring. All the 250MHz data flow input ports are assigned in bank2,ranged from pin-197 to pin-240,however, on chip RAMs are locate from X17_Y1--X17_Y20. I need 16 blocks of 20 total.So,the route delay from data input can't be almost the same when reach RAMs.Now I still have a little problem about data storage. Is it a fatal problem when QII fits my design? I will attach a shoot screen of time closure floorplan. 

 

By the way,FPGAs' density are increasing.So the performance not the resource usage is most concerned . Then maybe timing constraints should give more attention.How to use assignment editor correctly and high efficiently? Expect QII handbook, are there any other ways or desig reference? 

 

Thank you. 

Best Regards.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

I can't say for sure whether that will meet timing or not. It will probably be close. Since you have clock constraints(done when creating the PLL), Quartus will tell you very directly if it can make timing or not, so you need to implement that portion of the design and see what happens. Since you're capturing data, my guess is you could add another stage of registers between the IO registers and the memory, which adds one latency delay, but should allow you to easily meet internal timing. Usually data capture applications can handle this quite easily, although I don't know if you have latency requirements in your system.

0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

In my opinion, timing constraints relay on your units you choose .You must know how many delay and where they occur. 

Of course ,Quartus can help you analyze your work! 

I am troubled by timing constraints too! Hope it can help you! 

Regards.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

Thanks for concern :) 

 

I have changed the FPGA into -6 speed grade.Now,I'm bothered by devices' distinction. The project can run well on one FPGA,but fails on the others.So,I have to adjust timing of every single device. It's indeed a hard work. 

 

Some of friends have also encountered this situation before,in their cases,the distinction just occurs between different batch of FPGA. Unfortunately,I'm suffering on every piece. 

 

Then ,when making your schemes, the distinction also should be considered.
0 Kudos
Altera_Forum
Honored Contributor II
1,630 Views

You are right! 

I have tried different devices with one same project, the scheme shows me different devices have different initial logic ,of course the time required is different, sometimes the time slack ,Tsu and Th even can not be met. 

So, the distinction should be considered.
0 Kudos
Reply