Design Flow Tips for Obtaining High Fmax as FPGA Gets Full

Altera_Forum · ‎09-15-2015

I just completed an Arria V design whose individual logic blocks achieved about 177MHz performance, but when instantiated 6 times obtained only about 120MHz performance in a chip that is about 70% full. My goal was 125MHz. For years my approach has been 1.) design low-level blocks that can surpass my desired Fmax by a good amount, 2.) instantiate the low-level blocks as needed in the device, 3.) use an SDC file to set my desired Fmax, and 4.) hit the compile button. While this approach has worked well in the past, I now think that I may need to learn some better design techniques.

What is best way to obtain high Fmax in a design as the FPGA becomes more full? Are there any recommended on-line courses or literature that would help answer this question? Should I investigate things like Incremental Compilation, LogicLock, or something similar?

Altera_Forum · ‎09-15-2015

You're approach sounds good and is probably the best design technique there is. That doesn't mean things can't break when stitched together. I might build a cross-bar mux that runs at 177MHz by itself, but when I hook it up to the transceivers on both sides of the device, it now spreads across the die and fails timing. There are a number of things that can go wrong when stitching things together, but it's really hard to plan for them and I would recommend your approach.

The question is why is your path now at 120MHz, which is a big drop-off. I assume you're getting worse place-and-route, so spend some time in TimeQuest comparing the fit of these paths before and after. What's changed. Right-click on the six hierarchies in the Project Navigator and locate to Chip Planner and see how they're placed. Could logic be getting merged between these hierarchies? Do they connect in a way that forces them to be spread out, i.e. a topology where everyone talks to everyone and there's not good data flow? Is the device really full and Quartus is just barely able to get a fit, let alone maintain good timing?

Altera_Forum · ‎09-15-2015

Are all of the block inputs and outputs registered? If not then when you compile the block by itself (all I/O external) the non-registered I/O timing paths will not be reflected in the fmax result. But when you instantiate the blocks and interface them with other logic inside the design, the interface timing now comes into play. The timing inside the block may be fine and it's the interface timing that has become limiting. You should be able to see this quickly in the timing reports.

If that's what is going on you'll need to register the inputs and/or outputs that are not meeting timing. This is also an answer to how to maximize fmax: Register the inputs and outputs of all blocks in the design.

Altera_Forum · ‎09-15-2015

Are the failing paths entry and exits from Rams or DSPs? with plenty of room, the fitter can often put the logic right next to the ram/dsp, but when it gets full, these paths can get stretched because the rams and DSPs cannot be moved. Add extra pipeline registers to the input and output of rams and it gives the fitter a break with simple register - regsiter routing without having to worry about LUTs in between.

Did you also set your false paths and multi-cycle paths in the SDC file?

Altera_Forum · ‎09-17-2015

Thank you for all the expert advice!

My project is part of a 2 person team. I designed the lower level blocks that get instantiated 6 times into the top level. Another engineer designed a few other parts of the chip, and has been in charge of putting everything together at the top level as well. Since he has been away from work for the past few days, I copied the design from his computer and have been trying to make sense of it myself. I have been experimenting a little with partitioning and yesterday I was able to get a compile whereby the slowest part of my design gave me an Fmax of 140MHz. The only timing violation right now has nothing to do with the logic that I designed, so I may have to wait until my fellow engineer gets back to work to proceed further. Part of me wonders if the constraints placed on this other engineer's logic (triple speed Ethernet and other stuff) was interfering with my own logic constraints or chip resources....