Stratix III MLABs are slow

Altera_Forum · ‎12-16-2008

Hi,

has anyone here any experience (like me) with Stratix III MLABs? They seem slow i.e. my design fmax seems limited by MLABs more than anythings else. I have all the timing constraints required but the fitter seems unable to meet them. (I need 150MHz).

If the fitter can't meet my requirement, is there any mileage to be obtained in trying to manually force the MLAB cells together with either an instance assignment or manual placement by other means? Or is the fitter likely to have tried this during the fitting process already?

I have played around with all the speed/area/balance options and numerous other synthesis and fitter options but can't quite get the performance required. My design is reasonably complex and uses about 75% of the block memory resources but not a huge number of MLAB cells. Logic usage is probably about 30% ish.

The actual failing paths are simply MLAB address to q outputs within a single clock domain i.e. setup violations involving MLAB cells - there's no other combinatorial logic in the way and the memories using MLABs are fully registered on both input and output.

Regards,

Dave.

Altera_Forum · ‎12-16-2008

What speed grade? Is it the read or write side? I've had problems on the write side in that the write in the memory is negative edge, so your requirement is really 300MHz. The write registers need to be placed very close to the MLAB for this. The fitter should be able to do this, but I've seen issues. Another issue is if you're combining MLABs, i.e. if the write addresses fan-out to several MLABs, that can be very tight to do.

You could probably do an auto-sized floating LLR on each of these, although I'm not sure how good the results will be. You could try forcing some of these MLABs into M9Ks if they're available. Another thought is to make the write-address negative edge(if writing is your problem). This moves the 300MHz requirement from being wraddress_regs->MLAB back a stage to previous_regs->wraddress_regs. I'm not sure if this will work though, as it may always infer a half-clock cycle from the wraddress to the MLAB, but just an idea.

If it's always cases where a single wraddress reg fans out to multiple MLABs, you could replicate that one(the forum should have other stuff on register duplication). Just some thoughts.

Altera_Forum · ‎12-16-2008

Thanks Rysc,

Speed grade C4 (not an option to move up).

Its the read side

....altsyncram_component|altsyncram_t2t1:auto_generated|rdaddr_reg[2] to

....altsyncram_component|altsyncram_t2t1:auto_generated|dataout_latch[1]

The memory is 272 deep by 8 wide so (due to the changes to the MLAB spec) will require 17 MLAB blocks I believe. This is why I was wondering if it was to do with delays caused by fragmentation.

M9K is an option but I'm trying to balance my RAM usage and keep the memory efficiency as high as possible. I may have to do that if nothing else works, but i suspect the problem will just pop up somewhere else later on, so if there's a neat fix now it will help.

I'm a bit depressed because all the fitter physical synthesis options returned 0 or <20pS improvement - sounds like I've hit the end-stops on this. I may have to resign myself to using more M9Ks.

Thanks again,

Dave.

Altera_Forum · ‎12-18-2008

I tried the logic lock region but it didn't constrain the memory to a very tight area and still failed to meet timing by a tad. I'm now attempting to constrain the logic lock region manually, but it's not a very convenient way to do things as I have a lot of MLAB based memories.

If I have say a 272X8 memory in MLAB, would it be better to constrain the logic lock region as 1X17 or 17X1 or 5X4 etc. Any ideas?

Has anyone else had MLAB problems?

Many thanks,

Altera_Forum · ‎01-04-2009

Could you get by with setting your MLABs to 'dont care' mode?

This attribute will speed up the MLAB, but won't be deterministic if both a read and write occurs to the same address on the same cycle. You can handle this case separately, and if so it will speed things up considerably.

Altera_Forum · ‎01-05-2009

I've already got that mode disabled - it's worse if I allow read during write.

The latest dataheet specifies 450MHz operation for MLABs regardless of mode for a C4 speed grade. I would have thought that 150MHz wouldn't be too much to ask even if the memory uses a number of MLABs.

Seriously, I would have thought someone from Altera would jumping into this thread to explain why I can't get the performance I need - even if it's to say that "it's your fault, you're asking to much of the circuitry and here's why".

I guess I'll have to file a service request.

Altera_Forum · ‎01-05-2009

I have submitted a Service Request.

Altera_Forum · ‎01-05-2009

Well, I do have a design running over 307Mhz in C4, so it looks like you are missing something. I would also make sure you are registering the input and output directly.

Altera_Forum · ‎01-05-2009

Altera have been quite helpful so far, they sent me a test project (quite simple really) just to illustrate the degradation in performance when MLABs are stitched together - this has been helpful for comparison purposes. My build is very large and I suspect having to route around other logic is part of the problem. I've got a bit more investigation to do so I'll post what I find.

Thanks for the responses!