Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16557 Discussions

RAM block type mixing/balancing

Altera_Forum
Honored Contributor II
1,103 Views

Hi 

 

Does anyone have an idea what is necessary, that Quartus balances/mixes the ram cells used for an altsyncram? 

 

I have a quite big memory on the chip (Stratix III), that uses more than half of all 144k cells. Wouldn't a mix of 9k and 144k cells help to improve the fitting? 

 

So far I have the rams on type AUTO and AUTO_RAM_BLOCK_BALANCING enabled, but that doesn't seem to do the job. 

 

Thanks, 

emanuel
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
356 Views

You may have to hand-craft the RAM to what you need(i.e. create a few RAMs targeting the size you want and stitch them together.) It's generally not too difficult. The default is to make the RAM slices as deep as possible, which is the fastest implementation and requires the least amount of outside logic. For example, if you were to make a 6Kx8 RAM, then it would use 8 M9Ks, all configures with a depth of 8Kx1. There is MAXIMUM DEPTH setting(it's in the GUI, I may have the incorrect name), where you can set the maximum depth. So in this case, if you set it to 1K, then it would configure each M9K into a 1Kx8 block, and stack 6 of them on top of each other. This will be slower, and will require external muxing logic, but it only requires 6 M9Ks instead of 8. 

 

If you really need to do special handcrafting, like all of an M144K and then some M9Ks to top off the top or add a little more width, you'll probably have to create two separate RAMs and stitch them together. Not ideal, but it shouldn't take more than a half hour and once you've bit the bullet, it's done.
0 Kudos
Altera_Forum
Honored Contributor II
356 Views

So quartus does not balance the RAM automagically... The possibility to set AUTO_RAM_BLOCK_BALANCING made me think so ;) 

 

I'll play around with that if I have more time (i.e. never). 

 

Thanks a lot!
0 Kudos
Altera_Forum
Honored Contributor II
356 Views

I believe that works when you have many RAMs, i.e. moving some from M9Ks to MLAB RAMs, maybe moving an M144K to M9Ks or vice-versa. But I'm pretty sure it moves each logical RAM to be 100% of one type or another, so when the issue is a single large RAM, it doesn't apply. (There was a time where the fitter might target all M4Ks in Stratix II, for example, as that was the best fit per RAM, but as a whole it would run out, when there were plenty of M512s available.)

0 Kudos
Altera_Forum
Honored Contributor II
356 Views

That makes sense - even if it is nasty for me ;) 

So patching a RAM together by hand is the only option, but as the RAMs are initialised as well, that's a bit too much for what it possibly could help. 

 

Again, maybe if I have more time (i.e. never) ;-). 

 

Thanks a lot!
0 Kudos
Altera_Forum
Honored Contributor II
356 Views

I have similar problems and I'm torn over how to manage it. So far, what I have done is to compile and import the fitter RAM summary report into a spreadsheet and calculate the efficiency of memory usage for each RAM entity e.g. memory bits required / memory bits used. I organize the list by design entity (you can do this by clicking on the "name" tab so everything is in alphabetical order in terms of design hierarchy). 

 

Then I identify "hot spots" of inefficiency and try to resolve things manually. 

 

I have wondered though, it must be possible to come up with a simple algorithm which for a given memory width and depth will calculate the best solution i.e. without wasting bits. 

 

Has anyone else done this?
0 Kudos
Altera_Forum
Honored Contributor II
356 Views

If you're going to be using most of the RAM, or at least most of any particular type of RAM, then I recommend laying it all out beforehand. Quartus II defaults tend to do a good job with RAMs that have a depth at a power of 2, especially when including the Megafunction controls for RAM type and MAXIMUM_DEPTH. After that, it's usually odd-sized RAMs that cause a problem.  

The problem with a simple algorithm is that there are a lot of parameters. Just going for using the least number of bits often stitches RAMs in a way that makes them slow and uses extra LEs. Doing the same algorithm for each RAM also might make a design run out of a special type of RAM, like M4Ks. And even taking a look at how all the RAMs balanced together may not be enough, since the designer may be floorplanning, and so a block might have more or less of a particular RAM type in its region. 

I'm not saying Quartus couldn't do a better job, just that there are a lot of variables at play, so when it gets complicated, it can get really complicated.  

(And yes, I've often cut and paste that whole section into Excel and started analyzing, manipulating, etc.) Note one other issue is that the fitter can spread a RAM out. For example, if a RAM could fit in a single M9K, but the fitter thinks it will get better timing spreading it across two(and assuming another one is available), it can do that move, since there's no downside. But the Fitter RAM Summary now shows it using 2 M9Ks when you think it could fit into one. Just something to watch out for.
0 Kudos
Altera_Forum
Honored Contributor II
356 Views

Thanks, 

well I wasn't thinking of an algorithm to solve my whole problem in one go. I was thinking that I have all these required RAM blocks of varying shapes and sizes and if I determine the best theoretical fit for each one and then sort them out manually e.g. so I make best use of M144Ks before I go using M9Ks and MLABs. 

 

For what it's worth, my design uses 4Mbits in theory but 8Mbits in practice i.e. in a "real world" problem without too much effort I get a 50% memory efficiency. It is now that the real work starts in trying to improve that efficiency to squeeze as much performance as I can to do the job. Very few of my memories are nice 1KX8 blocks. If they all were, I wouldn't have a difficult job. 

 

Memory balancing is time consuming. Anything which can help would be very welcome.
0 Kudos
Reply