Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20641 Discussions

Maximizing memory around a Stratix-IV

Altera_Forum
Honored Contributor II
1,062 Views

How much memory width can I cram around a Stratix-IV GX 230 in the biggest (40 = 1517-pin) package? 

 

For the top and bottom I need DDR3 running as fast as possible. I need at least 12 16-bit DDR3s; I'm not at all fussy whether it is 4 buses of 3 chips each (48-bit buses), 3 buses of 4 chips each (64-bit buses), or even 6 buses of 2 chips each, etc. 

 

Looking at the docs I see that the 'x32/x36' groups support up to 47 data pins - fine, I'll live with a 47-bit bus instead of a 48-bit bus, and crank the speed by 2% (or add another bus). 

 

But then on page 7-9 of the Stratix-IV device handbook, I see no 'x32/x36' groups on the top or the bottom of the GX230 in the FF1517, and see nothing bigger than an x32/x36 group anywhere on any FPGA. On the other hand, Google can find plenty of references to 72-bit buses to DDR2 from Stratix, so wider buses must be possible! 

 

For the sides of the Stratix-IV, DDR3 doesn't appear to run any faster than RLDRAM, so I'd naturally put all of the table RAMs there. Each bus would be a 36-bit bus for a single 576Mb RLDRAM-II, in CIO36 BL2 mode to maximize the number of 72-bit table entries readable per packet. How many of these can I fit on the sides of a Stratix-IV GX 230 in the FF1517 package? 

 

If I add up the total pins, it looks like I could fit two sets of three DDR3s on the top and another two on the bottom (even allowing for VREFs and DCIs), plus two independent CIO36 RLDRAMs on each side, which would give me the accesses I need with probably achievable speeds. 

 

But while I'm experienced with other FPGAs, I'm new to Altera, and I seem to be missing some fundamental information regarding these DQS/DQ groups and sub-banks. 

For example, all of the examples combine powers of two of these DQS/DQ groups; what if a bus needs 6 or 10 of them for some bus width? 

 

Or am I simply asking to much of the FF1517 package, and we'll need to wait for the Stratix-IX GX 290 in the in the 1932-pin package? And will even that have enough I/Os?
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
278 Views

Hi Max Mem, 

 

Your getting confused between the "Total Memory Interface Width" and the "DQS/DQ Group Width". 

 

From your description, your optimum/easyest solution would be 3 64bit wide DDR3 Interfaces. 

 

DDR3 devices (doesn't really matter if they are 8 or 16bits wide themselves) are always either x4 mode or x8 only. Hence if the devices you are looking at are 16bit wide devices these are probably actually constructed of 2 x8 DQS groups. Put simply, every 8bits of data or DQ pins have a DQS/DQS# strobe pair of signals. 

 

Hence for DDR3 you should be counting the number of either x4 or x8 groups available on each side of your chosen device. Its the chosen memory device that specifys if x4 or x8 mode is required. 

 

If we assume that your DDR3 devices are x8 mode 16bit wide devices: 

EP4SGX230F1517's have 12 x8 groups on each side of the device. 

12 x 8 = 96bits. 

But Remember that you will also require some additional pins for the CAC signals (Command Address and Clock). 

Also the Altera IP is currently limited to a maximum width of 80bits wide. 

 

So as each side of the EP4SGX230F1517 could in theory support a 96bit wide DDR3 interface (assuming enough spare pins are available for CAC), Then three 64bit wide DDR3 Interfaces, One on the Top, One on a Side, and one on the Bottom should be no problem.
0 Kudos
Altera_Forum
Honored Contributor II
278 Views

Thank you very much for the quick response, and on pointing me to the service request process! 

 

I am comforted that there would be roughly enough pins for a 96-bit interface on the to and on the bottom, because 2x96 is the total width we need. 

However with one of three 64-bit interfaces on the side, if I understand the speed table at the from of chapter 7 that interface would run much slower than the top/bottom interfaces;and thus too slow for this massive-bandwidth buffer (and we also wouldn't have enough pins for the RLDRAM on the sides). 

 

But a 64-bit interface on each of the top and the bottom and a third 64-bit interface split between the top and the bottom would work, as would Top=64+32, Bottom=64+32, or 48+48, 48+48 (if there are enough address pins). 

 

Also, at least Samsung makes DDR3 in a 64Mx16 die (K4B1G1646D). So ehat you are saying is that that still need x4 or x8 DQS/DQ groups... that explains a lot of the mystery by pointing out that there is an entire layer of pin grouping that I had not considered. 

 

I will go and learn the next level, and when I have further questions of such detail, I will use the service request process. 

 

Thanks again! 

 

 

memory_monkey response: 

Yes unfortunately the side banks have lower performance. 

 

An Interface cannot generally be spread across the top and the bottom of the device, and core timing becomes an issue. 

 

The x16 Samsung device you mention is indeed actually constructed of two x8 DQS groups, DQSL/DQSL# = DQL0-7 & DQSU/DQSU# = DQU0-7. Hence it is x8/x9 groups that you are actually counting/using. 

 

You would need to do something like 64 & 32 in both the top and bottom banks, but I am not aware if 96bits of total data pins and two complete sets of CAC signals will fit, you would need to try this out for yourself.
0 Kudos
Altera_Forum
Honored Contributor II
278 Views

(I'm still getting used to this forum - I thought I was replying to the forum not directly to you...) 

 

I've checked and while there are enough DQS/DQ groups on the top for 96 bits of data bus, there would NOT be enough pins left to divide this into two address buses, just as you suspected.  

 

Therefore for the FF1517 the only hope would be a single 96-bit bus, which has barely enough pins and would have a load of 6 on the address bus, so I'm not optimistic about the speed achievable. 

However on the bigger 1932-pin package, there appear to be enough pins for two 64-bit DDR3 buses each on the top and on the bottom, plus two sets of address and command buses. While this is more than we need, we can always find a use for the extra memory! 

 

So thank you again for helping me figure out the limits of what the packages can do.
0 Kudos
Reply