FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6335 Discussions

Cyclone V hard memory controller clarification

Altera_Forum
Honored Contributor II
1,597 Views

Hi, 

 

I'm relatively new to this stuff but have spent a few weeks getting to the point of understanding how fast I can stream data into DDR3 sdram on a cyclone V reference design (bemicro-CV). I started with the basic configuration from Dave (posted elsewhere on these forums) (dwh@ caltech) -- and struggled through getting to the point where i got things working with the custom logic I added to it (the example project was a great way to get going). 

 

I found that the memory controller couldn't seem to keep up above the 80% bandwidth efficiency point( BMCV has 16 bit bus, 333 MHz DDR, so ~ 10.6G =max ignoring refreshes and other overhead). 

 

Does this KB article explain that? I just want confirmation that this is indeed the case.  

http://www.altera.com/support/kdb/solutions/rd10302012_952.html 

 

I am using two ports on the memory controller MPFE (one hooked to JTAG per Dave's project, the other to my custom logic that basically stuffs a timer value into things using a wide avalon MM interface with long bursts). I tried long and short bursts and the point of failure was always right at about the same point. After some digging, I found this note about the hard IP and am pretty sure this explains the problem. 

 

I believe I could push a little further by trying to load both ports, but for my design I'm not sure it is worth it, I more or less want to just understand the limitations at this point. 

 

Thanks for any input / confirmation! 

 

Lance
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
314 Views

Not sure if anybody will have that specific information for you, but it is easy enough to test your hypothesis by putting SignalTap on the Avalon-MM ports and look at how the transactions are getting broken up. 

 

I'm not sure, but as far as that knowledge base article goes, I believe a "4 clock cycle command" would be initiating a long burst. So you might have many cycles of activity before hitting the 1 cycle for the described port polling. I don't think that article is stating that 1 out of 5 clocks is lost (the 80% you are seeing).
0 Kudos
Altera_Forum
Honored Contributor II
314 Views

Thanks Ted. That might be correct. It seemed too convenient of an answer. 

 

My connections are as wide as the block allows with the the fifos that are built in (128 bits left for the 2nd port) and I am bursting at 128 words (experimented with lower). 

 

I guess that you're saying that once the burst is initiated, it is locked in one command cycle until the burst is complete. If that is true, this shouldn't be my problem for the long bursts. I know that is true on the avalon side, but I wasn't sure that applied to the memory controller side. 

 

 

I have been staring at the signaltap signals for a while but I'm new to troubleshooting these things. 

 

I note that for short bursts in my situation they can practically get through with no wait states (but I know there is an efficiency hit with shorter bursts when it comes to the setup). On bursts that are 64 and 128 words long they tend to start seeing waitrequest asserted 15-20 words in and it remains more persistently active through the remainder of the burst. I have tried to probe at what I can find for signals indicating the refresh cycles and I can sometimes see a direct correlation with the stalls then, but I certainly do not have a crystal clear picture. 

 

I didn't find a magic burst-size that bucked the 80% pass/fail threshold as I varied the clock up and down. 

 

 

I built in some variables for tracking the fifos and the number of wait states (capturing the max for a given burst) so that when the failures occurred outside of what I could see in signaltap I had some idea of what went wrong. I'm sure I could trigger on those conditions if I were better with signaltap but it seemed easier to do it that way.  

 

Thanks for the comments, I have more things to look at now.  

 

 

Lance
0 Kudos
Altera_Forum
Honored Contributor II
314 Views

Hey Lance, 

 

Did you try running a Modelsim simulation? I think you would learn a lot from simulation waveforms, since you can see when a burst on the Avalon-MM interfaces become a burst on the DDR interface. You could then write the testbench to generate sufficient Avalon-MM bursts to saturate the DDR3 interface. In the case of a MPFE, you could create multiple masters and hit the DDR with burst writes only, burst reads only, or combinations of burst write and reads. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
314 Views

Hi Dave, 

 

Thanks for commenting. No, I haven't simulated this part of the design yet. I'm sure it is good advice. I spent some time probing the avalon interconnects and DDR controller signals and it got pretty cumbersome in signaltap, so that is probably the way to go if I need to get above 80% efficiency. I've not yet simulated blocks that contain altera IP (though have simulated many blocks that I wrote). That is definitely on my list that I've not gotten to yet. 

 

My post probably should have been titled "is there a way to predict" or "can anyone share their experiences with" the memory controller throughput under simplified conditions. 

 

I can survive for now with the efficiency I'm getting and need to prioritize other tasks. I was hoping to get some feedback on any fundamental barriers I would face when trying to achieve high memory throughput efficiency with fixed write-only transfers from one port.  

 

Thanks again, 

 

Lance
0 Kudos
Altera_Forum
Honored Contributor II
314 Views

Hi Lance, 

 

--- Quote Start ---  

 

Thanks for commenting. No, I haven't simulated this part of the design yet. I'm sure it is good advice. I spent some time probing the avalon interconnects and DDR controller signals and it got pretty cumbersome in signaltap, so that is probably the way to go if I need to get above 80% efficiency. I've not yet simulated blocks that contain altera IP (though have simulated many blocks that I wrote). That is definitely on my list that I've not gotten to yet. 

 

--- Quote End ---  

 

I've been trying to get around to writing up my "Altera Memory Interfacing" notes into a tutorial. While I was making those notes I found a problem with the Cyclone V DDR3 controller and the various reset controls it uses. I wrote a testbench and submitted an Altera Service Request, and as far as I recall, its basically sitting in their bug-fix queue. The bug does not affect you, but the point is, that its not that hard to simulate the Cyclone V DDR3 interface. I'll try to get around to writing up those notes, and you can go through them. 

 

 

--- Quote Start ---  

 

My post probably should have been titled "is there a way to predict" or "can anyone share their experiences with" the memory controller throughput under simplified conditions. 

 

--- Quote End ---  

 

The answer will be "no" for any controller. Its all black-box IP, and that IP has configuration parameters (eg., refresh rate), and so there is no one answer to the question. The answer is "simulate it" and see if it meets your requirements. 

 

 

--- Quote Start ---  

 

I can survive for now with the efficiency I'm getting and need to prioritize other tasks. I was hoping to get some feedback on any fundamental barriers I would face when trying to achieve high memory throughput efficiency with fixed write-only transfers from one port.  

 

--- Quote End ---  

 

Sounds good. When you want to look at simulation, ping me, and at a minimum I can send you the Cyclone V files I submitted for the Service Request. 

 

Cheers, 

Dave
0 Kudos
Reply