FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6359 Discussions

Buffer Overflow and Underflow in Clocked Video Input/Output

Altera_Forum
Honored Contributor II
3,072 Views

Hi 

 

I am designing a video system to buffer three HD 1080p video stream. The input is in RGB 4:4:4 format at 148.5MHz. Output is also the same. I am using Quartus 9.0 SP1.  

 

The system for single HD stream consists of CVI, Frame Buffer and CVO block. Three such blocks are cascaded serially. (There is a custom video processing block but it is outside the SOPC system and not yet connected to the design).  

 

For the SOPC system, I am using 160 MHz clock. For CVI, I am using recovered clock, provided by HDMI chip. For CVO block, output clock is 148.5 MHz, which is generated by local PLL. I am also using DDR2 controller (Microtronix) running at 300 MHz in Startix-III FPGA. 

 

I get the video output but I am also getting Underflows in the video output block and occasional overflow in the video input block. Due to these error, I get a flicker in the output. I have increased the buffer sizes in both these blocks to at lease 4K pixels. I am also using 160MHz clock for SOPC to offset the overhead of Avalon packet generation. But still cannot get rid of these Overflow and underflows. I dont know what else can I do to solve this problem. I'll be very thankful if someone can give any insight or suggestions. 

 

 

Regards 

Faisal
0 Kudos
19 Replies
Altera_Forum
Honored Contributor II
1,081 Views

Hi Faisal, 

 

I've had a hell of a time with the VIP suite and sopc builder, designs that worked in 9.0 would not run after I upgraded to 9.1 and the main reason was that I wasn't using TimeQuest to constrain my clocks and important video signals. I'm still learning, and I still have some issues, but I had a similar problem with under/overflows and flickering which were both corrected by timing constraints - have you constrained your design? There are also .sdc files in C:\altera\91sp1\ip\altera\clocked_video_input\lib (and \clocked_video_output) that you should add to the design files dialog in settings too if you have not already done this. 

 

Also, is there a difference in your frame rates on the output and input video?
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Hi, 

 

Assuming timing closure is the issue, there is also an .sdc file for the Frame Buffer which might help in your case. You should clean timing errors before you try debugging the issue further. 

 

If you are not using triple buffers (with drop and repeat on) then the input frame rate and the output frame rate must be the same and the PLL that creates the output clock should be driven by your input clock (otherwise you will get glitches when the input and output clocks drift apart from each other). 

 

I believe you have more than enough memory bandwidth but did you do the computation? Is your design working with only one input on? jakobjones posted an excel sheet to do the bandwidth calculation a while ago. You could also try to increase the FIFO depth and burst target parameters of the frame buffers but this is a long shot. Make sure that you did not set the FIFO depth to be equal to the burst target. 

 

Kind regards
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

 

--- Quote Start ---  

The system for single HD stream consists of CVI, Frame Buffer and CVO block. Three such blocks are cascaded serially. 

--- Quote End ---  

Are you attempting to do three 1080p frame buffers at the same time? This would require a lot of memory bandwidth. 

 

Assuming standard 1920x1080p60, the total horizontal width is 2200. This means you need to maintain a rate of 1920/2200*148.5=129.6 Megapixels/second in and out of the memory for each frame buffer. If you have three frame buffers, this is a total of 777.6 Megapixels/second. 

 

How many bits per pixel are you doing? How wide is your memory bus? Is anything else using the memory?
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Hi all 

 

Thank you very much for your valuable suggestions. I really appreciate your help. 

 

I have attached top level diagram with this message. A picture worths thousand words. Please have a look at attached .JPG file. 

 

bandwidth calculations: 

---------------------------------------------------- 

 

First about the bandwidth calculation. I have DDR2 memory with 64 bit data line (DQ). Input is 1080p 60fps in RGB 4:4:4 (24 bits or 3 bytes) format. It translates to approximately 148.5 MHz input rate with 24 bit video data.  

 

Since I have three frame buffers in my design, there are total of 6 read and write ports. So 

 

input rate = 148.5MHz * 3 bytes * 6 ports = 2.673 GBytes/sec 

 

There are 64 DDR2 data lines (8 bytes), with data being read/written on both clock edges and assuming 75% availability of DDR2 memory. The DDR2 controller runs at 300 MHz clock. Therefore 

 

memory bandwidth = 300MHz * 8 bytes * 2 * 0.75 = 3.6 GBytes/sec 

 

So I think memory bandwidth is enough to handle three HDMI streams. I could not find the bandwidth calculation sheet. So please send me the link or reply with attachment. 

 

 

constraints: 

------------------------------------------------------ 

 

For constraints, I added video_input.sdc and video_output.sdc. I will add video_buffer.sdc and do the synthesis again. For other constraints, I just defined the clocks, derive_pll_clocks command and set a few paths as false paths which I am sure of. I am using Microtronix DDR2 controller which generates its own SDC file which I have included in the design as well.  

 

I haven't set input and output delays on video signals because I am not sure how to calculate them. Please let me know of any other constraints that I should create. 

 

In the top level diagram, "frame_buffer_f0" uses tripple buffering and other frame buffers use double buffering. Should I set them to use tripple buffering as well? 

 

Input and output frame rate are not an issue in my design. I dont want any frame rate adaptation. I tried to use the recovered hdmi clock (hdmi_rx_clk which is approx 148.5MHz) for all three video_input and video_output modules but I did not get any output. So I generated a local 148.5 MHz clock from PLL (hdmi_syn_clk) which is used in video_input and video_output modules as shown in attached diagram. 

 

I hope I did not bored everyone with these details. I'll appreciate your suggestions and help. 

 

Regards 

Faisal
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

The spreadsheet was posted in this thread: 

http://www.alteraforum.com/forum/showthread.php?t=6841 

 

What value did you use for the size of the read/write master ports of the frame buffers? With a master data port of 64-bit and R:G:B in parallel, the packing of pixels into memory word is really inefficient with only 2 pixels per 64-bit word (see wasted bits in the spreadsheet). 

 

I am not familiar with the Microtronix memory controller so this could make things worse but if your local Avalon-MM bus is currently 64-bit wide then perhaps you could try increasing it to 128 and either reconfigure the local interface of your controller or see if SOPC Builder can handle the switch fabric? Timing closure could be harder and you could waste logic but this would at least take you down from 25% to 7% waste. 

 

Using double buffers for the second and third paths should be fine in your design. I think they are not even needed in this case so you could consider removing them to test whether memory bandwidth is really the issue.
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Hi 

 

The width of read and write masters interfaces in the frame buffer is 128 bits. Yes this wastes some memory but this is not really an issue. Main problem is how to get rid of buffer underflow in video_output_f2 (As shown in the attached) diagram. I previously had buffer overflows in the video_input_f2. Now what I get is buffer underflow in video_output_f2 module. I am not really sure what can be the reason for that.
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

 

--- Quote Start ---  

Are you attempting to do three 1080p frame buffers at the same time? This would require a lot of memory bandwidth. 

 

Assuming standard 1920x1080p60, the total horizontal width is 2200. This means you need to maintain a rate of 1920/2200*148.5=129.6 Megapixels/second in and out of the memory for each frame buffer. If you have three frame buffers, this is a total of 777.6 Megapixels/second. 

 

How many bits per pixel are you doing? How wide is your memory bus? Is anything else using the memory? 

--- Quote End ---  

 

 

 

Hi Kevin 

 

 

Thanks for your answer. I posted a reply showing my bandwidth calculations. Can you please have a look and let me know if I am doing it the right way. 

 

Thanks 

Faisal
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

 

--- Quote Start ---  

input rate = 148.5MHz * 3 bytes * 6 ports = 2.673 GBytes/sec 

--- Quote End ---  

Due to the horizontal blanking interval, you don't actually need to maintain a rate of 148.5MHz. The average rate for a line is actually 148.5*1920/2200. The average rate for the entire frame is even less due to vertical blanking, but you can't really take advantage of that as the FIFOs are not big enough to average out the demand over an entire frame. 

 

As Vgs was mentioning, there is also some inefficiency in the way pixels get packed into a word. Therefore, simply multiplying the pixel rate by 3 bytes is not accurate. 

 

I think a more accurate calculation would be: 

 

bandwidth = 148.5M * 1920 / 2200 * 16 / 5 * 6 = 2.48 Gbytes/second. 

 

The 16 / 5 factor is assuming 128 bit words. You can fit 5 pixels into each 128 bit (16 byte) word. 

 

So it seems you should have enough bandwidth. Perhaps you need to adjust FIFO sizes and thresholds on the CVI, CVO and framebuffer blocks. 

 

Some other random thoughts: 

 

Do you even need to convert between clocked and avalon streaming video multiple times? Could your custom video IP be adapted to process the avalon stream directly? It would simplify your system a lot if you only had one CVI and one CVO to worry about. Do you even need all the frame buffers? Just based on your block diagram, I see no purpose for the first frame buffer. Unless the custom video IP needs to see the video from different delayed time points, I don't really see what the other frame buffers are for either. 

 

As your video IP is running at 166MHz and your memory at 300MHz, are you doing anything to address potential clock domain crossing issues? The default clock domain crossing logic in SOPC builder can be quite inefficient. You may need to explicitly add a pipeline bridge.
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

1 - Make sure your timing is properly constrained and that you are meeting timing requirements. 

2 - If I understand correctly, you've got a 64-bit DDR2 interface. How wide is the local interface with the Microtronix controller. Make sure all the memory masters on your frame buffers are set to match that width. Also, make sure your burst targets are set to at least 32. 

3 - You should get the design working with one video processing path first. Then add to it. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

 

--- Quote Start ---  

 

Some other random thoughts: 

 

Do you even need to convert between clocked and avalon streaming video multiple times? Could your custom video IP be adapted to process the avalon stream directly? It would simplify your system a lot if you only had one CVI and one CVO to worry about. Do you even need all the frame buffers? Just based on your block diagram, I see no purpose for the first frame buffer. Unless the custom video IP needs to see the video from different delayed time points, I don't really see what the other frame buffers are for either. 

 

As your video IP is running at 166MHz and your memory at 300MHz, are you doing anything to address potential clock domain crossing issues? The default clock domain crossing logic in SOPC builder can be quite inefficient. You may need to explicitly add a pipeline bridge. 

--- Quote End ---  

 

 

 

 

Hi Kevin 

 

Thanks for the advice.  

 

You got it right, I am using three frame buffers because the custom video processing block is actually using delayed version the frames. It looks at a particular pixels in the current frame in two previous frames to find out the correlation between the time delayed values of the pixels at the same positions in previous two frames. That is why I am using three frame buffers. 

 

For the clock domain crossing, I inserted pipeline bridges in the previous version but I still had some problems. I will put them back in and see what happens now. As I told that there are six read and write ports in three frame buffers, should I put pipeline bridge for all these six ports? I was assuming that SOPC will automatically see the CDC and put the bridges but I think its much better that I do it myself. 

 

About the buffer sizes in CVI and CVO modules, I am using 4096 pixel buffers each. I get underflow only from the video_out_f2 (the last module) so I increased its FIFO size to 4096 pixels which should be good enough for at least two lines of 1080p video. 

 

Regards 

Faisal
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

 

--- Quote Start ---  

1 - Make sure your timing is properly constrained and that you are meeting timing requirements. 

2 - If I understand correctly, you've got a 64-bit DDR2 interface. How wide is the local interface with the Microtronix controller. Make sure all the memory masters on your frame buffers are set to match that width. Also, make sure your burst targets are set to at least 32. 

3 - You should get the design working with one video processing path first. Then add to it. 

 

Jake 

--- Quote End ---  

 

 

 

 

Hi Jake 

 

Yes you are right, the DDR2 has 64 DQ lines and the local interface of all the frame buffers to the DDR2 controller is 128 bit wide.  

 

I have set the following parameters for the all the frame buffers: 

 

1. Master READ/WRITE interface width: 128 

2. Master READ/WRITE FIFO Depth: 1024 

3. Master READ/WRITE Burst Target: 128 

 

The Microtronix DDR2 controller has 2 settings for each port, which is fifo depth and burst length. I have set them to be 512 each. Do you think that the burst length should be less than FIFO depth?  

 

You said that "you should get the design working with one video processing path first. then add to it".  

I assume that by video processing path you mean CVI and CVO blocks (excluding frame buffers). Can I directly connect these two? I previously tried that but it did not worked. I read in some thread here that it might not work. However I can just put in a scaler with scaling factor of 1 just to delay the data. I hope it wont create any timing violations.  

 

 

 

Regards 

Faisal
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

You may be causing yourself problems having that burst length of 128. With a pixel width of 24 that means 5 pixels to every write/read. That means that each memory master is going to try and write/read 640 pixels each burst. I'd have to do the math but with all your masters writing/reading you may be temporarily starving the CVI and CVO. Try reducing the burst size to 32. 

 

Your original post said you had 3 paths (3xCVI->3xBuffer->3xCVO). Get one path working first. 

 

Also, set your frame buffers to triple buffering until you get things working. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

 

--- Quote Start ---  

You may be causing yourself problems having that burst length of 128. With a pixel width of 24 that means 5 pixels to every write/read. That means that each memory master is going to try and write/read 640 pixels each burst. I'd have to do the math but with all your masters writing/reading you may be temporarily starving the CVI and CVO. Try reducing the burst size to 32. 

 

Your original post said you had 3 paths (3xCVI->3xBuffer->3xCVO). Get one path working first. 

 

Also, set your frame buffers to triple buffering until you get things working. 

 

Jake 

--- Quote End ---  

 

 

 

Hi Jake 

 

Thanks for help. 

 

There are some other parameters I am not very clear about. As I am having underflow in the video output module. There is a parameter setting in CVO in SOPC. It is "FIFO level at which to start output". By default this is set to zero. Can my underflow be caused by this setting? Should I set to some other value like 32 or 64? (I have set FIFO size in CVO to be 4096 pixels). 

 

Second parameter is also related to CVO and it is "active picture line". Is it the line number from which CVO starts to output a frame or it is total number of lines? I have currently set it to 1080. (I am designing for 1080p60 video).  

 

I'll be thankful for your help. 

 

 

Regards 

Faisal
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

> I have set FIFO size in CVO to be 4096 pixels. 

You should set "FIFO level at which to start output" to something close to that value (for instance 4000 or more). Throughput "burstiness" may cause underflow.
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Hi all, i am stillan FPGA noob and i have a similar problem, and it is about bandwidth calculations but using a SDR sdram. 

I hope someone could help me. 

My video input into the deinterlacer and the framebuffer are respectively 320*240i (5 bits per color plane) and 320*240p(5 bits per color plane). 

Deinterlacer performs only bob with no buffering, and frame buffer performs triple buffering. 

My SDR SDRAM controller settings are clock 100MHz, word width 16bits, fifos 64 bits and burst target 16 bits. 

color planes are in parallel, and I've calculated that i need a bandwidth of  

320*240*15(pixels)*50(fps)*6(a read master and a write master for each buffer)= 345.6mb\s 

while my available bandwidth is 16bits*100MHz= 1.6gb/s 

 

I set my avalon MM masters port width to 16, fifo depth to 64 and interface burst target to 16 both for write only and read only masters interface. 

 

CVI fifo is set to 720, cvo to 640, but the entire system doesn't work properly, I still have an input overfolw. 

I think the problem could be accord to bandwidth calculations.. or the settings for the memory mapped itnerfaces are wrong.. 

I REALLY don't know how to handle all this, any hint would be precious. 

One last consideration: at the end off all, BEFORE the frame buffer,i have a Scaler. 

Maybe it is it that stalls my video flow? 

best regards 

Phate
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Hi Faisal, 

 

In one of the first replies, vgs mentioned the fact that the CVO clock needs to be derived form the CVI clock and if it is not, you need to use tripple buffering with frame dropping / repeating enabled. I did not see any reply indicating if you have tripple buffering enable. If the output clock (driving the CVO) is not derived from the video input clock, the output frame rate will never match the input frame rate and you will get underflows / overflows no matter how large you make the FIFOs. If you cannot derive the output clock from the input clock you have to periodically drop or repeat a frame (as would be done by a tripple buffer). 

 

From your diagram it looks as if only the first CVI gets the actual video input clock and all subsequent CVI and CVO gets clocked by your local 148.5MHz. As a test, just enable tripple buffering and frame dropping / repeating on you first framebuffer. If this makes the problem go away, then you know it is due to the different clocks. 

 

Regards, 

Niki
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Just one more question. 

I look at the an427 provided by altera, but I want to make a system a little different (as I've said in lots of other threads). 

In order to use rescalers or clippers instead of buffered deiterlacers, wich shuold be a "fast enough" clock for the video system? 

using a 100 MHz clock, i still have overflow in video input, no matter of how deep I set the DVI FIFO...
0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

thanks .....

0 Kudos
Altera_Forum
Honored Contributor II
1,081 Views

Thanks .......

0 Kudos
Reply