VIP Deinterlacer in Motion Adaptive Mode Questions

Altera_Forum · ‎09-09-2009

Hi,

I have a couple of questions regarding the use of the Altera Deinterlacer in Motion Adaptive mode.

How much memory bandwidth does the core require (assuming I use double buffering)? The way I see it, for each pixel that arrives, the core has to (1) store that pixel, (2) read three historic pixels, (3) write the result in one buffer and (4) read the result out again for outputting on the ST link. That is a total of 6 reads/writes per pixel. So if my input frame rate and output frame rates are identical, and my input data rate is X mbps, then I need at least 6X memory bandwidth. Is this correct?

The next question is how much margin do I need to add for a system to work in practice? The way the deinterlacer core accesses memory must add some overhead, as well as the memory controller itself. I am planing on using the HP DDR2 controller, connected to a 32-bit DDR2 SDRAM and 1080i60 RGB (24-bits) as input, and I need 1080p30 out. The effective pixel rate of 1080i60 (taking into account ratio of active pixels vs. total pixels per line) is about 65MPixels/s. At 24-bit RGB, this is 1.56gbps. Assuming that I am correct in the 6x bandwidth requirement, then the memory bandwidth required is 9.36gbps. If I run my DDR2 memory at 166MHz the total memory bandwidth is (theoretically) 10.624gbps. This is about a 13% margin. Is this safe? I guess I could verify the margin with simulation (given enough time). Is that the preferred way?

In the VIP user guide, the section on the deinterlacer talks about a "motion value" being calculated and stored for different regions of the image. Are these values also stored in the external memory or are they stored in the internal memory?

Does the de-interlacer work equally well for YUV (4:4:4) and RGB (4:4:4)?

Lastly, what is the quality of the deinterlacer? If I were to use it in a settop box type application (home entertainment), would it produce acceptable results? I know this is very subjective...so any opiniosn are welcome. (I do not currently have the hardware to evaluate it myself).

Thanks!

Niki

Altera_Forum · ‎09-09-2009

Hello,

The algorithm you suggest is reasonable but this is not what Altera's deinterlacer is doing at the moment:

1) each incoming field is stored in memory

2) to produce one progressive output frame, the core has to read back 4 fields (including the current field)

For reference purpose here is how to compute the bandwidth requirements for the 9.0 motion adaptive deinterlacer for 1080i60 RGB (24-bits) -> 1080p RGB (24-bits) with double-buffering assuming that the data port with of the memory masters is 256 bits.

1) data sample packing efficiency:

the Avalon MM master interfaces pack 10 pixels per 256-bit words (10 x 24 = 240 bits) which means 1.6 bits are wasted for each 24 bit pixel

2) bandwidth required to write the 1080i60 input to memory:

1920 x 540 x (24+1.6 bits) x 60 fields/s = 1.593 Gbit/s

3a) bandwidth required to read 4 fields from memory to output 1080p30 (input frame rate = output frame rate)

1920 x 540 x 4 x (24+1.6 bits) x 30 frame/s = 3.185 Gbit/s

3b) bandwidth required to read 4 fields from memory to output 1080p60 (input field rate = output frame rate, smoother algorithm but may cause shimmering)

1920 x 540 x 4 x (24+1.6 bits) x 60 frame/s = 6.37 Gbit/s

Total for 1080p30: 1.593 + 3.185 = 4.778 Gbit/s

Total for 1080p60: 1.593 + 6.37 = 7.963 Gbit/s

Motion values are stored and read back from external memory if motion bleed is on. This will add:

4) motion sample packing efficiency

the Avalon MM master interfaces pack 32 motion values per 256-bit words (32 x 8 = 256 bits) which means there is no waste

5a) bandwidth required to write a motion field AND read a motion field assuming a 1080p30 output:

2 x 1920 x 540 x (8 bits) x 30 frame/s = 0.498 Gbit/s

5b) bandwidth required to write a motion field AND read a motion field assuming a 1080p60 output:

2 x 1920 x 540 x (8 bits) x 60 frame/s = 0.996 Gbit/s

Total for 1080p30 with motion bleed: 4.778 + 0.498 = 5.276 Gbit/s

Total for 1080p60 with motion bleed: 7.963 + 0. 996 = 8.959 Gbit/s

> This is about a 13% margin. Is this safe?

It is recommended to plan designs with a 25% margin and to use long bursts (32 or 64) but it seems your margin is much better than necessary.

> Does the de-interlacer work equally well for YUV (4:4:4) and RGB (4:4:4)

Yes, the algorithm would give slightly different results but this would be difficult to spot.

Kind regards,

vgs

---

edit: corrected bandwitch computation for the motion bleed

Altera_Forum · ‎09-09-2009

Hi Vgs,

Thanks for the thorough answer. Maybe I understand this incorrectly, but I think you have left out the accesses required to perform the double buffering. I basically agree with the accesses required for the 4 fields, but you still have to write the final calculated value to one of the two buffers, while the output section is busy reading out pixels form the other buffer. So you have two additional memory accesses to account. Not so?

Regards,

Niki

Altera_Forum · ‎09-09-2009

> I think you have left out the accesses required to perform the double buffering

No, the accesses you are mentioning do not exist. Once they are computed, the final calculated values are sent straight to the avalon streaming output and not to the memory. The architecture of Altera's deinterlacer is perhaps slightly different from what you are used to.

The double buffering functionality should be understood as "behaving like a double buffer" and it occurs before the deinterlacing is done. The deinterlacer behaves like a double buffer because input fields are sent to memory buffers owned by a "writer component". Meanwhile a "reader/deinterlacer component" is busy reading back four fields from a different set of buffers to produce a progressive output. In double-buffering mode, the writer and the reader swap buffers once both have completed their task. The reader component owns more buffers than the writer component (because it needs four fields) and it does not give all of them back at once, only the oldest.

I realize this is not crystal clear but I hope this will clear the misunderstanding.

vgs

Altera_Forum · ‎09-09-2009

Hi Vgs,

Thanks again! What you say makes sense and is surely a more optimal use of memory bandwidth.

Regards,

Niki

Altera_Forum · ‎09-09-2009

nikisteenkamp,

I've attached a spreadsheet that performs the calculations for you. The fields that you control are those highlighted in blue. Maybe it's useful for you. It is what it is. Let me know if you have any questions.

Jake

Altera_Forum · ‎09-10-2009

Hi Jake,

Thanks - it is definitely useful!

If your local memory width is 128 bits (if you are using 32-bit DDR2 memory), would it make any sense to increase the memory data width of the deinterlacer to a value above 128 (256 is the next value). I guess this would result in larger burst transfers which would increase efficiency (maybe slightly), but would also increase internal memory requirements?

Altera should consider putting some memory bandwidth requirements information in the user guide. Especially for this core where the internal working of the core is not so obvious to the user.

Regards,

Niki

Altera_Forum · ‎09-10-2009

No there is no reason to make the deinterlacer memory interface larger than the actual memory interface. This will simply waste resources and make it more difficult to meet timing. If your local memory interface is 128-bits wide, you should make your deinterlacer memory masters 128-bits wide. If you want to perform larger bursts, simply up the burst count.

Jake