Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20705 Discussions

cyclone3 : maximum DDR input rate with LVDS

Altera_Forum
Honored Contributor II
1,446 Views

I'm designing a 10 megapixel still-and-video camera, and am trying to determine whether cyclone3 FPGAs are fast enough to interface to the HiSPi interface of this MT9J003 image sensor (CCD). This chip transmits the data via the HiSPi interface, which has 1 clock lane (signal) and 4 data lanes (signals). All 5 lanes are subLVDS, which I run through 3.125Gbps LVDS repeater chips (one DS25BR440 and one DS25BR100) to generate very clean, high-quality LVDS signals to feed into the cyclone3 FPGA (EP3C5 or EP3C10 or larger). The HiSPi interface is DDR, which means one data-bit must be captured from each data lane on each edge of the clock lane (rising and falling edges). In other words, the clock signal is 360MHz, but due to DDR sampling, the data-capture rate on each data lane is 720MHz. 

 

Since the image sensor IC generates a clock signal with rising and falling edges perfectly aligned with the data signals, presumably the cyclone3 PLL would not be involved in the data capture circuit. Presumably the clock signal would be fed directly into the LVDS clock on the same bank of the FPGA as the LVDS data signals. Also note that the image sensor is only 10mm from the LVDS repeater ICs, and the LVDS repeater ICs are only 10mm from the FPGA... AND... all 10 traces on the PCB can be kept exactly the same length between the image-sensor-pads and the LVDS-repeater-pads AND between the LVDS-repeater-pads and the FPGA-input-pads. Also note that we can send the data clock through two LVDS-repeaters with opposite input-polarity to generate two clocks that are exactly 180-degrees out of phase (so FPGA can sample data on rising edge of each clock, if this helps the FPGA). The output capacitance of the LVDS repeaters are only 1.2pF (typical). 

 

When I read the cyclone3 handbook, I see the following [maybe] relevant information (all the following information is given for C6, C7, C8 speed parts in that order). 

 

page 1-15 : clock tree performance: 402 MHz, 437 MHz, 500 MHz 

page 1-15 : PLL input clock frequency: 472 MHz 

page 1-22 : LVDS receiver timing (?clock cycle rate?): 320 MHz, 370 MHz, 437 MHz 

page 1-22 : LVDS data receive rate: 640 Mbps, 740 Mbps, 875 Mbps (must be via DDR) 

page 1-22 : data sampling window: 400ps, 400ps, 400ps 

page 1-22 : input jitter tolerance: 500ps, 500ps, 550ps 

 

So far, so good, though it appears I must adopt C7 parts to achieve my 720 Mbps rate. 

 

My problem is, I originally read the older cyclone3 handbook, which has vastly more detailed timing specifications than the current (new, "final") handbook. In that handbook was listed best-case setup and hold times for sampling input data, and the absolute best case setup + hold time was about 2 nanoseconds (even worse for LVDS inputs). Now, assuming we sample on both the rising and falling edges, we need the data to be valid for 2 nanoseconds + 2 nanoseconds (for the two edges), plus we need to allow 100ps ~ 1000ps for the two rise-and/or-fall times in the data lanes. Even if we allow zero time for data rise-and-fall, 4 nanoseconds implies a maximum LVDS speed of 250MHz and a maximum data receive rate of 500Mbps. 

 

Clearly the contradiction is the setup + hold time, which is about 2.5 to 3.0 nanoseconds in the older handbook, as compared to only 0.4 nanoseconds in the current handbook (the so-called "sample window", which must be just another name for "setup + hold" time). 

 

# 1: Should I supply complementary clocks to the FPGA (to sample data on both edges)? 

# 2: Will my design work reliably (assuming the logic in the FPGA is correct)? 

# 3: Why the *huge* difference between the older versus new handbook? 

# 4: Is the cyclone4 better for this (5 LVDS inputs = 4 data + 1 clock). 

 

Thanks in advance to anyone who explains these issues to me.
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
455 Views

Hi, 

 

I obtained a data sheet from the MT9J003. 

We have some experience with serial high-speed octal ADCs with LVDS outputs. So far I managed to interface these at 300 MHz with CycloneII C6 devices. I expect that the CycloneIII C6 devices should handle the 360 MHz from this sensor. The sensor has a provision to shift the clock channel in respect to the data channels and shift separate data channels, helping in aligning the data into the FPGA. Unfortunetaly there is no separate framing lane, so you have to sync on the fly or use the training patterns to align the words. 

 

You can do with the single clock channel provided, it is well centered in the middle of the data and you don't need a pll to shift the incoming clock to capture the data correctly.  

It may take a while to constrain the inputs (SDC) but it will work reliably in the end. The sensor has nice test-data patterns to help. 

 

Cyclone IV may be better and/or faster to constrain. 

 

I suggest you just start a small deserialiser project and simulate it .
0 Kudos
Altera_Forum
Honored Contributor II
455 Views

Thanks for sharing your experiences. 

 

I assume your reason for saying I only need the one clock channel (and not an additional inverted one) is that the innards of the FPGA can capture data on both the rising and falling edges (somehow). Otherwise, I don't understand, because the data is DDR, meaning a new bit is provided on each data-lane for both edges of the clock. 

 

What is SDC? 

 

Also, are you saying that once I experiment to achieve a good solid setup, we do not need to rebalance or actively fiddle the clock or data delays on each assembled product? Or we must make this process of sending in test patterns and trying different delays as part of our startup routine in the FPGA? 

 

From my quick view, except for the dedicated serial inputs on the cyclone4, the cyclone4 parts are no faster than the cyclone3 parts. This is extremely strange to me, but that's what I believe I'm seeing. The other reason the cyclone4 parts seem useless to me is, they (the smaller units) only have 4 dedicated serial inputs, and with the HiSPi interface that has 4 data and 1 clock lane, I would need 5 of them to make a sensible design. Also, it seems off hand these dedicated serial inputs are configured to recover clocks from data, not have a dedicated clock lane to complement the data (like the HiSPi interface has). 

 

I guess I could try some kind of simulation, but probably trying one with the real MT9J003 image sensor would be just about as easy (with its test patterns), and more definitive. Or not?
0 Kudos
Altera_Forum
Honored Contributor II
455 Views

You're quite right: the logic will capture on both the rising and falling clock edge. 

SDC are the Synopsis Design Constraints you have to provide so the Quartus Fitter, using the TimeQuest timing analyzer, can do a good job. 

Once you have constrained and fitted the logic it will work for every production board. As we are still talking moderate speeds over here there is no need for special training and adjusting logic. Receiving data from the sensor is a one way point to point connection, so the sensors clock to output delays are well known. If the physical routed lengths of the channels are kept equal (to some extent, they may differ by a few mm) you do not need to take the board delay into account. 

 

You do not need the high-speed inputs of the Cyclone IV, just use the normal LVDS channels of the devices (there may be a speed difference between the 'row' and the 'column' banks). The major advantage of Cyclone IV is mostly power consumption and of course for the GX-family the high-speed channels and the PCIe hard IP. Apart from that they seem to have an equal amount of resources as Cyclone III. 

 

Simulating is always a good idea, you can use the QuartusII internal simulator ( that's what I do) or use the external Modelsim simulator. The Modelsim simulator needs some model of the sensor (to obtain from the vendor, or make yourself) whereas the waveforms used by the internal are easy to create, even down to picoseconds precision. 

Once correctly simulated, it will also work on the board.
0 Kudos
Reply