Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16606 Discussions

QSYS feature-request: timing interface type

Altera_Forum
Honored Contributor II
2,097 Views

Hi all, 

 

given the version 15's interface-types, it is unnecessarily difficult to distribute control signals in a (nested) QSYS component and thus implement serial data processing. Streaming data-paths resemble directed graphs, while the distribution of control-signals should have tree-topology. Imagine a component with two streaming-sinks that requires the two streams to be in sync (same valid-signal f.e.). 

 

I would like to propose a new interface-type, that will ease streaming designs. 

 

== timing-interfaces == 

 

The new QSYS interface-type "timing" is asymmetric, there are sources and sinks, much like the avalon_steaming interface-type. 

Only output signals are allowed on sources, only input signals on sinks. 

The interface-type will allow 1:N connections, like the clock- and reset-types do. 

 

On the other hand, signal-roles should be completely free as in conduit interfaces. Roles on source an sink need not match completely, an error is only raised, if a sinks signal is missing on the source or has different width. Every sink has the opportunity to use any signal it needs and silently ignore everthing else (so you will not need to change the sinks definition once you introduce additional signals). 

 

HDL-generation for this new interface-type would be straightforward, no big problem. 

 

== implications for streaming == 

 

Control-Signals for multiple synchronous streaming-connections can be distributed in a tree-topology. It is more efficient not to have logic interpreting these signals in every component and simply distribute a bit-counter and lsb/msb-signals for a serial stream. And QSYS can do some useful checking ... 

 

Avalon interfaces refer to "associatedClk" and -Rst for good reasons. The streaming variant would be even more useful if there was a property to make a reference to a timing. That would allow serial data transfer with implicit synchronization. So I suggest an "associatedTiming"-property for avalon_streaming, that is checked to reference the same timing for source and sink. 

 

But how is the reference made? For avalon_streaming sources and sinks, it should have the form <timing_sink_iface>.<id>, where <timing_sink_iface> is the name of a timing-sink on the same component. The timings are looked up in the connected timings-sources "declaredTimings" property, which is of type STRING[]. Two timings match, if they reference the same <id> on the same interface on the same instance. If they do not match, an error is raised. 

 

== namespace issues == 

 

There will be no namespace clashes, timings defined by foreign IP will be different, even if the same id is used (as it is defined on an interface inside that IP). 

 

== multiple timings from one source == 

 

A timing-source can provide multiple timings at once, the declaredTimings-property is a list therefore. The relation of the delived signals to these timings is up to the developer and is not restricted. It should even be possible to declare a timing without generating any signals. 

 

If such a STRING has the form iface.id (that is: contains one dot), it should reference a timing of another timing-sink <iface> on the same component, so it is possible to pass a timing through. 

 

== relation to clock and reset == 

 

If you think about it, reset- and clock-signals integrate very neatly with that concept. To some extent, they are timing-signals too. They could be integrated as mandatory roles in the the timing-source. That would of course break each and every existing QSYS-design, but let's spend a few thoughts on this, it will also explain the concept.... 

 

Clock and reset would then always be routed together in one connection. Having them separate is nonsense anyway for synchronized resets, because associatedClk is checked and will throw errors. 

 

The distiction between different synchronizations of the reset-signal could be made with different "declaredTimings". The clock/reset-source should always export "reset". If it provides synchronous deassert, the it should *add* "reset-sync-deassert", if it provides synchronous assert, then add "reset-sync-assert", if it provides synchronization for both edges, then add "reset-sync-both" to the declaredTimings. Timing-receivers could reference the behaviour they need to make sure, it is provided. 

 

The clock-bridge would by default pass the reset-properties it receives on its input. _hw.tcl-commands to enumerate the declaredTimings on an timing-sink should make that possible. 

 

== standards for ip == 

 

Is there any benefit in standards for the usage of timing-interfaces in foreign IP? In most cases, signals passed via timing interface will not be of much use. Signals named startofpacket, endofpackt, valid, empty and also ready could be interpreted the same way as in avalon_streaming. 

Passing these signals in a timing_interface would make sure that the signals are identical for multiple streams (in some way a complement to the beatsPerCycle-property which does the opposite by introducing multiple control signals for one streaming interface). 

 

In case of the ready-signal, which travels from sink to source, the component with the streaming source would have to implement a timing sink and vice versa. If you need consolidated versions of valid and ready for example, you would need timing-interfaces in both directions. A special case, since you can only reference *one* timing in my proposal. 

 

== migration scenario for clock and reset == 

 

A migration scenario for existing designs would not be trivial though. You would disallow interface types "clock" and "reset" after "require qsys 16.0" and allow the new "timing"-type. But you must expect to wire up a nested QSYS 15 component in a QSYS 16 system and vice versa. The automatic insertion of a bridge would be necessary. 

 

Have a good time, 

Andreas
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
512 Views

Hi again, 

 

you may wonder why I make such a proposal that looks very complicated at first... 

I can explain it, it has to do with the design that I'm working on. 

 

== the design == 

 

It all began with a bitserial CORDIC-unit for cartesian to polar conversion. That unit has to processes only 12,000 complex inputs per second, it would  

be a waste to make a parallel design (doing trigonometric functions in 12,000 IRQ/s was not an option). My little coprocessor is therefore serial with  

respect to CORDIC-rounds and also uses serial adders to minimize resource-usage. 

 

Real and imaginary part both have 23 bit precision. Each round takes 23 cycles and I have 22 rounds (always precision-1). That gives a total of 506  

cycles per computation. 12,000 inputs take 6,072,000 cycles, no problem at 100MHz. (the choice of 23 bits comes from the restriction that I want <512  

cycles, more cycle would need more M9K blocks.) 

 

As internal tristate logic is not available in the CYCLONEs, the bit-shifter for CORDIC is a pain. Ray Andraka gives the hint to dual-port memory  

(http://www.andraka.com/files/crdcsrvy.pdf) and that was the solution. A M9K-block in true dual port, reading on both ports and additionally writing on  

one can do the magic. It needs a very complicated adressing scheme though, but it can also implement signed extension for the shifted values on the fly.  

 

I quickly decided to implement RAM-address generation by simply reading it from a ROM-block, a good decision as more control signals needed to be added  

later on for a divider, some scalers and some filters (interacting with their own RAM-block and also needing addressing, read/write-enables and so on).  

By now, I generate 28 bits of control signals from this ROM, all periodic on 506 cycles. 

 

== how it's done in qsys == 

 

Divide and conquer, everything has it's own component, some QSYS subsystems, some with composition-callback, some with elaboration-callback. 

 

What I now have is a central "synchronizer"-component in my design, that exports those control signals to every component that needs some of them. There  

are 20 conduits for the control signals. The synchronizer uses an elaboration-callback and has a table for the conduits, so I can add one every time I  

need it. 

 

== the problem == 

 

The problem is now that the same set of control signals is routed to every receiver. If I add a signal, all receivers must be changed to include the  

signal, even if they do not use it. And, by the way, the system looks very complicated with all those control-conduits. 

 

The data-paths between all those components are also made up of conduits. I would really like to use avalon_streaming, but I often have components that  

need data from two bittrains with identical timing (of course related to the control signals). The only way to do this with avalon_streaming is to slice  

a multibit data role, so both bittrains have the same valid and ready-signals. Slicing is a pain when it comes to Verilog-Generation. 

 

It would be possible to use some Altera-provided IP in some places, but I would need two components for the two bittrains travelling on the same  

streaming-interface. So I often end up doing my own components for those standard tasks, most times with a TABLE-parameter to have the names for the  

data-trains configurable. 

 

The beatsPerCycle-porperty gave me some hope, it has a little more documentation in the recent Avalon-spec. The docs are confusing though (is default 8  

correct?!?), and when you test it, it shows, that beatsPerCycle does not what I need. 

 

== solution? == 

 

A 1:N connect would be very helpful, lazy role checking (sink is subset of source) even more. 

Associating control signals with data streams would be a dream. 

 

 

Greets, Andreas
0 Kudos
Altera_Forum
Honored Contributor II
512 Views

a diagram would help

0 Kudos
Altera_Forum
Honored Contributor II
512 Views

I will see what I can do. Will take a few days.

0 Kudos
Altera_Forum
Honored Contributor II
512 Views

OK, I've made a rough diagram of the design I'm working on, to illustrate the problem. 

 

 

Light-Blue boxes are QSYS-components. Grey-boxes are QSYS-subsystems. 

 

Blue arrows is the data-plane, dark-blue is parallel data transfer (one transaction per cycle) and light blue is serial data transfer (23-bit values in 23 cycles, lsb first). All of these are avalon_streaming. 

 

 

You see that the data-plane is made up of point-to-point connections. Sometimes a component has incoming connections from two different source components (a directed graph). 

 

The pink connections are the clock/reset-"tree". Clock and reset distributed in a hierarchical manner and do not follow the data signal. As data-signals are useless without associated clock, QSYS has provisions to keep track of the clock-associations of these signals and will complain, if you try to connect signals with different associations. 

 

All components in "loop-controller" work on serial data. The synchonization on a common 506-cycle schedule is provided by a "synchronizer" (orange color) components, that delivers not only a schedule-counter, but also helper-signals (read/write-enables, adresses for different adessing-schemes and so on), a total of 28 bit control-bus. 

 

Feeding these control signals through avalon_streaming is not a good idea. Not all receivers need all signals. Components with two inputs (like the "costas" component) would also need to decide on which version of the control signals to act, it receives from "preserve" and "quotient". It is of course possible to ignore one set of control-signals, but as a component-designer, I would like to enforce that both inputs are synchronous with respect to the same control-signals. It would be much more adequate to distribute these control signals in a tree. 

 

Now the problems with the tree: 

  1. I need to use conduits, because other interface-types have restrictions on signals-roles 

  2. Conduits are 1:1, so I need a configurable number of output interfaces on the synchronizer and I am forced to use an elaboration-callback. 

  3. Every time I add a new control-signal, I need to adapt all receiver components, even the ones that do not make use of the new signal. 

  4. Association between avalon_streaming-inputs and the control-signals is not checked. 

 

 

 

My request is to have a customizable 1:N connect. 

 

The typical use of such a connect will be control-signals, that are needed to interpret the streaming-data, much like the clock/reset. If streaming-data can only be interpreted with the help of control-signals, then source and sink should have an association to those control-signals and QSYS should enforce this association to be the same. 

 

Greets, Andreas
0 Kudos
Altera_Forum
Honored Contributor II
512 Views

The pictures are a bit small to view, and I'd like to print them out to study your explanation further 

 

The concept of streaming data is exactly that everything comes along with the data. In the case of the 'controller concept' you would only need output to input connections between the blocks, given that the 'controller' times it all? Maybe it would be easiest to design the loop-controller as a hand-coded component (in VHDL, Verilog or MyHDL) and not trying to coerce Qsys to do something it was not really meant for.
0 Kudos
Altera_Forum
Honored Contributor II
512 Views

Hi josyb, 

 

 

--- Quote Start ---  

The pictures are a bit small to view, and I'd like to print them out to study your explanation further 

--- Quote End ---  

 

Apologies, I was not aware that the forum will scale them down. 

 

 

--- Quote Start ---  

The concept of streaming data is exactly that everything comes along with the data. 

--- Quote End ---  

 

That's not true. clock and reset do not come with the data. They are distributed in an independent topology. 

 

I would be fine to use the clock- and reset-interface types if only I could extend them to include my own control-signals. 

 

 

 

--- Quote Start ---  

In the case of the 'controller concept' you would only need output to input connections between the blocks, given that the 'controller' times it all? 

--- Quote End ---  

 

What would yor suggest for components like "Costas" that do receive two input streams? The realization *can* be very tiny because these inputs streams are in sync.  

 

I could of course let the optimizer infer that the control-signals for both inputs are in fact the same. But if I do so, I must at least do a realization that also works with two different control inputs (if I want to keep the component reusable). And I'm not sure that the optimizer will do the same radical simplification that I can do with the knowledge that the control-inputs will always be identical. I do not feel comfortable with that solution. 

 

 

--- Quote Start ---  

Maybe it would be easiest to design the loop-controller as a hand-coded component (in VHDL, Verilog or MyHDL) and not trying to coerce Qsys to do something it was not really meant for. 

--- Quote End ---  

 

I must admit that I did not look into alternatives deeply. But the QSYS tcl-scripting seemed a very powerful instrument. 

 

The controller is an interesting unit. It is in Verilog, but the signals are not formulated as HDL-logic. Instead they are read of an initialized ROM. And the ROM is initialized from a MIF that is generated by the _hw.tcl, dependent on parameters. 

There are two reasons for this: 

  1. I need very strange address-sequences for some RAMs (in the controlled components), that are hard to formulate as logic, but easy to precompute in TCL. 

  2. The generated signals also include some bitserials constants. These are results of trigonometric functions, the input parameters depend on component-parameters. Again, very easy to precompute in TCL. 

 

 

 

Of course, all that is doable with QSYS 15, I do not really need that "enhancement". But when I look at my design, I feel that I could have done better, if QSYS had these features. 

And I have seen other posts with people asking about 1:N conduits, so I will not be the only one to benefit from that. 

 

I'm sure that Altera themselves will continue to develop and improve QSYS, just wanted to put a "vote" for customizable 1:N connect.
0 Kudos
Altera_Forum
Honored Contributor II
512 Views

 

--- Quote Start ---  

Hi josyb, 

 

 

Apologies, I was not aware that the forum will scale them down.  

--- Quote End ---  

 

--- Quote Start ---  

 

That's not true. clock and reset do not come with the data. They are distributed in an independent topology. 

 

--- Quote End ---  

 

I maybe should have said: "Everything comes along with the data, except clock and reset which are distributed globally" 

 

--- Quote Start ---  

 

I would be fine to use the clock- and reset-interface types if only I could extend them to include my own control-signals. 

 

--- Quote End ---  

 

--- Quote Start ---  

 

 

What would you suggest for components like "Costas" that do receive two input streams? The realization *can* be very tiny because these inputs streams are in sync.  

 

--- Quote End ---  

I have a simple Qsys component to combine two streams into one, and I have written several components that accept two (or more) input streams. No sweat. 

 

--- Quote Start ---  

 

I could of course let the optimizer infer that the control-signals for both inputs are in fact the same. But if I do so, I must at least do a realization that also works with two different control inputs (if I want to keep the component reusable). And I'm not sure that the optimizer will do the same radical simplification that I can do with the knowledge that the control-inputs will always be identical. I do not feel comfortable with that solution. 

 

 

I must admit that I did not look into alternatives deeply. But the QSYS tcl-scripting seemed a very powerful instrument. 

 

--- Quote End ---  

 

 

You mention Verilog, but if you'd step up to SystemVerilog, or VHDL - what I use(d)- you could use a struct (SV) or record (VHDL) to encapsulate your control signals making adding/deleting of a control signal quite easy, because every component picks what it is interested in.  

 

--- Quote Start ---  

 

The controller is an interesting unit. It is in Verilog, but the signals are not formulated as HDL-logic. Instead they are read of an initialized ROM. And the ROM is initialized from a MIF that is generated by the _hw.tcl, dependent on parameters. 

There are two reasons for this: 

  1. I need very strange address-sequences for some RAMs (in the controlled components), that are hard to formulate as logic, but easy to precompute in TCL. 

  2. The generated signals also include some bitserials constants. These are results of trigonometric functions, the input parameters depend on component-parameters. Again, very easy to precompute in TCL. 

 

 

--- Quote End ---  

 

I am a big Qsys fan too, but lately I have begun coding in myhdl (http://www.myhdl.org) and there you can have the elaboration (like generating the addresses you mention) alongside the RTL code. Together with a self-checking testbench too. 

 

--- Quote Start ---  

 

 

Of course, all that is doable with QSYS 15, I do not really need that "enhancement". But when I look at my design, I feel that I could have done better, if QSYS had these features. 

And I have seen other posts with people asking about 1:N conduits, so I will not be the only one to benefit from that. 

 

I'm sure that Altera themselves will continue to develop and improve QSYS, just wanted to put a "vote" for customizable 1:N connect. 

--- Quote End ---  

 

I think that Qsys is quite complete. The addition of 'distributed control' like you propose would add considerable complexity both to the view as to the implementation. 

 

Regards, 

 

Josy
0 Kudos
Altera_Forum
Honored Contributor II
512 Views

Thats valuable advice, thank you. 

I decided to learn Verilog first when I began one year ago. Learned much about the concepts --- and the restrictions. Found that those restrictions can be partially lifted with generated HDL like from QSYS. 

But I always wondered if really complex projects can be done with those concepts. 

I also have a SystemVerilog-Book but did not dig into that until now. 

 

I was not even aware of MyHDL, looks promising at first glance. I think I will have a closer look. 

 

Regards, Andreas
0 Kudos
Altera_Forum
Honored Contributor II
512 Views

As a Tcl user you will love MyHDL - as it is Python ;) 

If you need a hand, you can find me on all MyHDL channels (GitHub, IRC, Gmane) 

 

Regards, 

 

Josy
0 Kudos
Reply