Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16614 Discussions

For Loop Synthesis Problem

Altera_Forum
Honored Contributor II
2,812 Views

I am trying to write a function that set all bits to the right of the most significant set bit of the input word. For some reason the follow Verilog code does not work correctly in Quartus II 10.0. The code synthesizes as if the different iterations of the for loop are concurrent, even though I'm using a blocking assignment (non-blocking assignment is not allowed in functions). For example, 4h'0080 generates 4h'00e8 instead of 4h'00ff. 

Any possible explanation is appreciated. 

function mask (input tap); integer index; mask = tap; for (index = 1; index < 32; index = index * 2) begin mask = mask | (mask >> index); end endfunction
0 Kudos
12 Replies
Altera_Forum
Honored Contributor II
1,346 Views

Did you mean index = index + 1 instead of index = index * 2?

0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

Good Question. Actually both should work, with index * 2 requiring log_2(n) iterations while index + 1 requiring n iterations. 

 

For example, using index * 2, and an input of 4h'0800, after successive iterations we have: 

 

32b'00000000110000000000000000000000 // index == 1 

32b'00000000111100000000000000000000 // index == 2 

32b'00000000111111110000000000000000 // index == 4 

32b'00000000111111111111111100000000 // index == 8 

32b'00000000111111111111111111111111 // index == 16 

 

However, in the end the loop actually produces 

 

32b'00000000110100010000000100000000
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

The function should be synthesizable, since there is a combinational statement that is equivalant: 

 

mask(tap) == {|(4h'8000 & tap), |(4h'C000 & tap), |(4h'E000 & tap), ..., |(4h'FFFC & tap), |(4h'FFFE & tap), |(4h'FFFF & tap)}
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

I don't know Verilog enough to know how the code should work. I know that in a VHDL process, if you use a variable it should work as expected. 

When you talk about iterations you are thinking like a software designer instead of hardware. When you think about the hardware that is generated from your code, there isn't any iterations, everything is done in parallel. I guess that after optimization, both the algorithm with index+1 and index*2 would generate the same hardware.
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

 

--- Quote Start ---  

When you talk about iterations you are thinking like a software designer instead of hardware. 

--- Quote End ---  

 

 

Actually all loops have iterations, but in the case of HDL the iterations define hardware on the surface of the chip, not state as in software. 

 

The index = index * 2 configuration does not work as expected, that's why I'm posting. Is this a bug in Altera's implementation or am I misunderstanding something about Verilog?
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

I wonder, how you determined that the result is wrong? I could see, that Quartus assigned the correct result when using the function with a constant argument.

0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

I know it's wrong because I assign the output of the function to a register that serves as a mask for comparision purposes. However the comparisions were showing too many true results when I ran the hardware, and I realized that the mask had some zeros where only 1's should be (to the right of the left-most 1). However, when I change the index = index * 2 to index = index + 1, I get the correct result. 

 

This tells me that the the synthesis engine is unrolling the loop incorrectly. I think that it's unrolling the loop as: 

 

A: tap | (tap>>1) | (tap>>2) | (tap>>4) | ... // notice tap>>3 is missing 

 

whereas it should unroll the loop recursively as in: 

 

B: tap | (tap>>1) | ((tap | (tap>>1))>>2) | (((tap | (tap>>1))>>2) | (((tap | (tap>>1))>>2)>>4) ... // tap>>3 covered here 

 

Now notice that A and B are equivalent when the loop definition uses index = index + 1 instead of index = index * 2 

 

So basically the question is: are blocking assigns supposed to be truely blocking in for loops? If so, then Quartus has a bug.
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

Maybe I should rephrase the question a little. 

 

Does the textual order of blocking assigns inside a for loop follow the order in which the assigns are "executed" in the loop? 

 

I know that the assigns are not actually executed in the loop, but some synthesis processing occurs nonetheless. Does this processing build on results of prior iterations of the loop, or is the loop considered completely stateless, so that the synthesis of prior iterations depends on later iterations?
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

 

--- Quote Start ---  

I know it's wrong 

--- Quote End ---  

 

As I said, I got a correct result with your above function, so I know it can be right. To make the problem observed in your usage of the function understandable, you should give a code example, that allows to reproduce it. At present, I don't expect, that the problem is related to the function itself. 

 

For the function coding, i would prefer a form that represents the intended operation more clearly: Search the leftmost bit != 0, set all bits right of it. But the result should be identical in any case.
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

That's interesting. I'm going to dump the register out to the LCD to make sure I see what I think I see. I'll report back later. Thanks FvM.

0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

 

--- Quote Start ---  

Actually all loops have iterations, but in the case of HDL the iterations define hardware on the surface of the chip, not state as in software. 

--- Quote End ---  

 

Actually it isn't entirely true. The loops are unrolled by the synthesizer, that looks at the generated logic function, and only then it will synthesize logic to realize that function. So you won't get a chunk of logic for each loop iteration in the code. As an example, I synthesized that process (sorry, it's VHDL, I know it better than Verilog): process(input) is variable tap : unsigned(31 downto 0); variable counter : integer; begin tap := input; counter := 1; while counter < 32 loop tap := tap or (tap srl counter); counter := counter * 2; -- counter := counter + 1; end loop; output <= tap; end process;  

It synthesized the same logic with counter = counter + 1, even if the loop was executed more times by the synthesizer. In both cases 32 logic elements were used. 

Personnaly I would go for the counter = counter + 1 syntax, which I find easier to read, or even better write a function that looks for the MSB and put the least significant ones to 1, as FVM said. It will generate the same logic in the end, while being more comprehensive. 

 

As for your original question it could be a bug in how the synthesizer sees your code, because I don't see any reason why it wouldn't work with *2. Did you try to simulate it in Modelsim to see if you get the same result?
0 Kudos
Altera_Forum
Honored Contributor II
1,346 Views

Daixiwen, I did not try the simulator. I'm not setup on ModelSim and I understand there's a learning curve. 

 

I dumped the function out to the 7-segment display on the dev board. Guess what? There's no error now. I tried it with index*2 and index+1 and got the same correct result. I also tried compiling in Quartus 9.1. Same result. I'm not sure what happened. 

 

I ended up writing the function without the barrel shifter as follows. Thanks for everyone's help and replies. 

 

function mask (input tap); integer index; mask = tap; for (index = 30; index >= 0; index = index - 1) begin mask = tap | mask; end endfunction
0 Kudos
Reply