'real' bit-accuracy, and bypassing 'integer' to make 'signed/unsigned' constant value

Altera_Forum · ‎07-19-2015

Greetings All:

I am currently working on some DSP filtering. I create constant values for the coefficients required based on the generic values. For example...


calc_coef := to_signed(integer(MATH_E**(-(MATH_2_PI * (actual_fc_real(rep)/real(clock_rate)))) * 2.0**(coef_width - 1)), coef_width);

The 'actual_fc_real(rep)' is a 'real' generic and the 'clock_rate' is an integer in this case, but could be a 'real' generic. The 'coef_width' scales the coefficient to the 'signed' fixed point. It works fine most of the time, however, every now and then I receive an error if the coef_width is >= 32, and coefficient value is closer to 1.0, sometimes it passed through using > 32 bits when it's 0.0003~ value for example.

The two questions I have are:

First: What is the bit-accuracy is the 'real' type in VHDL? I noticed the value I used...


-- 3.1 MATH_REAL package declaration
------------------------------------------------------------------------
--
-- Copyright 1996 by IEEE. All rights reserved.
--...
constant  MATH_2_PI : REAL := 6.28318_53071_79586_47693;
--...

...has 21 decimal digits, which means log2(10**21) = ~69 bit-accuracy, which exceeds the 'double' floating point used on the PC computer. Anyway, the reason I'm looking, float uses 32-bit format, with only 24-bit mantissa. I couldn't find much information of that on the web, and if the accuracy of the 'real' is around 24-bits, there's no point calculating a > 32-bit signed for the coefficient.

Second: I usually use the width of the hardware math functions using 36-bits if the 18-bit range is passed. 18-bit isn't very accurate for coefficients near 0.0 or 1.0, ie. the 0.0003~ I mention earlier. I have been limited to the 32-bit because of the 'integer' range specified by VHDL. Is there a way to bypass the intermediate 'integer' phase since the 'real' handles the numbers (providing accuracy, or course) and the 'signed' handle the width, the 'integer' range blocks the path I'm looking for.

Bonus Question: If the 'integer' limit isn't passable, I was planning to send the generic values to my computer and calculate the information coefficients and return them as a signal. I'm also working on an assembler compiler, which needs access to the same generic values as well. It's very simple to read an 'integer' generic value, request it with an rs232 command, and return the 4 bytes on rs232 TX port. Is there a way to extract the float or double's 4 or 8 bytes of a 'real' generic so I can send the information to the computer, preferably in the float or double format.

I hope my questions are interesting and beneficial for all involved.

David K.

Altera_Forum · ‎07-19-2015

Have a look at the float_pkg (that is part of the VHDL 2008 spec). I wouldnt try using any of the float types for actual logic, but for constant calculation it should work and bypass the integer. You can get a '93 compatible version (that should work with quartus) here:

http://www.vhdl.org/fphdl/

Altera_Forum · ‎07-20-2015

Tricky:

That sounds like a good idea, and yes I found that avoiding to floating-point logic saves a lot of space. There are other good and bad parts using floating-point, for example, if the 0.0003~ coefficient is used, floating holds the accuracy, however, if there's its counterpart which is 1.0 - 0.0003~, the result is 0.9997, which floating-point loses ~12-bit accuracy, which is why I prefer the 36-bit fixed rather than using a 32-bit float.

I have been looking at the '93 compatible version vhdl files, and I see there's a lot of homework to find all the information. The first thing I noticed that may halt my progress is...

...
package float_pkg is
-- generic (
  -- Defaults for sizing routines, when you do a "to_float" this will be
  -- the default size.  Example float32 would be 8 and 23 (8 downto -23)
  constant float_exponent_width : NATURAL    := 8;
  constant float_fraction_width : NATURAL    := 23;
...

...the 24-bit accuracy as defined. I see the value can be adjusted, bit I'm not sure are any limits internally. I usually have my own width for accuracy so I can test compilation with 16 or 18-bit width many times until it works, then increase it to the 36-bit width for the last build, it's takes way longer.

Before I was looking at the '93 compatible version, I was considering making my own real to signed/unsigned for the coefficient conversion for constants. That's lots of binary compare/divide/subtract loops. The send double formatted bytes over rs232 sounds more complicated, but if I have accurate 36-bit (or larger if required) I can easily send the signed/unsigned bytes after that conversion.

Any suggestions?

David K.

Altera_Forum · ‎07-20-2015

I have very little floating point experience, and what little I do have has used the altera fp IP cores. But Ive mostly used 8/23 format.

You should be able to just convert the float_pkg to the appropriate sizes. In VHDL 2008 you can instantiate the package with generics to set the size of your local package without having to change the source, but this is not supported by Quartus yet (it is in modelsim though). I highly recommend you do not do any logic with this package, as it will be very very slow as it is not pipelined.