Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12604 Discussions

question about variables of type double and NIOS2?

Altera_Forum
Honored Contributor II
1,318 Views

hello, 

 

i'm a begineer in hardware implementation and i have a question. 

 

i developped a program using visual c++. 

i used variables of tpe double in my code. 

 

does nios2 can execute this code and gives the same results like the simulation. 

because i heared that it can't support double precission. 

 

is it true? 

 

thanks an advance
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
394 Views

The double precision floating point is supported by library functions. As such it should give the same results but will be slow. 

 

Single precision FP can be supported by custom instructions - they will be less slow than using the library functions.
0 Kudos
Altera_Forum
Honored Contributor II
394 Views

 

--- Quote Start ---  

The double precision floating point is supported by library functions. As such it should give the same results but will be slow. 

 

Single precision FP can be supported by custom instructions - they will be less slow than using the library functions. 

--- Quote End ---  

 

 

good morning, 

 

thanks for the answer. 

 

First what's the solution that can i follow to make my program fast if i will use double precision. 

Second, are there any techniques to maintain the same precision and make it fast. 

i heared about fixed point we multiply the real number by 2^p and work like using the int type. 

 

thanks an advance
0 Kudos
Altera_Forum
Honored Contributor II
394 Views

There are certainly many applications where fixed-point arithmetic is appropriate, it will almost always be faster than using floating point. 

Care does need to be taken to ensure the values don't overflow, and that the correct shifts are applied in all the required places. 

Multiplying two fixed point numbers together requires the 'high' 32bits of the product of two values (actually you'll typically need 32bit that cross the word boundary). This might be problematic as Altera only provide the mulx instructions for the fpga's with onboard DSP functionality, they don't let you throw fpga real estate at the problem (a 32x32 multiply can be done with 1024 3bit->2bit adders with a ripple latency of 64 adders). 

Even with the mulx instructions, extracting the relevant bits may be worthy of a custom instruction - especially if your implied decimal point is always in the same place.
0 Kudos
Altera_Forum
Honored Contributor II
394 Views

good afternoon, 

 

to transform a variable from double to integer i multiply each number by pow(2,P) and i just take the integer part. 

 

here "P" is variable from 8 ,16,32. 

i began by testing the 2D convolution using fixed-point. 

i declared my variables as integer but i have been encountered two problems: 

 

1-first :a null coefficients if i take a low value of P (P=8). 

for example i have this two coefficients:-0,00221819585464577,-0,00877313479158837 that i want to transform it to integer 

if i multiply (-0,00221819585464577)*pow(2,8) and i take the integer part i will have zero as coefficient. it's a problem. 

 

2-secand :an overflow in my algorithm if i take P=16: 

if i multiply (-0,00221819585464577) by pow(2,16) i will have an overflow.so a bad result. 

 

please how can i do to solve this problem (to implement this algorithm an hardware). 

 

thanks an advance
0 Kudos
Altera_Forum
Honored Contributor II
394 Views

You need the 64bit result from the integer multiply of two fixed point values, then you can extract the required bits to get the fixed point product. 

(int)(((long long)x * y) >> 16) 

should give the correct value. 

However that multiply is likely to be a function call. 

If you are on an fpga with DSP multipliers then the mulxss instruction can be used to generate the high 32 bits. 

Otherwise it might be faster to do the 32x32 multiply yourself as four 16x16 multiplies. 

You need to take extra care for negative values! 

You might also persuade yourself that adding the HL+LH+LL products can't actually overflow 32 bits (I'm not sure).
0 Kudos
Reply