Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16596 Discussions

Altera example matrix multiply produces Compiler warning: non-vectorized load/stores

Altera_Forum
Honored Contributor II
1,494 Views

I am attempting to compile the matrix multiply example located here: https://www.altera.com/support/support-resources/design-examples/design-software/opencl/matrix-multiplication.html 

 

However when compiling I see the following: 

 

aoc: Running OpenCL parser.... /home/mike/ont_core_cpp/ont_core/basecall_nn/ocl/altera_experiments/matrix_mult/device/matrix_mult.cl:105:34: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance __global float *A, ^ /home/mike/ont_core_cpp/ont_core/basecall_nn/ocl/altera_experiments/matrix_mult/device/matrix_mult.cl:106:34: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance __global float *B, ^ 2 warnings generated. aoc: OpenCL parser completed successfully. aoc: Compiling.... aoc: Linking with IP library ... Checking if memory usage is larger than 100% Compiler Warning: Vectorized kernel contains loads/stores that cannot be vectorized. This might reduce performance. +--------------------------------------------------------------------+ ; Estimated Resource Usage Summary ; +----------------------------------------+---------------------------+ ; Resource + Usage ; +----------------------------------------+---------------------------+ ; Logic utilization ; 33% ; ; ALUTs ; 18% ; ; Dedicated logic registers ; 16% ; ; Memory blocks ; 32% ; ; DSP blocks ; 23% ; +----------------------------------------+---------------------------; 

 

I am compiling with the following options: -v --report --fpc --fp-relaxed -cl-fast-relaxed-math -cl-finite-math-only 

 

This happens with aoc Version 17.0.0 Build 290 and also with aoc Version 16.1.2 Build 203. 

 

The part of the kernel which causes the error appears to be: 

 

# pragma unroll for (int k = 0; k < BLOCK_SIZE; ++k) { running_sum += A_local * B_local; } 

 

But I cannot understand why this would be a problem.
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
720 Views

That warning is related to global memory accesses and is normal when SIMD is used. If the compiler fails to fully coalesce such accesses under the presence of SIMD, it will generate that warning. What it is trying to say is that don't expect to get linear performance improvement by using SIMD, if your global memory accesses are not contiguous. However, if your kernel is memory-bound and you use SIMD despite accesses not being contiguous, performance will actually go down. 

 

Needless to say, everyone will get the same message and it is completely safe to ignore in this case.
0 Kudos
Reply