Altera example matrix multiply produces Compiler warning: non-vectorized load/stores

Altera_Forum · ‎05-22-2017

I am attempting to compile the matrix multiply example located here: https://www.altera.com/support/support-resources/design-examples/design-software/opencl/matrix-multiplication.html

However when compiling I see the following:

aoc: Running OpenCL parser....
/home/mike/ont_core_cpp/ont_core/basecall_nn/ocl/altera_experiments/matrix_mult/device/matrix_mult.cl:105:34: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance
                 __global float *A,
                                 ^
/home/mike/ont_core_cpp/ont_core/basecall_nn/ocl/altera_experiments/matrix_mult/device/matrix_mult.cl:106:34: warning: declaring kernel argument with no 'restrict' may lead to low kernel performance
                 __global float *B, 
                                 ^
2 warnings generated.
aoc: OpenCL parser completed successfully.
aoc: Compiling....
aoc: Linking with IP library ...
Checking if memory usage is larger than 100%
Compiler Warning: Vectorized kernel contains loads/stores that cannot be vectorized. This might reduce performance.
+--------------------------------------------------------------------+
; Estimated Resource Usage Summary                                   ;
+----------------------------------------+---------------------------+
; Resource                               + Usage                     ;
+----------------------------------------+---------------------------+
; Logic utilization                      ;   33%                     ;
; ALUTs                                  ;   18%                     ;
; Dedicated logic registers              ;   16%                     ;
; Memory blocks                          ;   32%                     ;
; DSP blocks                             ;   23%                     ;
+----------------------------------------+---------------------------;

I am compiling with the following options: -v --report --fpc --fp-relaxed -cl-fast-relaxed-math -cl-finite-math-only

This happens with aoc Version 17.0.0 Build 290 and also with aoc Version 16.1.2 Build 203.

The part of the kernel which causes the error appears to be:

       # pragma unroll
        for (int k = 0; k < BLOCK_SIZE; ++k)
        {
            running_sum += A_local * B_local;
        }

But I cannot understand why this would be a problem.

Altera_Forum · ‎05-22-2017

That warning is related to global memory accesses and is normal when SIMD is used. If the compiler fails to fully coalesce such accesses under the presence of SIMD, it will generate that warning. What it is trying to say is that don't expect to get linear performance improvement by using SIMD, if your global memory accesses are not contiguous. However, if your kernel is memory-bound and you use SIMD despite accesses not being contiguous, performance will actually go down.

Needless to say, everyone will get the same message and it is completely safe to ignore in this case.