Sobel filter with opencl on FPGA

Altera_Forum · ‎05-15-2017

I am going through the altera opencl implementation of sobel filter for FPGAs (code is shown below).

I understand that they are using line buffers, which are then implemented as shift registers on the FPGA. Now they initialize the line buffer to have 2 rows and 3 extra pixels (which is needed for a window). I understand the sliding window concept where each pixel is buffered, and when enough pixels are available, the output is produced.

In this implementation, assume a 4 x 4 image. Now 11 pixels are buffered (2 * COLS + 3). Initially count is used to initialize the buffer to zero (I don't think this is necessary because the initial array itself can be initialized to zero). Now when count is zero, the pixel at position zero is obtained ( rows[0] = count >= 0 ? frame_in[count] : 0 ), and multiplied with the corresponding kernel coefficient and place at frame_out [0]. But in the normal convolution, to calculate the value at position zero, we need the neighboring pixels, which in this case is (0,1,4,5 and border pixels). So why are they placing only one value at position zero. Am I missing something here?.

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

In the matrix above, to calculate the value for pixel 5, we convolve the kernel with pixels 0,1,2,4,5,6,8,9,10. The same thing happens in the code below, but the output is stored at frame_out[10] (actually should be at frame_out[5]?).

code reproduced below:

frame_in is the input image, frame_out is the output image, iterations is the image size (rows*cols) and threshold is 128.

// Permission is hereby granted, free of charge, to any person obtaining a copy of this

// software and associated documentation files (the "Software"), to deal in the Software

// without restriction, including without limitation the rights to use, copy, modify, merge,

// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to

// whom the Software is furnished to do so, subject to the following conditions:

// The above copyright notice and this permission notice shall be included in all copies or

// substantial portions of the Software.

//

// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,

// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES

// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND

// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT

// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,

// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING

// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR

// OTHER DEALINGS IN THE SOFTWARE.

//

// This agreement shall be governed in all respects by the laws of the State of California and

// by the laws of the United States of America.

# define ROWS 5# define COLS 5

// Sobel filter kernel

// frame_in and frame_out are different buffers. Specify restrict on

// them so that the compiler knows they do not alias each other.

__kernel

void sobel(global int * restrict frame_in, global int * restrict frame_out,

const int iterations, const unsigned int threshold)

{

// Filter coefficients

int Gx[3][3] = {{-1,-2,-1},{0,0,0},{1,2,1}};

int Gy[3][3] = {{-1,0,1},{-2,0,2},{-1,0,1}};

// Pixel buffer of 2 rows and 3 extra pixels

int rows[2 * COLS + 3] = {0};

// The initial iterations are used to initialize the pixel buffer.

// int count = -(2 * COLS + 3);

int count=0;

while (count != iterations) {

// Each cycle, shift a new pixel into the buffer.

// Unrolling this loop allows the compile to infer a shift register.

# pragma unroll

for (int i = COLS * 2 + 2; i > 0; --i) {

rows = rows[i - 1];

}

rows[0] = count >= 0 ? frame_in[count] : 0;

int x_dir = 0;

int y_dir = 0;

// with these loops unrolled, one convolution can be computed every

// cycle.

# pragma unroll

for (int i = 0; i < 3; ++i) {

# pragma unroll

for (int j = 0; j < 3; ++j) {

unsigned int pixel = rows[i * cols + j];

unsigned int b = pixel & 0xff;

unsigned int g = (pixel >> 8) & 0xff;

unsigned int r = (pixel >> 16) & 0xff;

// rgb -> luma conversion approximation

// avoiding floating point math operators greatly reduces

// resource usage.

unsigned int luma = r * 66 + g * 129 + b * 25;

luma = (luma + 128) >> 8;

luma += 16;

x_dir += pixel * gx[j];

y_dir += pixel * Gy[i][j];

}

int temp = abs(x_dir) + abs(y_dir);

unsigned int clamped;

if (temp > threshold) {

clamped = 0xffffff;

} else {

clamped = 0;

}

if (count >= 0) {

frame_out[count] = clamped;

}

count++;

}