Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16606 Discussions

kernel panic from kernel with 512 workgroup size

Altera_Forum
Honored Contributor II
1,430 Views

Hello! 

 

I got some undesired behavior. (Opencl 14.0 on CentOs 6.5 with Pico m506) 

I have two identical kernels that differs only in __attribute__((reqd_work_group_size(32,1,1)))  

One has workgroup size 32, other 512. 

My host program has this: 

size_t global_work_size[1] = { NDRANGE };  

size_t local_work_size[1] = { WORK_GROUP_SIZE };  

status = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 1, write_event, &kernel_event); 

I have problem with launching kernel with 512 work group size with big NDRANGE value. It is ok with NDRANGE = 512, but with NDRANGE = 100000 i have kernel panic (listed below). 

I don't have this problem with kernel with workgroup size = 32 and NDRANGE = 40 000 000. 

Crash report: 

<4>  

<4>Pid: 3987, comm: aclkmdq Tainted: P --------------- 2.6.32-431.20.3.el6.x86_64 #1 (https://picocomputing.zendesk.com/tickets/1) ASUS All Series/Z87-PLUS  

<4>RIP: 0010:[<ffffffff8152a4f3>] [<ffffffff8152a4f3>] down_write+0x23/0x40  

<4>RSP: 0018:ffff8807ad497d40 EFLAGS: 00010246  

<4>RAX: 0000000000000068 RBX: 0000000000000068 RCX: 00000000000ff940  

<4>RDX: ffffffff00000001 RSI: ffff8807f48aa000 RDI: 0000000000000068  

<4>RBP: ffff8807ad497d50 R08: 0000000000000000 R09: 00000000ffffffff  

<4>aclpci_close (185):  

<7>aclpci = ffff88080d87a000, pid = 3978, dma_idle = 0  

<4>R10: 000000a47dceaa6c R11: 0000000000000001 R12: 0000000000000100  

<4>R13: ffff8807f48aa000 R14: ffff8807ad497fd8 R15: ffffe8ffffc0f048  

<4>FS: 0000000000000000(0000) GS:ffff880028340000(0000) knlGS:0000000000000000  

<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b  

<4>CR2: 0000000000000068 CR3: 00000007e882a000 CR4: 00000000001407e0  

<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000  

<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400  

<4>Process aclkmdq (pid: 3987, threadinfo ffff8807ad496000, task ffff8807f1a22040)  

<4>Stack:  

<4> 0000000000000000 ffff88080f773500 ffff8807ad497d80 ffffffffa01458f2  

<4><d> 0000000000000000 ffff8807ad4e8030 ffff88080d87a000 00000001000633ad  

<4><d> ffff8807ad497db0 ffffffffa01434b9 ffff8807ad497db0 ffff88080d87a000  

<4>Call Trace:  

<4> [<ffffffffa01458f2>] aclpci_release_user_pages+0x32/0x70 [aclpci_drv]  

<4> [<ffffffffa01434b9>] unlock_dma_buffer+0x39/0x80 [aclpci_drv]  

<4> [<ffffffffa01441a0>] ? wq_func_dma_update+0x0/0x20 [aclpci_drv]  

<4> [<ffffffffa0143bf3>] aclpci_dma_update+0xe3/0x690 [aclpci_drv]  

<4> [<ffffffffa01441a0>] ? wq_func_dma_update+0x0/0x20 [aclpci_drv]  

<4> [<ffffffffa01441b7>] wq_func_dma_update+0x17/0x20 [aclpci_drv]  

<4> [<ffffffff81094a20>] worker_thread+0x170/0x2a0  

<4> [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40  

<4> [<ffffffff810948b0>] ? worker_thread+0x0/0x2a0  

<4> [<ffffffff8109abf6>] kthread+0x96/0xa0  

<4> [<ffffffff8100c20a>] child_rip+0xa/0x20  

<4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0  

<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20  

<4>Code: c3 e8 62 77 b4 ff 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 0a ed ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 48 85 d2 74 05 e8 fe 4c d6 ff 48 83 c4 08 5b c9  

<1>RIP [<ffffffff8152a4f3>] down_write+0x23/0x40  

<4> RSP <ffff8807ad497d40>  

<4>CR2: 0000000000000068
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
422 Views

The Linux kernel panic happens during DMA transfer, which is done by the host on the CPU with a help of Altera OpenCL Linux kernel driver (aclpci_drv). My guess here is that you're trying to DMA more memory than you actually allocated. Double-check sizes passed to clCreateBuffer versus clEnqueueReadBuffer or clEnqueueWriteBuffer. I suspect that the clEnqueueRead/WriteBuffer got a larger size argument than the one to clCreateBuffer.

0 Kudos
Altera_Forum
Honored Contributor II
422 Views

No, all sizes are equal.

0 Kudos
Altera_Forum
Honored Contributor II
422 Views

I am also stuck up in the same position. have you found out the solution for that?

0 Kudos
Reply