- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the expected behavior when the same kernel is called multiple times on the FPGA, where the input is the output buffer as well.
Suppose I have a vector increment kernel that I call K consecutive times. Moreover, the kernel call launches only a single work-group of some dimension N. What is the behavior of the FPGA board? A - Does it run each kernel fully pipelined, i.e. the will the first work-item of the (i+1)-th call be pipelined with the last work-item of the i-th call? B - Will the i-th call completely finish before the (i+1)-th call start? This case is trivial, I can always add K to the vector instead of calling K times the increment kernel. But suppose the FFT case, where I'm confronted with unrolling all the stages in the same kernel, thus calling several times barrier(CLK_LOCAL_MEM_FENCE) which reduces the kernel performance, or calling several radix-n kernels. If the hypothesis B holds, then the former strategy might be better, but if A holds then the latter should deliver a greater performance. Which one is expectable?Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Scenerio B is expected.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page