cuda - Optimise Algorithm Using Dynamic Parallelism -

i have following code fragment , experimenting features of new kepler architecture. kernel called several times in loop fixed num_iterations. think shifting loop parent kernel i.e., kernel overhead lesser when invoked gpu compared cpu?

would possible use dynamic parallelism increase performance of algorithm below? if so, suggest similar use case dynamic parallelism me implement in own program?

for (i = 0; < num_iterations; i++) {     kernelgpu<<<gridsize, blksize>>>(         d_a,         d_b,         d_c,         d_d,         d_e,         r,         v,         n     ); }

i implemented putting loop in parent kernel , using dp performance became worse (around 50% slower).

Search This Blog

Brazzel

cuda - Optimise Algorithm Using Dynamic Parallelism -

Comments

Post a Comment

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

Reading inputs from Keyboard in Objective C -

javascript - jQuery show full size image on click -