cuda - Optimise Algorithm Using Dynamic Parallelism -


i have following code fragment , experimenting features of new kepler architecture. kernel called several times in loop fixed num_iterations. think shifting loop parent kernel i.e., kernel overhead lesser when invoked gpu compared cpu?

would possible use dynamic parallelism increase performance of algorithm below? if so, suggest similar use case dynamic parallelism me implement in own program?

for (i = 0; < num_iterations; i++) {     kernelgpu<<<gridsize, blksize>>>(         d_a,         d_b,         d_c,         d_d,         d_e,         r,         v,         n     ); } 

i implemented putting loop in parent kernel , using dp performance became worse (around 50% slower).


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -