cuda - Optimise Algorithm Using Dynamic Parallelism -


i have following code fragment , experimenting features of new kepler architecture. kernel called several times in loop fixed num_iterations. think shifting loop parent kernel i.e., kernel overhead lesser when invoked gpu compared cpu?

would possible use dynamic parallelism increase performance of algorithm below? if so, suggest similar use case dynamic parallelism me implement in own program?

for (i = 0; < num_iterations; i++) {     kernelgpu<<<gridsize, blksize>>>(         d_a,         d_b,         d_c,         d_d,         d_e,         r,         v,         n     ); } 

i implemented putting loop in parent kernel , using dp performance became worse (around 50% slower).


Comments

Popular posts from this blog

javascript - jquery or ashx not working -

opencv - DataType<cv::detail::deriv_type>::depth what is it used for -

python 3.x - Mapping specific letters onto a list of words -