c - cuBLAS synchronization best practices -

i read 2 posts on stack overflow, namely will cublas kernel functions automatically synchronized host? , cuda dynamic parallelizm; stream synchronization device , recommend use of synchronization api, e.g., cudadevicesynchronize() after invocations cublas functions. i'm not sure makes sense use such general purpose function.

would better follows? [correct me if i'm wrong]:

cublashandle_t cublas_handle; cudastream_t stream; // initialize matrices cublas_call(   cublasdgemm(cublas_handle, cublas_op_n, cublas_op_n, m, m,      m, &alpha, d_a, m, d_b, m, &beta, d_c, m)); // cublasdgemm non-blocking! cublasgetstream(cublas_handle, &stream); cudastreamsynchronize(stream); // safe copy result (d_c) device // host , use

on other hand, cudadevicesynchronize can used preferably if lots of streams/handles used perform parallel cublas operations. "best practices" synchronization of cublas handles? can cublas handles thought of wrappers around streams, in sense serve same purpose point of view of synchronization?

if using single stream, doesn't make difference whether synchronize 1 stream or use cudadevicesynchronize(). in terms of performance , effect should same. note when using events time part of code (e.g., cublas call) it's practice call cudadevicesynchronize() meaningful measurements. experience, doesn't impose significant overhead and, besides, it's safer time kernels it.

if application uses multiple streams, makes sense synchronize against stream want. believe this question helpful you. also, can read cuda c programming guide, section 3.2.5.5.

Search This Blog

Brazzel

c - cuBLAS synchronization best practices -

Comments

Post a Comment

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

Reading inputs from Keyboard in Objective C -

javascript - jQuery show full size image on click -