c - cuBLAS synchronization best practices -
i read 2 posts on stack overflow, namely will cublas kernel functions automatically synchronized host? , cuda dynamic parallelizm; stream synchronization device , recommend use of synchronization api, e.g., cudadevicesynchronize()
after invocations cublas functions. i'm not sure makes sense use such general purpose function.
would better follows? [correct me if i'm wrong]:
cublashandle_t cublas_handle; cudastream_t stream; // initialize matrices cublas_call( cublasdgemm(cublas_handle, cublas_op_n, cublas_op_n, m, m, m, &alpha, d_a, m, d_b, m, &beta, d_c, m)); // cublasdgemm non-blocking! cublasgetstream(cublas_handle, &stream); cudastreamsynchronize(stream); // safe copy result (d_c) device // host , use
on other hand, cudadevicesynchronize
can used preferably if lots of streams/handles used perform parallel cublas operations. "best practices" synchronization of cublas handles? can cublas handles thought of wrappers around streams, in sense serve same purpose point of view of synchronization?
if using single stream, doesn't make difference whether synchronize 1 stream or use cudadevicesynchronize()
. in terms of performance , effect should same. note when using events time part of code (e.g., cublas call) it's practice call cudadevicesynchronize()
meaningful measurements. experience, doesn't impose significant overhead and, besides, it's safer time kernels it.
if application uses multiple streams, makes sense synchronize against stream want. believe this question helpful you. also, can read cuda c programming guide, section 3.2.5.5.
Comments
Post a Comment