kernel - How to run a multi-queue code using OpenCL? -
for example, i'm doing image processing work on every frame of video.
every frame's processing using 200ms including writing, processing , reading. , fps 25, in case every 2 frames' distance 40ms. processing slow show continuous result.
so here idea, use multi-queues work.
in cpu part,
while(video not over) { 1. read frame0; processing **frame0** using **queue0**; wait 40 ms; 2. read frame1; processing frame1 using **queue1**; wait 40 ms; 3.4.5. ...(after 5 frames(just 200ms's processing time)) 6. download **frame0**'s result. 7. read frame5; processing frame5 using **queue0**; wait 40 ms; ... } the code means that, use different queues reading , processing same frame in video.
however, experiment result faster, 2 times faster, not in imaginary speed.
can tell me how deal it? thx!
assuming have 1 device, here thoughts on point:
- main reason have multiple command queues (cq) per single opencl device ability execute kernels & io operations simultaneously.
- usually 1 cq enought load single device @ ~100%. though, multi-cq idea (in opinion), you're feeding gpu workload.
- look @ kernel execution time. may be, it's big enough, device executing kernels & can't go faster.
- i think, don't need wait 40ms. solution process frames in queue, in put eliminate difference between bitstream & display order.
- if have many cq, opencl driver thread busy maintaining them, performance may decrease.
Comments
Post a Comment