parallel processing - How is Loop skewing making loop parallelizable? -
i reading loop tranformation techniques , have hard time understanding how loop skewing makes loop parallelizable here are 2 loops, initial 1 , seconde one, difference betwwen 2 ? 2 of them depends on previous iteration on both , j, make second loop parallisable ? or why can make interchange on second 1 , not first 1 ? both of them have dependencies on , j
for(int =2; < 5; i++){ for(int j =2; j < 5; j++){ a[i][j] = a[i-1][j] + a[i][j-1]; } } for(int =2; < 5; i++){ for(int j =2+i; j < 5+i; j++){ a[i][j-i] = a[i-1][j-i] + a[i][j-1-i]; } }
i have no credit this, formatted , copied source, , hope you
[source: ece 1754, survey of loop transformation techniques, eric laforest, march 19, 2010]
it distance between 2 executive iterations, in first distance 1 between 1 outer loop , inner loop, there dependency between them.
loop skewing says: skews execution of inner loop relative outer one. useful if inner loop has dependence on outer loop prevents running in parallel. example, following code has dependency vector of {(1, 0),(0, 1)} .neither loop can parallelized since each carry dependency. interchanging loops merely interchange indices holding dependencies, accomplishing nothing.
do = 2, n-1 j = 2, m-1 a[i,j] = (a[a-1,j] + a[i,j-1] + a[i+1,j] + a[i,j+1]) / 4 end end loop skewing implemented adding index of outer loop, times skewing factor f, bounds of inner loop , subtracting same value uses of inner loop index. subtraction keeps indices within new loop bounds, preserving correctness of program. effect on inner loop iterations shift position in array forwards f relative current outer loop, increasing dependency distance outer loop in same manner. in other words, given dependency vector (a, b), skewing transforms (a, f + b). since transformation preserves lexicographic order of dependencies, legal. applying skew factor of 1 above inner loop yields following code:
do = 2, n-1 j = 2+i, m-1+i a[i,j-i] = (a[a-1,j-i] + a[i,j-1-i] + a[i+1,j-i] + a[i,j+1-i]) / 4 end end this new code executes in same manner, dependencies of {(1, 1),(0, 1)}. both loops still carry dependency. however, interchanging loops @ point yields dependence vector {(1, 0),(1, 1)}, shown in following code:
do j = 4, m+n-2 = max(2, j-m+1), min(n-1, j-2) a[i,j-i] = (a[a-1,j-i] + a[i,j-1-i] + a[i+1,j-i] + a[i,j+1-i]) / 4 end end the inner loop can parallelized since has no loop-carried dependency on j, , dependency carried outer loop.note interchanging skewed loop bounds no longer straightforward: each loop must take account upper , lower bounds of other loop.
Comments
Post a Comment