- each thread gets an ID, by that ID it fetches its triangle from the memory (3 integers each) and by those 3 it fetches its vertices(3 floats each).
- Then it transforms the vertices into integer grid points (currently 8x8x8) and transforms them into the triangle bounds on that grid
- To transform the 3 points into bounds, it finds the min/max of each dimension among each of the points
Since the programming language I am using is missing a minmax intrinsic, I made one myself, looks like this:
procedure MinMax(a, b, c): local min, max if a > b: max = a min = b else: max = b min = a if c > max: max = c else: if c < min: min = c return (min, max)
So on the average it should be 2.5 * 3 *3 = 22.5 comparisons which ends up eating up way more time than the actual triangle – edge intersection tests (around 100 * 11-50 instructions). In fact, I found that pre-calculating the required buckets on the cpu (single threaded, no vectorization), stacking them in a gpu view along with bucket definition and making the gpu do ~4 extra reads per thread was 6 times faster than trying to figure out the bounds on the spot. (note that they get recalculated before every execution since I’m dealing with dynamic meshes) So why is the comparison so horrendously slow on a gpu?
Asked By : user29075
Answered By : Wandering Logic
compare (a > b) assign (max = a if a>b) assign (min = b if a>b) assign (max = b if not(a>b)) assign (min = a if not(a>b)) compare (c > max) assign (max = c if c>max) compare (c < min if not(c>max)) assign (min = c if not(c>max) and c<min)
It may be the case that on some GPUs this conversion of conditionals to predication is slower if the GPU is doing it itself. As pointed out by @PaulA.Clayton, if your programming language and architecture has a predicated conditional move operation (especially one of the form if (c) x = y else x = z) you might be able to do better. (But probably not much better). Also, placing the c < min conditional inside the else of c > max is unnecessary. It certainly isn’t saving you anything, and (given that the GPU has to automatically convert it to predication) may actually be hurting to have it nested in two different conditionals.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/39871 3.2K people like this