What is "memory coalescing"?

Problem Detail: I came to know that the graphic processing unit have something called memory coalescing. On reading on it I was not clear on the topic. Is this any way related to Memory Level Parallelism. I have searched in Google but was not able to obtain a satisfactory answer. It would be helpful if someone gives a more comprehensive, easy-to-understand explanation.

Asked By : sai kiran grandhi

Answered By : Realz Slaw

“Coalescing” can also refer to coalescing memory access patterns. In this usage, coalescing is used to mean making sure that threads run simultaneously, try to access memory that is nearby. This is usually because:

Memory is usually retrieved in large blocks from RAM.
Some processing units will try to predict future memory accesses and cache ahead, while yet processing older parts of memory.
Memory is cached in a hierarchy of successively larger-but-slower caches.

Therefore, making programs that can use predictable memory patterns is important. It is even more important with a threaded program, so that the memory requests do not jump all over; otherwise the processing unit will be waiting for memory requests to be fulfilled. Diagrams inspired by Introduction to Parallel Programming: Lesson 2 GPU Hardware and Parallel Communication Patterns: Below: Four threads, with uniform memory access. The black dashed rectangle represents a single 4-word memory request. enter image description here The memory accesses are close, and can be retrieved in one go/block (or the least number of requests). However, if we increase the “stride” of the access between the threads, it will require many more memory accesses. Below: four more threads, with a stride of two. 4 memory coalesced threads, and 4 threads with a stride of 2 Here you can see that these 4 threads require 2 memory block requests. The smaller the stride the better. The wider the stride, the more requests are potentially required. Of course, worse than a large memory stride is a random memory access pattern. These will be nearly impossible to pipeline, cache or predict.