Parallelism is the future of computing and is been used in many domains such as high-performance computing (HPC), graphic accelerators, many large control and embedded systems, automotive with great success. Graphics Processing Unit (GPU) is a highly effective utilization of parallel processing which provides a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPUs were originally hardware blocks optimized for a small set of graphics operations. As demand arose for more flexibility, GPUs became ever more programmable. Early approaches to computing on GPUs cast computations into a graphics framework, allocating buffers ⁄ arrays and writing shades /kernel functions. Several research projects looked at designing languages to simplify this task; in late 2006, NVIDIA introduced its CUDA architecture and tools to make data parallel computing on a GPU more straightforward. Not surprisingly, the data parallel features of CUDA map well to the data parallelism available on the NVIDIA GPUs. GPU architectures are vastly programmable, they offer high throughput and data intensive operations.