The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, or download the complete SDK.
Please note that you may need to install the latest NVIDIA drivers and CUDA Toolkit to compile and run the code samples.
|
CUDA Getting Started Guide (Windows)
This guide will show you how to install and check the correct operation of the CUDA development tools in Windows. |
|
Open
|
|
|
CUDA Getting Started Guide (Linux)
This guide will show you how to install and check the correct operation of the CUDA development tools in Linux. |
|
Open
|
|
|
CUDA Getting Started Guide (Mac OS X)
This guide will show you how to install and check the correct operation of the CUDA development tools in Mac OS X. |
|
Open
|
|
|
Getting Started with CUDA SDK samples
This guide covers the introductary CUDA SDK samples beginning CUDA developers should review before developing your own projects. |
|
Open
|
|
|
SDK Code Sample Guide New Features in CUDA Toolkit 4.2
This guide covers what is new in CUDA Toolkit 4.2 and the new code samples that are part of the CUDA SDK 4.2. |
|
Open
|
|
|
CUDA C Programming Guide
This is a detailed programming guide for CUDA C developers. |
|
Open
|
|
|
CUDA C Best Practices Guide
This is a manual to help developers obtain the best performance from the NVIDIA CUDA Architecture. It presents established optimization techniques and explains coding metaphors and idioms that can greatly simplify progarmming for the CUDA architecture. |
|
Open
|
|
|
CUDA Occupancy Calculator
The CUDA Occupancy Calculator allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. This tool provides guidance for optimizing the best kernel launch configuration for the best possible occupancy for the GPU. |
|
Open
|
|
|
CUDA Developer Guide for Optimus Platforms
This document provides guidance to CUDA developers and explains how NVIDIA CUDA APIs can be used to query for GPU capabilities in Optimus systems. It is strongly recommended to follow these guidelines to ensure CUDA applications are compatible with all notebooks featuring Optimus. |
|
Open
|
|
|
OpenCL Programming Guide
This is a detailed programming guide for OpenCL developers. |
|
Open
|
|
|
OpenCL Best Practices Guide
This is a manual to help developers obtain the best performance from OpenCL. |
|
Open
|
|
|
OpenCL Overview for the CUDA Architecture
This whitepaper summarizes the guidelines for how to choose the best implementations for NVIDIA GPUs. |
|
Open
|
|
|
OpenCL Implementation Notes
This document describes the "Implementation Defined" behavior for the NVIDIA OpenCL implementation as required by the OpenCL specification Version: 1.0. The implementation defined behavior is referenced below in the order of it's reference in the OpenCL specification and is grouped by the section number for the specification. |
|
Open
|
|
|
CUDA API Reference Manual (PDF)
This is the CUDA Runtime and Driver API reference manual in PDF format. |
|
Open
|
|
|
CUDA API Reference Manual (CHM)
This is the CUDA Runtime and Driver API reference manual in CHM format (Microsoft Compiled HTML help). |
|
Open
|
|
|
The CUDA Compiler Driver (NVCC)
This CUDA compiler driver allows one to compile each CUDA source file, and several of these steps are subtly different for different modes of CUDA compilation (such as generation of device code repositories). It is the purpose of the CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from developers." |
|
Open
|
|
|
PTX: Parallel Thread Execution ISA Version 3.0
This document describes PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device. |
|
Open
|
|
|
Compute Command Line Profiler User Guide
The Compute Command Line Profiler is a command line based profiling tool that can be used to measure performance and find potential opportunities for CUDA and OpenCL optimizations, to achieve maximum performance from NVIDIA GPUs. The Compute Command Line Profiler provides metrics in the form of plots and counter values presented in tables and as graphs. It tracks events with hardware counters on signals in the chip; this is explained in detail in the chapter entitled, "Compute Command Line Profiler Counters." |
|
Open
|
|
|
CUDA Fermi Compatibility Guide
The Fermi Compatibility Guide for CUDA Applications is intended to help developers ensure that their NVIDIA CUDA applications will run effectively on GPUs based on the NVIDIA Fermi Architecture. This document provides guidance to developers who are already familiar with programming in CUDA C/C++ and want to make sure that their software applications are compatible with Fermi. |
|
Open
|
|
|
CUDA Fermi Tuning Guide
An overview on how to tune applications for Fermi to further increase these speedups is provided. More details are available in the CUDA C Programming Guide (version 3.2 and later) as noted throughout the document.. |
|
Open
|
|
|
CUBLAS Library User Guide
The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs. |
|
Open
|
|
|
CUFFT Library User Guide
This document describes CUFFT, the NVIDIA CUDA Fast Fourier Transform (FFT) library. The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floating-point power and parallelism of the GPU without having to develop a custom, GPUbased FFT implementation. |
|
Open
|
|
|
CUSPARSE Library User Guide
The NVIDIA CUDA CUSPARSE library contains a set of basic linear algebra subroutines used for handling sparse matrices and is designed to be called from C or C++. These subroutines can be classified in four categories. |
|
Open
|
|
|
CURAND Library User Guide
The NVIDIA CURAND library provides facilities that focus on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers. |
|
Open
|
|
|
NVIDIA Performance Primitives (NPP) Library User Guide
NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance. |
|
Open
|
|
|
CUDA Profiler Tools SDK Interface (CUPTI) User Guide
The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides four APIs, the Activity API, the Callback API, the Event API, and the Metric API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA. |
|
Open
|
|
|
CUDA Profiler Tools SDK Interface Release Notes
The CUDA Profiler Tools Interface Release Notes. |
|
Open
|
|
|
Thrust Quick Start Guide
Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C. |
|
Open
|
|
|
NVIDIA CUDA H.264 Video Encoder Library User Guide
The NVIDIA CUDA H.264 Video Encoder is a library for performing CUDA accelerated video encoding. The functionality in the library takes raw YUV frames as input and generates NAL packets. This encoder supports up to various profiles up to High Profile @ Level 4.1. |
|
Open
|
|
|
NVIDIA CUDA Video Decoder Library User Guide
The CUDA Video Decoder API gives developers access to hardware video decoding capabilities on NVIDIA GPU. The actual hardware decode can run on either Video Processor (VP) or CUDA hardware, depending on the hardware capabilities and the codecs. This API supports the following video stream formats for Linux and Windows platforms: MPEG-2, VC-1, and H.264 (AVCHD). |
|
Open
|
|
|
CUDA C SDK Release Notes
CUDA C SDK Release Notes. |
|
Open
|
|
|
OpenCL SDK Release Notes
OpenCL SDK Release Notes. |
|
Open
|
|
|
GPU Computing SDK End User License Agreement
This is the Software License Agreement for developers or licensees. |
|
Open
|
|