Location
This course will take place in person in Edinburgh
Bayes Centre G.03, The University of Edinburgh
Overview
This short course will provide an introduction to GPU computing with CUDA aimed at scientific application programmers wishing to develop their own software. The course will give a background on the difference between CPU and GPU architectures as a prelude to introductory exercises in CUDA programming. The course will discuss the execution of kernels, memory management, and shared memory operations. Common performance issues are discussed and their solution addressed. Profiling will be introduced via the current NVIDIA tools.
The course will go on to consider execution of independent streams, and the execution of work composed as a collection of dependent tasks expressed as a graph. Device management and details of device to device data transfer will be covering for situations where more than one GPU device is available. CUDA-aware MPI will be covered.
The course will not discuss programming with compiler directives, but does provide a concrete basis of understanding of the underlying principles of the CUDA model which is useful for programmers ultimately wishing to make use of OpenMP or OpenACC. The course will not consider graphics programming, nor will it consider machine learning packages.
Note that the course is also appropriate for those wishing to use AMD GPUs via the HIP API, although we will not specifically use HIP.
Pre-requisite Programming Languages:
Attendees must be able to program in C or C++ (course examples and exercises will limit themselves to C). A familiarity with threaded programming models would be useful, but no previous knowledge of GPU programming is required.
Course attendees should bring a laptop, but do not need GPU hardware. Access to a GPU machine (Cirrus) will be provided.
They are also required to abide by the Cirrus policies.
Timetable
DAY ONE
- 09:30 - 10:00 Logistics and logging in
- 10:00 - 10:30 Introduction and GPU architectures
- 10:30 - 11:00 The CUDA/HIP programming model
- 11:00 - 11:30 Break
- 11:30 - 12:00 CUDA programming: kernels
- 12:00 - 13:00 A first CUDA exercise: operation on a vector
- 13:00 - 14:00 Lunch
- 14:00 - 14:30 Programming: memory considerations
- 14:30 - 15:30 Exercise: operation on a matrix
- 15:00 - 15:20 Break
- 15:20 - 15:45 Unified/managed memory
- 15:45 - 16:00 Exercise: managed memory
- 16:00 - 16:20 Threaded programming and synchronisation
- 16:20 - 17:00 Exercise: Reduction for vector product
- 17:00 Close
DAY TWO
- 09:00 - 09:10 Detour: using profiling
- 09:10 - 10:00 Exercise: nsight and nsight systems
- 10:00 - 10:40 Device Management. The idea of streams; its extension to CUDA Graph API
- 10:40 - 11:00 Exercise: graph API
- 11:00 - 11:30 Break
- 11:30 - 12:00 More than one GPU: GPU to GPU transfers GPU aware MPI
- 12:00 - 13:00 Exercises (cont.)
- 13:00 - 14:00 Lunch
- 14:00 - 14:10 Put it all together: Conjugate gradient (CG) algorithm
- 14:10 - 15:00 CG exercise
- 15:00 - 15:20 Break
- 15:20 - 15:50 CG exercise (cont.)
- 15:50 - 16:00 Some miscellaneous observations
- 16:00 Close
Feedback
Please let us know what was great about this course and anything we can improve