I recently read with great interest the blog post “My reflections on teaching GPU Programming and Architecture http://bit.ly/k5XDZQ by @pjcozzi. The excellent post inspired me to write my own version based on my CS668 Parallel Computing class of Spring 2011 at the University of Cincinnati. This year I have redesigned the class around the revolutionary architecture of the GPU (graphics processing unit) for executing parallel programs.
GPU Programming in Large Class Format
My CS668 Parallel Computing course, unlike the graduate class taught by @pjcozzi, is primarily an undergraduate CS elective taught in the setting of a large class. This term’s CS668 class has 49 students enrolled, and the students span computing majors in CS, CE, and EE.
I really enjoy the format of the large class, as it provides a setting of a more interactive learning environment, and it provides students with excellent networking and group learning opportunities. The course is designed so that students work throughout the term in small groups doing lab homework projects and design and execute a large-scale final project. Students put many hours of group effort into their final project designs, final presentations and final reports. At the end of this post I will display the schedule of all the project presentations, so that the reader can note the diversity and creativity of the project groups.
Curating Great Numerical Ideas
A number of Computer Science Programs run a “Great Ideas”-style course as a way to introduce Freshman to the discipline of Computer Science. I have considered this in the past, but often shied away, since such great ideas usually require quite a bit of mathematical sophistication before students can really appreciate the depth and beauty of the results.
Near the beginning of the term, I made the decision that my CS668 Parallel Computing class would be an excellent vehicle to expose students to a variety of such “great ideas”. My goal was to present these ideas so that advanced undergraduates could grasp them with more appreciation.
One critical component to making this a success is the CUDA SDK Demo Programs that NVidia has made freely available. The SDK contains a large set of demo programs that students can play and interact with, and use as templates for their own projects. This is a wonderful way for students to work and experience the numerical ideas talked about in lecture.
Idea1. Modeling Vibration Waves
In Ian Stewart’s Book Nature’s Numbers: The Unreal Reality Of Mathematics (Science Masters Series) there is an essay entitled “Violins to Videos” which recalls the history of the development of the wave equation and the significance of this type of modeling on the history of science and engineering. In lecture we develop some simple parallel codes for modeling the position in space of particles on a vibrating violin string.
Idea 2. The Heat Equation
I really like to use the heat equation as an example numerical problem. The reasoning of its correctness is beautifully simple, yet the math has the feel of sophistication in the expression of the Poisson PDE. And as big bonus, the code in the SDK produces lovely graphics to experiment with such as the one at the top of this post.
Idea3. Monte Carlo Method of Statistical Sampling
High-performance GPU-style computing can really unleash very simple methods for solving complex math problems, and are very useful when estimated values are sufficient. The idea is to use the Monte Carlo Method of Statistical Sampling invented and popularized by Stanislaw Ulam, John Von Neumann, and Nicholas Metropolis. The story I have heard is that Ulam first came up with the idea when challenged to compute the probability of winning a game of solitaire.
The CUDA SDK contains a few demos demonstrating the application of the idea of Monte Carlo Sampling from the point of view of financial modeling of stock prices, termed the Black-Scholes Model (see for example, Basic Black-Scholes: Option Pricing and Trading). This is a simple random-walk type of model in which the Monte Carlo sampling can be seen to produce an estimate that is clearly useful:
1. Collect standard normal samples
2. Use GPU to compute expected value of option payoff
Thanks to @theg4cubepro for pointing me toward this algorithm video.
Idea 4. Fractals and Chaos
No surprise here that the SDK includes a version of Mandelbrot set generation. Although, unfortunately, the version that is distributed is a bit heavy on bells and whistles. In lecture I discussed the relationship between Chaotic functions and fractals, and we talked about the use of such mathematical modeling in bio-medical applications. I referred back to an old blog post in which students explored visualizations of the orbit diagram of the logistic function and Robert May’s discovery of its chaotic nature.
There is a growing list of textbooks available to students and teachers in GPU-style parallel computing. There are two valuable textbooks that I considered for this course, although in the end I did not require the students to purchase of either one. The first one is a great resource for students just starting out with GPUs and CUDA programming. The second one is more advanced architecture-oriented.
CUDA by Example: An Introduction to General-Purpose GPU Programming contains many simple programs that are freely available, which demostrate the power and versatility of CUDA-style programs. The code sample seem to all compile easily and are very cleanly written. A very nice introduction to CUDA programming.
Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series) is more of a textbook. This book gives a more in depth treatment of the architectural design issues. However, there are not many good example programs. The book, although well written, did not suit my style or strengths, as it is not strong on parallel programming design.
Student Groups Final Presentation Schedule
I hope that students will be able to have their presentations and demos recorded for podcasts. I will try to make links to them available on this blog.
THURSDAY, MAY 26
Eric Roth, Richard Klafter, Eric Swanson
2. RAY TRACING
Jun Zhou, Sam McCoucha, Daniel Owusu-Kesseh
3. BRUTE FORCE CRYPTANALYSIS
Andrew Tefft, Adam Brandes
4. COMPUTING PI
TUESDAY, MAY 31
1. MOLECULAR DYNAMICS
Jiayang Guo, Matthew Jackson, Yuan Jiang, Hasso Pape
2. BALL COLLISION SIMULATION
Raymond Cox, Keith Wentzel, Chelsea Chase, Ben List
3. DEFORMABLE IMAGE REGISTRATIONS
Lirong Tan, Minzhe Guo
4. PRIME NUMBER GENERATION FOR RSA
Nicholas Sampanis, Thomas Bachmann, Chris Romano, Christian Denholm
5. STRING MATCHING
Cheng Zhu, Qiang Han, Mingyang Wang
THURSDAY, JUNE 2
1. GAUSSIAN ELIMINATION
Ryan Anderson, Matt Fuller, Nathan Loyer, Christopher Welch
2. SOLVING SYSTEMS AX=0
Britney Bogard, Nicholas Foltz, Jorge Moscat
3. ITEMSET MINING
Dan Lautzenheiser, Nathan Sommer, Vic Halpin
4. RAINBOW TABLES
Edward Kimball, Benjamin Jones, Adam Stylinski, Kate Khazanova
5. GROESTL HASHING
Jeff Farris, Jeremy Lavergne, Alex Padgett, Jon Wedaman