SABERTOOTH cuDNN Tutorial

Complexity:	Low
Payloads:	`SABERTOOTH`
Windows:	`PAYLOAD_SABERTOOTH`

This tutorial demonstrates CuDNN v7.5.0 by running the samples that come shipped with it.

Prerequisites

All tutorials require the steps outlined in the Getting Started Guide.

Overview

The tutorial comes with 2 scripts:

cudnn_demo - runs in-orbit on the SABERTOOTH to demonstrate cuDNN running
deploy - run by the user on the ground to upload cudnn_demo and schedule it to execute in a PAYLOAD_SABERTOOTH window

cudnn_demo Script

cudnn_demo builds & runs the following cuDNN samples:

mnistCUDNN
RNN

First, the CUDA compiler is checked for availability:

$ /usr/local/cuda/bin/nvcc -V

Next, the cuDNN samples source is copied to ~/ as the system location is not writable:

$ cp -a /usr/src/cudnn_samples_v7 ~/

Finally some of the CUDA samples are built & run. Here are the commands for building & running mnistCUDNN:

$ cd ~/cudnn_samples_v7/mnistCUDNN
$ make
$ ./mnistCUDNN

Example Output

Linking agains cublasLt = false
CUDA VERSION: 10000
TARGET ARCH: aarch64
HOST_ARCH: aarch64
TARGET OS: linux
SMS: 30 35 50 53 60 61 62 70 72 75
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -IFreeImage/include  -LFreeImage/lib/linux/aarch64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
FreeImage/lib/linux/aarch64/libfreeimage.a(strenc.o): In function `StrIOEncInit':
strenc.c:(.text+0x1294): warning: the use of `tmpnam' is dangerous, better use `mkstemp'

cudnnGetVersion() : 7500 , CUDNN_VERSION from cudnn.h : 7500 (7.5.0)
Host compiler version : GCC 7.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms  1  Capabilities 5.3, SmClock 921.6 Mhz, MemSize (Mb) 1980, MemClock 12.8 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.251667 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.382813 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 2.635729 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 12.472500 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 12.811354 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.171354 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.206354 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.319636 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 2.538021 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 12.519062 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Upload & Deploy

Run deploy. The script uploads cudnn_demo and schedules it's execution in 24 hours.

$ ./deploy "[YOUR_AUTH_TOKEN]" [YOUR_SAT_ID]

Review

Once the window has completed and enough time has passed for the log to download, it can be reviewed in AWS S3 (see Hello World tutorial).

Next Steps

This is the last tutorial. Next step is to better understand the development environments available.