For running GPU workload on the EWC Kubernetes Service the two prerequisites are required:




> kubectl get pods -n gpu-operator NAME READY STATUS RESTARTS AGE gpu-feature-discovery-tpwr4 2/2 Running 0 107s gpu-operator-745ccb5b94-dzxvk 1/1 Running 0 3m19s gpu-operator-gpu-operator-node-feature-discovery-master-6fpj76g 1/1 Running 0 3m19s gpu-operator-gpu-operator-node-feature-discovery-worker-6hk95 1/1 Running 0 3m19s gpu-operator-gpu-operator-node-feature-discovery-worker-jb2v8 1/1 Running 0 3m18s nvidia-container-toolkit-daemonset-7gsz7 1/1 Running 2 (86s ago) 111s nvidia-cuda-validator-pqt4b 0/1 Completed 0 46s nvidia-dcgm-exporter-hmxx8 1/1 Running 0 108s nvidia-device-plugin-daemonset-2kxfq 2/2 Running 0 110s nvidia-device-plugin-validator-ss74n 0/1 Completed 0 29s nvidia-operator-validator-6tglx 1/1 Running 0 111s |
> cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: vector-add
spec:
restartPolicy: OnFailure
containers:
- name: vector-add
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1
EOF
> kubectl logs pod/vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done |