TLDR: Using GPUs for AI

This post is not meant to be an in-depth guide, but rather a collection of common questions and resources on using GPUs for AI.

I'll add more points as I come across them, so feel free to check back later.

Table of Contents

Why use a GPU instead of a CPU?
Can I use a CPU for AI?
Which GPU is best for AI training / inference?
Is more VRAM (GPU memory) better for AI?
Does software matter when it comes to GPUs?
What’s the difference between CUDA Cores and Tensor Cores?
Which GPU should I pick?

Why use a GPU instead of a CPU?

A modern CPU (Central Processing Unit) can be pretty fast and will handle most tasks you throw at it. It has a few cores that can handle multiple threads of execution, which is great for general-purpose computing, like running your operating system, web browser, and applications.

A GPU (Graphics Processing Unit) on the other hand has been optimized to perform a subset of tasks more efficiently. In particular, it can apply the same operation to a large batch of data in parallel.

AI training and inference often involves performing many operations over matrices, which is exactly where GPUs shine. They can divide the workload into smaller tasks and run them all simultaneously across thousands of cores, making them much faster than CPUs for these specific tasks.

Can I use a CPU for AI?

Yes, there's nothing stopping you from using a CPU for AI tasks. In fact, many AI frameworks (like TensorFlow and PyTorch) can run on CPUs.

In particular, you might get modest results with Apple Silicon chips like the M4. These have dedicated "neural" cores and unified memory, which can help with AI tasks.

However, the performance will be significantly slower compared to using a dedicated GPU optimized for AI workloads. This is especially true for large models where having more cores, dedicated memory and bandwidth makes a big difference.

Which GPU is best for AI training / inference?

The answer depends on many factors including your specific workload. But generally, you should consider:

Tensor Cores
Memory capacity and bandwidth
Cache hierarchy
FLOPS

For an in-depth guide on picking GPUs for AI/ML, check out this article by Tim Dettmers.

Is more VRAM (GPU memory) better for AI?

AI models and training datasets can be quite large. VRAM is high-speed memory placed directly on the GPU. It provides rapid access to whatever data you load into it.

Having more VRAM helps if you have a large model, as it allows the GPU to store and access more of the model's parameters quickly. This reduces the need to fetch data from slower system memory.

Does software matter when it comes to GPUs?

Yes, software support is important too. GPUs are not just about hardware. They need to be supported by libraries and frameworks so that you can actually use them efficiently.

For example, NVIDIA provides a suite of libraries (like CUDA, cuDNN, and TensorRT) that are optimized for their GPUs, while AMD has ROCm for their GPUs. AI frameworks like TensorFlow or PyTorch interact with them to access the underlying GPU resources.

What’s the difference between CUDA Cores and Tensor Cores?

Both are processing units on NVIDIA GPUs but have different purposes.

CUDA Cores can handle a wider range of math operations (eg., 3D rendering and physics simulations). Tensor Cores, however, are specialized in the kind of math that AI models need (like matrix multiplication).

If you're looking for the AMD equivalents, CUDA Cores are most similar to "Stream Processors", and Tensor Cores are similar to "Matrix Cores". But keep in mind that they're not 1:1 comparable (eg. 1 CUDA Core does not equal 1 Stream Processor, as they operate differently).

Which GPU should I pick?

As a starting point, here's my broad attempt at categorizing GPUs for AI:

Category	Examples	Best for	Performance profile
High-end	A100, H100, H200, GH200, MI250, MI300X	Training large language models, high-performance computing, large-scale inference	Highest memory capacity, memory bandwidth, and compute performance
Mid-range	A40, A30, A6000, V100, L4, T4	Medium-sized model training, inference, and fine-tuning tasks	Higher memory and compute, cost-effective for many AI tasks
Budget	K80, M60, P100, A4000	Small model training, experiments, low-cost inference	Moderate memory, older architectures (K80, M60), lower cost
Consumer	RTX3070, RTX4090, RTX4000	Gaming, content creation, and mid-range AI tasks	High compute for gaming and development, Tensor Cores make RTX cards suitable for entry-level AI inference

Also, here's a list of GPU prices across cloud providers to help you compare options.

Runpod

Sponsor

Spin up a GPU in seconds across 30+ regions

Managed containers with monitoring built-in

Autoscales from 0 to thousands of containers

Learn more →