About the NVIDIA H200 141GB
The NVIDIA H200 is the memory-expanded successor to the H100, released in late 2024. Same compute as H100 but with 141 GB of faster HBM3e memory (vs H100's 80GB HBM3). The extra memory is the killer feature — Llama 3 70B inference fits in a single H200 without sharding, dramatically simplifying serving infrastructure.
Specs
Memory
141 GB HBM3e
Bandwidth
4.8 TB/s
Tensor cores
528 (4th gen)
FP16 (peak)
1,979 TFLOPS
Architecture
Hopper
Released
Q4 2024
What's it good for?
- Single-GPU inference for 70B+ models — fits Llama 3 70B without tensor parallelism.
- Long-context inference — large KV cache fits in HBM3e memory.
- Mixture-of-Experts training — MoE models with large expert routing benefit from H200's bandwidth.
- Memory-bound workloads — anything that hit OOM on H100 is the H200's sweet spot.
When to use H200 vs alternatives
- H200 vs H100: H200 has 76% more memory and 60% more bandwidth. Same compute. Worth the ~20% price premium if your model is memory-bound.
- H200 vs B200: B200 is the next-gen Blackwell card with 2.5× compute and 192GB. H200 is cheaper and adequate for most workloads; B200 is overkill unless you're training frontier models.
- H200 vs 2× H100: 2× H100 has 160GB total but requires NVLink for fast tensor parallelism. Single H200 is simpler and often cheaper.