AI is boiling our oceans

Jun 28, 2023

I’d like to talk about the environmental impact of training AI models on accelerator hardware.

This is interesting to me because of:

the power hungry hardware that is needed for matrix operations
the long, sustained batch nature of a training workflow
the scale of these paralellised training runs

Major cloud providers have already pledged to be Carbon XYZ by 20XX, why are we still talking about this?

I’ll refer you to a blog post that I’ll be writing on these three bad phrases of:

Carbon neutral
Carbon negative
Net Zero

They aren’t always what you think they mean, keyword: offsetting -> LINK.

POWER: why is this news?

Yes, hyperscalers like GCP, AWS and Azure have always been massive consumers of electricity. However, these new workloads have orders of magnitudes more requirements for power.

Accelerator hardware, such as GPUs, TPUs (Tensor Processing Units), and FPGAs (Field-Programmable Gate Arrays), are power-hungry devices. They are designed to perform complex computations rapidly, which requires a substantial amount of electricity. Training large AI models, especially deep neural networks, can consume a significant amount of energy, contributing to increased carbon emissions and energy consumption.

Accelerator go brr

AI model training typically involves running computations over extended periods, often taking days, weeks, or even months to complete (GPT3 took 46 days on a large cluster of V100s). It would take 355 years to train GPT-3 on a single NVIDIA Tesla V100 GPU. Using 1,024x A100 GPUs, researchers calculated that OpenAI could have trained GPT-3 in as little as 34 days.

Let us consider the GPT-3 model with 𝑃 =175 billion parameters as an example. This model was trained on 𝑇 = 300 billion tokens. On 𝑛 = 1024 A100 GPUs using batch-size 1536, we achieve 𝑋 = 140 teraFLOP/s per GPU. As a result, the time required to train this model is 34 days.

Unlike traditional computing workloads that may have intermittent usage patterns, AI training workloads are sustained and require constant power supply. This prolonged energy consumption can have a cumulative environmental impact, especially when multiplied across numerous training runs.

Scale of Parallelization

AI training workloads are heavily parallelized to take advantage of the computational power of accelerator hardware. This parallelization allows for the simultaneous processing of large amounts of data, enabling faster training times.

To train the larger models without running out of memory, the OpenAI team uses a mixture of model parallelism within each matrix multiply and model parallelism across the layers of the network. All models were trained on V100 GPU’s on the part of a high-bandwidth cluster provided by Microsoft

However, this scale of parallelization also translates to increased power requirements. Running multiple GPUs or other accelerator devices in parallel to train models can significantly increase the energy consumption of AI workflows.

Conclusion

While major cloud providers have made commitments to become carbon neutral, carbon negative, or achieve net-zero emissions, it’s essential to understand that these pledges have varying interpretations and timelines. Additionally, achieving these goals involves complex strategies, including renewable energy procurement, energy efficiency improvements, and carbon offsetting. However, the scale and growth of AI workloads pose a challenge in achieving these environmental targets, as the demand for computational resources continues to increase.