We Have Finally Found A Solution To An Extremely Energy Efficient AI
A deep dive into the 'L-Mul' or Linear complexity multiplication algorithm that makes our existing AI models faster and more energy-efficient than ever before.
Running AI models is expensive and costs the environment.
The average electricity consumption of running ChatGPT in early 2023 was 564 MWh each day.
This is equivalent to the total daily electricity usage of 18,000 families in the United States.
It is also estimated that, in the worst-case scenario, Google’s AI service could consume as much electricity as Ireland's.
This is quite a lot! But why does AI need so much energy?
Neural network internals work with floating point parameters, which involve high-dimensional tension multiplications, element-wise multiplications, and linear transformations.
And these operations are energy-expensive.
If we could tweak the amount of computation needed with these operations in these neural networks, we could save a lot of energy and speed them up.
Amazingly, researchers of a recent pre-print published in ArXiv have proposed to solve exactly this.
They created an algorithm called ‘L-Mul’, or the linear complexity multiplication algorithm, which can approximate floating point multiplications with integer addition operations.
This algorithm can be integrated into existing neural networks without any need for fine-tuning.
This change phenomenally leads to a 95% reduction in energy consumption for element-wise floating point tensor multiplications and up to 80% energy savings for dot product computations.
Here’s a story in which we deep-dive into this algorithm and discuss how it makes our existing AI models faster and more energy-efficient than ever before.
Let’s go!
Let’s First Talk Numbers
Neural networks use floating point tensors to represent their inputs, outputs and parameters.
The 32-bit (default for PyTorch) and 16-bit FP tensors (FP32
and FP16
) are commonly used for this purpose.
The IEEE 754 standard defines the technical standards for arithmetic on these floating point tensors.
Next, let’s discuss operations.
The computational complexity of integer addition is linear, i.e. O(n)
, where n
is the number of bits.
But floating point multiplication requires exponent addition (O(e)
), mantissa multiplication (O(m²)
) and rounding, where e
and m
represents the number of bits for the exponent and mantissa.
There’s an energy cost to this.
Floating point operations are more expensive than integer operations.
Multiplying floating point numbers is more costly than adding them.
Keep reading with a 7-day free trial
Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.