TinyML and Efficient Deep Learning Computing, and Why I’m Very Excited Right Now
This spring, I’m taking a really exciting course on TinyML and Efficient Deep Learning Computing at NUST, and I’m honestly very excited about it! From what I can tell, this might be the only place in Pakistan offering this exact course and curriculum, which is pretty cool considering similar versions are taught at places like MIT and Harvard.
The course focuses on two closely related (and very hot) areas:
TinyML and Efficient Deep Learning Computing
So… what even is TinyML?
Most modern deep neural networks are massively
overparameterized, with too many weights, huge model sizes, and a lot of power
spent on redundant computation. That’s fine if you’ve got your
hands on the latest GPUs! But not so great if you’re working with memory and
battery constrained devices.
This is where TinyML comes in. TinyML is all about:
- Reducing model size
- Achieving faster computation
- Making the models less power hungry!
- Making models deployable on ultra-low-power hardware, like MCUs
In simple (but slightly distorted) terms:
Model big. Memory small?
Squeeze model. Fit on MCU. Done.
Here's a GIF that sums it up!
Okay, then what’s Efficient Deep Learning Computing?
Efficient Deep Learning Computing is the bigger picture. The goal is to maintain the same (or nearly the same) accuracy while using:
- Less memory
- Less energy
- Less computation
- Lower latency
- Ideally, cheaper hardware
And here’s the important part:
Efficient deep learning considers:
- Optimization methods like pruning and quantization
- Model Architecture
- Hardware design
Because fun fact: not all compression methods
are hardware-friendly. Some types of pruning, for example, make it harder to run efficiently on GPUs or CPUs.
So naturally, this raises a big question: If
we change how models are designed…
Shouldn’t we also change how hardware is designed?
But… why now?
First of all, GPUs are expensive. Like, painfully expensive! 😭
We can’t all just throw NVIDIA B200s at our problems and call it a day. Not
everyone gets their hands on GPUs that are basically computing steroids. Most of us are out here training
models on the protein powder equivalent
of compute: free tiers, a lot of patience and batch sizes of …
wait for it… 1.
At the same time, Moore’s Law is slowing down and simply waiting for “next
year’s hardware” is no longer a real strategy.
So for the foreseeable future, we’re stuck with a reality where:
- Compute is expensive
- Energy is limited (on boards)
- Hardware improvements are only incremental, not exponential
That’s why efficient deep learning and TinyML
matter right now.
It’s only Week 4😅
At this point, it’s still way too early for me
to talk deeply about the how of TinyML
and Efficient Deep Learning. It’s only Week 4 of the semester after all. But I’m genuinely excited to see where
this course goes and what we’ll learn as it progresses.
While reading up on the topic, I came across Professor Song Han, who is widely
credited with pioneering the field of TinyML and efficient deep learning computing. Naturally, I was
curious and skimmed parts of his 2017 PhD
thesis.
Imagine inventing an entire research field as your PhD thesis. That’s the actual stuff of dreams!
Also, a small but iconic detail, Professor
Song Han thanked NVIDIA's CEO, Jensen Huang in the acknowledgments for “the generous GPU support.” Which honestly
sounds extremely cool. One day, may all of us AI researchers realize the dream of having all the “generous GPU support” we could ever want. 😁
I genuinely can’t wait to see how these Deep Compression methods for Deep Neural Networks evolve. Hopefully, it will one day help push us closer to the dream of running powerful deep learning models on cheap MCUs.
Also, full transparency:
I may or may not have written this post just to demonstrate my exemplary
GIF-making skills.


Comments
Post a Comment