TinyML and Efficient Deep Learning Computing, and Why I’m Very Excited Right Now



It’s been a hot minute since I last posted, so here’s a little update!
This spring, I’m taking a really exciting course on TinyML and Efficient Deep Learning Computing at NUST, and I’m honestly very excited about it! From what I can tell, this might be the only place in Pakistan offering this exact course and curriculum, which is pretty cool considering similar versions are taught at places like MIT and Harvard.


The course focuses on two closely related (and very hot) areas:

TinyML and Efficient Deep Learning Computing

So… what even is TinyML?

Most modern deep neural networks are massively overparameterized, with too many weights, huge model sizes, and a lot of power spent on redundant computation. That’s fine if you’ve got your hands on the latest GPUs! But not so great if you’re working with memory and battery constrained devices.

This is where TinyML comes in. TinyML is all about:

  • Reducing model size
  • Achieving faster computation
  • Making the models less power hungry!
  • Making models deployable on ultra-low-power hardware, like MCUs

In simple (but slightly distorted) terms:

Model big. Memory small?

Squeeze model. Fit on MCU. Done.


Here's a GIF that sums it up!


TinyML will be especially relevant for IoT and robotics, where real-time inference and ultra-low battery consumption really 
matter. Instead of sending data to the cloud and waiting for a response, computation happens at the edge. The intelligence lives on the device itself!

Okay, then what’s Efficient Deep Learning Computing?

Efficient Deep Learning Computing is the bigger picture. The goal is to maintain the same (or nearly the same) accuracy while using:

  • Less memory
  • Less energy
  • Less computation
  • Lower latency
  • Ideally, cheaper hardware

And here’s the important part:
Efficient deep learning considers:

  • Optimization methods like pruning and quantization
  • Model Architecture 
  • Hardware design

Because fun fact: not all compression methods are hardware-friendly. Some types of pruning, for example, make it harder to run efficiently on GPUs or CPUs.

So naturally, this raises a big question: If we change how models are designed…
Shouldn’t we also change how hardware is designed?

But… why now?

First of all, GPUs are expensive. Like, painfully expensive! 😭
We can’t all just throw NVIDIA B200s at our problems and call it a day. Not everyone gets their hands on GPUs that are basically computing steroids. Most of us are out here training models on the protein powder equivalent of compute: free tiers, a lot of patience and batch sizes of … wait for it… 1.

At the same time, Moore’s Law is slowing down and simply waiting for “next year’s hardware” is no longer a real strategy.

So for the foreseeable future, we’re stuck with a reality where: 

  • Compute is expensive
  • Energy is limited (on boards) 
  • Hardware improvements are only incremental, not exponential

That’s why efficient deep learning and TinyML matter right now.

It’s only Week 4😅

At this point, it’s still way too early for me to talk deeply about the how of TinyML and Efficient Deep Learning. It’s only Week 4 of the semester after all. But I’m genuinely excited to see where this course goes and what we’ll learn as it progresses.

While reading up on the topic, I came across Professor Song Han, who is widely credited with pioneering the field of TinyML and efficient deep learning computing. Naturally, I was curious and skimmed parts of his 2017 PhD thesis.

Imagine inventing an entire research field as your PhD thesis. That’s the actual stuff of dreams!

Also, a small but iconic detail, Professor Song Han thanked NVIDIA's CEO, Jensen Huang in the acknowledgments for “the generous GPU support.” Which honestly sounds extremely cool. One day, may all of us AI researchers realize the dream of having all the “generous GPU support” we could ever want. 😁

Until then… free tiers of Google Colab and Kaggle Compute it is ! 💪😅

I genuinely can’t wait to see how these Deep Compression methods for Deep Neural Networks evolve. Hopefully, it will one day help push us closer to the dream of running powerful deep learning models on cheap MCUs.

Also, full transparency:
I may or may not have written this post just to demonstrate my exemplary GIF-making skills.
😁

 


Comments

Popular posts from this blog

Fixing the "A software problem has caused Meshmixer to close unexpectedly" Problem

An attempt at creating "ASCII" Art but with Unicode characters