IoT 📖 22 min read
📅 Published: 🔄 Updated:

TinyML: Machine Learning on Microcontrollers

Most people haven't heard of TinyML. The ones who have usually dismiss it — "what can you do with 264KB of RAM?" Turns out, more than you'd expect. I was skeptical too until I got a keyword spotting model running on an ESP32. The model was 18KB. Inference took 19ms. It recognized "yes" and "no" spoken at normal volume across a room. On a $4 chip with no internet connection. That's when I stopped dismissing it and started paying attention.

🤯 The demo that convinced me:

⚙️
I strapped an MPU6050 accelerometer to a workshop exhaust fan — the kind with a 12-inch blade that runs 18 hours a day. Trained an anomaly detection model on two days of "normal" vibration data, quantized it to int8, deployed it to an ESP32. Total model size: 14KB. Three weeks later, the model flagged an anomaly at 2 AM. The vibration spectrum had shifted — a subtle low-frequency component that wasn't there before. I checked the fan the next morning and found the front bearing was developing play. It would have seized within a week. A $4 microcontroller caught a mechanical failure that I couldn't hear or feel by hand. That's when I went from "interesting toy" to "this actually belongs in production."

📌 What actually works on an ESP32:

⏱️ 22 min read
  • Keyword spotting — small vocabulary wake words ("yes", "no", "hey device") run comfortably in under 20ms per inference. This is the most mature TinyML use case and the easiest win.
  • Gesture recognition — accelerometer-based classifiers for 3-5 distinct gestures fit in 15-50KB. Accuracy above 90% is realistic with decent training data.
  • Anomaly detection — train on "normal" sensor patterns, flag deviations. Works surprisingly well for vibration monitoring and predictive maintenance.

🚫 What doesn't work: Image classification above 96x96 resolution. The memory just isn't there. Person detection at 96x96 grayscale barely fits. Anything with color channels at higher resolution will blow past the ESP32's RAM before you even load the model.

TinyML is machine learning designed for microcontrollers — devices with kilobytes of RAM, not gigabytes. We're talking:

  • ESP32: 520KB RAM, 240MHz, about $5
  • Arduino Nano 33 BLE Sense: 256KB RAM, built-in sensors, about $35
  • Raspberry Pi Pico: 264KB RAM, $4

These chips run on batteries for months and cost less than a coffee. The tradeoff is that you have almost no memory to work with, so the models have to be tiny.

Typically 10KB to 500KB. They're quantized — meaning the weights get converted from 32-bit floats to 8-bit integers — to fit in memory and run fast enough to be useful. You're not running anything close to an LLM on these. But you can run:

  • Gesture recognition (accelerometer patterns)
  • Sound classification (glass breaking, baby crying, doorbell)
  • Keyword spotting ("Hey Arduino")
  • Anomaly detection (vibration patterns indicating machine failure)
  • Simple image classification (person vs. No person)

Your First TinyML Project: Magic Wand

Let's build a gesture recognition system. Wave the device in specific patterns, and it recognizes which gesture you made. This is one of the canonical TinyML demos, but I'll show you how to customize it.

Hardware You'll Need

  • Arduino Nano 33 BLE Sense (has built-in accelerometer) OR
  • ESP32 + MPU6050 accelerometer module (~$3)

I'll use the Arduino Nano 33 BLE Sense because the accelerometer is already on board. Less wiring, fewer headaches.

The Workflow

TinyML Pipeline
1. Collect Data
 Wave device in pattern, record accelerometer data
 ↓
2. Train Model (on your PC)
 Use TensorFlow/Edge Impulse to train classifier
 ↓
3. Convert Model
 Quantize and convert to TensorFlow Lite format
 ↓
4. Deploy to Device
 Flash the model onto the microcontroller
 ↓
5. Run Inference
 Device classifies gestures in real-time (15ms)

Step 1: Set Up Edge Impulse

Edge Impulse is the easiest way to get started with TinyML. It handles data collection, training, and deployment. Free for personal projects.

  1. Create account at edgeimpulse.com
  2. Install Edge Impulse CLI:
Bash
# Install Node.js first if you don't have it
npm install -g edge-impulse-cli
  1. Connect your Arduino Nano 33 BLE Sense
  2. Flash the Edge Impulse firmware:
Bash
edge-impulse-daemon

📝 Quick tip: If edge-impulse-daemon can't find your board, make sure you have the right USB drivers installed. On Windows, the CP2102 or CH340 driver is usually what's missing.

Follow the prompts to connect your device to your Edge Impulse project.

Step 2: Collect Training Data

Terminal: Package installation
Terminal: Package installation

We'll train the model to recognize three gestures:

  • Circle: Draw a circle in the air
  • Shake: Shake side-to-side
  • Idle: No movement (important for knowing when NOT to trigger)

In the Edge Impulse dashboard:

  1. Go to Data acquisition
  2. Select your device
  3. Sensor: Accelerometer
  4. Label: "circle"
  5. Sample length: 2000ms
  6. Click Start sampling
  7. Draw circles in the air while holding the device

Repeat for each gesture. Collect at least 2 minutes of data per class. More is better.

Pro tip from experience: Training the model is the easy part. Getting it to fit on the ESP32 is where it gets interesting. My first attempt at a gesture classifier was 400KB — way too large. After quantization (float32 → int8) it shrank to 95KB and accuracy only dropped 2%. Always quantize.

Step 3: Design the Impulse

a "impulse" is Edge Impulse's term for the processing pipeline:

  1. Go to Create impulse
  2. Add a processing block: Spectral Analysis (works great for motion data)
  3. Add a learning block: Classification
  4. Set window size: 2000ms
  5. Window increase: 500ms (overlapping windows)
  6. Save impulse

Generate Features

  1. Go to Spectral features
  2. Click Generate features
  3. Wait for processing

You'll see a visualization where similar gestures cluster together. If your circles and shakes are mixing, you need more diverse training data.

Step 4: Train the Model

  1. Go to Classifier
  2. Neural network settings:
    • Number of training cycles: 100
    • Learning rate: 0.0005
    • Architecture: Default (2 dense layers)
  3. Click Start training

Training takes 1-5 minutes. You should see accuracy above 90%. If it's lower:

  • Collect more diverse data
  • Make sure gestures are distinct enough
  • Try a different window size

Step 5: Deploy to Device

  1. Go to Deployment
  2. Select Arduino library
  3. Click Build
  4. Download the.zip file

In Arduino IDE:

  1. Sketch → Include Library → Add.ZIP Library
  2. Select the downloaded file
  3. File → Examples → [Your Project Name] → nano_ble33_sense_accelerometer
  4. Upload to your board

Step 6: Test It

Open Serial Monitor (115200 baud). Wave the device in different patterns. You'll see output like:

Serial Output
Predictions:
 circle: 0.92
 shake: 0.05
 idle: 0.03

Predictions:
 circle: 0.03
 shake: 0.89
 idle: 0.08

The inference time will be displayed too — typically 10-30ms. That's real-time classification on a tiny chip.

Project 2: Glass Break Detector

Sound classification is another sweet spot for TinyML. Here's a practical security application.

Hardware

  • Arduino Nano 33 BLE Sense (has built-in microphone)
  • Or ESP32 + INMP441 I2S microphone

Training Data

You need audio samples of:

  • Glass breaking: YouTube videos work (search "glass breaking sound effect")
  • Background noise: Record your actual environment
  • Similar sounds: Plates clattering, windows closing, dishes

The "similar sounds" category is important. You don't want every loud noise to trigger an alert. In my first version, clapping triggered the glass break detection. Adding a "not glass" category with clapping, dropping things, and loud music fixed it.

Processing Block

For audio, use MFE (Mel Frequency Energy) or Spectrogram instead of Spectral Analysis. These are designed for audio data.

Deployment

Same process as before. Once deployed, the device listens continuously. When glass breaks, it triggers an action — LED flash, buzzer, WiFi notification, whatever you need.

Power consumption while listening: about 30mA. That's months on an USB power bank.

Going Deeper: Custom TensorFlow Lite

Edge Impulse is great for getting started, but sometimes you need more control. Here's how to build TinyML models from scratch using TensorFlow Lite Micro directly.

Fair warning: the TFLite Micro toolchain is not fun. The API changes between versions with minimal documentation. Error messages are useless — you'll get a generic "Invoke failed" with no indication of what went wrong. Half the time the issue is a tensor arena that's 200 bytes too small, but the library won't tell you that. You just have to guess and recompile. I've spent more time fighting the build system than actually writing model code. It works, but it's the kind of experience that makes you appreciate what Edge Impulse abstracts away.

Train in Python

Python
import tensorflow as tf
import numpy as np

# Your training data (accelerometer readings)
X_train = np.load('gestures_x.npy') # Shape: (samples, timesteps, 3)
y_train = np.load('gestures_y.npy') # Shape: (samples,)

# Build a small model
model = tf.keras.Sequential([
 tf.keras.layers.InputLayer(input_shape=(128, 3)),
 tf.keras.layers.Conv1D(8, 3, activation='relu'),
 tf.keras.layers.MaxPooling1D(2),
 tf.keras.layers.Conv1D(16, 3, activation='relu'),
 tf.keras.layers.GlobalAveragePooling1D(),
 tf.keras.layers.Dense(16, activation='relu'),
 tf.keras.layers.Dense(3, activation='softmax') # 3 classes
])

model.compile(
 optimizer='adam',
 loss='sparse_categorical_crossentropy',
 metrics=['accuracy']
)

model.fit(X_train, y_train, epochs=50, validation_split=0.2)

# Check model size
model.summary()

Convert to TensorFlow Lite

Python
# Convert to TFLite with quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8] # Full integer quantization

# Representative dataset for calibration
def representative_dataset():
 for i in range(100):
 yield [X_train[i:i+1].astype(np.float32)]

converter.representative_dataset = representative_dataset
tflite_model = converter.convert()

# Save the model
with open('gesture_model.tflite', 'wb') as f:
 f.write(tflite_model)

print(f"Model size: {len(tflite_model) / 1024:.1f} KB")

A well-designed model should be 10-50KB. If it's larger, reduce layers or neurons.

Convert to C Array

Bash
xxd -i gesture_model.tflite > gesture_model.h

This creates a header file you can include in your Arduino sketch.

Run on ESP32

C++ (Arduino)
#include <TensorFlowLite_ESP32.h>
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "gesture_model.h" // Your converted model

// Allocate memory for the model
constexpr int kTensorArenaSize = 10 * 1024;
uint8_t tensor_arena[kTensorArenaSize];

tflite::MicroInterpreter* interpreter;
TfLiteTensor* input;
TfLiteTensor* output;

void setup() {
 Serial.begin(115200);
 
 // Set up the model
 static tflite::AllOpsResolver resolver;
 static tflite::MicroInterpreter static_interpreter(
 tflite::GetModel(gesture_model_tflite),
 resolver,
 tensor_arena,
 kTensorArenaSize
 );
 interpreter = &static_interpreter;
 interpreter->AllocateTensors();
 
 input = interpreter->input(0);
 output = interpreter->output(0);
}

void loop() {
 // Read accelerometer data into input tensor
 // (128 samples × 3 axes)
 for (int i = 0; i < 128; i++) {
 // Read from accelerometer...
 input->data.int8[i * 3 + 0] = (int8_t)(accel_x * 127);
 input->data.int8[i * 3 + 1] = (int8_t)(accel_y * 127);
 input->data.int8[i * 3 + 2] = (int8_t)(accel_z * 127);
 }
 
 // Run inference
 unsigned long start = micros();
 interpreter->Invoke();
 unsigned long duration = micros() - start;
 
 // Get predictions
 int8_t circle = output->data.int8[0];
 int8_t shake = output->data.int8[1];
 int8_t idle = output->data.int8[2];
 
 Serial.printf("Inference: %lu us | circle: %d, shake: %d, idle: %d\n",
 duration, circle, shake, idle);
 
 delay(100);
}

Real-World Applications

Here's what I've built with TinyML:

Predictive Maintenance

Attached an accelerometer to a ventilation fan. Trained a model on "normal" vibration patterns. When the bearing started failing, the vibration pattern changed and the model detected an anomaly three days before the fan would have seized.

Pet Activity Monitor

A friend built this one: collar-mounted ESP32 with accelerometer. Classifies walking, running, sleeping, eating, and scratching. Battery life: about 2 weeks. Syncs to phone via BLE when in range.

Workshop Safety

Sound detection for table saw startup. Automatically turns on dust collection when the saw runs. Latency: 50ms. Much faster than the mechanical vibration switches I tried before.

Limitations and Gotchas

TinyML isn't magic. Here's what has actually bitten me:

  • Memory limits are real: A speech recognition model that works on a Raspberry Pi won't fit on an ESP32. I burned two full weekends trying to squeeze a model that was fundamentally too large. You have to design for the constraints from the start — pick your architecture with the target hardware in mind, keep layer counts low, check memory usage after every change. Start small. Grow the model only until you hit your accuracy target or run out of RAM.
  • Overfitting is easy: Collect diverse data or the model just memorizes your specific room and your specific voice. I had a keyword model that worked perfectly in my office and failed completely in my kitchen. Same words, different acoustics.
  • Quantization destroys some models: The standard claim is "1-3% accuracy loss." That's true for well-behaved models. But I've had models where int8 quantization killed accuracy by 15% because the weight distributions didn't quantize cleanly. If your model has a few very large weights mixed with many small ones, quantization will clip the outliers and the model falls apart. Test after quantization, not before.
  • Audio is hard: Microphone quality varies wildly between modules. A model trained on an INMP441 MEMS mic failed completely when I swapped to a cheap electret. Had to retrain from scratch.
  • Temperature affects sensors: Accelerometers drift with temperature. If your device runs in a space with temperature swings, either calibrate periodically or feed temperature as an additional input feature.

What You Can Build

The overlap between "problems worth solving" and "problems a microcontroller can solve" is bigger than most people think:

  • Keyword spotting: Wake words without a cloud connection. No API calls, no latency, no monthly bills from AWS.
  • Anomaly detection: Machines that know when they sound wrong. This is probably the most underrated application — stick an accelerometer on anything with moving parts.
  • Gesture control: Wave to control lights, double-tap to dismiss notifications. Feels gimmicky until you use it in a workshop with dirty hands.
  • Occupancy sensing: Is this room occupied? Temperature, sound, and CO2 patterns can answer that without a camera.
  • Quality control: Detect defective products by sound or vibration on a factory line. This is where the industrial money is.

All of this runs on battery-powered hardware that costs less than a sandwich. No subscription fees. No data leaving the device.

Resources

  • Edge Impulse: Best getting-started platform
  • TensorFlow Lite Micro: Official Google library
  • Pete Warden's TinyML book: Thorough and practical
  • Harvard TinyML EdX course: Free and excellent

Where This Is Going

The ESP32-S3 already has vector instructions that speed up inference noticeably. Espressif is clearly investing in ML-adjacent silicon features. The tooling is getting less terrible — Edge Impulse didn't exist three years ago and now it handles 80% of use cases without touching TFLite Micro directly. Model architectures are getting smaller and smarter. MobileNet V3 fits in places MobileNet V1 never could.

In 2-3 years, every ESP32 project will include some form of local inference. A sensor node that just reads and transmits raw data will feel as outdated as a website without HTTPS. Right now the experience is clunky — the toolchain fights you, quantization requires trial and error, and debugging on-device is painful. But the trajectory is obvious. The hardware is cheap enough, the models are small enough, and the use cases are real enough. TinyML just hasn't had its breakout moment yet.

💬 Comments