TinyML
- Summary
-
Discussion
- Why do we need TinyML?
- What applications does TinyML enable?
- What's the TinyML development process flow?
- What are the hardware constraints for TinyML?
- What software platforms support TinyML?
- What techniques are used to reduce model sizes in TinyML?
- What's the current performance of TinyML?
- What are some limitations/challenges of TinyML?
- Where can I learn more about TinyML?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
TinyML is simply bringing ML to the world of highly resource-constrained devices. TinyML is not a framework or a library. It's a specialization or subset of ML.
Devices are sensor/actuator nodes typical of the Internet of Things (IoT). Devices have limited compute, connectivity, power, storage, and RAM. Traditionally, devices collect data and send them to the cloud for analysis. Cloud ML engines make inferences and send commands to the devices. TinyML enables on-device inference. The challenge is to run ML models with limited resources in a dynamic environment.
While both ML and embedded devices have a long history individually, TinyML is relatively new. While TinyML has progressed a lot since 2019, there's still much research to be done. On-device training, TinyMLOps and hardware accelerators are emerging areas of research (in early 2024).
Discussion
-
Why do we need TinyML? In some use cases, we require low latency and real-time responses. With TinyML, sensor devices needn't send data to the cloud and wait for responses. On-device inference enables the system to respond faster to current environmental conditions.
Some devices have so little memory/storage that running a networking stack is not an option. Wireless connectivity isn't possible. This is also true of devices installed in remote places. Even if such a connectivity is possible, wireless communication consumes precious battery power. Many IoT devices are designed to last a few years without change of battery. In audio/video streaming applications, TinyML also saves network bandwidth.
Without network connectivity, AI/ML becomes autonomous and reliable. They enable rapid prototyping and easier integration into current systems.
Sending sensitive data to the cloud can compromise data privacy. Since TinyML processes data on the device, there's less chance of a privacy breach or data theft.
To reap these benefits, on-device ML models have to be optimized for one or more of storage, memory, energy, and processing.
-
What applications does TinyML enable? TinyML is applicable in many domains including healthcare, sports, augmented reality, agriculture, environmental monitoring, industrial IoT, and automotive. We mention a few specific examples.
Niolabs built a solution to optimize water use in farming. It combined soil moisture and sunlight data. Each device did ML inference based on its local conditions.
Using temperature, humidity and pressure, fire detection has been shown on an Arduino Nano 33 BLE board. Using Edge Impulse framework, spectral analysis and classification were applied.
In industrial systems, sensors monitor temperature, vibration and noise. With TinyML, problems can be caught and machines can be shut down immediately. This improves safety and avoids equipment damage.
In one study, TinyLSTM, a compressed version of LSTM, was used to improve the performance of a hearing aid. Likewise, TinyML has been applied towards speech recognition, speech enhancement, and denoising.
With an image sensor, person detection is possible. For human activity recognition, one study used accelerometer and gyroscope sensors. Another experiment demonstrated gesture recognition. Accelerometer data captured two common gestures in boxing: punching and flexing.
-
What's the TinyML development process flow? ML models are trained on large datasets, particularly if they're deep neural networks. Training happens on GPUs, often in the cloud. The target platform could be completely different from the processor or hardware platform on which the model was trained. Hence format conversion is needed. The converted model can then be deployed to the target.
DNN models are about 100MB and higher. Initial research effort was focused on compressing these models for TinyML. Let's consider AlexNet (2012), a deep CNN model for ImageNet image classification. It has about 60M parameters and 240MB model size. SqueezeNet (2016) is an optimized version of AlexNet. When used with a novel deep compression technique, it's only 0.42M parameters and 0.47MB model size.
The next advancement in TinyML was to automatically discover the best architecture given the training data. Via Neural Architecture Search (NAS), an optimized network topology and an inference engine are selected. MCUNet (2020) is an example. Its model for ImageNet has 1.9MB model size and needs 0.49MB RAM usage. This led to the emergence of AutoML platforms for TinyML.
-
What are the hardware constraints for TinyML? A TinyML hardware has typically less than 1MB of Flash storage and 256KB of RAM. Flash storage relates to model size that can be accommodated, typically less than a million parameters.
Clock speed is a few hundred MHz. This relates to latency, which is expected to be 300ms or less. Simple applications may achieve 1ms latency. TinyML includes 32-bit processors and even 8-bit microcontrollers. At the low end, floating point operations aren't supported, which is the case with Cortex-M0+ and microcontrollers.
Ultra-low-power devices operate at 1.7-3.3V. 5V devices consume more power. Power consumption must be 100mW or less, 1mW at the extreme. Lower clock speeds mean lower power consumption but also longer processing time.
A design engineer selects a hardware platform that best suits the application. For simplest sensors, kilobytes of Flash and RAM may be enough. For camera inputs and processing high-resolution images, more Flash and RAM may be need. Likewise, to run a Wi-Fi or BLE stack, more Flash and RAM are needed. If battery operated, power budgeting and duty cycling must be carefully planned.
-
What software platforms support TinyML? We note the following TinyML frameworks:
- TensorFlow Lite Micro (TFLM): Widely cited and grew out of TensorFlow framework. Written in C++ 17 and for 32-bit processors. Supports many development boards. TensorFlow Lite may be easier to integrate on the Raspberry Pi.
- PyTorch Mobile: Enables smooth transition from training with PyTorch and deploying for embedded environments.
- Edge Impulse: Closed-source framework that can train and optimize models/libraries for various targets (MCU, CPU, GPU). Uses less Flash and RAM than TFLM.
- µTensor: Converts a Keras model into C++ for Mbed, ST, and K64 boards.
- NNoM: Open-source framework that accepts TensorFlow models and converts to C code. Not as popular as TFLM.
- Embedded Learning Library (ELL): From Microsoft for image/audio classification on Raspberry Pi, Arduino, and micro:bit platforms.
- STM32Cube.AI:: Generates and optimizes code for STM32 ARM Cortex-M-based boards. It understands TFL, ONNX, Matlab, and PyTorch models.
- µTVM: Optimizes tensor operations for MCUs.
Neuton, Edge Impulse, Latent AI, Imagimob, NanoEdge AI Studio and SensiML are platforms that bring the AutoML approach to TinyML.
-
What techniques are used to reduce model sizes in TinyML? DNNs are typically overparameterized. Some of the weights are unimportant to the model's performance. These can be removed over many training epochs. This is called pruning.
With quantization, high-precision floating point values are reduced to low-precision fixed point values and integer operations. Applicable for weights and activations, quantization reduces processing requirements with minimal loss of accuracy. Weight sharing and tensor decomposition are other techniques.
Knowledge distillation is basically training a smaller simpler model that gives similar results to a larger and more complex model.
Apache TVM compiles models and tensor operations optimally for various targets. Apart from such optimizations, running the models on hardware accelerators improves speed and saves energy. From the perspective of SoC and accelerator design, designers look at processor type, bit precision configuration, reconfigurability of NN architecture, and efficient on/off-chip memory access.
-
What's the current performance of TinyML? By 2022, TinyML was possible even on 8-bit MCUs with Flash and RAM footprint of just a few kilobytes. Using the Neuton framework, a binary classification problem for gesture recognition was proven on a Arduino Mega 2560 board. Input was accelerometer data. Model size was less than 3KB.
Another example is hand gesture recognition, implemented on an Arm Cortex-M7 with accelerated Arm CMSIS-NN library. Grayscale 96x96 pixel data at 20fps came from an image sensor. TensorFlow Lite Micro and OpenMV (based on MicroPython) were used. MobileNetV1 was the trained and converted to
*.tflite
format. Model size was about 303KB. -
What are some limitations/challenges of TinyML? Traditional neural network architectures may not be the best fit for TinyML constraints. Model compression affects accuracy. AutoML for TinyML may be important since very few embedded engineers have AI/ML skills.
The right trade-offs are not obvious. What's an acceptable accuracy given constraints of power and memory? Different modelling approaches are needed for different applications and trade-offs. It's not clear how MACCs and FLOPs relate to resource constraints.
As environmental conditions change over time, the accuracy of TinyML models may drop. Models need to be retrained with recent data. On-device training needs more research. TinyOL is one possible solution. Moreover, models need to adapt to changing constraints (power, memory, bandwidth).
IoT involves various hardware and software platforms. Data format and resolution also vary. Training an ML model for this variety is a challenge. Model reuse from one platform to another even for the same application isn't easy.
TinyMLOps and life cycle management for edge devices are at a nascent stage. Tools and processes are not well developed. TinyML software frameworks lack some features.
-
Where can I learn more about TinyML? Harvard University's TinyMLedu is a good starting point for beginners. This includes a list TinyML courses. Harvard has published many TinyML courses on edX. Edge Impulse has a course on Coursera. University of Pennsylvania has its ESE3600 TinyML course.
Two books to read are TinyML by Warden and Situnayake (2019) and TinyML Cookbook by Iodice (2022).
A Seeed Studio webpage lists embedded modules and related frameworks that support TinyML.
For application ideas, see Neuton homepage. Another source for ideas is Experiments with Google that uses TFLM as the framework. Edge Impulse has shared many real-world TinyML case studies.
Milestones
This decade sees interest in and growth of the Internet of Things (IoT). Many IoT cloud platforms are launched during 2012-2014. IoT alliances and consortia are formed from 2014. Google Trends shows interest in the term "Internet of Things" from 2013 and peaking in 2017. By 2020, there's an estimated 10-12 billion IoT devices. These developments set the stage for the emergence of TinyML towards the end of the 2010s.
Chen et al. propose a DNN for keyword spotting (KWS). This is useful for hands-free interaction with conversational voice assistants. The assistant is always listening for keywords. Their DNN model can execute on mobile devices with better results than state-of-the-art HMM models. The latter requires a Viberti decoder that's more computationally intensive. In 2018, Zhang et al. achieve KWS on microcontrollers within 70KB.
2017
Google previews TensorFlow Lite, a lightweight version of TensorFlow for mobile and embedded devices. It's now possible to deploy DNNs for inference (eg. MobileNet, Inception v3, Smart Reply) on smartphones. A converter converts a TensorFlow model into TensorFlow Lite format. Where possible, it uses hardware acceleration. Interpreter takes only 300KB compared to 1.5MB used by TensorFlow Mobile. TensorFlow Lite is an evolution of TensorFlow Mobile.
2019
Google announces TensorFlow Lite for Microcontrollers at the TensorFlow Developer Summit. They show a speech recognition demo on a SparkFun dev board having Cortex M4, 384KB RAM and 1MB Flash. Processor takes less than 1mW of power. The model takes only 20KB of Flash storage. TensorFlow Lite takes another 25KB. RAM usage is 30KB. In December, Google shares more details at the AIoT Dev Summit where they use the term "TinyML". The earliest publications on TinyML are from 2019.
2020
Arm announces its Arm Cortex-M55, the first Cortex-M processor to include Arm Helium vector processing technology. Arm also announces Arm Ethos-U55 microNPU, an NN accelerator for embedded devices. Together, these bring 480x performance for ML workloads. In October, Arm Ethos-U65 microNPU brings edge ML capability to Cortex-A, Cortex-R and Neoverse systems. Among the architectures supported by Ethos-U are DS-CNN-L, MobileNet v3, Inception v4, RNNoise, DeepSpeech-1, and Wav2Letter. Grove Vision AI V2 is an AI module with Cortex-M55 and Ethos-U55.
2020
Amazon embeds the AZ1 Neural Edge processor into its line of Echo devices. This makes on-device neural speech recognition possible for handling Alexa voice commands. Three years later (Sep 2023), Alexa is upgraded to use generative AI that's most likely done in the cloud. Amazon Echo was first made available to Amazon Prime customers back in 2014.
2020
For person detection, called Visual Wake Word (VWW), Lin et al. manage to fit a model within 1MB Flash. They call their solution MCUNet. Their TinyNAS employs a two-stage Neural Architecture Search (NAS) approach. Their TinyEngine is a memory-efficient inference library. They obtain state-of-the-art results and claim that always-on ML for IoT is now possible. However, the NAS approach is time consuming and results can't be ported to various hardware platforms.
Many research papers are published that deal with on-device training. Online learning (aka incremental learning) updates the model as new data arrives. Devices may also collaborate over the network to jointly update their models based on decentralized data. Also in 2021, Neuton TinyML is founded. It can automatically discover optimized neural network architectures that can run even on 8-bit microcontrollers.
2022
Han and Siebert summarize the current state of the art in TinyML. They note that TensorFlow Lite Micro supports a wide variety of processors. Keyword spotting is a common application. Among the neural networks, CNN, DNN and RNN/LSTM architectures are common. In October, one research invented Phinet GAN for face swapping. GANs for this application are more than 100MB but Phinet GAN is under 1MB with 6% performance penalty. Phinet GAN uses optimized upsampling blocks using bilinear filtering.
References
- Abadade, Y., A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki, and A. S. Hafid. 2023. "A Comprehensive Survey on TinyML." IEEE Access, vol. 11, pp. 96892-96922. doi: 10.1109/ACCESS.2023.3294111. Accessed 2024-01-29.
- Ancilotto, A., F. Paissan, and E. Farella. 2022. "Bringing face-swapping to the very edge with Phinets." Presentation, tinyML EMEA 2022, October 27. Accessed 2024-02-04.
- Apache TVM. 2024. "Homepage." Apache TVM, The Apache Software Foundation. Accessed 2024-01-29.
- Arm. 2024. "Arm and Partners Delivering TinyML." Arm. Accessed 2024-01-29.
- Arora, T. 2020. "Arm Ethos-U65: Powering innovation in a new world of AI devices." Blog, Arm Community, October 19. Accessed 2024-02-04.
- Cavagnis, L. 2022. "Ultra TinyML: Machine Learning for 8-bit Microcontroller." Towards Data Science, on Medium, February 21. Accessed 2024-01-31.
- Chawla, N. 2023. "Empowering the Edge: Advancements in AI Hardware and In-Memory Computing Architectures." tinyML Talks, August 1. Accessed 2024-01-29.
- Chen, G., C. Parada, and G. Heigold. 2014. "Small-footprint keyword spotting using deep neural networks." IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp. 4087-4091. doi: 10.1109/ICASSP.2014.6854370. Accessed 2024-01-29.
- Earls, A. R. 2023. "Can tinyML Bring Machine Learning to the Masses?" Electronic Design, December 26. Accessed 2024-01-29.
- FBK. 2023. "The Growing Community of TinyML." Blog, Marvel, April 18. Updated 2023-05-29. Accessed 2024-01-29.
- Faulkner, C. 2020. "Amazon's AZ1 Neural Edge processor will make Alexa voice commands even faster." The Verge, September 24. Accessed 2024-01-29.
- Gavrilova, Y. 2023. "Introduction to Tiny ML." Blog, Serokell, June 7. Accessed 2024-01-29.
- Google. 2021. "TensorFlow Lite for Microcontrollers." Experiments with Google. Accessed 2024-01-29.
- Google Trends. 2024. "Google Trends for 'Internet of Things'." Google Trends. Accessed 2024-01-31.
- Grosse, R. 2018. "A Closer Look at AlexNet." Slides, Tutorial 6 – CNNs, CSC321, Univ. of Toronto, February 12. Accessed 2024-02-02.
- Gupta, S., S. Jain, and B. Roy. 2022. "A TinyML Approach to Human Activity Recognition." Journal of Physics: Conference Series, IOP Publishing, vol. 2273, article no. 012025. Accessed 2024-01-29.
- Haigh, K. Z., A. M. Mackay, M. R. Cook, and L. G. Lin. 2015. "Machine Learning for Embedded Systems: A Case Study." Raytheon BBN Technologies. Accessed 2024-01-30.
- Han, H. and J. Siebert. 2022. "TinyML: A Systematic Review and Synthesis of Existing Research." International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, pp. 269-274, February 21-24. doi: 10.1109/ICAIIC54071.2022.9722636. Accessed 2024-01-31.
- Heim, L., A. Biri, Z. Qu, and L. Thiele. 2021. "Measuring what Really Matters: Optimizing Neural Networks for TinyML." Working paper, ETH Zurish, April 21. Accessed 2024-01-29.
- Iandola, F. N., S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. 2016. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size." v4, arXiv, November 4. Accessed 2024-02-02.
- Kelly, S. M. 2023. "So long, robotic Alexa. Amazon’s voice assistant gets more human-like with generative AI." CNN, September 20. Accessed 2024-01-29.
- Lawton, G. 2021. "Tips and tricks for deploying TinyML." Enterprise AI, TechTarget, December 29. Accessed 2024-01-29.
- Lin, J., W.-M. Chen, Y. Lin, J. Cohn, C. Gan, and S. Han. 2020. "MCUNet: Tiny Deep Learning on IoT Devices." NeurIPS 2020, Vancouver, Canada, December 6-12. Accessed 2024-01-31.
- Lorenser, T. 2020. "Arm Cortex-M55 and Ethos-U55 Processors: Extending the Performance of Arm’s ML Portfolio for Endpoint Devices." Blog, Arm Community, February 10. Accessed 2024-02-04.
- Lorenzetti, Laura. 2014. "Forget Siri, Amazon now brings you Alexa." Fortune, November 06. Accessed 2024-01-31.
- Lueth, K. L. 2017. "IoT Platform Comparison: How the 450 providers stack up." IoT Analytics, July 13. Accessed 2024-01-31.
- Lueth, K. L. 2020. "State of the IoT 2020: 12 billion IoT connections, surpassing non-IoT for the first time." IoT Analytics, November 19. Accessed 2024-01-31.
- Lê, M. T., P. Wolinski, and J. Arbel. 2023. "Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review." v1, arXiv, November 20. Accessed 2024-01-31.
- Mistry, S. 2022. "Building a TensorFlow Lite based computer vision emoji input device with OpenMV." ml-image-classification-example-for-openmv, Arm Developer Ecosystem, on GitHub, November 28. Accessed 2024-02-02.
- Moyer, B. 2021. "Why TinyML Is Such A Big Deal." Semiconductor Engineering, September 2. Accessed 2024-01-29.
- Mulvaney, D., I. Sillitoe, E. Swere, Y. Wang, and Z. Zhu. 2007. "Real-time machine learning in embedded software and hardware platforms." Int. J. Intelligent Systems Technologies and Applications, vol. 2, no. 2/3, pp. 87-204. Accessed 2024-01-30.
- Nadeski, M. 2019. "Bringing machine learning to embedded systems." White paper, Texas Instruments, March. Accessed 2024-01-31.
- Nekhil R. 2023. "Fire Detection with sensor fusion." Project 160533, Edge Impulse, January. Accessed 2024-01-29.
- Neuton. 2024a. "About Neuton.AI." Neuton, on LinkedIn. Accessed 2024-01-31.
- Neuton. 2024b. "Homepage." Neuton. Accessed 2024-02-04.
- O'Donnell, L. 2016. "10 IoT Consortiums, Alliances Solution Providers Should Have On Their Radar." CRN, The Channel Company, September 19. Accessed 2024-01-24.
- Raja, A. 2023. "New trends in TinyML !" LinkedIn Pulse, October 29. Accessed 2024-01-29.
- Rajapakse, V., I. Karunanayake, and N. Ahmed. 2023. "Intelligence at the Extreme Edge: A Survey on Reformable TinyML." ACM Computing Surveys, vol. 55, no. 13s, article no. 282, pp. 1-30, December. doi: 10.1145/3583683. Accessed 2024-01-29.
- Ray, P. P. 2022. "A review on TinyML: State-of-the-art and prospects." Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4, pp. 1595-1623, April. doi: 10.1016/j.jksuci.2021.11.019. Accessed 2024-01-29.
- Ren, H., D. Anicic, and T. A. Runkler. 2021. "TinyOL: TinyML with Online-Learning on Microcontrollers." International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, pp. 1-8. doi: 10.1109/IJCNN52387.2021.9533927. Accessed 2024-01-29.
- Rieder, N. and R. T. Maestro. 2022. "How to quickly deploy TinyML on MCUs." Embedded, July 5. Accessed 2024-01-29.
- Schalkwyk, J. 2019. "An All-Neural On-Device Speech Recognizer." Blog, Google Research, March 12. Accessed 2024-01-31.
- Schizas, N., A. Karras, C. Karras, and S. Sioutas. 2022. "TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic Review." Future Internet, MDPI, vol. 14, no. 12, December. Accessed 2024-01-29.
- Seeed Studio. 2023. "Tiny Machine Learning(TinyML)." Wiki, July 21. Accessed 2024-01-29.
- TensorFlow. 2017. "Announcing TensorFlow Lite." Blog, Google for Developers, November 14. Accessed 2024-01-31.
- TensorFlow. 2023. "TensorFlow Lite for Microcontrollers." TensorFlow, May 23. Accessed 2024-01-29.
- TinyMLedu. 2024. "Take a Free Online Course or Teach Your Own!" TinyMLedu, Harvard University. Accessed 2024-01-29.
- Vailshery, L. S. 2023. "Number of Internet of Things (IoT) connected devices worldwide from 2019 to 2023, with forecasts from 2022 to 2030." Statista, July 27. Accessed 2024-01-31.
- Warden, P. 2019a. "Launching TensorFlow Lite for Microcontrollers." Blog, March 7. Accessed 2024-01-31.
- Warden, P. 2019b. "What's TinyML good for" AIoT Dev Summit, December 2-3. Accessed 2024-01-29.
- Wolff, I. 2023. "Is Software and Hardware Ready for TinyML Tsunami?" EE Times, December 19. Accessed 2024-01-29.
- Zhang, Y., N. Suda, L. Lai, and V. Chandra. 2018. "Hello Edge: Keyword Spotting on Microcontrollers." arXiv, v3, February 14. Accessed 2024-01-31.
- von Zitzewitz, Gustav. 2017. "Survey of neural networks in autonomous driving." Advanced Seminar Summer Semester, Technische Universitat Munchen. Accessed 2024-01-30.
Further Reading
- Ray, P. P. 2022. "A review on TinyML: State-of-the-art and prospects." Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4, pp. 1595-1623, April. doi: 10.1016/j.jksuci.2021.11.019. Accessed 2024-01-29.
- Janapa Reddi, V., B. Plancher, S. Kennedy, L. Moroney, P. Warden, L. Suzuki, A. Agarwal, C. Banbury, M. Banzi, M. Bennett, B. Brown, S. Chitlangia, R. Ghosal, S. Grafman, R. Jaeger, S. Krishnan, M. Lam, D. Leiker, C. Mann, M. Mazumder, D. Pajak, D. Ramaprasad, J. Evan Smith, M. Stewart, and D. Tingley. 2022. "Widening Access to Applied Machine Learning With TinyML." Harvard Data Science Review, vol. 4, no. 1. doi: 10.1162/99608f92.762d171a. Accessed 2024-01-29.
- Rajapakse, V., I. Karunanayake, and N. Ahmed. 2023. "Intelligence at the Extreme Edge: A Survey on Reformable TinyML." ACM Computing Surveys, vol. 55, no. 13s, article no. 282, pp. 1-30, December. doi: 10.1145/3583683. Accessed 2024-01-29.
- Abadade, Y., A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki, and A. S. Hafid. 2023. "A Comprehensive Survey on TinyML." IEEE Access, vol. 11, pp. 96892-96922. doi: 10.1109/ACCESS.2023.3294111. Accessed 2024-01-29.
- Lê, M. T., P. Wolinski, and J. Arbel. 2023. "Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review." v1, arXiv, November 20. Accessed 2024-01-31.
- Warden, P. 2019b. "What's TinyML good for" AIoT Dev Summit, December 2-3. Accessed 2024-01-29.
Article Stats
Cite As
See Also
- TinyML Frameworks
- TinyMLOps
- TinyML-as-a-Service
- TensorFlow Lite Micro
- MicroPython
- IoT Analytics