PORTEND: A Joint Performance Model for Partitioned Early-Exiting DNNs

Maryam Ebrahimi, Alexandre da Silva Veith, Moshe Gabel, Eyal de Lara

29th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2023), Ocean Flower Island, Hainan, China, December 2023



The computation and storage requirements of Deep Neural Networks (DNNs) make them challenging to deploy on edge devices, which often have limited resources. Conversely, offloading DNNs to cloud servers incurs high communication overheads. Partitioning and early exiting are attractive solutions for reducing computational costs and improving inference speed. However, current work often addresses these approaches separately and/or ignores common communication intricacies on edge networks such as de(serialization) and data transmission overheads. We present PORTEND, a novel performance model that jointly optimizes partitioning, early exiting, and multi-tier network placement. PORTEND’S novel approach outperforms the state-of-the-art solutions in edge computing setups, reducing the DNN inference latency by 29%.