Multimodal Remote Inference

from arxiv, Submitted to IEEE/ACM Transactions on Networking; a preliminary version appeared in the Proceedings of the 22nd IEEE International Conference on Mobile Ad-Hoc and Smart Systems (MASS 2025)

We consider a remote inference system with multiple modalities, where a multimodal machine learning (ML) model performs real-time inference using features collected from remote sensors. When sensor observations evolve dynamically over time, fresh features are critical for inference tasks. However, timely delivery of features from all modalities is often infeasible under limited network resources. To address this challenge, we formulate a multimodal scheduling problem to minimize the ML model's inference error. We model this error as a general function of the Age of Information (AoI) vector, where AoI quantifies data freshness. We cast the problem as a semi-Markov decision process (SMDP) and derive an equivalent reformulation with a reduced state set. We then show that the problem has fundamentally different chain structures in the two-modality and multi-modality cases. For the two-modality case, we prove that the optimal policy has an index-based threshold structure. For the general multi-modality case (i.e., with more than two modalities), we develop the optimal error-aware switching-and-transmission policy (EAST), which is computed using a multichain policy iteration algorithm (MPI). To further reduce complexity, we also develop two low-complexity policies under special settings: the error-aware transmission policy (EAT) and the fixed threshold policy (FT). Numerical results from three case studies show that the proposed policies outperform several simple heuristics, including round-robin, greedy, and uniform random policies. In particular, EAST reduces the inference error by up to 44.8% compared with the best baseline in each case. In the five-modality case, EAT and FT reduce computation time by 6.6$\times$ and 3000$\times$, respectively, relative to EAST, while increasing the inference error by 20.2% and 38.6%, respectively.

翻译：我们考虑一个多模态远程推理系统，其中多模态机器学习模型利用从远程传感器收集的特征进行实时推理。当传感器观测值随时间动态演变时，新鲜特征对推理任务至关重要。然而，在有限网络资源下，及时从所有模态传输特征往往不可行。为解决这一挑战，我们制定了一个多模态调度问题，旨在最小化机器学习模型的推理误差。我们将此误差建模为信息年龄向量的通用函数，其中信息年龄量化数据新鲜度。我们将该问题表述为半马尔可夫决策过程，并推导出一个具有简化状态集的等价重构形式。随后，我们证明该问题在双模态和多模态情况下具有根本不同的链结构。对于双模态情况，我们证明最优策略具有基于指数的阈值结构。对于一般的多模态情况（即模态数超过两个），我们开发了最优的误差感知切换与传输策略，该策略通过多链策略迭代算法计算。为进一步降低复杂度，我们还针对特殊场景开发了两种低复杂度策略：误差感知传输策略和固定阈值策略。三个案例研究的数值结果表明，所提策略优于若干简单启发式策略，包括轮询、贪婪和均匀随机策略。特别是，与每种情况下的最佳基线相比，EAST将推理误差降低了最多44.8%。在五模态情况下，与EAST相比，EAT和FT的计算时间分别降低了6.6倍和3000倍，而推理误差分别增加了20.2%和38.6%。