High-Speed Compute-Efficient Bandit Learning for Many Arms
High-Speed Compute-Efficient Bandit Learning for Many Arms
Abstract:
Multiarmed bandits (MABs) are online machine learning algorithms that aim to identify the optimal arm without prior statistical knowledge via the exploration-exploitation tradeoff. The performance metric, regret, and computational complexity of the MAB algorithms degrade with the increase in the number of arms, K. In applications such as wireless communication, radar systems, and sensor networks, K, i.e., the number of antennas, beams, bands, etc., is expected to be large. In this work, we consider focused exploration-based MAB, which outperforms conventional MAB for large K, and its mapping on multiple edge processors and multiprocessor system on a chip (MPSoC) via hardware-software co-design (HSCD) and fixed point (FP) analysis. The proposed architecture offers 67% reduction in average cumulative regret, 84% reduction in execution time on edge processor, 97% reduction in execution time using FPGA-based accelerator, and 10% savings in resources over state-of-the-art MABs for large K = 100.