Cloud Resource Allocation via Multi-Agent Reinforcement Learning and Amortized Winner Determination

Authors

  • Muhammad Adnan Khan University of Engineering and Technology, Pakistan
  • Zeshan Iqbal University of Engineering and Technology, Pakistan
  • Saba Iqbal University of Wah, Wah Cantt, Pakistan

DOI:

https://doi.org/10.61453/joit.v2026_0104

Keywords:

Cloud Computing, Resource Allocation, Multi-Agent Reinforcement Learning, Combinatorial Auctions, Winner Determination Problem

Abstract

Dynamic cloud resource allocation, particularly in- volving heterogeneous resource bundles (combinatorial requests), presents a significant challenge constrained by the computational intractability of the Winner Determination Problem (WDP), which is classified as NP-hard. This paper introduces a unified framework integrating Multi-Agent Deep Reinforcement Learning (MADRL) with an Amortized Winner Determination policy to achieve real-time, equitable, and cost-efficient cloud orchestration. Cloud brokers are modeled as decentralized Proximal Policy Optimization (PPO) agents learning bidding strategies, while a central Auctioneer Agent utilizes a neural network (learned WDP solver) to quickly approximate the complex combinatorial matching task. The learning process is guided by a multi- objective reward function explicitly balancing cost minimization, social welfare, and equitable resource distribution, quantified using Jain’s Fairness Index. Empirical evaluation, conducted in the CloudSim simulation environment, demonstrates significant advantages over traditional heuristic and exact solvers. The MADRL framework achieved the lowest total cost (65.57) and dramatically superior fairness (Jain’s Index 0.929) compared to static baselines. Furthermore, the amortized solver maintained high social welfare (averaging 1285) near the theoretical maximum of Integer Linear Programming (ILP) (averaging 1310), but with a computational runtime (40–150 milliseconds) that is orders of magnitude faster, enabling the system to operate effectively in dynamic, near real-time cloud marketplaces. This integration validates amortized combinatorial optimization as a promising pathway to scalable, autonomous, and economically sound resource management.

References

Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R. (2011). CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1), 23-50. https://doi.org/10.1002/spe.995

Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the 8th USENIX conference on Networked systems design and implementation (NSDI'11) (pp. 323-336). https://dl.acm.org/doi/10.5555/1972457.1972490

Jain, R., Chiu, D., & Hawe, W. (1984). A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. DEC Research Report TR-301.

Li, Q. (2023). A Truthful dynamic combinatorial double auctions for cloud resource allocation. Journal of Cloud Computing, 12(1), 45. https://doi.org/10.1186/s13677-023-00420-y

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://doi.org/10.48550/arXiv.1707.06347

Shi, W., Zhang, J., & Liang, Z. (2019). Deep reinforcement learning for resource management in networked systems: A survey.IEEE Communications Surveys & Tutorials, 21(3), 3135–3157. https://doi.org/10.3390/s22083031

Smith, V. L. (2006). Combinatorial auctions (Vol. 1, No. 0). P. C. Cramton, Y. Shoham, & R. Steinberg (Eds.). Cambridge: MIT press. https://dl.acm.org/doi/10.5555/1076465

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press. https://doi.org/10.1017/S0263574799271172

Zhang, A., & Others. (2022). Incorporating fairness into reinforcement learning for resource allocation. IEEE Transactions on Network and Service Management, 19(2), 1254-1267.

Zhou, T., Li, Y., Wang, X., Gao, H., & Zhang, B. (2024). Deep reinforcement learning for job scheduling and resource management in cloud computing: An algorithm-level review. ACM Computing Surveys, 56(3), 1-38. https://doi.org/10.48550/arXiv.2501.01007

Downloads

Published

2026-03-11

How to Cite

Khan, M. A., Iqbal, Z., & Iqbal, S. (2026). Cloud Resource Allocation via Multi-Agent Reinforcement Learning and Amortized Winner Determination. Journal of Innovation and Technology, 2026(1), 33–39. https://doi.org/10.61453/joit.v2026_0104