Cloud Resource Allocation via Multi-Agent Reinforcement Learning and Amortized Winner Determination
DOI:
https://doi.org/10.61453/joit.v2026_0104Keywords:
Cloud Computing, Resource Allocation, Multi-Agent Reinforcement Learning, Combinatorial Auctions, Winner Determination ProblemAbstract
Dynamic cloud resource allocation, particularly in- volving heterogeneous resource bundles (combinatorial requests), presents a significant challenge constrained by the computational intractability of the Winner Determination Problem (WDP), which is classified as NP-hard. This paper introduces a unified framework integrating Multi-Agent Deep Reinforcement Learning (MADRL) with an Amortized Winner Determination policy to achieve real-time, equitable, and cost-efficient cloud orchestration. Cloud brokers are modeled as decentralized Proximal Policy Optimization (PPO) agents learning bidding strategies, while a central Auctioneer Agent utilizes a neural network (learned WDP solver) to quickly approximate the complex combinatorial matching task. The learning process is guided by a multi- objective reward function explicitly balancing cost minimization, social welfare, and equitable resource distribution, quantified using Jain’s Fairness Index. Empirical evaluation, conducted in the CloudSim simulation environment, demonstrates significant advantages over traditional heuristic and exact solvers. The MADRL framework achieved the lowest total cost (65.57) and dramatically superior fairness (Jain’s Index 0.929) compared to static baselines. Furthermore, the amortized solver maintained high social welfare (averaging 1285) near the theoretical maximum of Integer Linear Programming (ILP) (averaging 1310), but with a computational runtime (40–150 milliseconds) that is orders of magnitude faster, enabling the system to operate effectively in dynamic, near real-time cloud marketplaces. This integration validates amortized combinatorial optimization as a promising pathway to scalable, autonomous, and economically sound resource management.
References
Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R. (2011). CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1), 23-50. https://doi.org/10.1002/spe.995
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., & Stoica, I. (2011). Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the 8th USENIX conference on Networked systems design and implementation (NSDI'11) (pp. 323-336). https://dl.acm.org/doi/10.5555/1972457.1972490
Jain, R., Chiu, D., & Hawe, W. (1984). A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. DEC Research Report TR-301.
Li, Q. (2023). A Truthful dynamic combinatorial double auctions for cloud resource allocation. Journal of Cloud Computing, 12(1), 45. https://doi.org/10.1186/s13677-023-00420-y
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://doi.org/10.48550/arXiv.1707.06347
Shi, W., Zhang, J., & Liang, Z. (2019). Deep reinforcement learning for resource management in networked systems: A survey.IEEE Communications Surveys & Tutorials, 21(3), 3135–3157. https://doi.org/10.3390/s22083031
Smith, V. L. (2006). Combinatorial auctions (Vol. 1, No. 0). P. C. Cramton, Y. Shoham, & R. Steinberg (Eds.). Cambridge: MIT press. https://dl.acm.org/doi/10.5555/1076465
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1, pp. 9-11). Cambridge: MIT press. https://doi.org/10.1017/S0263574799271172
Zhang, A., & Others. (2022). Incorporating fairness into reinforcement learning for resource allocation. IEEE Transactions on Network and Service Management, 19(2), 1254-1267.
Zhou, T., Li, Y., Wang, X., Gao, H., & Zhang, B. (2024). Deep reinforcement learning for job scheduling and resource management in cloud computing: An algorithm-level review. ACM Computing Surveys, 56(3), 1-38. https://doi.org/10.48550/arXiv.2501.01007
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Innovation and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.