Performance‑Aware Lifecycle Framework for Deployment of Large Language Models in Cloud‑Native and Serverless Environments

Dilnoza Zubayd qizi Ismoilova

Vol. 5 No. 12 (2025)

Articles

Performance‑Aware Lifecycle Framework for Deployment of Large Language Models in Cloud‑Native and Serverless Environments

pdf

Dilnoza Zubayd qizi Ismoilova

more info

Dilnoza Zubayd qizi Ismoilova
Assistant of the Department of Medical and Biological Chemistry, Bukhara State Medical Institute, Uzbekistan

Published 2025-12-17

Keywords

Large Language Models,
Cloud-native deployment,
Serverless computing,
Scalability benchmarking

How to Cite

Dilnoza Zubayd qizi Ismoilova. (2025). Performance‑Aware Lifecycle Framework for Deployment of Large Language Models in Cloud‑Native and Serverless Environments. Stanford Database Library of American Journal of Applied Science and Technology, 5(12), 136–141. Retrieved from https://oscarpubhouse.com/index.php/sdlajast/article/view/62

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Large Language Models (LLMs) have rapidly evolved to become central components in contemporary artificial intelligence applications, promising sophisticated natural language understanding, generation, and decision-making capabilities. However, their deployment at scale—especially within cloud-native and serverless infrastructures—poses significant challenges in terms of performance, scalability, cost-efficiency, and lifecycle management. Existing literature offers detailed surveys on LLM capabilities, optimization techniques, and deep learning model compression (Hadi et al., 2023; Raiaan et al., 2024; Patil & Gudivada, 2024; Menghani, 2023), as well as broader concerns and methodologies regarding cloud-native architectures, serverless latency, and machine learning (ML) lifecycle orchestration (Henning & Hasselbring, 2022; Golec et al., 2024; Ashmore et al., 2021; Kodakandla, 2021; Buyya et al., 2018; Nigenda et al., 2022). Yet, limited work integrates these threads into a unified deployment and evaluation framework tailored for LLM-driven services. In this article, we propose a comprehensive, performance-aware lifecycle framework for LLM deployment in cloud-native and serverless environments. The framework systematically addresses scalability benchmarking, resource optimization, latency mitigation (particularly cold-start issues), continuous testing and monitoring, cost optimization, and compliance with ML lifecycle best practices. We elaborate on the theoretical underpinnings of the framework, describe a methodology for its adoption, present hypothetical results illustrating potential gains, discuss limitations, and outline avenues for future research. Our goal is to equip ML engineers, system architects, and academic researchers with a cohesive, practical, and theoretically grounded guideline for deploying LLM-based systems under real-world constraints.

pdf

References

Hadi, M.U., et al. (2023). Large language models: a comprehensive survey of their applications, challenges, limitations, and future prospects. Authorea Preprints, 1, 1–26.
Raiaan, M.A.K., et al. (2024). A review on large language models: Architectures, applications, taxonomies, open issues and challenges. IEEE Access, 12, 26839–26874.
Patil, R. & Gudivada, V. (2024). A review of current trends, techniques, and challenges in large language models (LLMs). Applied Sciences, 14(5), 2074.
Henning, S. & Hasselbring, W. (2022). A configurable method for benchmarking scalability of cloud-native applications. Empirical Software Engineering, 27(6), 143.
Menghani, G. (2023). Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Computing Surveys, 55(12), 1–37.
Almutawa, M., Ghabrah, Q., & Canini, M. (2024). Towards LLM-Assisted System Testing for Microservices. In 2024 IEEE 44th International Conference on Distributed Computing Systems Workshops (ICDCSW). IEEE.
Silva, L.C., et al. (2020). Benchmarking machine learning solutions in production. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE.
Malakar, P., et al. (2018). Benchmarking machine learning methods for performance modeling of scientific applications. In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). IEEE.
Golec, M., et al. (2024). Cold start latency in serverless computing: A systematic review, taxonomy, and future directions. ACM Computing Surveys, 57(3), 1–36.
John, B. (2025). Comparative Analysis of Data Lakes and Data Warehouses for Machine Learning Workflows: Architecture, Performance, and Scalability Considerations.
Jabbar, Q.A.Z., et al. (2022). Using GPU and TPU Hardware Accelerators to Develop a Cloud-Based Genetic Algorithm System. In International Conference on Signals, Machines, and Automation. Springer.
Anderson, K. (2022). Automating Machine Learning Pipelines: CI/CD Implementation on AWS.
Al-Nouti, A.F., Fu, M., & Bokde, N.D. (2024). Reservoir operation based machine learning models: comprehensive review for limitations, research gap, and possible future research direction. Knowledge‑Based Engineering and Sciences, 5(2), 75–139.
Ashmore, R., Calinescu, R., & Paterson, C. (2021). Assuring the machine learning lifecycle: Desiderata, methods, and challenges. ACM Computing Surveys, 54(5), 1–39.
Kodakandla, N. (2021). Serverless Architectures: A Comparative Study of Performance, Scalability, and Cost in Cloud-native Applications. Iconic Research And Engineering Journals, 5(2), 136–150.
Bagai, R. (2024). Comparative analysis of AWS model deployment services. arXiv preprint.
Borra, P. (2024). Advancing Artificial Intelligence with AWS Machine Learning: A Comprehensive Overview. International Journal of Advanced Research in Science, Communication and Technology.
Bhardwaj, P. (2025). The Impact of Serverless Computing on Cost Optimization.
Buyya, R., et al. (2018). A manifesto for future generation cloud computing: Research directions for the next decade. ACM Computing Surveys, 51(5), 1–38.
Nigenda, D., et al. (2022). Amazon SageMaker model monitor: A system for real-time insights into deployed machine learning models. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Joshi, P.K. (2025). CI/CD Automation for Payment Gateways: Azure vs. AWS.
Chandra, R. (2025). Design and implementation of scalable test platforms for LLM deployments. Journal of Electrical Systems, 21(1s), 578–590.

Performance‑Aware Lifecycle Framework for Deployment of Large Language Models in Cloud‑Native and Serverless Environments

Keywords

How to Cite

Download Citation

Abstract

References