Recent advances in natural language processing demonstrate the capability of large-scale language models (such as GPT-3) to solve a variety of NLP problems with zero shots shifting from supervised fine-tuning to prompt engineering/tuning. However, building large language models raises challenges on data preparation, training, and deployment. In addition, while the process is well-established for a few dominant languages such as English, its execution on localized languages remains limited. We'll give an overview of the end-to-end process for building large-scale language models, discuss the challenges of scaling, and describe some existing solutions for efficient data preparation, distributed training, model optimization, and distributed deployment. We'll use examples on localized languages such as French or Spanish using NVIDIA Nemo Megatron, a framework for training large NLP models optimized for SuperPOD hardware infrastructure.