We are facing the rapid advancements of Machine Learning (ML) and AI, particularly Deep Learning (DL) in recent years. However, most practitioners still work on limited-scale setups often with hidden performance trade-offs. Despite of the massive compute power, large-scale parallelism, and tightly optimized hardware, High-Performance Computing (HPC) has traditionally been used by computer scientists, engineers, physicists, etc., and not widely adopted by the ML community. This training aims to bridge the gap and will show how to design standard DL models like Convolutional Neural Networks (CNN) and run them efficiently on HPC systems, including the use of multiple GPUs in parallel, running jobs through SLURM, optimizing training speed and resource use, and exporting and testing models in a reproducible way.
The training is divided into four sessions (refer to the content below) each containing both theoretical and practical hands-on concepts. For practical concepts, participants will use the MeluXina supercomputer.
Who should attend?
This training is designed for researchers, engineers, and data scientists working with ML who want to scale their DL models.
What will you learn?
Participants will learn how CNNs are structured, trained, and applied in practice. Additionally, the emphasis will be placed on HPC infrastructure relevant to DL workflows. Furthermore, participants will learn how to use multi-GPU training via PyTorch DDP.
Training outcomes
By the end of the course, the participants will be able to:
â—‹ Design and train CNNs using either PyTorch or TensorFlow
â—‹ Run DL jobs on GPU-based HPC systems via SLURM
â—‹ Apply Distributed Data Parallelism (DDP) for scaling trainings across multiple GPUs
â—‹ Tune training performance to improve throughput and resource efficiency
â—‹ Export models and perform inference testing outside the training loop.
Prerequisites
â—‹ Python programming
â—‹ Basic ML knowledge
â—‹ Familiarity with the Linux shell
GPU Compute Resources
During the training, participants will have access to the MeluXina supercomputer. For more information about MeluXina, please refer to the Meluxina overview and the MeluXina – Getting Started Guide. Communication will take place via Zoom and email. All training content (presentation and code) will be provided in advance on GitHub, and the link will be shared in the training confirmation email.
Agenda
This half-day course will be hosted online in Central European Time (CET) on 10 June, 2025 (1:00 PM – 5:00 PM CET).
Session 1. Introduction to Convolutional Neural Networks (CNNs)
1. What are CNNs and their evolution
2. Types of CNNs architecture and design
3. Use cases
4. Challenges
Exercise: Designing a simple CNN in PyTorch or TensorFlow
Session 2. HPC fundamentals for Deep Learning
1. Essentials of HPC architecture (compute nodes, schedulers-SLURM)
2. GPU vs CPU for Deep Learning
3. Storage and I/O bottlenecks
4. Scaling strategies
Exercise: SLURM job submission for training a DL model (CPU vs GPU)
Session 3. Scalable CNN training with Distributed Data Parallel on HPC
1. How DDP works?
2. DDP training script – Code walkthrough
3. Writing a SLURM script for DDP
Exercise: Running a CNN with Torch DDP
Session 4. Optimization and Benchmarking
1. Optimization of the training and best practices
2. Reproducible model export
3. Post-training inference testing
Exercise: Export the trained model and test inference
Schedule
1:00 PM – 1:45 PM: Session 1
1:45 PM – 2:30 PM: Session 2
2:30 PM – 2:45 PM: Break
2:45 PM – 3:45 PM: Session 3
3:45 PM – 4:00 PM: Break
4:00 PM – 5:00 PM: Session 4
1:00 PM – 1:45 PM: Session 1
1:45 PM – 2:30 PM: Session 2
2:30 PM – 2:45 PM: Break
2:45 PM – 3:45 PM: Session 3
3:45 PM – 4:00 PM: Break
4:00 PM – 5:00 PM: Session 4
Important: Limited spots available (25 participants max)!
Contact person for more information:
Aleksandra RANCIC – aleksandra.rancic[at]uni.lu