Name: MeluXina: Convolutional Neural Networks & HPC
Start: 2025-06-10T13:00:00+02:00
End: 2025-06-10T17:00:00+02:00

We are facing the rapid advancements of Machine Learning (ML) and AI, particularly Deep Learning (DL) in recent years. However, most practitioners still work on limited-scale setups often with hidden performance trade-offs. Despite of the massive compute power, large-scale parallelism, and tightly optimized hardware, High-Performance Computing (HPC) has traditionally been used by computer scientists, engineers, physicists, etc., and not widely adopted by the ML community. This training aims to bridge the gap and will show how to design standard DL models like Convolutional Neural Networks (CNN) and run them efficiently on HPC systems, including the use of multiple GPUs in parallel, running jobs through SLURM, optimizing training speed and resource use, and exporting and testing models in a reproducible way.

The training is divided into four sessions (refer to the content below) each containing both theoretical and practical hands-on concepts. For practical concepts, participants will use the MeluXina supercomputer.

Who should attend?

This training is designed for researchers, engineers, and data scientists working with ML who want to scale their DL models.

What will you learn?

Participants will learn how CNNs are structured, trained, and applied in practice. Additionally, the emphasis will be placed on HPC infrastructure relevant to DL workflows. Furthermore, participants will learn how to use multi-GPU training via PyTorch DDP.

Training outcomes

By the end of the course, the participants will be able to:

○ Design and train CNNs using either PyTorch or TensorFlow

○ Run DL jobs on GPU-based HPC systems via SLURM

○ Apply Distributed Data Parallelism (DDP) for scaling trainings across multiple GPUs

○ Tune training performance to improve throughput and resource efficiency

○ Export models and perform inference testing outside the training loop.

Prerequisites

○ Python programming

○ Basic ML knowledge

○ Familiarity with the Linux shell

GPU Compute Resources

During the training, participants will have access to the MeluXina supercomputer. For more information about MeluXina, please refer to the Meluxina overview and the MeluXina – Getting Started Guide. Communication will take place via Zoom and email. All training content (presentation and code) will be provided in advance on GitHub, and the link will be shared in the training confirmation email.

Agenda

This half-day course will be hosted online in Central European Time (CET) on 10 June, 2025 (1:00 PM – 5:00 PM CET).

Session 1. Introduction to Convolutional Neural Networks (CNNs)

1. What are CNNs and their evolution

2. Types of CNNs architecture and design

3. Use cases

4. Challenges

Exercise: Designing a simple CNN in PyTorch or TensorFlow

Session 2. HPC fundamentals for Deep Learning

1. Essentials of HPC architecture (compute nodes, schedulers-SLURM)

2. GPU vs CPU for Deep Learning

3. Storage and I/O bottlenecks

4. Scaling strategies

Exercise: SLURM job submission for training a DL model (CPU vs GPU)

Session 3. Scalable CNN training with Distributed Data Parallel on HPC

1. How DDP works?

2. DDP training script – Code walkthrough

3. Writing a SLURM script for DDP

Exercise: Running a CNN with Torch DDP

Session 4. Optimization and Benchmarking

1. Optimization of the training and best practices

2. Reproducible model export

3. Post-training inference testing

Exercise: Export the trained model and test inference

Schedule
1:00 PM – 1:45 PM: Session 1
1:45 PM – 2:30 PM: Session 2
2:30 PM – 2:45 PM: Break
2:45 PM – 3:45 PM: Session 3
3:45 PM – 4:00 PM: Break
4:00 PM – 5:00 PM: Session 4

Important: Limited spots available (25 participants max)!

Contact person for more information:

Aleksandra RANCIC – aleksandra.rancic[at]uni.lu

MeluXina: Scalable Deep Learning with Convolutional Neural Networks on High-Performance Computing Infrastructure

10 June 2025 - Online

registrations closed

About the Event

Initiated and managed by

component teaserparticipants not found

component teasercontacts not found

Terms & Conditions

Luxinnovation

EuroCC

University of Luxembourg

LuxProvide