Scaling ML workloads with PyTorch | OD39 - Смотреть видео или скачать видео в MP4, музыку MP3 на телефон или компьютер

This talk covers best practices and techniques for scaling machine learning workloads for building large scale models using PyTorch. We will share our experiences of using PyTorch to train 175-billion and 1-Trillion parameter models, different training paradigms and techniques for profiling and troubleshooting that will help you in jumpstarting your efforts in this space.

Jump to:
00:00 Introduction
00:44 Why is large model training needed?
00:59 Scaling creates training and model efficiency
01:13 Larger models = more efficient, less training, less data
01:24 Larger models can learn with few shot learning
02:19 Democratizing largescale language models with OPT175B
02:51 Challenges of large model training
03:25 What is PyTorch Distributed?
04:20 Features Overview
06:00 DistributedDataParallel
06:53 FullyShardedDataParallel
08:44 FSDP Auto wrapping
09:22 FSDP Auto wrapping example
09:38 FSDP CPU Offload, Backward Prefetch policies
09:46 FSDP Mixed Precision control
09:53 Pipeline
11:06 Example Auto Partitioning
12:26 Pipeline + DDP (PDP)
13:44 Memory Saving Features
13:52 Activation Checkpointing
14:20 Activation Offloading
15:01 Activation Checkpointing & Offloading
15:45 Parameter Offloading
16:15 Memory Saving Feature & Training Paradigms
18:11 Experiments & Insights
18:16 Model Implementation
18:50 Scaling Efficiency Varying # GPUs
20:57 Scaling Efficiency Varying World Size
22:07 Scaling Efficiency Varying Batch Size
23:50 Model Scale Limit
24:55 Impact of Network Bandwidth
27:08 Best Practices
28:20 Best Practices FSDP
29:01 Profiling & Troubleshooting
29:08 Profiling & Troubleshooting for Large Scale Model Training
30:35 Uber Prof (Experimental) Profiling & Troubleshooting tool
32:09 Demonstration
34:15 Combining DCGM + Profiling
35:36 Profiling for Large Scale Model Training
36:04 Nvidia NSights multinode, multigpu Profiling
36:47 PyTorch Profiler Distributed Training Profiling (single node multigpu)
37:04 Try it now
37:24 Resources
37:30 Closing Notes

Microsoft Build 2022

Scaling ML workloads with PyTorch | OD39

Теги

Смотрите далее

LEGO EV3 Using RCX Touch Sensor

Форматы аудио: какой лучше выбрать?| WAV, MP3, FLAC, AAC, AIFF, OGG, MQA

R35S/R36S install custom firmware ArkOS / AmberELEC / UnofficialOS

#పత్తి లో #గులాబీరంగుపురుగు నివారణకు TOP 4 మందులు వీటిని పత్తి లొ వాడండి100%నివారణ #pinkbollworm

Emissies voorspellen van boerenbedrijven via mobiele metingen | TNO

New Earning App Today 1 Refer ₹300 | Refer And Earn App | 2024 Best Earning App Refer And Earn Money

Молния.Хромакей. Футаж. Молния на чёрном.Lightning

Доклад И. М. Угрина «Русская идея в контексте интегральной теории эволюции»

The City of San Jose Launches Internet Access Initiative on The People's Network

TEXI BRO - the first home sewing machine with AC servo motor

Направляем домены в VPN и Proxy на роутере Keenetic. Static route, KVAS

Настройка Entware XKeen Xray на роутере Keenetic

Ламповый передатчик. Шарманка.

Using the Drawing Checker Tool

Kesalahan saat Merawat Tanaman Anthurium

Новые клипы

Тренды Наука