Qwen3-VL
SFT Training Pipeline
1
Frame
Sampling
‹
2
Resize
‹
3
Patch
Align
‹
4
Patchify
‹
5
Temporal
Pool
‹
6
Flatten &
Tokenize
‹
7
Vision
Encoder
‹
8
Forward &
Backward
◀
▮▮
▶
↺
Hold
s
Step
Zoom
1.0×
Space: play · Arrows: navigate · Click boxes to jump · Drag to pan · M: mode