Qwen3-VL
GRPO Training Pipeline
1
Frame
Sampling
›
2
Resize
›
3
Patch
Align
›
4
Patchify
›
5
Temporal
Pool
›
6
Flatten &
Tokenize
›
7
Vision
Encoder
8
Generate
G compl.
›
9
Reward
Scoring
›
10
Advantage
Compute
›
11
Policy
Forward
›
12
Ref
Forward
›
13
GRPO
Loss
◀
▮▮
▶
↺
Hold
s
Step
Zoom
1.0×
Space: play · Arrows: navigate · Click boxes to jump · Drag to pan · M: mode