Qwen3-VL GRPO Training Pipeline
1
Frame
Sampling
2
Resize
3
Patch
Align
4
Patchify
5
Temporal
Pool
6
Flatten &
Tokenize
7
Vision
Encoder
8
Generate
G compl.
9
Reward
Scoring
10
Advantage
Compute
11
Policy
Forward
12
Ref
Forward
13
GRPO
Loss
Hold s Zoom 1.0×
Space: play · Arrows: navigate · Click boxes to jump · Drag to pan · M: mode