MUZERO DOES YOUTUBE

If you've been watching YouTube lately, the encoding settings for the
video you watched might have been selected by MuZero - using less of
your bandwidth for the same quality. The task here is rate control:
selecting the quantization parameters within the VP9 codec to maximize
the quality at a specified bitrate.

This is a constrained RL problem, requiring us to optimize two
conflicting objectives of variable difficulty at the same time. To
deal with this challenge we introduce a self-competition based reward
mechanism, where the reward depends on how successful other recent
episodes were at maximizing the quality while staying under the
bitrate concern. This allows the agent to improve smoothly at any
stage of learning, no matter whether it has not yet learned to stay
within the constraints, or whether it is in the final stages of
maximizing quality.

See our official blog post as well as our paper for the full details.