@dalias @raven667 @Nazani @futurebird this kind of "AI" involves lots of training to generate a lightweight model that would then be employed for each video, so the hefty compute cost is one-time.
I believe it's attention-based. Something like "use more bits on faces" and "smooth this part since nobody's looking at it." A musician will be looking at things differently than your general viewer, so since this is a generalized algorithm, the artifacts are more obvious.