You know the “I forced a bot to…” meme? Well, my colleagues Noah & team actually did it, forcing bots to watch real estate videos (which feature lots of stable, horizontal tracking shots) in order to synthesize animations between multiple independent images—say, the ones captured by a multi-lens phone:
We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube.
Check out what it can enable: