I fully approve of this witchcraft. Now, to apply the “I forced a bot to watch 1,000 hours…” technique to my kids’ band recitals. 😉
VentureBeat writes,
The fully trained PixelPlayer system, given a video as the input, splits the accompanying audio and identifies the source of sound, and then calculates the volume of each pixel in the image and “spatially localizes” it — i.e., identifies regions in the clip that generate similar sound waves.


[YouTube]