Visually Indicated Sounds

Andrew Owens     Phillip Isola     Josh McDermott     Antonio Torralba     Edward H. Adelson     William T. Freeman    

Examples of sounds from the Greatest Hits dataset. Click each image to play.

Materials make distinctive sounds when they are hit or scratched — dirt makes a thud; ceramic makes a clink. These sounds reveal aspects of an object's material properties, as well as the force and motion of the physical interaction. In this paper, we introduce an algorithm that learns to synthesize sound from videos of people hitting objects with a drumstick. The algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We demonstrate that the sounds generated by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about the material properties in a scene.

System pipeline

 Download Paper
Data: full-res videos and labels (50GB)
          low-res videos (456 x 256) and labels (20GB)
          dataset license (creative commons 4.0)
          precomputed sound features (1GB).
Slides: keynote, recorded talk.
Code and additional results coming soon!
(in the meantime, please email Please turn on JavaScript to view email address. .)

This video shows clips from our dataset, and algorithm results.

Press:  QuartzWashington PostBoston GlobeMIT News.

The Greatest Hits dataset
A sample of videos from our datset. Please note that these are recorded sounds (not the sound predicted by our algorithm). More sample videos here.




Plastic bag