Abstract:
Objects make distinctive sounds when they are hit or scratched. These sounds reveal
aspects of an object s material properties, as well as the actions that produced them.
In this paper, we propose the task of predicting what sound an object makes when
struck as a way ofstudying physical interactions within a visual scene. We present an
algorithm that synthesizes sound from silent videos of people hitting and scratching
objects with a drumstick. This algorithm uses a recurrent neural network to predict
sound features from videos and then produces a waveform from these features with
an example-based synthesis procedure.
This project uses the Recurrent Neural Network technique to develop the software.
The main advantage of using this technique is that it predict sound features from
videos and then produces a waveform from these features with an example-based
synthesis procedure. Different models of neural network are discuss and Feed forward, Back propagation. After trials and errors, a suitable set of training
parameters are define and network structure that consist of 1 input layer, 2 hidden
layers and 1 output layer with 69 input neurons, 324 neurons for both hidden layers
and 38 neurons for output layer is created.
We show that the sounds predicted by our model are realistic enough to fool
participants in a “real or fake” psychophysical experiment, and that they convey
significant information about material properties and physical interactions