I need your suggestions for something that is not VCV Rack related, but could be interesting as task (and VCV Rack is a complete LAB to set up experiments)
I need to track in a sport event
A) when there is CLAPPING and
B) I need to extract the energy of the clapping (easy part if we filter out)
seems the inverse problem of people doing audio restoration (where they want to remove clapping)
You could record a sample of a single clap in that room and convolve it with your input. Claps are essentially delta functions—the room resonance fully defines their sound. Peaks of that signal would correspond to claps, and their amplitudes the power of the clap. You could calculate the RMS of that signal per second to get a number proportional to the power of the clapping.
Will there other interference such as commentary etc in the audio source. A lot of variables to consider either way, but when listening to clapping it can be that higher frequency in the ambience can be noticed more but depending on where the listen position is. At lower frequencies (pitch) ambience could be energetic!
I don’t know which kind of techniques you are after specifically, but it could perhaps also be a task for machine learning. It could potentially be seen as a particular case of speech recognition.
this is a very good idea
but the material is always coming from live situation, changing from time to time
like for example
a tennis match
a soccer match
an X-factor show
ecc ecc
@marc_boule
yes would be perfect but we have so few samples currently
one of the thing where we use AI is to detect when a match
NFL, NBA, MLS starts (for every subsection) (because we use as differential to pinpoint single actions)
I’m already evaluating audio power during the full stream to give more or less importance to parts of the event
and sometimes the commentary helps too (like the in the soccer case) to define excitation for the action