Detecting clapping envelope

synthi · January 29, 2020, 5:31pm

Hello Big DSP minds of the VCV RACK universe

I need your suggestions for something that is not VCV Rack related, but could be interesting as task (and VCV Rack is a complete LAB to set up experiments)

I need to track in a sport event A) when there is CLAPPING and B) I need to extract the energy of the clapping (easy part if we filter out)

seems the inverse problem of people doing audio restoration (where they want to remove clapping)

thanks for inputs! Antonio

Vortico · January 29, 2020, 5:56pm

You could record a sample of a single clap in that room and convolve it with your input. Claps are essentially delta functions—the room resonance fully defines their sound. Peaks of that signal would correspond to claps, and their amplitudes the power of the clap. You could calculate the RMS of that signal per second to get a number proportional to the power of the clapping.

Coirt · January 29, 2020, 5:59pm

Will there other interference such as commentary etc in the audio source. A lot of variables to consider either way, but when listening to clapping it can be that higher frequency in the ambience can be noticed more but depending on where the listen position is. At lower frequencies (pitch) ambience could be energetic!

marc_boule · January 29, 2020, 6:10pm

I don’t know which kind of techniques you are after specifically, but it could perhaps also be a task for machine learning. It could potentially be seen as a particular case of speech recognition.

synthi · January 30, 2020, 10:36am

this is a very good idea but the material is always coming from live situation, changing from time to time like for example a tennis match a soccer match an X-factor show ecc ecc

synthi · January 30, 2020, 10:37am

yes, there are lot of commentaries!

@marc_boule yes would be perfect but we have so few samples currently

one of the thing where we use AI is to detect when a match NFL, NBA, MLS starts (for every subsection) (because we use as differential to pinpoint single actions)

I’m already evaluating audio power during the full stream to give more or less importance to parts of the event and sometimes the commentary helps too (like the in the soccer case) to define excitation for the action