1 year ago

#133367

test-img

Jaythaking

What's the recommended recording setup for better recognition using Sound Analysis createML model

I'm trying to setup the AVAudioEngine with a tap node to record the microphone and send the buffer to recognize the category of the sound using SoundAnalysis/Create ML.

I'm using a model of type VGGish with Create ML with a window duration of 0.975 seconds. enter image description here

I have this current setup:

let listeningInputFormat: AVAudioFormat! = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32,
                                                                 sampleRate: 44100.0,
                                                                 channels: 1,
                                                                 interleaved: false)

tapNode.installTap(onBus: 0, bufferSize: 1024,
                               format: listeningInputFormat) { [weak self]  buffer, time in
  // Sending buffer to SoundAnalysis
}

I'm pretty rookie with this tool and I couldn't find any deep explanation on setting it up for optimal recognition.

I'm guessing the Window Duration is the sample duration used for the recognition, aka 0.975 seconds in this case, Is there a recommended sample rate to use for the recording installTap buffer for an optimal recognition with this window's duration? (0.975)

I saw some people uses a buffer size of 44100 and others much lower like 1024

ios

swift

avaudioplayer

avaudioengine

createml

0 Answers

Your Answer

Accepted video resources