1 year ago
#133367

Jaythaking
What's the recommended recording setup for better recognition using Sound Analysis createML model
I'm trying to setup the AVAudioEngine
with a tap node to record the microphone and send the buffer to recognize the category of the sound using SoundAnalysis/Create ML.
I'm using a model of type VGGish
with Create ML with a window duration of 0.975 seconds
.
I have this current setup:
let listeningInputFormat: AVAudioFormat! = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32,
sampleRate: 44100.0,
channels: 1,
interleaved: false)
tapNode.installTap(onBus: 0, bufferSize: 1024,
format: listeningInputFormat) { [weak self] buffer, time in
// Sending buffer to SoundAnalysis
}
I'm pretty rookie with this tool and I couldn't find any deep explanation on setting it up for optimal recognition.
I'm guessing the Window Duration is the sample duration used for the recognition, aka 0.975 seconds
in this case, Is there a recommended sample rate to use for the recording installTap buffer for an optimal recognition with this window's duration? (0.975)
I saw some people uses a buffer size of 44100
and others much lower like 1024
ios
swift
avaudioplayer
avaudioengine
createml
0 Answers
Your Answer