Automatically splitting an Audio File into smaller chunks based on silence
The most important part for Audio Classification is to be able to create a dataset with sounds. To do so, we can easily go to a Video Provider and download the source file as a .WAV file, but how can we now change this source file from a lengthy one (e.g., 200MB+) towards smaller chunks?
For the project I am working on I did just that! So let's go through the process of what I did that resulted in creating smaller chunks of audio from a big audio file.
Downloading the Source File
First, I started by finding a video that was to my interest for my dataset
💡 Interesting files are "Compilation" files as they contain a lot of the audio that we require.
Once we found such a file, we can download it straight to the .WAV file. For this, I utilized https://youtubeto.org/en/youtube-wav.html which resulted in a ~250MB .WAV file.
Splitting the Audio File on the Silent Parts
Now the most interesting part is to split this file into smaller chunks. Go and install https://www.audacityteam.org/ which is an Audio Editor that is going to help us tremendously!
💡 This works best when there is a small "gap" in between audio fragments
Once installed, open your file in Audacity
![](https://xaviergeerinck.com/content/images/2024/03/251331_image.png)
This will open the spectrogram in a normalized linear view. Now, to make our lives easier we can switch this to a logarithmic dB view by right-clicking on the vertical scale and selecting "dB"
![](https://xaviergeerinck.com/content/images/2024/03/251331_image-1.png)
To split our sound, we need to know which dB level we will mark as "silence" so once we got our dB view open, let's zoom in a bit
![](https://xaviergeerinck.com/content/images/2024/03/251331_image-2.png)
![](https://xaviergeerinck.com/content/images/2024/03/251331_image-3.png)
Analyzing this view, we can see that the sound goes "silent" around the -20dB mark.
![](https://xaviergeerinck.com/content/images/2024/03/251332_image.png)
This silence also takes around ~0.5s in the -20db range
![](https://xaviergeerinck.com/content/images/2024/03/251333_image.png)
Now we know this, press CTRL+A to select the track and open "Label Sounds" by going to Analyze -> Label Sounds
![](https://xaviergeerinck.com/content/images/2024/03/251333_image-1.png)
In the pop-up that opens, enter the dB and time we detected earlier.
![](https://xaviergeerinck.com/content/images/2024/03/251335_image.png)
After clicking "Apply" the audio frame will be split into chunks:
![](https://xaviergeerinck.com/content/images/2024/03/251340_image.png)
💡 If you are unhappy with the result, redo these steps and play with the dB and Silence Duration settings
Now to export our chunks, we can go to File -> Export -> Export Multiple to get our chunks and click Export.
![](https://xaviergeerinck.com/content/images/2024/03/251341_image.png)
If successful, we will now see our exported files:
![](https://xaviergeerinck.com/content/images/2024/03/251343_image.png)
Summary
We now have split up a huge WAV file in smaller chunks and are ready to utilize them in our dataset!
Member discussion