Sings Who?
Shazam helps you identify and discover new music. But what about identifying and discovering new birds?
The year 2019 has been dubbed the year of streaming music — but that was before 2020 and 2021 came along. Stuck in their homes at the height of the pandemic, millions of people around the world turned to music to calm their minds.
Thanks to streaming services like Spotify, music is now more accessible than ever before — and in a legal, piracy-free way where artists get at least slightly compensated. Gone are the days of hunting through stacks of CDs or curating MP3s on pendrives; all it now takes to listen to unlimited music is a small fee or tolerance for ads.
One poll showed 58% of Americans listing music as their coping mechanism for stress, as opposed to 43% for books and 42% for physical exercise. Other reasons for listening to music included helping people to be more productive, to sleep better, or to generally uplift their moods.
Apart from music, the other thing that came to the forefront during the pandemic was birdsong.
It was easier when the traffic stopped. But even today, if you pause and listen, you’ll find your life is populated by far more birdcalls than you might have thought.
Like music, birdcalls have been shown to have a positive effect on mental health. Of course, it depends on the kind of bird: the melody of a songbird would be musical and pleasing, while the loud and raucous call of the magpie can prove to be more stressful than therapeutic.
The reasons we like birdsong could be evolutionary. Humans evolved in a natural environment where birdsongs were commonplace, so it makes sense we’re still tuned to them. A more specific theory is that birdcalls signal the presence of life in an area, which implies comfort and safety. Extending this idea to the world of knowledge, knowing exactly which birds are calling would provide even more information on the environment…if we could do it.
Many birds have a unique identifying call; some go further to sing elaborate songs, and there are a few mimic other birds. However they might call, a bird’s call is often the only way to identify what bird it is by it’s call. This is not always easy, even for experts.
In human society, today, awareness of the natural world is going down, birdcalls included. And how could it now? Living in a concrete buildingscape, it’s no surprise that people are more familiar with brand names — and, of course, music.
Streaming music has been great for music discovery: the act of listening and discovering new artists and albums. Earlier, music would spread through word of mouth or through movies and radio — but it took quite an effort to go out and get a copy of the song. Today, it’s as easy as scrolling through the “Suggested Tracks” in your player.
But music discovery did exist before streaming, and there were services to help you.
One phone-in service, ‘2580', allowed you to dial in that number, play back a snippet of music, and have it identify which song was being played. Over time, this ‘2580’ service evolved into the Shazam app we are familiar with. Social networks like TikTok and Instagram started using Shazam-like algorithms to indicate what background music was added to a particular post or video.
And then, a group of ornithologists thought: why is human-made music the only thing that these algorithms identify?
In 2021, a group of researchers from Cornell University and the Chemnitz University of Technology released a new app called BirdNET. It was like Shazam, but for birdcalls. Take a recording of a bird, and the app would tell you what kind of bird it probably was.
On the technical side, BirdNET uses neural networks: computer programs that, by looking (or listening) to thousands of samples, can learn to identify things. When a sound is sent in to a fully trained network network, the final layer of the model yields probabilities that correspond to each of the bird species.
The neural network used by BirdNET is a special kind called the ‘residual network’ or ResNet, which has become one of the most cited networks in literature. As with any other neural network, a ResNet has several layers of nodes: the data is input to the first layer, and the last layer is trained to predict the identity of the bird species. However most neural networks only have connections between neighbouring layers. A ResNet, on the other hand, has shortcut connections that “jump” over some layers, which turns out to improve the performance of the neural network.
But even the most advanced neural network is useless if there’s no data to train it on — and that’s where most of the researchers spent most of their efforts.
There is a subtle but important difference between calls made by birds, for birds, and music designed for human ears. Most bird sounds are in the frequency range of 250 Hz — 8.3 kHz, which overlaps with humans but doesn’t quite cover it. The designers of BirdNET took this into account, so that the app didn’t spend effort processing unnecessary sounds.
While training, the BirdNET engine had to also listen to non-bird sounds, so that it could learn to ignore them. These include sounds of insects, mammals, wind, rain, thunder, human speech, human whistling, footsteps, speech, cars, aeroplanes, and sirens — many of which came from a stock sound library, the Google AudioSet.
It is known that birds can hear two sounds closer in time much better than humans. The researchers also incorporated this fact while training their network.
Songs are unique and diverse, but birdcalls can be similar. Also, collecting birdcalls is a more specialist activity, which meant that researchers had limited data on which to train the network. To help this process, researchers usually “augment” the data. This means increasing the training data by modifying the data that you already have.
For BirdNET, the researchers did this mainly in three ways. They could “stretch” the time or the frequency of the bird sounds, creating a new modified sound snippet to use for training. “Stretching time” is similar to saying a word quickly or more slowly, while “stretching frequency” is like two people saying the same thing instead of one — effectively making that particular word (or birdcall) more “frequent”.
Finally, researchers could modify samples by adding on different background noises. This created new samples, so that the neural network could practice detecting birdcalls against those background noises too.
With their final network, the researchers generated a model that could distinguish between 984 common bird species of Europe and North America with an accuracy of about 79%.
But while 984 is a large number, it’s nowhere near the total number of bird species in Europe and North America — let alone in the rest of the world.
That’s where people like you and me come in.
Remember when you entered CAPTCHAs to log into websites? They pose tasks like “select all squares containing a tree”, to prove that you’re a human. Sometimes, these CAPTCHAs also use humans to help them train: they may slip in an image where they’re not sure if it contains a tree or not — so that the humans’ responses can tell them which one it is!
A more straightforward app is TrueCaller, which lets you assign a name to any phone number. Other TrueCaller users can see the name you assigned, which lets them identify unknown callers like “Carpenter Pete”, “Credit Card Scammers”, or, unfortunately “Undercover Journalist Who Shouldn’t Be Identified”.
But it’s not all involuntarily. ‘Citizen Science’ platforms like Zooniverse allow people to voluntarily identify and tag data, to help with scientific research. For the more experienced, innovative apps like BrainDr take a leaf out of Tinder’s book, allow participants to swipe right or left to identify brain lesion images.
This is where BirdNET differs from its music counterpart Shazam. If it’s not able to tell you which bird a particular birdcall came from — the tables turn and you have the ability to tell it.
Birds live in most environments and are usually found everywhere within the environment. Many of the birds we see visiting us in India during the winter come all the way from Siberia. They are also more easily observable than other organisms, announcing their presence with birdcalls and colourful plumage.
This means studying bird population trends is an easy way to learn a lot about the health of the ecosystem. The number of species, the number of individuals within each species, and where these are found lead to an understanding of the migration patterns of birds. This also tells us how migration is being disrupted in the last few years and provides us with information about how the health of the environment is affected by climate change.
Bird sound data is usually gathered by expert observers. These experts see, listen, and count birds in the field every five to ten minutes. As you can imagine, this method is prone to many errors: the accuracy depends on the expert’s experience, the surrounding weather conditions, how much time the expert has to span the entire area, and even whether they just happen to be having a bad day.
Thanks to inventions like BirdNET, anyone with the app can now go around recognising birds — and feeding this information back into the app to help researchers with their analyses. Better still, the source code is available online, allowing people to improve not only the data but also the app itself.
I am eager to use BirdNET, to discover birds and birdcalls in the same way that I discover music. Of course, the birds were already there, and nothing was preventing me from listening in even without the app. But there is something about labels; about the fact that other people are listening to the same kinds of sounds as you, and have, through experience, developed specific names and descriptions for them, that makes listening to birds with BirdNET a more fulfilling experience.
Will 2023 be the year of the birdcall? Probably not. One reason is that listening to birdcalls is still a more niche activity than listening to music. The other reason is that birds have been singing since before humans existed.
If there was indeed a “year of the birdcall”, it would have taken place millions of years ago.