Large-Scale Whale-Call Classification by Transfer Learning on Multi-Scale Waveforms and Time-Frequency Features
Open Access
- 12 March 2019
- journal article
- research article
- Published by MDPI AG in Applied Sciences
- Vol. 9 (5), 1020
- https://doi.org/10.3390/app9051020
Abstract
Whale vocal calls contain valuable information and abundant characteristics that are important for classification of whale sub-populations and related biological research. In this study, an effective data-driven approach based on pre-trained Convolutional Neural Networks (CNN) using multi-scale waveforms and time-frequency feature representations is developed in order to perform the classification of whale calls from a large open-source dataset recorded by sensors carried by whales. Specifically, the classification is carried out through a transfer learning approach by using pre-trained state-of-the-art CNN models in the field of computer vision. 1D raw waveforms and 2D log-mel features of the whale-call data are respectively used as the input of CNN models. For raw waveform input, windows are applied to capture multiple sketches of a whale-call clip at different time scales and stack the features from different sketches for classification. When using the log-mel features, the delta and delta-delta features are also calculated to produce a 3-channel feature representation for analysis. In the training, a 4-fold cross-validation technique is employed to reduce the overfitting effect, while the Mix-up technique is also applied to implement data augmentation in order to further improve the system performance. The results show that the proposed method can improve the accuracies by more than 20% in percentage for the classification into 16 whale pods compared with the baseline method using groups of 2D shape descriptors of spectrograms and the Fisher discriminant scores on the same dataset. Moreover, it is shown that classifications based on log-mel features have higher accuracies than those based directly on raw waveforms. The phylogeny graph is also produced to significantly illustrate the relationships among the whale sub-populations.Keywords
Funding Information
- National Natural Science Foundation of China (61806214 and 61702531)
- Science and Technology Foundation of China State Key Laboratory (614210902111804)
This publication has 18 references indexed in Scilit:
- Deep learningNature, 2015
- Convolutional Neural Networks for Speech RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
- CNN Features Off-the-Shelf: An Astounding Baseline for RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Classification of large acoustic datasets using machine learning and crowdsourcing: Application to whale callsThe Journal of the Acoustical Society of America, 2014
- A generalized baleen whale call detection and classification systemThe Journal of the Acoustical Society of America, 2011
- Blue whale calls classification using short-time Fourier and wavelet packet transforms and artificial neural networkDigital Signal Processing, 2010
- Behavioural evidence for social units in long-finned pilot whalesCanadian Journal of Zoology, 2003
- A digital acoustic recording tag for measuring the response of wild marine mammals to soundIEEE Journal of Oceanic Engineering, 2003
- Within-pod variation in the sound production of a pod of killer whales, Orcinus orcaAnimal Behaviour, 2000
- A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methodsBiometrika, 1989