Voice-Changing Detection with Convolutional Neural Network
DOI:
https://doi.org/ 10.47611/harp.139Keywords:
voice-changingAbstract
Voice-changing is a voice transformation technique that directly modifies a voice’s pitch, tone, and timbre. Like other types of voice transformation, voice-changing is threatening our society from an economic, social, and legal perspective; hence in this study, we tackle this problem by adopting a convolutional neural network. As far as we know, our proposed work is the first solution to distinguish voice-changed voices and authentic voices. In addition, we also conducted experiments and gained significant insights that can help facilitate further research into voice-changing and audio-related machine learning. Foremost, we find the architecture with four convolutional layers to be the most effective. Second, we find a strong and positive correlation between the training data size and the performance in terms of accuracy and stability. Finally, we discovered that different training languages yield very different performances, with Chinese far outweighing English as a training language. We were inspired by the findings of different training languages, which motivated us to conduct a supplementary experiment: we concluded that tone is one factor that contributes to making a training language better in spotting voice-changed voices.
Downloads
Posted
License
Copyright (c) 2024 Chuntung Zhuang
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.