Age Estimation in Short Speech Utterances Based on Bidirectional ‎Gated-Recurrent Neural Networks


Recently, age estimates from speech have received growing interest asthey are important for ‎many applications like custom call routing, targetedmarketing, or user-profiling. In this work, an ‎ automatic system to estimateage in short speech utterances without ‎ depending on the text is proposed.From each utterance frame, four ‎ groups of features are extracted and then10 statistical functionals are measured for each ‎ extracted dimension of thefeatures, to be followed by dimensionality reduction usingLinear ‎ Discriminant Analysis (LDA). Finally, bidirectional GatedRecurrent Neural Networks (G-‎ RNNs) are used to predict speaker age.Experiments are conducted on the VoxCeleb1 ‎ dataset to show theperformance of the proposed system, which is the first attempt to do sofor ‎ such a system. In gender-dependent system, the Mean Absolute Error(MAE) of the proposed system ‎ is 9.25 years, and 10.33 ‎ years, the RootMean ‎ Square Error (RMSE)‎is 13.17 and 13.26, respectively, ‎ for ‎ femaleand male speakers. In gender_ independent system, the MAE of theproposed system is 10.96 years, and the RMSE is 15.47. The results showthat the proposed system has a good performance on short-durationutterances, taking into consideration the high noise ratio in the VoxCeleb1dataset.