Statistical band selection for descriptors of MBSE and MFCC-based features for accent classification of Malaysian English / Yusnita M. A. ...[et al.]

Accent is a major cause of speech variability that complicates the speech technology systems. Interestingly, ethnicity is one of the influential factor that give rise to accentuation in speech. Proper approach of extracting ethnical accent information is utmost crucial in many speech applications. T...

Full description

Saved in:
Bibliographic Details
Main Authors: M. A., Yusnita (Author), M. P., Paulraj (Author), Yaacob, Sazali (Author), A. B., Shahriman (Author), Mokhtar, Nor Fadzilah (Author)
Format: Book
Published: UiTM Press, 2013-06.
Subjects:
Online Access:Link Metadata
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accent is a major cause of speech variability that complicates the speech technology systems. Interestingly, ethnicity is one of the influential factor that give rise to accentuation in speech. Proper approach of extracting ethnical accent information is utmost crucial in many speech applications. This paper proposes an efficient way of analyzing the ethnical accent using statistical knowledge of log-energies of fourier transformed derived mel-filter banks. A simple algorithm to select bands so called statistical band selection (SBS) method using smallest variances within class scores was developed to optimize the presentation of speech features. The experiments were conducted on selective accent-sensitive words of male and female speakers originate from three major ethnics in Malaysia. Firstly, statistical descriptors such as mean, standard deviation, kurtosis and the ratio of standard deviation to kurtosis of mel-bands spectral energy and secondly, mel-frequency cepstral coefficients were extracted from the selected bands to model an accent classifier, implemented based on neural network model and K-nearest neighbors. Experimental results showed that SBS has increased the performance of accent classification system by achieving better accuracy rates between 4% to 6%, lesser memory requirement between 22% to 55% and faster speed of 70% on average of three-class accent problem.
Item Description:https://ir.uitm.edu.my/id/eprint/62950/1/62950.pdf