Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition

Bandela, Surekha Reddy; Kumar, T. Kishore

dc.contributor.author	Bandela, Surekha Reddy
dc.contributor.author	Kumar, T. Kishore
dc.date.accessioned	2022-02-18T09:54:17Z
dc.date.available	2022-02-18T09:54:17Z
dc.identifier.citation	Bandela S. R. , Kumar T. K. , "Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition", APPLIED ACOUSTICS, cilt.172, 2021
dc.identifier.issn	0003-682X
dc.identifier.other	vv_1032021
dc.identifier.other	av_6953fbbd-79f6-4e2b-b966-f7d1c60c0c08
dc.identifier.uri	http://hdl.handle.net/20.500.12627/178204
dc.identifier.uri	https://doi.org/10.1016/j.apacoust.2020.107645
dc.description.abstract	Speech feature fusion is the most commonly used phenomenon for improving the accuracy in Speech Emotion Recognition (SER). But in this, there is a disadvantage of increasing the complexity in SER system in terms of processing time. Besides this, some of the features could be redundant and do not contribute for SER and lead to incorrect emotion prediction and reduction in SER accuracy. To overcome this problem, in this paper, unsupervised feature selection is applied to the feature set with the combination of INTERSPEECH 2010 paralinguistic features, Gammatone Cepstral Coefficients (GTCC) and Power Normalized Cepstral Coefficients (PNCC). The Feature Selection with Adaptive Structure Learning (FSASL), Unsupervised Feature Selection with Ordinal Locality (UFSOL) and a novel Subset Feature Selection (SuFS) algorithm are the feature dimension reduction techniques used to acquire better SER performance in this work. The proposed SER system is analyzed in both clean and noisy environments. The EMO-DB and IEMOCAP emotion databases are considered for evaluating the proposed SER performance. For noise analysis, the clean speech is corrupted with different noises of Aurora noise database and white Gaussian noise at different Signal to Noise Ratio (SNR) levels from -5dB to 20 dB. Support Vector Machine (SVM) classifier with linear and Radial Basis Function (RBF) kernels using 10-fold cross-validation and hold-out validation is used in this analysis with classification accuracy and computation time as the performance metrics. The results show that the proposed SER system outperforms the baseline SER system as well as many of the existing literature works both in clean and noisy conditions. For SNR levels >15 dB, the proposed SER system in presence of different noises performs same as the SER in clean environments. Whereas, for lower SNRs <15 dB the performance is likely to be reduced. Therefore, to overcome this drawback and improve the SER performance in noisy conditions, a dense Non-Negative Matrix Factorization (denseNMF) method is adopted for de-noising the noisy speech signal prior to SER achieving noise robustness. (C) 2020 Elsevier Ltd. All rights reserved.
dc.language.iso	eng
dc.subject	Physical Sciences
dc.subject	Temel Bilimler
dc.subject	Acoustics and Ultrasonics
dc.subject	Akustik
dc.subject	Elektromanyetizma, Akustik, Isı Transferi, Klasik Mekanik ve Akışkanlar Dinamiği
dc.subject	Temel Bilimler (SCI)
dc.subject	Fizik
dc.subject	AKUSTİK
dc.title	Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition
dc.type	Makale
dc.relation.journal	APPLIED ACOUSTICS
dc.contributor.department	NIT Warangal , ,
dc.identifier.volume	172
dc.contributor.firstauthorID	3388185

Bu öğenin dosyaları:

Dosyalar	Boyut	Biçim	Göster
Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Makale [92796]

Basit öğe kaydını göster