Show simple item record

dc.contributor.authorBandela, Surekha Reddy
dc.contributor.authorKumar, T. Kishore
dc.date.accessioned2022-02-18T09:54:17Z
dc.date.available2022-02-18T09:54:17Z
dc.identifier.citationBandela S. R. , Kumar T. K. , "Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition", APPLIED ACOUSTICS, cilt.172, 2021
dc.identifier.issn0003-682X
dc.identifier.othervv_1032021
dc.identifier.otherav_6953fbbd-79f6-4e2b-b966-f7d1c60c0c08
dc.identifier.urihttp://hdl.handle.net/20.500.12627/178204
dc.identifier.urihttps://doi.org/10.1016/j.apacoust.2020.107645
dc.description.abstractSpeech feature fusion is the most commonly used phenomenon for improving the accuracy in Speech Emotion Recognition (SER). But in this, there is a disadvantage of increasing the complexity in SER system in terms of processing time. Besides this, some of the features could be redundant and do not contribute for SER and lead to incorrect emotion prediction and reduction in SER accuracy. To overcome this problem, in this paper, unsupervised feature selection is applied to the feature set with the combination of INTERSPEECH 2010 paralinguistic features, Gammatone Cepstral Coefficients (GTCC) and Power Normalized Cepstral Coefficients (PNCC). The Feature Selection with Adaptive Structure Learning (FSASL), Unsupervised Feature Selection with Ordinal Locality (UFSOL) and a novel Subset Feature Selection (SuFS) algorithm are the feature dimension reduction techniques used to acquire better SER performance in this work. The proposed SER system is analyzed in both clean and noisy environments. The EMO-DB and IEMOCAP emotion databases are considered for evaluating the proposed SER performance. For noise analysis, the clean speech is corrupted with different noises of Aurora noise database and white Gaussian noise at different Signal to Noise Ratio (SNR) levels from -5dB to 20 dB. Support Vector Machine (SVM) classifier with linear and Radial Basis Function (RBF) kernels using 10-fold cross-validation and hold-out validation is used in this analysis with classification accuracy and computation time as the performance metrics. The results show that the proposed SER system outperforms the baseline SER system as well as many of the existing literature works both in clean and noisy conditions. For SNR levels >15 dB, the proposed SER system in presence of different noises performs same as the SER in clean environments. Whereas, for lower SNRs <15 dB the performance is likely to be reduced. Therefore, to overcome this drawback and improve the SER performance in noisy conditions, a dense Non-Negative Matrix Factorization (denseNMF) method is adopted for de-noising the noisy speech signal prior to SER achieving noise robustness. (C) 2020 Elsevier Ltd. All rights reserved.
dc.language.isoeng
dc.subjectPhysical Sciences
dc.subjectTemel Bilimler
dc.subjectAcoustics and Ultrasonics
dc.subjectAkustik
dc.subjectElektromanyetizma, Akustik, Isı Transferi, Klasik Mekanik ve Akışkanlar Dinamiği
dc.subjectTemel Bilimler (SCI)
dc.subjectFizik
dc.subjectAKUSTİK
dc.titleUnsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition
dc.typeMakale
dc.relation.journalAPPLIED ACOUSTICS
dc.contributor.departmentNIT Warangal , ,
dc.identifier.volume172
dc.contributor.firstauthorID3388185


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record