IIIT Hyderabad Publications |
|||||||||
|
NVESTIGATION OF FEATURES FOR ACOUSTIC SCENECLASSIFICATIONAuthor: CHANDRASEKHAR PASEDDULA 201550804 Date: 2022-09-02 Report no: IIIT/TH/2022/114 Advisor:Suryakanth V Gangashetty AbstractEnvironmental sounds convey a large amount of information regarding day-to-day activities in nature.These sounds also provide powerful means of communication. The acoustic scene of environmentalsounds contains a set of audio events, which occur over a certain duration. Human hearing enablesus recognition of specific sounds and allows the processing of these sounds continuously without anyeffort. Therefore, even in the absence of visual cues, humans can identify most of the times, events, andsounds with acoustic cues. In this regard, it is necessary to develop an Acoustic Scene Classification(ASC) system which makes the machine to behave like human beings in the context of soundscapes. AnASC task mainly focuses on portrayal of the acoustic surroundings from an audio track by choosing atextual tag for it. The motivation for this work is to make some of the electronic gadgets to be moreintelligent. This is done by incorporating the acoustic scene knowledge in context-awareness devices,listening robots, hearing aids, automatic data tagging, and so on. The main objectives of this researchwork are to address: (a) To extract desired information that characterizes the environmental sounds in anacoustic signal using known standard acoustic features and unknown features, (b) To effectively classifythe features using the standard DNN model, (c) To develop an ASC system for the dataset containingboth the clean and channel mismatch conditions, and (d) To improve the ASC system performance bycombining evidence from multiple sources.In this regard, this research work aims to address some of the issues in the development of a sys-tem that analyses and classifies acoustic scenes. We have investigated the effect of the standard featuresnamely Mel-Frequency Cepstral Coefficients (MFCC), Log-Mel Band Energies (LOGMEL), Linear Pre-diction Cepstral Coefficients (LPCC), and All-Pole Group Delay (APGD) for the representation of ASC.We have also proposed new features namely Inverted Mel-Frequency Cepstral Coefficients (IMFCC),Spectral Centriod Magnitude Coefficients (SCMC), Spectral Subband Flux Coefficients (SSFC), andSingle Frequency Filtering Cepstral Coefficients (SFFCC). The effect of these features on the acousticscene classification of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 and2018 datasets has been studied using DNN classification modelsFrom our studies, it has been observed that, no single system can perform better for all the acousticscenes. Therefore, to reduce the confusion, we have combined the evidences for the acoustic scenesby using a late fusion mechanism. From our studies on DCASE 2017 and 2018 datasets, it is observedthat combining evidence suitably from the pair of classifiers has improved the performance of the ASCsystem.When the number of acoustic scenes are more it may be difficult for a single classifier to discriminatethe classes better. In this regard, we have proposed two-level hierarchical classification approach, inwhich initially identifying the meta category of acoustic scene is followed by fine-grained classificationbelonging to a particular meta category.Keywords:Acoustic Scene Classification, Mel-Frequency Cepstral Coefficients, Inverted Mel-FrequencyCepstral Coefficients, Spectral Centroid Magnitude Coefficients, Subband Spectral Flux Coefficients,Single Frequency Filtering Cepstral Coefficients, Deep Neural Networks, Late Fusion Mechanism, Hier-archical Classification, Detection and Classification of Acoustic Scenes and Events. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |