resumo
- In some of the processes used in data analysis, such as the recognition of pathologies and pathological subjects, the presence of anomalous instances in the dataset is an unfavorable situation that can lead to misleading results. This article presents a function that implements the identification of anomalies in dataset using the boxplot and standard deviation methods. Also was used the filling technique to treat these anomalies, in which the anomalous point value were substituted by a limit value determined by the boxplot or standard deviation methods. To improve the outliers methods some normalization processes based on the z-score, logarithmic and squared root methodologies were experimented. These outliers treatment were applied to the dataset used in the recognition of vocal pathologies (dysphonia, chronic laryngitis and vocal cords paralysis vs control), performed by a MLP and LSTM neural networks. After the experiments, both the standard deviation and the boxplot methods with z-score normalization showed very useful for pre-processing the dataset for voice pathologies recognition. The accuracy was improved between 3 and 13 points in percentage.