Application of Signal Classifiers in Auditory Evoked Potentials for the Detection of Pathological Patients

ABSTRACT The auditory brainstem response (ABR) by evoked potentials is a widespread auditory pathway assessment technique. This is largely applied due to its cost-effectiveness, practicality and ease of use. In contrast, it requires a trained professional to carry out the analysis of the results. This motivates several research efforts to increase the independence of the diagnostician. To this end, the present work shows the ability of three signal classification tools to differentiate ABR studies of normal hearing subjects from those who may have some pathology. As a starting point, the PhysioNet short term auditory evoked potentials databases are used to calculate the features later applied to construct the dataset. The features used are diverse classes of permutation entropy, fractal dimension, the Lyapunov exponent and the zero crossing rate. To ensure more accurate results, a Montecarlo simulation of one thousand trials is employed to train the classifiers and test the results. The goodness of the classification is performed upon the basis of the computation of quality parameters, such as the area under the receiver operating characteristic curve (ROC), the F1 score coefficient and the accuracy. In all the cases the methodology proposed gives high performance results that encourage the line of research of the present work.


INTRODUCTION
When a sound stimulus reaches the ear, it produces several responses of the auditory structures, originating a rise of the evoked auditory potential.Mathematically, this response is made up of continuous waves, and can be registered with the appropriates systems.
The technique under analysis is a non-invasive procedure, which allows the assessment of sensory and neural function of the auditory nervous system and also represents an indirect but important resource for verifying and localizing neurological dysfunction within non-sensory systems, as the sensory pathways run parallel to the motor pathways and pass close to the encephalic areas linked to the vegetative processes of consciousness and cognition.This allows the clinic to reveal nervous system dysfunction that is not otherwise apparent (cf [11,16]).It is currently a very useful study, but it has become very dependent on the experience of the audiologist or neurologist who analyzes the study.In this sense, the aim of this proposal is to develop an alternative methodology using objective parameters that contribute to the diagnosis and characterisation of the auditory response [28].
The ABR have become an integral part of current otological and audiological study methods and spectral methods have been successfully applied, as for example in [8].The relevance in the use of this kind of electro-physiological procedures in the clinic can be understood by some example applications, in this sense can be mentioned: assessment of auditory pathway function and pathology, evaluation of brainstem development and pathologies, auditory threshold testing, selection of hearing aids, intra-surgical monitoring and analysis of brain activity for the determination of the so-called "brain death", among other applications.
To accomplish the objective of this work several magnitudes are calculated from the ABR records, that can be divided in three sets: one characteristic of the information measures, other corresponding to nonlinear system and signal analysis, and the third an optimization of the permutation entropy from the best choice of embedding time delay.Due to the noisy content of the ABR signals, taken from the Physionet databases, it is necessary to apply a prefiltering in order to avoid the contamination of the results with spurious components.Joining the entropic measures with the nonlinear parameters a wide dataset is constructed with the data of pathological and normal ABRs.Added these magnitudes to the database the dataset is composed by different combinations of them to analyze in the feature space.The entropy measures calculated from the signals are the Bandt and Pompe permutation entropy, Rényi Entropy, weighted permutation entropy, an improvement of the Bandt and Pompe permutation entropy to manage the missing pattern problem of the records, and a Bandt and Pompe entropy calculated with the optimization of the parameter mentioned above.The nonlinear parameters calculated to be integrated in the dataset are the Lyapunov exponent and the fractal dimension.Lastly, the Zero Crossing Rate is also used as a feature in this study.
To formulate a model using the dataset constructed three conceptually different methodologies are analyzed: support vector machines, random forest and k-nearest neighbours.The aim of this stage of the work is to test different classification algorithms to evaluate the performance in differentiating between normal and pathological ABR records.Then, a Montecarlo simulation of one thousand trials is performed to improve the results of the classification algorithms applied.In every Montercarlo trial a seventy percent of the whole dataset is assigned to train the classifier and the remaining is used to classify.The quality of the classification is measured by the calculation of the area under the receiver operating characteristic curve (ROC), the F1 score coefficient and the accuracy.In all cases, with the datasets constructed, the selected classifiers gave very satisfactory results.
The present research is structured in the following sequence: section one dedicated to the introduction, section two where the database and prefiltering are described, section three presents the generalities of the classifiers selected, section four describe the dataset features, section five shows the results (including the features of the dataset employed), and section six contains the discussion and analysis, the conclusions are developed in section seven, the final section of the work presents the corresponding bibliography.

Database
Two databases are used in this study, both located at the PhysioNet website: Evoked Auditory Responses in Hearing Impaired (EARH) [34] and Evoked Auditory Responses in Normals (EARNDB) [33].They were provided under the same conditions, and consisted of Auditory Brainstem Response (ABR) and Otoacoustic Emission (OAE) signals, although in this work only the ABR ones are used.
The EARH database holds the signals corresponding to eight listeners with a known history of hearing impairment confirmed through clinical tests, ages ranging from 40 to 85 years old, four females and four males.The records were acquired on the ear with the best threshold: three subjects with their left ear and five with their right ear.The EARNDB provides the signals of eight subjects with normal hearing, again four females and four males, whose ages go from 19 to 31 years old.This time all of them were arbitrarily tested on their right ear.
For both sets the ABRs were recorded simultaneously with the OAEs in a sound-attenuating, electrically shielded booth.Listeners had three electrodes affixed as follows: the non-inverting electrode was positioned on the forehead, the inverting electrode was positioned on the ipsilateral mastoid (behind the ear), and the ground electrode positioned on the contralateral mastoid.
The tests used tone burst stimuli, specifically of 1 kHz tones with 4 ms duration and 4 kHz tones with 1 ms duration, each stimulus consisted of 12 concatenated 41.7 ms intervals in order to generate a train of tone bursts that lasted approximately 0.5 seconds.All signals were converted from analog to digital sampling at 48 kHz, resulting in a signal length of 10 6 points.The stimulus level ranged from the listeners threshold to 100 dB in steps of 5 dB, and due to this threshold some of the hearing impaired listeners could not be tested at 4 kHz.
From a previous study [4] with these signals, they suggested two things: a denoising filter was needed and the signals had to be trimmed to 5000 points.Besides, less than 20% of the EARNDB signals presented some issue with the recordings so those were withdrawn from this study.
After removing the questionable signals, trimming all of the rest down to 5000 points and filtering them, a unique database with 2039 signals corresponding to normal hearing subjects and 565 signals for the hearing impaired patients was generated.

Prefiltering
The signals used to construct the dataset are taken from the PhysioNet database that are available in raw format, which means they require a prefiltering to remove the noise with which they were contaminated.This signals pre-processing accomplishes to minimize the effect of noise on the calculations of the features to be used with the classifiers.
To avoid unwanted attenuation and phase shifting of the signals a wavelet filter is selected [7].In such a way a prefiltering based on denoising by wavelets is implemented using the Symlet 5 mother wavelet, in a filter bank with 6 levels of analysis and detail [36], [21], which allows the signal to be reconstructed with the contribution of the most significant levels of analysis, thus discarding the noisy component of the signal associated with the detail coefficients corresponding to the higher frequencies.The computational implementation of the denoising is done in Python using the library PyWavelets [20].

CLASSIFIERS
There is some background about the application of classifications methodologies using evoked potential records, see for example [19] [22] and [2].Evaluation of the progression of the Multiple Sclerosis Disease has been studied in [39] and distinguishing age from infants can be seen in the work presented in [29].Classifications algorithms in the use in clinical rehabilitation have been used in [35].In the pathway to increase the objectivity of the ABR analysis, the work of [23] is an interesting study.To distinguish different hearing perception levels using machine learning, the work of [25] constitutes a good source of information.In [1] a study about ABR using support vector machines (SVM) is presented.Finally, to increase the impact of the machine learning methodologies in the clinical research the results of [30] are profusely illustrative.

Support Vector Machines
Support Vector Machines (SVMs) are non probabilistic linear binary classifiers, with several applications in Biology [24].This methodology is based upon finding a hyperplane that maximizes the gap separating two categories in the multidimensional feature space.
One of the advantages of SVMs is its capability to detect outliers.This allows a limited part of the dataset to stay on the wrong side of the hyperplane, as can be seen in figure 1 without affecting the final result, taking into account the amount of outliers and the distance of these to the hyperplane.
The SVM used in this work is set up with a third degree polynomial kernel type.To fit this kernel it uses the features set selected according to the capability of the library SKLearn of Python [26] and null constant kernel coefficient.

Random forest
The random forest algorithm is an improvement of the single decision tree method, generating a forest of several trees which splits with randomly generated vectors, or random subsets of training data, and computes the score as a function of these different components [8] as shown in Figure 2.This algorithm presents a series of advantages: it is more precise than a single decision tree, it reduces the over fitting problems, and a random subset of features is selected at each node in every tree of the forest.
Among the disadvantages that can be mentioned: it is computationally more expensive than a single decision tree algorithm and it is harder to program and execute.
The Random Forest used in this study is configured with 100 decision trees.The criterion used to decide the branch selected is based upon the computation of the Shannon entropy.Due to the dichotomy of the diagnostic task, every node is split in 2 branches.Also, every decision tree uses 3 variables of the feature set.The library used is the same that indicated in the previous case.

K-Nearest Neighbors
The K-Nearest Neighbors (KNN) algorithm is one of the most basic and simplest classification methods.It is a good choice when there is little knowledge about the distribution of the data [27].This algorithm uses as classification criteria the distance among the neighbors, as can be seen in Figure 3.
The KNN algorithm used in the present research is set up with 100 neighbors weighted by the euclidean distance in which the near neighbors have more influence in the construction of the classes.This classification is implemented by the same library above mentioned.

Higuchi's Fractal Dimension
The fractal dimension is an exponent that allows us to quantify how much a fractal seems to fill a geometric space as we get closer to smaller scales.It can be used to determine the degree of irregularity of a signal as well as, for example, Fourier spectral analysis, which can be used more precisely through the power spectrum that exists on every frequency (denoted by ν).As the power spectrum of a signal follows a law of the type P(ν) ∝ ν −α , the exponent can be interpreted as an index of the irregularity of the signal.However, the algorithms normally used to calculate this spectrum, such as the fast Fourier transform, provide noisy fluctuations of these indexes, Trends Comput.Appl.Math., 24, N. 1 (2023) which is why an average of the power spectrum should be taken over a long interval to obtain a stable index.As the statistical nature of the signal generally varies over small time intervals, it would not be appropriate to use this technique.As an alternative to solve this problem, the fractal dimension can be used and in this work the algorithm introduced by Higuchi in 1988 [15] is used.
In order to calculate the fractal dimension of a time series X = {X 1 , X 2 , ..., X N } which is reconstructed in k subseries as follows: Upon every one of these subseries the length L m (k) is calculated: where N−1 k is a normalization factor of the length of the signal.Finally, to calculate the fractal dimension D a regression must be performed on the averages of the lengths of the subseries because these will follow a law of the type L(k) ∝ k −D .The fractal dimension indicates the "roughness" degree of the signal shape.For continuous time series of the type f (X) : X ∈ ℜ 1 → ℜ 1 , if the signal tends to be linear, the index will be close to 1, while if the roughness increases, the signal will seem to be filling more space and the index will be close to 2.

Lyapunov Exponents
The Lyapunov Exponents [38] [32] of a dynamic system define the divergence or convergence degree of two nearby orbits, meaning, they show the evolutionary feature of the system.
In the case of a n-dimensional system, there are n Lyapunov exponents.These n exponents can be ordered from largest to lowest, obtaining the Lyapunov spectrum of exponents for a particular system.The sign of every exponent gives a notion of where the system tends to evolve (attractors).If they are all negative, it indicates the presence of a fixed point.When they are of the same sign, but with the presence of null values, it indicates a limit cycle.Finally, when the exponents present different signs it indicates chaos in the system.
In particular, the greatest exponent is the most important.From it we can determine whether the system is chaotic or not.In case it is greater than 0, two initially nearby orbits of the attractor diverge rapidly as time progresses, exponentially, showing extreme sensitivity and a large change with respect to the initial conditions, that is, the presence of chaos.
The algorithm selected to compute the Lyapunov exponent in this work is proposed by Kantz in 1994 [17] which provides the maximal LE and requires only the time between samples as input parameter.

Permutation Entropy
This simple complexity measure of a signal waveform was introduced by Bandt and Pompe in 2002 [6] and has been used in multiple areas of research because of its robust and fast method of calculation and its signal discriminating power.Across the years different additions or changes were made to this original measure giving place to new entropies, some of which are used in this work.The algorithm is as follows: Given a time series X = {X 1 , X 2 , ..., X N } , an embedding dimension m and a time delay τ, the Permutation Entropy (PE) of this time series can be calculated.The time series X is sectioned into small sub-sequences called embedding vectors of length m.
where k = 1, 2, ..., N − (m − 1) • τ.The embedding vectors are then assigned a set of the same length consisting of indices in ascending order from 0 to m − 1.
Now the embedding vectors are sorted in ascending order and this change is reflected in their assigned set of indices.These sets are now the corresponding patterns of every embedding vector.
There will be m! different ordinal patterns in every time series.The next step is to calculate their probability distribution p i and finally use it to get the PE of the signal according to: When the logarithm base is 2, the PE is measured in bits.the number of publications that use any version of the permutation entropy is grow during the recent times, among a lot of examples, it can be advise to see [3], [5], [18], [12], [37].

Modified Permutation Entropy
One of the variations of the original PE algorithm that is used in this study is the Modified Permutation Entropy (MPE) [10].The difference lays in the way of calculating the probability distribution.Instead of using: , where c i stands for the amount of times the pattern π m i was found in the time series, for the MPE now that equation changes to: where T is the amount of different patterns found.Then the MPE is calculated in the same way that PE.
This change is based on the fact that missing patterns, called forbidden patterns, could increase the obtainable information about time series.This small modification can be applied not only to the PE but also to any other kind of entropy and, possibly, increase their signal discrimination power.

Weighted Permutation Entropy
Another variant of the PE is the Weighted Permutation Entropy (WPE) [13].First the average value X m k of every embedding vector is calculated and assigned an object from a previously defined alphabet Π containing m! elements and upon this average value is defined a weight w k .
This of course affects the probability distribution, which now is : Trends Comput.Appl.Math., 24, N. 1 ( 2023) , where the indicator function 1 A (u) is defined as 1 A (u) = 1 in the case of that the embedding vector u belongs to a set A of the alphabet and 1 A (u)=0 otherwise.
The WPE is computed exactly like the PE.This addition is motivated by the fact that PE overlooks the amplitude between points in a signal.Two subsections of a signal could have different amplitude variation between their points and yet have the same ordinal pattern associated to them.WPE offers with the addition of this new weight w k a possible solution to this problem.

Rényi Entropy
There is another entropy used in this study and that is the Rényi Entropy denoted as (RE), which is actually a generalization of the original one.The ordinal patterns and their probabilities are calculated the same way than PE, also a new parameter α is incorporated but the final way of computing this entropy is: In this work is implemented the Cuesta Frau's [10] modification to compute the Rényi Entropy thus getting a Modified Rényi Entropy (MRE).

Entropy with optimal embedding time delay
An optimal time delay is used for every signal as the last feature for the time series on this study.We follow the steps in the work of [14] to find an optimal embedding time delay based on auto-correlation and mutual information of each signal.Next is described how to determine this parameter.
Given a time series there are several methods to determine the optimal embedding time delay τ, some of them use the mutual information between the signal at time t and t + τ, searching in this case the minimum value of this parameter.To achieve this objective, the assignment [S(t), Q(t)] = [X(t), X(t + τ)] can be done and then calculate the mutual information of this pair of signals as follow: where: In this work the time delay τ is constantly increased.The first τ that generates a minimum in the mutual information function is selected.Finally with this τ and an embedding dimension of m = 6 the permutation entropy and the weighted permutation entropy are calculated.

Zero Crossing Rate
The zero crossing rate (ZCR) of a signal is the rate at which the signal changes it's sign.That is, the amount of times the values go from positive to negative and viceversa in a given time interval or frame.In this study, the ZCR is measured all across the length of the signal.ZCR is largely used in voice signal processing and other audio related applications.Given a series X = {X 1 , X 2 , ..., X N }, ZCR is calculated as follows: where R stands for the amount of zero crossings in the time series.the application of ZCR in machine learning algorithms is increased in the last twenty years specially in classification of speech signals (see i.e. [9] and [31]) .

RESULTS
As described in the previous sections, the feature space of the dataset is calculated on the basis of every of the ABR records for both classes, such as for normal hearing subjects and for those with a diagnosed pathology.By varying the feature parameters in precise ranges, the values of these are set to produce the best results in order to achieve the best classification of pathological recordings.
When calculating the Fractal dimension, Higuchi's k values are ranging from 50 to 100.For all entropies, embedding dimension m is selected in a range from 3 to 6 and the embedding time delay τ in the interval between from 1 to 20.For the Rényi entropy, α values go from 0.01 to 0.5 and from 2 to 5.
Out of all the calculated features, only eight are selected for the final classification results, which are the fractal dimension with k = 50, the Lyapunov Exponent, the ZCR, the modified permutation entropy with m = 3 and τ = 20, the modified Rényi entropy with m = 3, τ = 20 and α = 5, the permutation entropy with m = 6 and optimal time delay and the weighted permutation entropy with m = 4, τ = 20 and m = 6 and an optimal time delay.This selection is based on their scatter plots and histograms, shown in Figure 5 and 4 respectively.Both these plots show some kind of separations between the two groups, which could justify their presence.Moreover, when added to it, their contribution to the classification model improves the quality parameters that are evaluated in this work.

Fractal Dimension vs Permutation Entropy Lyapunov Exponent vs Fractal Dimension
Modified Entropy vs Rényi Entropy Permutation Entropy vs Rényi Entropy In figure 6 the boxplots of the corresponding optimal τ are shown.This embedding time delay is calculated trough the methodology proposed in 4.3.4.The mean for normal hearing subjects is τ = 47.2 and for pathological ones is τ = 32.6.
With the previously described dataset, in order to do a comparative study of the classification algorithms, 1000 Montecarlo trials of every type of classifier are performed using a 70% of the complete database randomly selected to train the model and the other 30% to classify it.From the results of this stage of the work, the area under the ROC curve, the F-1 score and the accuracy are calculated to compare the performance of every classification algorithm as can be seen in Figure 7.  overall mean values above 95%.However, the KNN classifier had an equally acceptable result with an f1 score of 94%.
The confusion matrices of one iteration were made and they are shown in the tables 1, 2 and 3 for the Random Forest, KNN and SVM classifier respectively.A low amount of false positives can be observed in these matrices.
A more visual way to see these results is through the ROC curves.Figure 8 shows one of the iterations carried out to calculate the performance of the models.

DISCUSSION AND ANALYSIS
From the box plots in Figure 7 it can be seen that SVM and Random Forest have the best results regarding the accuracy and F-1 Score while for the area under the ROC curve the best classifier is Random Forest.However, the three classifiers have satisfactory results in every classification  67 109 quality parameters.They all have an accuracy between 90% an 92%, an area under the ROC curve between 84% and 88% and an F-1 Score between 94% and 95.5%.
Analyzing the scatter plots in Figure 4, it is clear that healthy patients are placed in a concentrated point cloud, whereas pathological records may appear inside this cloud or very far away.From this observation it can be assumed that some signals from pathological patients may seem like healthy but no signal from a healthy patient is outside the mentioned cloud.This means that if one auditory evoked potentials signal from a patient is predicted as pathological, that patient is for sure pathological.The example shown in tables 2 and 3 coincides with the characteristics of the classifying methodologies of the K-Nearest Neighbours and Support Vector Machine, in respect to the prediction capability of the method.In the opposite corner is the Random Forest algorithm which has a low profile to prediction.
An interesting observation is how the ROC curve works in this case.Looking at Figure 8, it is clear how the curve is laying in the left axis, meaning the false positive rate is very low.The probability of misclassifying a normal hearing subject is minimal.
In the three proposed algorithms, the ROC curve obtained is very similar.In them, the cutoff points are very close to each other, which could not easily justify the use of one algorithm over another.
In both Figures 5 and 4 three point clouds can be seen for the Lyapunov exponent.This result is found in all the records analyzed, both normal hearing and pathological, and may be an indication of different behavioral regimes in the response of the auditory system.Not enough evidence has been found that these groupings are related to age since they manifest equally for the same patient.
The need of a classification algorithm emerges from the distribution functions, which many exhibit a remarkable overlapping as seen in Figure 5.
From Figure 6, the calculation of the optimal τ for both groups of signals give as a result lower values for the pathological subjects than the normal hearing subjects.As the embedding time delay is related with the auto correlation of a signal it is valid to express that the correlation of the pathological ABRs has a lower correlation than the normal ABRs.This result could indicate a characteristic of the auditory pathway pathologies that generates an evoked response with lower auto correlation.

CONCLUSIONS
Although three training methodologies with different operating principles were used, the results obtained with them were of the same level of quality in the classification.
Given the scatter plots and knowing that SVM classification algorithms work best when a curve that separates the point clouds is easily visualized, it is expected that SVM would have the best results in matter of accuracy and F-1 Score.Random Forest has almost as good results in these indicators and even has the greatest result in AUC whereas SVM has the lowest one.However, the fact that SVM has none false positives in the shown example of the confusion matrix and given the obtained scatter plots, it could be suggested that SVM is slightly better when classifying these signals than the other selected alternatives.
Although similar variables are calculated to be used as features in the dataset, such as the case of the WPE and PE with prefixed and optimal embedding time delay, and the case of the RE, it is clear from the corresponding histograms that the mean values of the distributions are different Trends Comput.Appl.Math., 24, N. 1 (2023) as well as their relative locations, which makes them additional variables that are incorporated in the feature space contributing to improve the result of the classification stage.
The satisfactory results obtained in the present work can be interpreted as a first step in incorporating different algorithms to classifying signals for the detection of ABRs of subjects with possible pathologies and thus contribute to the analysis and subsequent objective diagnosis.

Figure 2 :
Figure 2: Exemplification of the random forest algorithm.

Figure 3 :
Figure 3: Example with four neighbors, the green dot would be classified in the same group as the red dots as the majority of the neighbors are red.

Figure 8 :
Figure 8: Example of the ROC curve for a Montercarlo trial for every classification algorithm.

Table 1 :
Confusion matrix of one Montecarlo Trial for Random Forest classifier.

Table 2 :
Confusion matrix of one Montecarlo Trial for KNN classifier.

Table 3 :
Confusion matrix of one Montecarlo Trial for SVM classifier.