A Metaheuristic Approach to Parameter Estimation of a Flexible Parametric Mixture Cure Rate Model with Interval-Censored Data

ABSTRACT A flexible parametric mixture cure model, called bi-lognormal cure rate model or simply BLN model, is defined and studied. The BLN model can be effectively used to analyze survival dataset in the presence of long-term survivors, especially when the dataset presents the underlying phenomenon of latent competing risks or when there is evidence that a bimodal hazard function is appropriated to described it, which are advantages over other cure rate models found in the literature. We discuss the maximum likelihood estimation for the model parameters considering interval-censored data through the differential evolution algorithm that is a nature-inspired computing metaheuristic used for global optimization of functions defined in multidimensional spaces. This approach is also used because the likelihood function of the model is multimodal and the direct application of gradient methods in this case is not ideal, since such methods are local search methods with a high chance of getting stuck at a local maximum when the starting point is chosen outside the basin of attraction of a global maximum. In addition, a simulation study was implemented to compare the performance of differential evolution algorithm with the performance of the Newton-Raphson algorithm in terms of bias, root mean square error, and the coverage probability of the asymptotic confidence intervals for the parameters. Finally, an application of the BLN model to real data is presented to illustrate that it can provide a better fit than other mixture cure rate models.


INTRODUCTION
Survival models that take into account a cure fraction, known as cure rate models or long-term survival models, are frequent topics considered in many theoretical and applied statistics articles. The relevance of these models has increased in the last decade, mainly due to medical advances, utive inspection times. Therefore, the exact lifetimes are not recorded, but it is known that the lifetimes belong to an interval [L, R], where L is the latest inspection time before the occurrence of the event and R is the first inspection after the occurrence of event [45]. [L, ∞] represents the case where the event of interest is not observed by the end of the last inspection time in which case the lifetime is considered to be right-censored. Although interval-censored data have been extensively studied in the literature, there are not many references dealing with interval-censored data in a cure rate setup. The following references [11,25,37,43] and the references therein could be mentioned as few examples.
As an illustration of the aforementioned problems, that is, the scenario of competing risks, longterm survivors, and interval-censored, we consider data from a field trial of 4,992 circuit boards extracted from [12]. The data consists on the lifetimes of the circuit boards observed during a period of 10,000 hours of operation or until failure. A total of 95 circuit board failures are observed. For the circuit boards that fail before the end of the experiment, engineering judgment indicates that failure can occur due to infant failure or wearout, but the exact cause of failure is unknown. Besides, the lifetimes of these failed units are interval-censored between two inspections, and the remaining lifetimes are right-censored. There is a great deal of censorship (there were 4,897 censored lifetimes) in the data, evidencing a possible adequacy of the cure rate model approach. Furthermore, the cumulative hazard plot for these dataset was shown in [35] and it was observed that two possible causes of failure are competing. Given this, at least in principle, these behaviors indicate that models that ignore the possibility of cure and competing risks will not be adequate for these dataset. Therefore, in this paper, we consider the situation where the objective is to analyse a survival dataset in the presence of long-term survivors, in particular, when this dataset presents the underlying phenomenon of latent competing risks or when there is evidence that a bimodal hazard function is appropriated to described it. For this situation, we propose a mixture cure rate model based on the structure proposed by Mazucheli et al. [35] with two causes of failure latent, assuming in our approach the hypothesis that the lifetime associated with a particular cause of failure follows a lognormal distribution in presence of interval-censoring. For proposed model, the explicit expression of the maximum likelihood estimators of the parameters by considering interval-censored data cannot be obtained explicitly by analytical methods, since equating the first-order log-likelihood derivatives to zero leads us to a complicated system of non-linear equations. In this case, we need of a iterative computer method to solve this problem. Gradient methods (such as Newton-Raphson and Fisher scoring) require finding the scoring vector and the inverse of the Hessian matrix (or a good approximation of this matrix) associated with the log-likelihood function, but these operations are hard to be realized analytically. In addition, gradient methods are local search procedures that usually are trapped by a local optima when applied to optimize multimodal functions, even when strategies of multiple runs are used. For such reasons, we have proposed to use a well-known evolutionary algorithm, called differential evolution (DE) to find a good approximation of a global maximum of the log-likelihood function. DE is a nature-inspired metaheuristic developed by Storn and Prince [44] for global optimization 538 BI-LOGNORMAL MIXTURE CURE RATE MODEL of multimodal functions defined in multidimensional spaces. It uses the principles of biological evolution and natural selection to simulate the evolution of living organisms subject to biological mechanisms, which lead to fittest descendants to their environment. DE does not use derivatives of the objective function in its search procedure. Instead, it uses an ensemble of feasible solutions of the optimization problem as a population of organisms that evolve by application of operators simulating biological mechanisms such as mutation, crossover, and selection. In other words, DE is a derivative-free algorithm that can be considered as an efficient method for the general problem of global optimization, with many successful applications in various scientific field [17,21,27,30,36,38]. In our Monte Carlo simulation study, it was observed that DE converges to a good approximation of a global maximum of the log-likelihood function in almost all runs. That is, given a priori approximation error, DE converges with high success rate. Besides, DE is robust and does not require any other method to choose good starting points to achieve results near of a global maximum.
The remainder of this paper is structured as follows: Section 2 presents the bi-lognormal cure rate model and describes the maximum likelihood estimation based on DE algorithm to estimate the model parameters for interval-censored data. Section 3 presents a simulation study to compare the performance of DE algorithm with the performance of the Newton-Raphson algorithm in terms of bias, root mean square error, and coverage probability of the asymptotic confidence intervals for the model parameters. An application of our model to a real dataset is presented in Section 4. Some final comments in Section 5 conclude the paper.

Bi-lognormal mixture cure rate model
The lognormal distribution assumes that lifetime T has density function given by where µ ∈ R is the mean lifetime logarithm (µ = E[log(T )]), just as σ > 0 is the standard deviation of log(T ). The corresponding survival and hazard functions are represented respectively by where Φ(·) is the cumulative distribution function of the standard normal distribution. The shape of the hazard function is unimodal, but it does not have a closed analytical form. A similar comment holds for the survival function. It is important to highlight that the normal and lognormal distributions have certain equivalences, because if a random variable has a lognormal distribution, its logarithm has a normal distribution. For this reason, there are many mathematical properties between the two distributions, as discussed in [2]. Now, in building our model, let T be the lifetime of an individual (or component). The mixture cure rate model performs a T lifetime dissection as where T * denotes the lifetime of a susceptible (who is not a long-term survivor or immune the cause of failure under study) and η indicates, by the values 1 or 0, whether the sampled subject is a susceptible or long-term survivor, independently of T * . Let p 0 = Pr(η = 1) ∈ [0, 1] be the proportion of immunes, i.e., the cure fraction. Then, the survival function of T , S p (t), is given by where S(t) = Pr(T * > t) is a proper survival function (with S(∞) = 0).
Following Mazucheli et al. [35], we suppose that an individual (or component) is subject to two independent sources of failure cause indexed by j = 1, 2. Assume further that the distribution of lifetime related to the jth cause, X j , may be sufficiently described by a lognormal form with density defined in Eq.(2.1). Then, the observed event time, T * = min{X 1 , X 2 }, is said to follow a bi-lognormal distribution if its survival and hazard functions are respectively where S j (·) and h j (·) are the lognormal survival and hazard functions defined in Eq.(2.2). The shape of hazard function (2.6) is unimodal or bimodal.

Remark 1.
Although there are probability distributions that accommodate unimodal hazard functions, such as inverse Weibull, log-logistic, and exponentiated Weibull distributions, the literature in this area is lacking distributions that support bimodal hazard functions. In this sense, the bi-lognormal distribution is flexible enough to accommodate this type of hazard, which is another motivation for its formulation.
Under these ponderations, based on bi-lognormal survival function (2.5), the survival function (2.4) is given by From here on, we refer to the model in Eq. (2.7) as the bi-lognormal mixture cure rate model, or simply, the BLN model.

Maximum likelihood estimation for the BLN model
Let us consider the situation when the lifetime T in Eq. (2.3) is not completely observed and is subject to interval-censoring, i.e., T belongs to an interval [L, R], where L is the latest inspection time before the occurrence of the event of interest and R is the first inspection after the occurrence of event of interest. Note that [L, ∞] refers to the situation where the event of interest is not observed before the last inspection time in which case T is considered to be right censored. The censoring indicator is thus defined as δ = I(R < ∞), which takes the value 0 if [L, ∞], meaning that the event is not observed for a subject before the last inspection time, and takes the value 1 if R < ∞, meaning that the event took place but its exact time is not known and is only known to belong to the interval [L, R].
Remark 2. The likelihood function (2.9) is exchangeable, i.e., considering the BNL model, the value of (2.9) will be the same if the MLEs of µ 1 and σ 1 are exchanged for the MLEs of µ 2 and σ 2 and vice versa. Therefore, to avoid this identifiability problem, we consider the constraint Remark 3. We consider the R programming language [39] through the DEoptim function, which is fully documented in the DEoptim package in R [1], to compute the MLE ϑ ϑ ϑ of the parameter ϑ ϑ ϑ . The DEoptim function implements the DE algorithm that is discussed in detail in Subsection 2.3.
Asymptotic properties of MLE ϑ ϑ ϑ are required to make inferences about the unknown parameter ϑ ϑ ϑ of the BLN model. Following [15,29], under certain conditions of regularity, the MLE ϑ ϑ ϑ is asymptotically unbiased and efficient. Moreover, its distribution converges to normal with the variance-covariance matrix given by the inverse of the Fisher information matrix. Given this, we present an approximate method for constructing confidence intervals (CIs) for ϑ ϑ ϑ using the asymptotic properties of ϑ ϑ ϑ . Thus, let us first denote the Fisher information matrix of ϑ ϑ ϑ by (2.10) Since the Fisher information matrix is very complicated to be obtained due to the censored observations (censoring is random and noninformative), we resort to the observed Fisher information matrix of ϑ ϑ ϑ obtained in the form evaluated at the maximum likelihood estimator ϑ ϑ ϑ = ϑ ϑ ϑ , which is a consistent estimator of I E . The required second derivatives are computed numerically.
The multivariate normal N 5 (ϑ ϑ ϑ , I −1 O ( ϑ ϑ ϑ )) distribution can be used to construct asymptotic CIs for the parameters. In fact, an 100(1 − γ)% asymptotic CI for each parameter κ is given by Remark 4. As mentioned above the asymptotic properties of MLE ϑ ϑ ϑ are valid only under certain regularity conditions, which are not easy to verify analytically in our BLN model. Therefore, in the next section, we investigate the asymptotic properties of the MLEs by means of a simulation study. It is worth noting that many authors have performed simulations to assess the asymptotic behavior of MLEs, especially when the analytical investigation is not trivial [11,40].

MLE based on differential evolution algorithm
In general, differential evolution (DE) is a nature-inspired algorithm used for global optimization of multimodal functions defined in multidimensional spaces [17,38,44]. In the context of the maximum likelihood estimation, DE is described as follows. Let l(ϑ ϑ ϑ ; D) be a log-likelihood function to be maximized over a given parametric space Θ Θ Θ ⊆ R D . For this optimization problem, consider a population P g of K feasible solutions in Θ Θ Θ. A feasible solution is a D-dimensional vector ϑ ϑ ϑ g k = (ϑ g k1 , . . . , ϑ g kD ) in Θ Θ Θ. The index k, where k = 1, . . . , K, labels the kth solution in P g and the index g represents the generation counter of the algorithm, which in turn acts to evolve the population P g in order to find a good approximation for an optimal solution of the problem.
DE consists of three main steps: mutation, crossover, and selection. Mutation produces a donor vector v v v g+1 k according to the following scheme: for each solution ϑ ϑ ϑ g k ∈ P g , three distinct solutions ϑ ϑ ϑ g k 1 , ϑ ϑ ϑ g k 2 , ϑ ϑ ϑ g k 3 ∈ P g (2.13) are randomly chosen to generate v v v g+1 k as follows v v v g+1 k = ϑ ϑ ϑ g k 1 + F · (ϑ ϑ ϑ g k 2 − ϑ ϑ ϑ g k 3 ) (2.14) where k ̸ = k 1 ̸ = k 2 ̸ = k 3 and F ∈ [0, 1] is a parameter controlling the amplitude of the differential variation. Crossover implements a discrete recombination between the donor vector v v v g+1 k and the current solution ϑ ϑ ϑ g k to produce the offspring u u u g+1 k . Crossover is controlled by a crossover probability CR ∈ [0, 1] and it is implemented as for all d = 1, . . . , D. After the mutation step, if an element of v v v g+1 k is found to violate the bounds of the parametric space Θ Θ Θ, it is reset in such a way that the bounds are respected. An analogous procedure is done for u u u g+1 k after the crossover step. Finally, the selection step is used to select the best solution between u u u g+1 k and ϑ ϑ ϑ g k to go to the next generation. Therefore, we have (2.16) The population P g+1 in generation g + 1 consists of the set of solutions {ϑ ϑ ϑ g+1 k : k = 1, . . . , K}. The process is repeated until some stopping criterion is met. At the end of the evolution process, DE returns the best solution ϑ ϑ ϑ * found in the last population (corresponding to last generation of the algorithm) and its objective value l * = l(ϑ ϑ ϑ * ; D), which is the value of the log-likelihood function on ϑ ϑ ϑ * .
The essential steps of DE algorithm are summarized as the pseudo code shown in Algorithm 1. The parameter F is commonly chosen in the range [0.4, 1], CR is a probability in the range [0, 1], and K is commonly chosen between 5D and 10D, where D is the dimensionality of the parametric space Θ Θ Θ ⊆ R D . Variants of DE algorithm are discussed in the DE literature. These different DE strategies basically differ in the way that the donor vector is created and the way that the offspring is generated. In order to characterize these strategies, a general notation was adopted in the literature, namely: DE/x/y/z, where x refers to the mutation scheme to create the donor vector, y indicates the number of difference vectors used in the mutation scheme, and z indicates the crossover scheme.
The DE strategy discussed in this subsection is referred to as DE/rand/1/bin (see Algorithm 1), indicating that the donor vector was created from three solutions randomly chosen and with only one difference vector, and that the offspring was generated using the binomial crossover scheme. Readers may be referred to [38] and [17] for more details.
Finally, it is important to note that the DEoptim package in R [1] implements several DE strategies, including the strategy DE/rand/1/bin. The structure of the DEoptim function in the DEoptim package is familiar to readers who are accustomed to the R syntax.

SIMULATION RESULTS
As noted by Mazucheli et al. [35], in lifetime studies, it is common to find datasets with a small or moderate amount of observed lifetimes. In addition, in the dataset in Section 4, we can observe the presence of a large amount of censoring. Therefore, a Monte Carlo simulation study was conducted to assess the performance of the MLEs via DE algorithm and Newton-Raphson method in small and moderate samples when censoring is observed. To this end, we examine three measures, namely: the bias, root mean square error, and probability of coverage of the asymptotic CIs for the parameters of BLN model. The results were obtained from 1,000 Monte Carlo repetitions and the simulations were carried out in the R programming language. In each replication, we consider the BLN model in Eq. (2.7) with parameters µ 1 = 4.0, µ 2 = 12.0, σ 1 = 6.5, σ 2 = 0.2, and the cured fraction fixed in p 0 = 0.3, 0.6 and 0.9 when sample size n = 50, 100, 200, 300, 400 and 500. To introduce random censoring, the distribution of censoring times is assumed to be exponential with rate α, which is set to control the proportion of right-censored observations. Interval-censored data (L i , R i , δ i ) are generated by using the following steps, according to the same strategy described in [37]: Algorithm 2 Interval-censored data generated according to the BLN model Require: µ 1 , µ 2 , σ 1 , σ 2 , p 0 , α, and n 1: Define values for µ 1 , µ 2 , σ 1 , σ 2 , p 0 , and α. 2: For the ith observation, draw u i ∼ Uniform(0, 1) and C i ∼ Exponential(α), where α is set to control the proportion of right-censored observations.
3: If u i < p 0 , set L i = C i , R i = ∞ and δ i = 0; otherwise, generate X j lognormal random variable with parameters µ j and σ j , j = 1, 2 and set T * The generated data have approximately 100(p 0 + 0.05) of censoring data, which fits in the context of models with a cure fraction p 0 . Figures 1-3 list the bias of the five parameters of the BLN model and the corresponding root mean squared errors (RMSEs) and the empirical coverage probabilities (CPs) of 95% of the asymptotic CIs. The results indicate that: (i) with respect to the performance of bias, RMSEs for parameters, the DE algorithm is slightly better than the Newton-Raphson method in most simulation combinations; (ii) as expected and independent of the estimation algorithm, bias and RMSEs decrease as sample size increases, while the amount of censorship induces increases, especially for small samples; (iii) the empirical CPs of the parameters µ 1 , µ 2 and p 0 are closer to nominal levels when n increases, while σ 1 and σ 2 are always less than nominal level, worsening as the proportion of censorship increases. Although the CPs for the σ 1 and σ 2 parameters improves for a large sample size, we still find it unsatisfactory even for n as large as 500, especially when the censorship proportion is very large. Therefore, the confidence interval based on the asymptotic normality of maximum likelihood estimators should not be used unless n is considerably large, which is the situation studied in our application of Section 4. Other parameter adjustments were also used, however, the performance of the parameter estimation procedure did not present significant differences for the results of Figures 1-3.  Further, we give the plots of the empirical distributions of µ 1 , µ 2 , σ 1 , σ 2 and p 0 when n = 500 and 100(p 0 = 0.9 + 0.05) . The plots displayed in Figure 4 indicate that the normal distributions provides reasonable approximations to the distributions of these estimators.

APPLICATION TO THE CIRCUIT BOARD DATA
To verify the performance of the BLN model on real-world problems, we conducted an analysis of the circuit board dataset (initially described in Section 1), using the BLN model as a parametric model appropriated to describe this dataset. In this application, the performance of the BLN model was compared with the performances obtained by other models proposed in the literature to deal with interval-censored data in a cure rate setup, namely: (1) the negative binomial Weibull (NBW) model [25], (2) the COM-Poisson (CP) model [37], and (3) the extended generalized gamma accelerated failure time (EGG-AFT) model [43], whose the respective survival functions are given by follows: where p 0 ∈ [0, 1], and v = log(t)−µ σ .
To compare these models, we consider the max log L(·) value, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC). AIC and BIC are respectively defined by −2 log L( ϑ ϑ ϑ g ) + 2κ and −2 log L( ϑ ϑ ϑ g ) + κ log(n), where ϑ ϑ ϑ g is the MLE of the g-model, κ is the number of estimated parameters in the g-model, and n is the sample size. The best model corresponds to the lowest values of max log L(·), AIC, and BIC.
Given the above considerations, we have adjusted the BLN, NBW, COM-Poisson and EGG-AFT models using the DE algorithm for parameter estimation. Table 1 presents the max log L(·), AIC, and BIC values for all models considered. Based on these results, we can conclude that the BLN model outperforms all the other models in comparison.  Furthermore, in order to assess if the BLN model is appropriate, Figure 5 shows plots of the empirical survival function estimates using the generalized Turnbull's nonparametric estimator proposed by Dehghan and Duchesne [18] against the respective predicted values obtained from the BLN, NBW, COM-Poisson and EGG-AFT models. Clearly, we observe from Figure 5, that the predicted values obtained from the BLN model are the closest to the values of the nonparametric estimator, suggesting that this model give a satisfactory fit to the data, thus going against the results from the Table 1. In a way, this result corroborates our research hypothesis, since the BLN model was proposed to have properties that can adequately describe specific characteristics of the dataset considered in our analysis, namely: interval censoring, presence of long-term survivors and latent competitive risks. We consider the R programming language through the gte function of the gte package in R [19] to compute the survival function estimates using the generalized Turnbull.

CONCLUSIONS
In this paper, based on interval-censored data, we consider maximum likelihood estimation via DE algorithm to estimate the parameters of a parametric mixture cure model, referred to as the bilognormal mixture rate model, or simply, the BLN model. The BLN model can be used when the dataset particularly presents the underlying phenomenon of latent competing risks or when there is evidence that a bimodal hazard function is appropriated to described it. This feature makes the BLN model quite flexible in terms of its use to describe real datasets and represents its main advantage over other cure rate models found in the literature. A Monte Carlo simulation study was developed, indicating that the MLE via DE algorithm outperforms the Newton-Raphson method in most cases in terms of bias, root mean square error, and the coverage probability of confidence intervals. Besides, DE algorithm is robust and does not require any other method to choose starting points to obtain good approximations of MLE. The practical importance of the proposed methodology was demonstrated in a real-world interval-censored dataset, where it provided a good fit.