Impact of network topology on the spread of infectious diseases

ABSTRACT. The complex network theory constitutes a natural support for the study of a disease propagation. In this work, we present a study of an infectious disease spread with the use of this theory in combination with the Individual Based Model. More specifically, we use several complex network models widely known in the literature to verify their topological effects in the propagation of the disease. In general, complex networks with different properties result in curves of infected individuals with different behaviors, and thus, the growth of a given disease is highly sensitive to the network model used. The disease eradication is observed when the vaccination strategy of 10% of the population is used in combination with the random, small world or modular network models, which opens an important space for control actions that focus on changing the topology of a complex network as a form of reduction or even elimination of an infectious disease.


INTRODUCTION
Infectious or transmissible diseases are caused by biological agents, such as viruses or bacteria. Methods that may help prevent these diseases, in order to reduce their incidence, have become increasingly necessary. One of the approaches that has received great attention from the scientific community is the mathematical epidemiology [20,22]. A better understanding of the transmission mechanisms of infectious diseases through mathematical models allows the establishment of more effective control strategies [8]. Examples of control application techniques from mathematical models can be seen in [28,29].
Several strategies for modelling infectious diseases have been proposed in the literature through the use of compartmental models by means of ordinary and partial differential equations, Markov chains, cellular automata, optimization methods, among others [6,9,11,37]. Recent studies have shown that the complex networks theory constitutes a natural support for the study of a disease propagation [24,33,40,43]. A complex network is defined by sets of nodes (vertices), edges (connections or links) and some type of interaction between their nodes [33]. From the application of the complex network theory to infectious diseases and other ecological phenomena, it has been developed the Individual Based Model (IBM) [16,17,18,19]. Other examples of IBM applications may be seen in [2,30] and regarding the application of control in [28].
Two major topics can be considered in obtaining effective control strategies for infectious diseases. The first topic is similar to the linearization stage to control nonlinear models. Instead of working directly with a complex network, the technique is based on producing a similar model based on differential equations to approximate the mean-field behavior of the complex network. Works in this direction can be seen in [28,30,31]. The second topic consists in the translation of classical control concepts, such as controllability and observability, into the field of complex networks [1,25]. Although these two lines are quite distinct, both investigation topics usually make use of vaccination and isolation as control actions. Despite the major advances in these two topics and significant attention on this issue by European and North American communities [26,34,35,39], there is less attention to the effects of the network topology in the spread of disease by South American scientists. Certainly, it is a topic that deserves to be explored for potential control strategies.
The topological effect of networks on the spread of infectious diseases can be seen in some recent works. It was observed, for example, that one pathogen may generate different epidemiological dynamics depending on the network topology [41]. Along the same lines, the resilience of some diseases may be related to the network topology, and the vaccination in the network hubs may increase the potential to the disease eradication [23]. For the transmission of sexually transmitted diseases in heterogeneous complex networks, analytical relations of the basic reproduction number determines the dynamics of the disease in steady state [42]. In this work, we intend to contribute for the knowledge of the complex network topology in the spread of an infectious disease. Our approach employs the mean jump length that describes how a network is connected and it has been used as an important indicator in the differentiation of complex networks properties [5]. The novelty of this study is the establishment of a direct relationship between the mean jump length, the disease spread and the number of infected individuals. This perspective opens an important space for control actions that focus on changing the topology of complex networks as a form of reduction or even eradication of infectious diseases.
The remainder of this paper is organized as follows: after this introduction, Section 2 describes the SIR models without and with vaccination. The Individual Based Model is presented in Section 3. Measures for the characterization and discrimination of complex networks as well several network models are presented in Section 4. The results are presented and discussed in Section 5, and the conclusions are presented in Section 6.  [22]. It divides the population into three disjoint classes: individuals who are not infected, but may become, named susceptible (S); individuals who are infected and can infect other individuals, named infected (I), and, individuals who recovered from the infection and acquire temporary immunity, named recovered (R). The classic SIR model, defined by a continuous system of three ordinary differential equations, considers that the distribution of individuals is spatially and temporally homogeneous. This model (Eq. (2.1)), describes the temporal evolution of each epidemiological class, that is: where N is the total number of individuals with constant size, N = S(t) + I(t) + R(t) for all t ≥ 0; β is the contact rate -also called transmission coefficient, it is the average number of contacts of an individual per unit of time, causing the transmission of the disease; γ is the recovery rate -it represents the rate of infected individuals per unit of time that moves to the recovered class; µ is the renewal rate -it is the number of individuals who dies per unit of time and, in the same number, are born other susceptible individuals.

SIR model with vaccination
The vaccination in the SIR model is considered as a way of reducing epidemic incidences. In this model, there is a class of vaccinated individuals (V ), that is, individuals who are not infected, are vaccinated and acquire temporary immunity, besides the susceptible (S), infected (I) and recovered (R) classes. The SIR model with vaccination [7], defined by a continuous system of four ordinary differential equations, considers that the distribution of individuals is spatially and temporally homogeneous. This model (Eq. (2.2)), describes the temporal evolution of each epidemiological class, that is: where β , γ and µ are the same as described in Section 2.1 and p v is the percentage of the population to be vaccinated.

INDIVIDUAL BASED MODEL
An IBM consists of a discrete structure in which relationships occur among a number of individuals. Their behaviors are determined by a set of characteristics [21,40] that stochastically evolve in time. It can be expressed by a P m×n matrix, where each of its lines represents an individual and each of its columns represents a characteristic. For each time step t, its matrix (population) can be represented by: In this work, the IBM for the SIR model consists of the following rules [27]: • C 1 is the state of an individual regarding to the epidemic (susceptible (0), infected (1) or recovered (2)).
• C 2 is the age of an individual (in units of time). At t = 0, C 2 = 0 and for each iteration, ∆t is added to the age.
• C 3 is the maximum age an individual can live. At the birth time, this age is obtained by: where µ ibm is the life expectancy of the population and a u is a random variable uniformly distributed between 0 and 1.
• C 5 is the maximum time that an individual remains infected and it is obtained by: where γ ibm is the infection period.
By definition, for susceptible and recovered individuals C 4 = 0 and C 5 = 0 for t = 0. The number of susceptible, infected and recovered individuals are denoted by S(t), I(t) and R(t), respectively. The population in study has a constant size, that is, N = S(t)+I(t)+R(t) for all t ≥ 0. In addition, the transitions between the epidemiological states are discrete and defined by the following rules: A given individual dies when its maximum age is reached. Since N is constant, this individual is replaced by a new susceptible individual. Otherwise, rule R 2 or R 3 is considered.
It occurs when a susceptible individual becomes infected and its state changes from 0 to 1.
It occurs when an infected individual becomes immune to the disease and its state changes from 1 to 2. Figure 1 shows the IBM flowchart for the SIR model without vaccination. At each time, each individual of the population is considered and it is verified which rule will be applied. After N individuals are evaluated, the time step is increased by ∆t units. The algorithm ends when the simulation time t reaches the final value t f . The IBM for the SIR model with vaccination has the same characteristics and transition rules of the SIR model without vaccination, previously described. Except that for the SIR with vaccination a new state is added in C 1 , namely vaccinated (3). Moreover, in R 1 an individual who reaches its maximum age can be replaced by a new susceptible or vaccinated individual. Figure 2 presents the IBM flowchart for the SIR model with vaccination.

COMPLEX NETWORKS
A complex network G = (N , L ) is described by a set N = {n 1 , n 2 , . . . , n N } of nodes, with connections between them, given by a set L = {l 1 , l 2 , . . . , l M } of edges. Systems taking the form of networks (also called "graphs" in much of the mathematical literature) have become the focus of widespread attention in interdisciplinary research over the past decades [32]. One of the reasons behind the growing popularity of complex networks is that almost any discrete structure can be suitably represented as special cases of graphs, whose features may be characterized, analyzed and, eventually, related to its respective dynamics [10]. A network can be mathematically expressed by a N × N adjacency matrix A. In this matrix its rows and columns are assigned to the nodes in the network and the presence of an edge is symbolized by a numerical value. A network with undirected, unweighted edges will be represented by a symmetric matrix containing only the values 1 and 0 to represent the presence and absence of connections, respectively.

Network measures
Several investigations into complex networks involve the representation of the structure of interest as a network, followed by the analysis of the topological features of the obtained representation in terms of a set of informative measures, such as the density, the degree, the clustering coefficient, the shortest path length and the mean jump length [4].

Density
The density d of an undirected, unweighted network is defined as the ratio between the number of edges and the number of possible edges in a network with N nodes, that is: where a i j (i = 1, 2, . . . , N e j = 1, 2, . . . , N) are the elements of the corresponding adjacency matrix A and 0 ≤ d ≤ 1.

Degree
The degree of a node, hence K i , is the number of edges connected to that node. For undirected, unweighted networks it can be computed as The average degree of a network is the average of K i for all the nodes in the network, that is

Clustering coefficient
The tendency of a network to form tightly connected neighborhoods can be measured by the clustering coefficient. For an undirected, unweighted network and a node n i , CC i is defined as the ratio between all triangles that are formed by node n i and the number of all possible triangles that n i could form, that is, [14]. Therefore, The global clustering coefficient CC, which represents the overall level of clustering in the network, is the average of the local clustering coefficients of all the nodes.

Shortest path length
The average shortest path length, which is a measure of the efficiency of information flow on a network, is defined as the average number of steps along the shortest paths for all possible pairs of network nodes [32]. The average shortest path length is defined as follows. Let dist(n 1 , n 2 ) denote the shortest distance between n 1 and n 2 (n 1 , n 2 ∈ N ). Assume that dist(n 1 , n 2 ) = 0 if n 1 = n 2 or n 2 cannot be reached from n 1 , has path(n 1 , n 2 ) = 0 if n 1 = n 2 or if there is no path from n 1 to n 2 , and has path(n 1 , n 2 ) = 1 if there is a path from n 1 and n 2 . Then, L is given by: , (4.5) where N denotes the number of nodes in G, ∑ N i, j dist(n i , n j ) is the all-pairs shortest path length of G, and ∑ N i, j has path(n i , n j ) is the number of paths in G [4]. Therefore, the value of L is given by the average of the shortest path lengths between all pairs of nodes in the network.

Mean jump length
From the adjacency matrix A it is possible to compute the transition matrix W , in which each of its elements is given by w i, j = a i j / ∑ N j=1 a i j . With W it is possible to perform a random walk on the network G and compute the mean jump length ∆, which is defined as follows: where s = S are the jumps of length δ s (i, j) = |i − j|, with i, j = 1, . . . , N being the node indices, as defined by W . Previous work has provided an approach that is less time-consuming for the calculation of the mean jump length ∆ [4], given by: where W T is the transpose of W , P is a N × N matrix with elements p i, j = |i − j|, and tr is the trace operation.

Complex network models
Several complex networks models have been proposed in the literature with the aim of reproducing patterns of connections found in real networks [32]. In this work, random, small world, scalefree, modular and hierarchical models were used to generate undirected, unweighted networks to better understand the implications of these patterns in the propagation of a disease.

Random model (RAN)
In a random model, given a network with N nodes totally disconnected, each pair of nodes is connected with probability 0 ≤ p ≤ 1. Thus, for p = 0, all the nodes will be disconnected and, on the other hand, for p = 1 the network will be fully connected (called "complete graph" in the graph theory). The networks produced by this model for any 0 < p < 1 are known as random networks [13]. In this work, random networks were obtained for p = 0.1.

Small world model (SW)
Watts and Strogatz found that the connections of several real networks were not completely regular or random, but between both. That is, real networks could be highly clustered, as regular networks, but with a short path length between their nodes, as random graphs -well known as a "small world" property [43]. In the small world model, each node in a circular and regular network is connected to its k nearest neighbors. With probability p, a node is randomly chosen and an edge that connects it to its closest neighbor is connected to one of its other neighbors. This process is repeated over all the network nodes until all edges are rewritten. Thus, for p = 0 the network remains regular and for p = 1 the resulting network is totally random. The networks produced by this model for any 0 < p < 1 are known as small world networks [43]. In this work, small world networks were obtained for p = 0.01 and k = 100.

Scale-free model (SF)
Barabási and Albert [3] mapped the connections between the World Wide Web pages and found the occurrence of few nodes with high degree (called "hubs") and several nodes with low degree. They proposed a model able to generate networks with such characteristics, based on the growth property, where a network with a few number of disconnected nodes receives at each step a new node with N 0 edges, and the preferential attachment property, where each new node gets connected to the more connected nodes in the network. In this work, scale-free networks were obtained for N 0 = 105.

Modular model (MOD)
In a modular network the edges are densely distributed between nodes belonging to the same group (module) and sparsely distributed between nodes belonging to different groups [10]. In this model, a modular network with N nodes and M modules is obtained by connecting each pair of nodes with probability 0 ≤ p ≤ 1 and ratio 0 ≤ r ≤ 1, which defines whether the connections occur inside or outside of each module, respectively [38]. The distribution of edges within each module follows the Erdõs-Rényi model. In this work, modular networks were obtained for M = 10, p = 0.1 and r = 0.95.

Hierarchical model (HIE)
A hierarchical model creates networks which are able to reproduce the unique scale-free property and the high clustering of the nodes at the same time -a behavior found in many social and biological networks [10]. A network produced by this model has a tree-like structure, that is, it is hierarchically organized from nodes with higher degree to nodes with lower degree [36]. Considering a network with b 0 nodes at the first layer, each node connects to its distinct sons (not more than b) which belong to the next layer. This procedure is repeated l times and a skeleton is obtained. Despite the local links, additional m long-range direct connections (shortcuts) between nodes in the same layer are added with probability λ to make the average path length short [12]. In this work, hierarchical networks were obtained for b 0 = 1, b = 2, l = 10, m = 197, 901 and It is worth mentioning that the density of connections found in many real social networks is given by d ≈ 0.1 [15]. With the knowledge that d RAN ≈ p, d SM = 2k/N, d SF = (2(N − N 0 )N 0 )/(N(N − 1)), d MOD ≈ p and d HIE = (2(N + m − 1))(N (N − 1)), all the parameters for the network models used in this work were chosen in such way to generate networks with this value of density. Table 1 presents the parameters used in all network models and the network measures used to characterize the resulting networks.

RESULTS
Considering that the IBM is discrete and, based on the parameters of the SIR models (without and with vaccination) previously described in Sections 2.1 and 2.2, the following rules can be established With the use of Equation (5.1), it is possible to build the equivalence between the SIR models (without and with vaccination) and the IBM, in such way that, on average, their solutions will present similar behaviors [27]. Figure 3a presents  In the SIR model without and with vaccination and in the original IBM formulation, the relationships between individuals are represented by a complete graph, that is, all individuals are connected to each other. However, in real contact networks this behavior is not observed. Thus, in this work the IBM for the SIR model (without and with vaccination) was modified in order to incorporate more realistic models of contact networks and understand the effect of the network topologies on the IBM solutions. Unlike the IBM without complex networks, the IBM with complex networks an infected individual can infect other individuals only if they are connected. is lower than in the hierarchical and scale-free models, it is higher than in the small world and modular models since in these latter models the disease preferentially spreads between closer neighbors or to nodes belonging to the same module, respectively. The disease eradication is observed when the vaccination strategy of 10%, 5%, 27%, 3% and 44% of the population is used in combination with the random, small world, scale-free, modular and hierarchical network models, respectively. Figure 6 presents the IBM solutions (average number of infected individuals) with complex networks for the SIR models without (Fig. 6a) and with (Fig. 6b) vaccination. In both cases, the solutions were obtained with the same values of N, S 0 , I 0 , R 0 , γ ibm , β ibm , µ ibm , ∆t, t f used in Figure 5 and p v = 0.10, under 1, 000 different realizations. Complex networks with different topologies result in curves of infected individuals with different behaviors, and thus, the propagation of a given disease is highly sensitive to the network topology used. In particular, the larger the mean jump length (Table 1), the faster is the disease propagation, consequently, the highest is the number of infected individuals. The disease eradication is observed when the vaccination strategy of 10% of the population is used in combination with the random, small world or modular network models. For illustration purposes, Figures 7 and 8 show examples of the IBM for the SIR model in combination with random, small world, scale-free, modular and hierarchic networks with N = 20 nodes and t = 0, 1, 2, 3 and 4 (without vaccination) and t = 0, 10, 11, 12 and 13 (with vaccination) for the disease spread. Nodes in blue, red, green and brown correspond to susceptible, infected, recovered and vaccinated individuals, respectively.

CONCLUSIONS
In this work, the IBM in combination with several complex network models was proposed for modeling the propagation of an infectious disease. In general, complex networks with different properties result in curves of infected individuals with different behaviors, and thus, the evolution of a given disease is highly sensitive to the network model used. The disease eradication is observed when the vaccination strategy of 10% of the population is used in combination with the random, small world or modular network models which opens an important space for control actions that focus on changing the complex networks topology as a form of reduction or even elimination of the infectious diseases. Therefore, our approach can be a simple and effective tool in the most realistic modeling of an infectious disease. RESUMO. A teoria de redes complexas constitui um suporte natural para o estudo da propagação de uma doença infecciosa. Neste trabalho, apresentamos um estudo da propagação de uma doença infecciosa com o uso de tal teoria em conjunto com o Modelo Baseado em Indivíduos. Mais especificamente, utilizamos diversos modelos de redes complexas amplamente conhecidos na literatura para verificar seus efeitos topológicos na propagação de uma doença. De modo geral, observamos que redes complexas com propriedades distintas resultam em curvas de indivíduos infectados com comportamen-tos também distintos, e desta forma, que a proliferação de uma dada doençaé altamente sensível ao modelo de rede utilizado. Observamos a erradicação da doença quando adotada a estratégia de vacinação de 10% da população em combinação com os modelos de redes aleatória, mundo pequeno ou modular. Desta forma, estratégias de controle com enfoque na topologia da rede em estudo podem ser utilizadas comêxito como forma de redução ou até mesmo de erradicação de uma doença infecciosa.