# 8 Measuring network robustness

**Alessandro Vindigni**

**session 29/03/2023**

In this session we will use the following R-packages and external sources that you are encouraged to load

```
library(rjson)
library(dplyr)
library(igraph)
library(rweboflife)
library(ggplot2)
library(latex2exp)
library(formattable)
source("~/ecological_networks_2023/downloads/Functions/helper.R")
```

## 8.1 Data collapsing and cirticality

During the lecture we have shown you several plots of the average size of the largest cluster \(C_{max}\) as a function of the fraction of removed nodes \(f\), for different types of (simulated) networks. Curves obtained for networks of different size \(N\) were normalized plotting the ratio \(C_{max}/N\) vs \(f\). However, according to the theory of **critical phenomena**, in the neighborhood of the critical fraction \(f_c\) curves obtained for different network sizes \(N\) should collapse onto a single *master curve*
if plotted as
\[\begin{equation}
y=\frac{C_{max}}{N^{\Delta_c}}
\quad \text{vs}\quad
x=\frac{f-f_c}{N^{\Delta_f}}
\end{equation}\]
We consider random removal of nodes in a **honeycomb lattice** and in the **Erdos-Renyi graph** for which the \(\Delta\) exponents can be computed analytically and take the values

model | \(\Delta_c\) | \(\Delta_f\) |
---|---|---|

Erdos-Renyi | 2/3 | 1/3 |

2D percolation | 303/288 | 3/8 |

### Exercise 1

#### honeycomb lattices

Load the data computed by us for honeycomb lattices of size \(N=900,\, 10'000,\ 22'500\) nodes and bind those datasets into a single dataframe

```
<-"~/ecological_networks_2023/downloads/Data/03-29_measuring_network_robustness/"
path_to_data load(paste0(path_to_data, "hc_lattices_30.RData"))
load(paste0(path_to_data, "hc_lattices_100.RData"))
load(paste0(path_to_data, "hc_lattices_150.RData"))
<- rbind(df_hc_30_rnd,df_hc_100_rnd,df_hc_150_rnd) df_all_hc
```

You can define \(f_c\) from the percolation threshold corresponding to a 2D honeycomb lattice. Adjust the value of \(\Delta_c\) and \(\Delta_f\) in the underlying code chuck and add the two columns that are needed to plot the *master curve* to the `df_all_hc`

dataframe

```
= 1 # set the appropriate value
Delta_c = 1 # set the appropriate value
Delta_f = 1-0.697 # 1 - (precolation threshold)
f_c <- df_all_hc %>% mutate(f = 1.-n_nodes/N,
df_collapse_hc x = (f-f_c)*(N**Delta_f),
y = C_max/(N**Delta_c))
```

Plot the master curve using the following commands and check that data for different \(N\) actually collapse in some part of the plot. In your opinion, what defines the extent of the region where data-collapsing is realized?

```
ggplot() +
ggtitle("Honeycomb lattice collapse N = 900, 10'000, 22'500")+
geom_point(data = filter(df_collapse_hc, N==22500), aes(x,y), color = "turquoise3",shape = 1,size =0.5) +
geom_point(data = filter(df_collapse_hc, N==10000), aes(x,y), color = "dodgerblue4",shape = 1,size =0.5) +
geom_point(data = filter(df_collapse_hc, N==900), aes(x,y), color = "darkseagreen4",shape = 1,size =0.5) +
ylab(TeX("$y$")) +
xlab(TeX("$x$")) + scale_y_continuous(trans='log10')
```

#### Erdos-Renyi graphs

For Erdos-Renyi graphs, instead, load the data

```
load(paste0(path_to_data, "er_graph_30.RData"))
load(paste0(path_to_data, "er_graph_100.RData"))
load(paste0(path_to_data, "er_graph_150.RData"))
<- rbind(df_er_30_rnd,df_er_100_rnd,df_er_150_rnd) df_all_er
```

Readout the initial value of \(N\) from the three dataframes to extract (with `filter()`

) the relative curves from `df_collapse_er`

. Use the values of \(\Delta_c\) and \(\Delta_f\) typical of the Erdos-Renyi model, and set \(f_c = 0.75903\).
Repeat the process to produce the master curve \(x\) vs \(y\) for this model.
Do you notice any difference between the data-collapsing realized in the two models? Is the master curve the same?

## 8.2 Haslemere dataset

In this section we analyze a dataset on human social interactions specifically gathered in 2017 with the aim of modeling the spreading of infectious diseases. These high-resolution GPS data were collected tracing 469 volunteers by means of a mobile-phone app. The volunteers were residents of the town of Haslemere, where the first evidence of UK-acquired infection with COVID-19 would later be reported. Raw data are available here, while more details on data collection can be found in this preprint.

```
# Import data from a csv file into an R dataframe.
# Note that this file separates columns with ",".
# Thus, we specify the separator when calling the read.csv function
<- read.csv(paste0(path_to_data,"Haslemere_Kissler_DataS1.csv"), sep = ",")
data_haslemere
# Assign names to the columns
names(data_haslemere) <- c("time_step","person_A", "person_B", "distance")
```

As per good practice, let us check the class of the object `data_haslemere`

and have a quick look at its data structure.

```
# Check class of the object
class(data_haslemere)
```

`## [1] "data.frame"`

`head(data_haslemere) %>% formattable()`

time_step | person_A | person_B | distance |
---|---|---|---|

1 | 2 | 215 | 9 |

1 | 2 | 246 | 18 |

1 | 2 | 265 | 45 |

1 | 5 | 367 | 32 |

1 | 5 | 430 | 17 |

1 | 8 | 10 | 13 |

### Build up networks from Haslemere dataset

The function `interaction_by_distance_and_time`

(df, \(R\), \(t_{inf}\)) takes as arguments

`df`

: dataframe containing the Haslemere dataset loaded from csv file- \(R\): maximal distance between person_A and person_B in meters
- minimum time of interaction in units of 5 min, i.e., \(t_{inf}=6\) means that two individuals have interacted at least for half an hour.

```
<- function(df, R, time_inf, day = NULL){
interaction_by_distance_and_time # filter by day
if(is.null(day)){
print("day NULL")
<- df
data_period else {
} print(paste0("day = ",day))
<- 1 + (day-1)*192
istep_min <- day*192
istep_max <- filter(df, between(time_step, istep_min,istep_max))
data_period
}
# filter by distance shorter than R[m]
<- data_period %>% filter(distance < R) %>% select(person_A, person_B)
data
# create incidence matrix
<- incidence_matrix_from_raw_data(data)
inc_mat
# filter by cumulative time spent closer than R longer than time_inf
<- 1*(inc_mat > time_inf)
incidence_matrix
return(incidence_matrix)
}
```

Note that the function `incidence_matrix_from_raw_data()`

is loaded from the file `helper.R`

.

Once you have created the incidence matrix associated with the “rule” that defines the **contact** between two people, you can obtain a graph using the function `graph_from_incidence_matrix()`

of the `igraph`

package. To highlight the effect of the time spent at a distance smaller than \(R\) between two individuals, consider the following cases

- \(R=100\) and \(t_{inf}=0\) \(\Rightarrow\) everybody is connected to everybody
- \(R=50\) and \(t_{inf}=6\) \(\Rightarrow\) pairs of people who have come closer than 50 m for at least 30 min in 3 days (48h).

The two datasets exemplified above are produced passing the following arguments to the function `interaction_by_distance_and_time()`

:

```
<- 100
R <- 0 # any contact within 48 h
time_inf <- interaction_by_distance_and_time(data_haslemere, R+1, time_inf) inc_mat_100_0
```

`## [1] "day NULL"`

```
<- 50
R <- 6 # 30 min within 48 h
time_inf <- interaction_by_distance_and_time(data_haslemere, R+1, time_inf) inc_mat_50_6
```

`## [1] "day NULL"`

We can then use the function `graph_from_incidence_matrix()`

of igraph to convert both incidence matrices into graph objects:

```
<- graph_from_incidence_matrix(inc_mat_100_0)
graph_100_0 <- graph_from_incidence_matrix(inc_mat_50_6) graph_50_6
```

Playing with the parameters \(R\) and \(t_{inf}\) networks with different topology are produced. Now we are going to visualize these two examples:

#### \(R=100\) and \(t_{inf}=0\) (any amount of time)

`plot(graph_100_0, vertex.size=2, vertex.label=NA, layout = layout_nicely) `

#### \(R=50\) and \(t_{inf}=6\) (30 min within 48 h)

`plot(graph_50_6, vertex.size=2, vertex.label=NA, layout = layout_nicely) `

To assess the robustness of this contact network we will use the package `rweboflife`

, developed by our group and available in our GitHub workspace. In particular, the function we are going to use takes a non-named `edgelist`

, with an array format, as input. The conversion from a graph object to the proper format can be made with `igraph`

:

`<- as_edgelist(graph_100_0, names = FALSE) edge_list `

Then we need to pass to the function a vector of the nodes we want to remove, named according to their index (an integer not a character!)

```
<- edge_list[,1]
vec_n1 <- edge_list[,2]
vec_n2 <- union(vec_n1, vec_n2)
vec_all <- unique(vec_all) vec_rm
```

Now we are in the position to compute the *average* size of the largest cluster \(C_{max}\) as a function of the number of removed nodes. The average is performed iterating the progressive removal of nodes (simulation) as many times as defined by the parameter `num_iter`

.
For random removal strategy (RND) we use the function `inverse_percolation()`

```
<- 20 # number of replicates of the simulation
num_iter <- 1 # how many nodes are removed between one measure of C_max and the next one
imeasure <- 7625434 # seed for the random number generator
rnd_seed <- rweboflife::inverse_percolation(edge_list, vec_rm, num_iter, imeasure, rnd_seed) df_helsemere_rnd
```

Let us add to the `df_helsemere_rnd`

dataframe a pair of columns \(f\) and \(\kappa\),
representing the fraction of removed nodes and the Molly-Reed connectivity parameter, respectively (run the underlying command).

```
<- df_helsemere_rnd %>% mutate(f = 1.-n_nodes/N,
df_helsemere_rnd_analysis kappa = (k_sd*k_sd + k_ave*k_ave)/k_ave)
```

### Exercise 2

Assume you were in the COVID-19 taskforce and you were asked for advice to hinder the pandemic. A possible strategy could be to change people habits so that their contact network becomes disconnected (subcritical phase).

- Lookup the
`df_helsemere_rnd_analysis`

dataframe

`%>% formattable() df_helsemere_rnd_analysis `

and identify the size of the largest cluster \(C_{max}\) corresponding to the critical threshold of \(\kappa \simeq 2\) (Molloy-Reed criterion).

- Play around with the values of \(R\) and \(t_{inf}\) till a similar value of \(C_{max}\) as that determined at the previous point is obtained.

**HINT:**

```
<- 10
R <- 6 # 30 min within 48 h
time_inf <- interaction_by_distance_and_time(data_haslemere, R+1, time_inf)
my_inc_mat <- graph_from_incidence_matrix(my_inc_mat)
my_graph <- max(igraph::components(my_graph)$csize) my_C_max
```

Plot the resulting graph and check visually that it actually corresponds to a disconnected network.

Provide a concrete suggestion for decision makers, such as

*“we need to set rules (e.g., lockdown) in order that ONLY people who in ordinary conditions spend at least half an hour per day at a distance smaller that 10 meters can keep interacting”*.

## 8.3 Robusteness of empirical networks

We consider here the evolution of the largest cluster under the removal of nodes of two networks that are available on Web of life:
the pollination network **M_PL_015** and the foodweb **FW_013_03**.

To download the pollination network we use the syntax described in the first exercise session:

```
# download the network from the web-of-life API
<- "https://www.web-of-life.es/"
base_url <- paste0(base_url,"get_networks.php?network_name=M_PL_015")
json_url <- jsonlite::fromJSON(json_url)
M_PL_015_nw # create a graph (meaning igraph object)
<- M_PL_015_nw %>% select(species1, species2, connection_strength) %>%
M_PL_015_graph graph_from_data_frame()
```

As done for the Haslemere dataset, we compute the *average* size of the largest cluster \(C_{max}\) as a function of the number of removed nodes \(f\), using functions in the package `rweboflife`

available in our GitHub workspace. Similarly to what done before, we need to prepare and edgelist
in terms of node indexes (an integer not a character!):

`<- as_edgelist(M_PL_015_graph, names = FALSE) edge_list `

and vector of the nodes we want to remove.

```
<- edge_list[,1]
vec_n1 <- edge_list[,2]
vec_n2 <- union(vec_n1, vec_n2)
vec_all <- unique(vec_all) vec_rm
```

For random removal strategy (RND) we use the function `inverse_percolation()`

```
<- 20
num_iter <- 1
imeasure <- 8625434
rnd_seed <- rweboflife::inverse_percolation(edge_list, vec_rm, num_iter, imeasure, rnd_seed) df_M_PL_015_rnd
```

while we should use the function `systematic_removal()`

for systematic removal of nodes, from most-to-least connected (MTL)

```
<- 2
num_iter <- 1
imeasure <- 8623457
rnd_seed <- rweboflife::systematic_removal(edge_list, vec_rm, num_iter, imeasure, TRUE, TRUE, rnd_seed) df_M_PL_015_mtl
```

or from least-to-most connected (LTM)

```
<- 2
num_iter <- 1
imeasure <- 8623457
rnd_seed <- rweboflife::systematic_removal(edge_list, vec_rm, num_iter, imeasure, FALSE, TRUE, rnd_seed) df_M_PL_015_ltm
```

respectively (note that the first of the two boolean arguments changed form **TRUE** to **FALSE**).

The returned dataframe consists of the following columns

- id = integer identifying a row

- N = initial number of nodes in the network

- n_nodes = current number of nodes at a given stage of the simulation

- n_nodes1 = current number of nodes of type 1 (those that are actively removed)

- n_nodes2 = current number of nodes of type 2

- k_ave = current average degree of the network

- k_sd = current averaged squared degree of the network

- C_max = largest cluster size (in number of nodes)

### Exercise 3

Cut and paste the code chucks provided above to produce the dataframes

`df_M_PL_015_rnd`

and`df_M_PL_015_mtl`

.**Achtung:**the second simulation (MTL) takes roughly 20 sec. per iteration: try to choose the good compromise between having a smooth enough curve and not taking ages to produce it.Add to each dataframe a pair of columns \(f\) and \(\kappa\) , representing the fraction of removed nodes and the Molly-Reed connectivity parameter, respectively (run the underlying command).

```
<- df_M_PL_015_rnd %>% mutate(f = 1.-n_nodes/N,
df_M_PL_015_rnd_analysis kappa = (k_sd*k_sd + k_ave*k_ave)/k_ave)
<- df_M_PL_015_mtl %>% mutate(f = 1.-n_nodes/N,
df_M_PL_015_mtl_analysis kappa = (k_sd*k_sd + k_ave*k_ave)/k_ave)
```

- display the dataframes
`df_M_PL_015_rnd_analysis`

and`df_M_PL_015_mtl_analysis`

and identify the value of \(f\) at which \(\kappa\) becomes lower than 2 (Molloy-Reed criterion). Assign these values to a pair of variables

```
= [???]
f_c_M_PL_015_rnd = [???] f_c_M_PL_015_mtl
```

- Plot the results of the simulations obtained for random and strategic (from-most-to-least connected) removal of nodes along with a pair of vertical lines indicating the expected threshold according to the Molloy-Reed criterion:

```
ggplot() +
# ggtitle("plot title")+
geom_point(data = df_M_PL_015_mtl_analysis, aes(f, C_max/N), color = "darkgrey",shape = 1,size =0.5) +
geom_point(data = df_M_PL_015_rnd_analysis, aes(f, C_max/N), color = "blue",shape = 1,size =0.5) +
stat_function(fun = function(x){1.-x}, color ="black", linetype = "dotted", size = 0.5) +
geom_vline(xintercept = f_c_M_PL_015_rnd, linetype = "dotted", size = 0.5, color = "blue") +
geom_vline(xintercept = f_c_M_PL_015_mtl, linetype = "dotted", size = 0.5) +
ylab(TeX("$C_{max}/N$")) +
xlab(TeX("$f$")) +
xlim(0., 1) + ylim(0., 1) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1.05)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 1.1)) +
geom_boxplot() +
theme(
panel.background = element_rect(fill='transparent'),
plot.background = element_rect(fill='transparent', color="NA"),
panel.grid.major = element_blank(),
axis.line.x = element_line(arrow = grid::arrow(length = unit(0.3, "cm"))),
axis.line.y = element_line(arrow = grid::arrow(length = unit(0.3, "cm")))
)
```

In your opinion, is the Molloy-Reed criterion satisfactory in predicting the critical fraction of nodes at which the largest cluster should

*“evaporates”*into a disconnected network? Comment on the results.**Bonus**Add to the previous plots the so-called*secondary-extinction curve*, namely the fraction of surviving consumers as a function of the removed resources.

### Exercise 4

Repeat the whole procedure for the foodweb **FW_013_03**.

Do you observe significant differences in the robustness towards random removal and strategic (from most to least connected) removal of nodes in the two different networks.

Assuming that specialized species in the two considered empirical networks are more susceptible to extinction than generalists, e.g., because of climate change, which scenario do you expect to be more representative of what might happen in Nature: a decrease of the largest cluster due to random removal of nodes or due to strategic removal from the most connected to the least connected node?

### Save your data

Please save the variables used in the plots running the following command

```
save(df_M_PL_015_rnd_analysis, df_M_PL_015_mtl_analysis, f_c_M_PL_015_rnd, f_c_M_PL_015_mtl,
file="~/ecological_networks_2023/workspace/03-29_measuring_network_robustness/data_ex_M_PL_015.RData")
save(df_FW_013_03_rnd_analysis, df_FW_013_03_mtl_analysis, f_c_FW_013_03_rnd, f_c_FW_013_03_mtl,
file="~/ecological_networks_2023/workspace/03-29_measuring_network_robustness/data_ex_FW_013_03.RData")
```

**CHECK**: Another`*RData`

file should appear in the folder`03-16_toolkit_network_analysis`

. If you want to be hundred percent sure that everything worked well, you can reload the variables from the created files:

```
load("~/ecological_networks_2023/workspace/03-29_measuring_network_robustness/data_ex_M_PL_015.RData")
load("~/ecological_networks_2023/workspace/03-29_measuring_network_robustness/data_ex_FW_013_03.RData")
```

and produce your plots again.