2 Species-level metrics

Miguel Roman and Leandro Cosmo
session 15/03/2024

2.1 Introduction

In this practical exercise we will focus on the properties of the nodes in a network, known as node metrics (or species-level metrics when applied to ecosystem modelling). We will calculate these node metrics using the igraph package. Before starting, we need to load this and some other packages and define some functions that will make the session easier to go through:

showMatrix <- function(adj_mat) {
  adj_df <- as.data.frame(adj_mat)
  rownames(adj_df) <- colnames(adj_df) <- 1:nrow(adj_df)
  visweb(adj_df, labsize=.8, type="none", clear = FALSE, plotsize = 7)
}

2.1.1 Creating an igraph network

Let’s start by creating a very simple network, the so-called “path network”, which essentially consists on nodes connected in a linear sequence.

G <- make_empty_graph(5) + edges(c(1,2, 2,3, 3,4, 4,5))

# Plot the path graph
plot(G, 
     layout = layout_with_kk,  # Layout the vertices in a line
     vertex.size = 30,         # Set the size of vertices
     vertex.color = "skyblue", # Set the color of vertices
     edge.color = "gray",      # Set the color of edges
     main = "Path network of 5 nodes"  # Main title
)

2.1.2 Matrix representation of networks

As you might know by now, networks can be represented as matrices, where rows and columns are assigned to the nodes in the network. Each cell in these matrices represents a pair of nodes, and its value will be 1 if they are connected or 0 if they aren’t. The type of representation we will use throughout this lecture is the adjacency matrix, which are always symmetric matrices.

Now let’s display the adjacency matrix representation for the network above. Take some time to compare it with the above network until you understand its meaning.

adj_mat <- as.matrix(get.adjacency(as.undirected(G)))
showMatrix(adj_mat)


2.2 Node metrics

Now let’s study the properties of the nodes in the network we created before. We will also need to create other slightly more complex networks to contrast what we get:

Note: The second network is generated using a random algorithm. You can re-run the code to obtain different structures.

G2 <- make_star(6, mode = "undirected")
plot(G2, 
     layout = layout_with_kk,  # Layout the vertices in a line
     vertex.size = 30,         # Set the size of vertices
     vertex.color = "skyblue", # Set the color of vertices
     main = "Star network of 6 nodes"  # Main title
)

G3 <- sample_k_regular(10,4)
plot(G3, 
     layout = layout_with_kk,  # Layout the vertices in a line
     vertex.size = 20,         # Set the size of vertices
     vertex.color = "skyblue", # Set the color of vertices
     edge.size =5,
     main = "K-regular graph of 10 nodes"  # Main title
)

2.2.1 Node degree

The degree of a node is the number of links connected to it. In the case of ecological networks, the degree of a species’ node means how many other species interact with it. To calculate it for each node, it is as simple as computing the sum of rows (or columns, by symmetry) of the adjacency matrix.

deg<-rowSums(adj_mat)
print(deg)
## [1] 1 2 2 2 1

Alternatively, you can use igraph::degree(G) directly on the networks to compare across all three:

print(igraph::degree(G))
## [1] 1 2 2 2 1
print(igraph::degree(G2))
## [1] 5 1 1 1 1 1
print(igraph::degree(G3))
##  [1] 4 4 4 4 4 4 4 4 4 4

Pay attention that, although looking random, the K-regular network has a degree of 4 in all of its nodes, take a look at its adjacency matrix and see how all rows (or columns) sum 4.

adj_mat <- as.matrix(get.adjacency(as.undirected(G3)))
showMatrix(adj_mat)

2.2.2 Node strength

If the network is not binary, it is known as a weighted network. Instead of 0s and 1s, links can have any real value, such as decimal numbers, for example, to describe the frequency of interaction between species. Similarly to summing the links to calculate the degree of a node, in weighted networks we can sum the weights of the links connected to a given node, and this is called its strength. Node strength can be seen as an equivalent of node degree for weighted networks. Actually, all metrics taught in this session have their “counterparts” in weighted networks, although we won’t work with them for now.

2.2.3 Closeness centrality

To define this metric, one first needs to understand the concept of shortest path. The shortest path between two nodes is the minimum sequence of links to travel from one to another, assuming it is only possible to travel through links and nodes.

set.seed(447999477)
#set.seed(238564608)
G4 <- sample_k_regular(30,3)

# Find the shortest path between nodes 1 and 2
shortest_path <- shortest_paths(G4, from = 1, to = 20)$vpath[[1]]

G4 <- set_edge_attr(G4, "color", value = "gray")
G4 <- set_edge_attr(G4, "width", value = 1)

# Highlight the shortest path in red
for (i in 1:(length(shortest_path)-1)) {
  edge <- get.edge.ids(G4, c(shortest_path[i], shortest_path[i+1]), directed = FALSE)
  G4 <- set_edge_attr(G4, "color", value = "red", index = edge)
  G4 <- set_edge_attr(G4, "width", value = 3, index = edge)  # Set the width to 3
}

# Plot the updated graph with the highlighted shortest path
plot(G4, 
     layout = layout_with_kk,  
     vertex.size = 17,         # Set the size of vertices
     vertex.color = "gold", # Set the color of vertices
     edge.size =5,
     main = "Example: shortest path from node 20 to node 1"  # Main title
)

Instead of measuring the distance \(d\) of the shortest path between nodes 1 and 20 (marked in red), we could measure their “closeness” by dividing \(\frac{1}{d}\). The fewer links forming this path, the higher the value of “closeness” between these two nodes. Therefore, the closeness between nodes 1 and 20 is 0.25, while the closeness between nodes 1 and 12 is 1.0, the maximum possible.

The closeness centrality of a node \(i\) is the sum of these closeness values from the node itself to all others in the network:

\[ C_B(i)=\sum_j\frac{1}{d_{ij}} \] where \(d_{ij}\) is the shortest path distance between nodes \(i\) and \(j\), ex. \(d_{1,20}=4\). Therefore, closeness centrality gives an idea of how “reachable” the node is from any other point in the network. This measure tends to be smaller for large networks, as some nodes can be very distant from each others. For this reason, it is usually normalized by the total number of nodes minus one (N-1).

\[ C(i)=\frac{\sum_j\frac{1}{d_{ij}}}{N-1} \] The function distances() calculates the shortest paths between each pair of nodes and places them in a matrix.

# generate a random tree network
G <- sample_tree(50, method = "lerw")
adj_mat <- as.matrix(get.adjacency(as.undirected(G))) # get adjacency matrix

d <- igraph::distances(G) # get matrix of shortest path distances
#diag(d)<-0 # Setting the diagonal to 0 (exclude self effects and loops, not necessary for the example)
cl <- 1/d # closeness = inverse of distance
diag(cl)<-0 # Setting the diagonal to 0 to avoid infinite closeness of nodes with themselves
closeness <- (rowSums(cl))/(nrow(adj_mat)-1) # Computing closeness centrality

#============= Plotting ========================
# Get node coordinates
coords <- layout_with_kk(G)

# Create a data frame with coordinates
df <- data.frame(x = coords[, 1], y = coords[, 2])

# Get link list from igraph object
edges <- get.edgelist(G)

# Convert link list to data frame
df_edges <- data.frame(
  x1 = coords[edges[,1], 1],
  y1 = coords[edges[,1], 2],
  x2 = coords[edges[,2], 1],
  y2 = coords[edges[,2], 2]
)

nodesize_tmp = closeness*13 # weighting size of the nodes for the sake of visualization
p <- ggplot(df, aes(x = x, y = y)) +
  geom_segment(data = df_edges, aes(x = x1, y = y1, xend = x2, yend = y2), color = "black") +
  geom_point(aes(color = closeness), size = nodesize_tmp) + 
  scale_color_gradient(low = 'skyblue', high = 'red',name = "Closeness centrality") +
  ggtitle("Closeness centrality example") +
  xlab("") + ylab("") +
  theme_void()
print(p)

In weighted networks, the function distances() uses weights as the cost of traveling through links (or the “length” of the link). Therefore, weights can increase or decrease the distance of the path.

2.2.4 Katz centrality

Instead of just using shortest paths as we did in our previous calculations, we could actually sum the pathways of all lengths connecting to a given node. It can be calculated analytically by inverting the matrix K: \[ K =I-\alpha A \]

where \(A\) is the adjacency matrix, \(I\) is the identity matrix, and \(\alpha\) is a constant that is lower than \(\frac{1}{\lambda}\), where \(\lambda\) is the maximum eigenvalue of \(A\), also called spectral radius. This connections made with distant neighbors are, however, penalized by an attenuation factor

In an undirected network (i.e., links don’t have direction, adjacency and incidence matrix are the same) we can calculate Katz centrality by the sum of values in each rows (or columns, by symmetry) of matrix \(K\). However, in a directed network the row or columns sums can mean the outgoing pathways or the ingoing pathways depending on how the adjacency matrix is structured. For directed networks, column sums of matrix K is the outgoing pathways from each node, while row sums is the incoming pathways. Nevertheless, we won’t work with directed networks in this session. Here we prepared a function to calculate Katz centrality for all nodes in a matrix:

katz<-function(A,penalize=0.1){
  A.eigen<-eigen(A)$values #Calculating eigenvalues
  A.maxeigen<-ifelse(is.complex(A.eigen), max(Re(A.eigen[which(Im(A.eigen)==0)])), max(A.eigen)) # Get largest eigenvalue
  alpha <- 1/(A.maxeigen+penalize) # Calculate alpha. We sum 0.1 to make the value a bit lower than the maximum allowed
  T<-alpha*A                  # Multiply A matrix with alpha
  I<-diag(1, nrow=nrow(T), ncol=ncol(T)) # Get the identity matrix I
  katz<-solve(I-T) # solve() gives the inverse matrix.
  katz_cent<-rowSums(katz) # Calculate Katz centrality for each node
  return(katz_cent)
}

Now let’s visualize this metric on the same network as in the previous example:

katz_c<-katz(adj_mat)
katz_c<-katz_c/sum(katz_c) # normalisation (optional)

#============= Plotting ========================
# Get node coordinates
coords <- layout_with_kk(G)

# Create a data frame with coordinates
df <- data.frame(x = coords[, 1], y = coords[, 2])

# Get link list from igraph object
edges <- get.edgelist(G)

# Convert link list to data frame
df_edges <- data.frame(
  x1 = coords[edges[,1], 1],
  y1 = coords[edges[,1], 2],
  x2 = coords[edges[,2], 1],
  y2 = coords[edges[,2], 2]
)

nodesize_tmp = katz_c*110 # weighting size of the nodes for the sake of visualization
p <- ggplot(df, aes(x = x, y = y)) +
  geom_segment(data = df_edges, aes(x = x1, y = y1, xend = x2, yend = y2), color = "black") +
  geom_point(aes(color = katz_c), size = nodesize_tmp) + 
  scale_color_gradient(low = 'skyblue', high = 'red',name = "Katz centrality") +
  ggtitle("Katz centrality example") +
  xlab("") + ylab("") +
  theme_void()
print(p)

2.2.4.1 Extra exercise

This exercise will not add grade to your submission. By reading the code for the katz() function, you will realize that it uses by default a penalization of 0.1. Play around with this value and plot the network again and again to understand its meaning. What happens? Hint: Try with something like katz_c <- katz(<your_matrix>,penalize=<positive_value>)

2.2.5 Wrap-up exercise

Below you can find an ecological network of seed dispersal sampled from Cañada de las Sabinas, Nava de las Correhuelas, Sierra de Cazorla, Andalucía. It consists of 12 plants and 4 birds that disperse their seeds. With a bit of copy-pasting the previous code and some minor changes, use the metrics you learned in this session to answer the following questions. Apart from the name of the species, paste the values you obtained to support your answer.

  1. What species has the highest potential to directly affect all others?
  2. What species has the highest potential to directly and in directly affect all others through direct pathways?
  3. What species has the highest potential to directly and in directly affect all others through all possible pathways?

Code:

# define the base url
base_url <- "https://www.web-of-life.es/" 
json_url <- paste0("https://www.web-of-life.es/get_networks.php?network_name=M_SD_027") #M_SD_027 is the network ID in the database. This can be changed to study other networks
nw <- jsonlite::fromJSON(json_url)
class(nw$connection_strength) <-"numeric"
# select the 3 relevant columns and create the igraph object 
G <- nw %>% dplyr::select(species1, species2, connection_strength) %>% graph_from_data_frame(directed = FALSE)
# get info about species 
my_info <- read.csv(paste0(base_url,"get_species_info.php?network_name=M_PA_002"))

isResource <- my_info$is.resource %>% as.logical() # 0/1 converted to FALSE/TRUE

# Add the "type" attribute to the vertices of the graph 
#V(G)$type <- !(isResource) 

#E(G) # to see the edges (pairwise interactions)
V(G) # to see the vertices (nodes) names = species 
## + 16/16 vertices, named, from f0ee5ac:
##  [1] Crataegus monogyna  Hedera helix        Berberis vulgaris  
##  [4] Juniperus communis  Juniperus phoenicea Juniperus sabina   
##  [7] Lonicera arborea    Prunus prostrata    Rhamnus myrtifolius
## [10] Rosa canina         Taxus baccata       Rhamnus saxatilis  
## [13] Turdus merula       Turdus iliacus      Turdus torquatus   
## [16] Turdus viscivorus
adj_mat <- as.matrix(get.adjacency(as.undirected(G))) # get adjacency matrix

katz_c<-katz(adj_mat,0.001)
katz_c<-katz_c/sum(katz_c) # normalisation (optional)

#============= Plotting ========================

# Get node coordinates
coords <- layout_with_kk(G)

# Create a data frame with coordinates
df <- data.frame(x = coords[, 1], y = coords[, 2], node = V(G)$name)

# Get link list from igraph object
edges <- get.edgelist(G, names=FALSE)

# Convert link list to data frame
df_edges <- data.frame(
  x1 = coords[edges[,1], 1],
  y1 = coords[edges[,1], 2],
  x2 = coords[edges[,2], 1],
  y2 = coords[edges[,2], 2]
)

nodesize_tmp = katz_c*100 # weighting size of the nodes for the sake of visualization
margin=0.3
p <- ggplot(df, aes(x = x, y = y)) +
  geom_segment(data = df_edges, aes(x = x1, y = y1, xend = x2, yend = y2), color = "honeydew4") +
  geom_point(aes(color = katz_c), size = nodesize_tmp) + 
  geom_label(aes(label = node, color=katz_c), size = 2.5, fontface = "bold.italic", nudge_y = 0.2,nudge_x = 0.05) +  # Adding node labels
  scale_color_gradient(low = 'skyblue', high = 'sienna1',name = "Katz centrality") +
  ggtitle("Imported network") +
  xlab("") + ylab("") +
  theme_void()+
  
  # Adjust plot limits to include a margin
  xlim(min(coords[, 1]) - margin, max(coords[, 1]) + margin) +
  ylim(min(coords[, 2]) - margin, max(coords[, 2]) + margin)
print(p)

2.2.6 Extra exercise

The network you just used comes from the web of life database. There you can find many more networks of pollinators, food webs, etc. Our network has the network ID M_SD_027 in the database, but if you’re curious you can browse other networks, find their IDs and repeat these analyses.

2.3 Feedback

We would appreciate your feedback :) You can submit it together with your exercise submissions, or alternatively through this link if you prefer to keep it anonymous. Thank you!