Chapter 3 Sampling an ecological network

Eva Knop, Vincent Grognuz and Sebastián Montoya
session 18/03/2025

Field work

For this exercise session, you will go outside to collect data on species interactions in order to build your own pollination network. You can find instructions on how to collect interaction data in the slides provided. Moreover (weather permitting), we will demonstrate what you should do at the Irchel Park.

An important thing to keep in mind is that you should record the interactions you sample in the file we provide. You can download the file here. This is the file you are expected to use when building your network in R.

Data analysis and report preparation

After having collected your data, you should do the following tasks:

Load the data you collected in the field into R
Check species names
Estimate sampling completeness
Convert the data into a network
Visualize your network

We expect you to submit a short report that combines your code, data, and visualizations. To do so, we provide you with a template Rmarkdown file that you can fill in. You can find this file here: /workspace/03-18_sampling_an_ecological_network.

Please keep in mind that the deadline to submit the report for this exercise session is Saturday, April 5th at 1:00 pm. If you have questions, please do not hesitate to contact us.

3.1 Practical demonstration

To walk you through all the steps needed to prepare your report we will work with a mock data set.

3.1.1 Load data

After you recorded your data into the file called species_interactions_protocol.csv, you will need to upload it to the Rstudio Server. To do so, you can click on the Upload button found on the bottom right panel of Rstudio. This will allow you to navigate your file system to find the file you wish to upload and then upload it directly to the server. Please upload your data in the following location: /workspace/03-18_sampling_an_ecological_network/. Once the file is on the server, we can load the relevant packages we will need and read the data into R as follows:

# Load packages
library(igraph)
library(dplyr)
library(bipartite)
library(taxize)
library(tidyr)
library(vegan)

# Read data into R
data <- read.csv("~/ecological_networks_2025/downloads/Data/sampling_an_ecological_network/species_interactions_protocol_example.csv", sep=";")

Note that you will need to change the file path so that it points to the location where your data is stored on the server(!).

Next we can check that the data was imported correctly:

# Take a glimpse at the data
head(data)

##            site       date  time             plant               insect
## 1   forest edge 22.03.2022 14:02 Primula_vulgaris     Bombus_terrestris
## 2 hedges garden 22.03.2022 14:02  Primula_vulgaris    Bombus_terrestris
## 3          park 22.03.2022 14:02        Salix_sp.  Episyrphus_balteatus
## 4   forest edge 22.03.2022 14:15  Primula_vulgaris     Hymenoptera_sp.2
## 5 hedges garden 23.03.2022 14:05        Salix_sp.            Bombus_m1 
## 6 hedges garden 23.03.2022 14:05        Salix_sp.            Bombus_m1 
##    insect_behaviour temperature weather                   remarks
## 1           resting          18   sunny visitor with missing legs
## 2           resting          18   sunny            missing petals
## 3 feeding on nectar          18  cloudy visitor has ectoparasites
## 4 feeding on nectar          18  cloudy                      <NA>
## 5 feeding on nectar          18  cloudy                      <NA>
## 6 feeding on nectar          18  cloudy                      <NA>

3.1.2 Check species names

Checking species names is a fundamental step before you analyze your data, as misspelled names will increase the number of nodes in a network.

Let’s begin by checking plant names.

# Do a visual check for plants
sort(unique(data$plant))

## [1] "Gallanthus_nivalis"    "Primula_vulgaris"      "Primula_vulgaris "    
## [4] "Salix_sp. "            "Taraxacum_officinalis" "YellowFlower _m1 "

Note that R recognizes “Primula_vulgaris” different from “Primula_vulgaris ”, because there is a blank space. Other plant names also have blank spaces.

Now, let’s check insect names.

# Do a visual check for insects
sort(unique(data$insect))

## [1] "Apis_mellifera"       "Bombus_m1 "           "Bombus_terrestris"   
## [4] "Episyrphus_balteatus" "Hymenoptera_sp.2"     "Hymenoptera_sp1"     
## [7] "Hymenoptera_sp2"      "Orthoptera_m1"

Besides a blank space in “Bombus_m1 ”, “Hymenoptera_sp.2” and “Hymenoptera_sp2” are being recognized as different.

There are two ways to solve this issue. You can either correct the names directly in your .csv file or you can use R for this. When you have a huge database of interactions, it is easier to use R than to track each problematic name in your file. Here is how you could solve it with R.

# Correct plant names
# Remove unwanted blank spaces and correct names
data$plant <- gsub(" ", "", data$plant)
# Do another visual check for plants
sort(unique(data$plant))

## [1] "Gallanthus_nivalis"    "Primula_vulgaris"      "Salix_sp."            
## [4] "Taraxacum_officinalis" "YellowFlower_m1"

# Correct insect names
# Remove unwanted blank spaces and correct names
data$insect <- gsub(" ", "", data$insect)
# Standardize names
data$insect <- gsub("_sp1", "_sp.1", data$insect)
data$insect <- gsub("_sp2", "_sp.2", data$insect)
# Do another visual check
sort(unique(data$insect))

## [1] "Apis_mellifera"       "Bombus_m1"            "Bombus_terrestris"   
## [4] "Episyrphus_balteatus" "Hymenoptera_sp.1"     "Hymenoptera_sp.2"    
## [7] "Orthoptera_m1"

Plant and insect names are now standardized.

3.1.2.1 Pay attention to taxonomy

When you sample interactions, it is your responsibility to make sure that species names are correct and updated. Remember, your data could be used by you or other researchers in the future and its reusability will depend on how carefully you handled your data set. This step is important, even if you obtain your networks from an online database.

The taxize package of R offers a useful, but not infallible, tool to quickly check your names.

# Check taxonomy for plants
gna_verifier(data$plant %>% unique, # unique species names in your data frame
             data_sources=196 # the taxonomy database to check your names with
               )[,c(1,3,9,12)]

## # A tibble: 5 × 4
##   submittedName         dataSourceTitleShort matchedName    matchedCanonicalFull
##   <chr>                 <chr>                <chr>          <chr>               
## 1 Primula_vulgaris      World Flora Online   Primula vulga… Primula vulgaris    
## 2 Salix_sp.             World Flora Online   Salix L.       Salix               
## 3 YellowFlower_m1       <NA>                 <NA>           <NA>                
## 4 Gallanthus_nivalis    World Flora Online   Galanthus niv… Galanthus nivalis   
## 5 Taraxacum_officinalis World Flora Online   Taraxacum off… Taraxacum officinale

# Check taxonomy for insects
gna_verifier(data$insect %>% unique, # unique species names in your data frame
             data_sources=1 # the taxonomy database ID to check your names with
               )[,c(1,3,9,12)]

## # A tibble: 7 × 4
##   submittedName        dataSourceTitleShort matchedName     matchedCanonicalFull
##   <chr>                <chr>                <chr>           <chr>               
## 1 Bombus_terrestris    Catalogue of Life    Bombus terrest… Bombus terrestris   
## 2 Episyrphus_balteatus Catalogue of Life    Episyrphus (Ep… Episyrphus balteatus
## 3 Hymenoptera_sp.2     Catalogue of Life    Hymenoptera     Hymenoptera         
## 4 Bombus_m1            Catalogue of Life    Bombus Latreil… Bombus              
## 5 Apis_mellifera       Catalogue of Life    Apis mellifera… Apis mellifera      
## 6 Hymenoptera_sp.1     Catalogue of Life    Hymenoptera     Hymenoptera         
## 7 Orthoptera_m1        Catalogue of Life    Orthoptera Oli… Orthoptera

In case there is an outdated name, taxize will provide the most up to date name in the “matchedCanonicalFull” column. The package can also help you in detecting misspelled names.

Be careful, if your names are not found within a given database, it does not mean they do not exist, just that the data base does not contain that name. This is very common for recently described species, poorly known species, or species from low-income countries.

Once you detect outdated names, you can change them directly in your .csv file or with R. You can find the full list of databases linked to taxize and their IDs in https://resolver.globalnames.org/data_sources

3.1.3 Estimate interaction sampling completeness

Similar to estimating sampling completeness for species richness in a community, it is possible to estimate sampling completeness for the unique pairwise interactions in a network. Here we will do it using the estaccumR function of the vegan package.

First, we will order our data as a matrix, with each unique pairwise interaction in the rows, and each sample (sampling day) in the columns.

# Create a sampling interaction matrix
samp_mat<-data.frame(interactions=paste0(data$plant, "&", data$insect),
           date=data$date, value=1) |>
  group_by(interactions, date) |>
  # summarise the groups by adding a new column which counts the number of interactions between each unique combination of interactions and sampling date
  summarise(weight = n()) |>
  ungroup() |>
  pivot_wider(names_from="date", values_from="weight", values_fill=0, names_sort = TRUE) # Transform into a matrix-like object

samp_mat

## # A tibble: 12 × 6
##    interactions `22.03.2022` `23.03.2022` `24.03.2022` `25.03.2022` `26.03.2022`
##    <chr>               <int>        <int>        <int>        <int>        <int>
##  1 Gallanthus_…            0            0            0            1            0
##  2 Gallanthus_…            1            0            0            0            0
##  3 Primula_vul…            2            0            0            0            0
##  4 Primula_vul…            1            0            0            0            0
##  5 Primula_vul…            4            0            0            0            0
##  6 Salix_sp.&B…            4            2            0            0            0
##  7 Salix_sp.&E…            1            1            0            0            0
##  8 Taraxacum_o…            0            0            0            0            1
##  9 Taraxacum_o…            1            0            0            0            0
## 10 Taraxacum_o…            1            0            0            0            0
## 11 YellowFlowe…            0            0            1            0            0
## 12 YellowFlowe…            1            0            0            0            0

Then, we will estimate interaction sampling completeness using the Chao1 estimator.

# Calculate an interaction accumulation curve and estimate the number of unique pairwise interactions per accumulated sample
pool_estimators <- estaccumR(t(samp_mat[,-1]))

# Calculate sampling completeness. To do so, divide the observed number of unique pairwise interactions by the estimated number and multiply it by 100.

max(pool_estimators$S)/max(pool_estimators$chao)*100

## [1] 40

Now, let’s have a look at the curves.

# Plot the species accumulation curve
plot(x=pool_estimators$means[,1], # accumulated samples
     y=pool_estimators$means[,2], # observed accumulated number of unique pairwise interactions
     main="Interaction Accumulation Curve", # define title
     xlab="Number of Samples", # define label for X axis
     ylab="Number of unique pairwise interactions", # define label for Y axis
     col="black", # define line color
     lwd=2, # define line width
     las=1, # define the orientation of tickmark labels
     ylim=c(0,max(pool_estimators$means[,3])+5), # define limits for Y axis
     ty="l" # define the plot as lines
     )

# Add the Chao1 estimator to the plot
lines(x=pool_estimators$means[,1], # accumulated samples
      y=pool_estimators$means[,3], # estimated number of unique pairwise interactions with Chao 1
      col = "red", # define line color 
      lwd = 2 # define line width
      )

# Add a legend
legend("bottomright", # position of the legend
       legend = c("Observed", "Chao1 Estimate"), # define labels
       col = c("black", "red"), # define line colors
       lwd = 2, # define line width
       bty = "n" # remove the box of the legend
       )

3.1.4 Convert data to a bipartite network

Now that you have loaded your data set, checked their names, and estimated sampling completeness, it is time to visualize your network. To do so, you need to convert it into an igraph object, i.e., “something” that is suitable to be visualized and to be analyzed quantitatively as a network. Let’s start by converting our raw data into an edge list:

# convert the raw data into an edge list
my_edge_list <- data |>
  # only select relevant columns
  select(plant, insect) |>
  # group by each unique combination of plant and insect
  group_by(plant, insect) |>
  # summarise the groups by adding a new column which counts the number of 
  # interactions between each unique combination of plant and insect
  summarise(weight = n()) |>
  # ungroup the plants and animals
  ungroup()

Now our data is formatted in such a way that each row summarises the number of interactions between a pair of plant and insect:

# get a glimpse at the data
head(my_edge_list)

## # A tibble: 6 × 3
##   plant              insect            weight
##   <chr>              <chr>              <int>
## 1 Gallanthus_nivalis Apis_mellifera         1
## 2 Gallanthus_nivalis Hymenoptera_sp.2       1
## 3 Primula_vulgaris   Bombus_terrestris      2
## 4 Primula_vulgaris   Hymenoptera_sp.1       1
## 5 Primula_vulgaris   Hymenoptera_sp.2       4
## 6 Salix_sp.          Bombus_m1              6

Following the procedure we have applied in the session toolkit for network analysis, we can convert our edge list into a network using the igraph package.

# Convert the edge list into network
my_network <- graph_from_data_frame(my_edge_list, directed = FALSE)

# Check network 
my_network

## IGRAPH 2b0678e UNW- 12 12 -- 
## + attr: name (v/c), weight (e/n)
## + edges from 2b0678e (vertex names):
##  [1] Gallanthus_nivalis   --Apis_mellifera      
##  [2] Gallanthus_nivalis   --Hymenoptera_sp.2    
##  [3] Primula_vulgaris     --Bombus_terrestris   
##  [4] Primula_vulgaris     --Hymenoptera_sp.1    
##  [5] Primula_vulgaris     --Hymenoptera_sp.2    
##  [6] Salix_sp.            --Bombus_m1           
##  [7] Salix_sp.            --Episyrphus_balteatus
##  [8] Taraxacum_officinalis--Episyrphus_balteatus
## + ... omitted several edges

Now we need to make our network bipartite. To do so, we can use the bipartite_mapping function to automatically assign plants and insects to separate groups:

# Assign groups to insects and plants
bipartite_mapping_output <- bipartite_mapping(my_network)

# Set these groups to the nodes of the network
V(my_network)$type <- bipartite_mapping_output$type

# Confirm that network is now bipartite
is_bipartite(my_network)

## [1] TRUE

3.1.5 Plot the network

We can plot the network using either the igraph or bipartite package. Remember, however, that each one requires our network to be represented in different formats. igraph requires a network object while bipartite requires an incidence matrix.

Let’s first plot our network using igraph:

# Set the layout for our plot
set.seed(11)
LO = layout_nicely(my_network) # Set a layout that reduces edge overlapping


# Visualize our network
plot(my_network, # define the network we want to plot 
     vertex.label=V(my_network)$Name, # define the node labels
     vertex.size=8, # define node size
     vertex.label.dist=2, # define position of node labels
     vertex.label.degree =pi/6, # define the position of node labels
     vertex.label.cex=0.8, # define label size
     vertex.color=c("darkgreen", "orange")[V(my_network)$type+1], # define colour of nodes
     vertex.shape=c("square", "circle")[V(my_network)$type+1], # define shape of nodes
     vertex.label.family="Times", # define typography
     vertex.label.color="black", # define colour of border
     edge.color="gray", # define colour of links
     edge.curved=0.1, # curve the edges
     edge.width=E(my_network)$weight, # define edge width
     main = "My Network") # define title

If we want to plot our network using the bipartite package, we first need to convert our network into an incidence matrix:

# Convert the igraph object into incidence matrix 
my_incidence_matrix <- as_incidence_matrix(
  my_network,
  attr = "weight",
  names = TRUE,
  sparse = FALSE
)

# Check matrix
my_incidence_matrix

##                       Apis_mellifera Hymenoptera_sp.2 Bombus_terrestris
## Gallanthus_nivalis                 1                1                 0
## Primula_vulgaris                   0                4                 2
## Salix_sp.                          0                0                 0
## Taraxacum_officinalis              0                1                 0
## YellowFlower_m1                    0                0                 0
##                       Hymenoptera_sp.1 Bombus_m1 Episyrphus_balteatus
## Gallanthus_nivalis                   0         0                    0
## Primula_vulgaris                     1         0                    0
## Salix_sp.                            0         6                    2
## Taraxacum_officinalis                0         0                    1
## YellowFlower_m1                      0         1                    0
##                       Orthoptera_m1
## Gallanthus_nivalis                0
## Primula_vulgaris                  0
## Salix_sp.                         0
## Taraxacum_officinalis             1
## YellowFlower_m1                   1

Now we can plot our network:

# Plot the network 
# (you can tweak the arguments to change the look of your network)

plotweb(my_incidence_matrix, # define the matrix
        text.rot=90, # rotate the text
        bor.col.interaction="gray40", # define colour of links
        col.high="orange", # define colour of top level
        col.low="darkgreen", # define colour of lower level
        y.lim=c(-1,3)) # tweak y axis so that network is visible in RMARKDOWN

3.1.6 Compare networks that you sampled on different sites

When you sample networks at different sites, or seasons, you might be interested in not aggregating interactions in a single network, but visualizing their differences. To separate our data as a list of networks per site we use the frame2webs function of the bipartite package.

# Compare the networks that you sampled on different sites
my_network_list<-
  # Split the data frame into networks based on the "site" where the interaction were recorded
  # You could use the "date" instead if you were interested in daily differences
frame2webs(data, varnames = c("plant", "insect", "site")) %>%
  lapply(graph_from_incidence_matrix, directed = FALSE, weighted=TRUE)

my_network_list

## $`forest edge`
## IGRAPH 096f7e1 UNWB 11 8 -- 
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges from 096f7e1 (vertex names):
## [1] Gallanthus_nivalis   --Apis_mellifera      
## [2] Primula_vulgaris     --Bombus_terrestris   
## [3] Primula_vulgaris     --Hymenoptera_sp.1    
## [4] Primula_vulgaris     --Hymenoptera_sp.2    
## [5] Salix_sp.            --Bombus_m1           
## [6] Salix_sp.            --Episyrphus_balteatus
## [7] Taraxacum_officinalis--Episyrphus_balteatus
## [8] Taraxacum_officinalis--Orthoptera_m1       
## 
## $`hedges garden`
## IGRAPH d971ebb UNWB 6 4 -- 
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges from d971ebb (vertex names):
## [1] Primula_vulgaris--Bombus_terrestris Primula_vulgaris--Hymenoptera_sp.2 
## [3] Salix_sp.       --Bombus_m1         YellowFlower_m1 --Bombus_m1        
## 
## $park
## IGRAPH e8b5a42 UNWB 9 6 -- 
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges from e8b5a42 (vertex names):
## [1] Gallanthus_nivalis   --Hymenoptera_sp.2    
## [2] Primula_vulgaris     --Hymenoptera_sp.2    
## [3] Salix_sp.            --Bombus_m1           
## [4] Salix_sp.            --Episyrphus_balteatus
## [5] Taraxacum_officinalis--Hymenoptera_sp.2    
## [6] YellowFlower_m1      --Orthoptera_m1

# Set the layout for the plot
LO = lapply(my_network_list, # Each network in the list
            layout_nicely) # Use a layout that reduces edge overlapping



# Visualize the networks
layout(matrix(1:4, nrow = 2, ncol = 2))  # Set up the figure layout

par(mar=c(2,2,2,2))
for (name in names(my_network_list)) {
  x <- my_network_list[[name]]  # Get the network object
  set.seed=10
  plot(x, # define the network we want to plot 
       vertex.label=V(x)$Name, # define the node labels
       vertex.size=8, # define node size
       vertex.label.dist=2.5, # define position of node labels
       vertex.label.degree =pi/6, # define the position of node labels
       vertex.label.cex=0.8, # define label size
       vertex.color=c("darkgreen", "orange")[V(x)$type+1], # define colour of nodes
       vertex.shape=c("square", "circle")[V(my_network)$type+1], # define shape of nodes
       vertex.label.family="Times", # define typography
       vertex.label.color="black", # define colour of border
       edge.color="gray", # define colour of links
       edge.curved=0.1, # curve the edges
       edge.width=E(x)$weight, # define edge width
       main = name) # use the name of the list element as the title
}

3.2 Your task

We expect you to repeat the analysis shown above with the data set you collected during the block course. We ask you to upload the raw data you collected as a .csv file onto the server filling the template provided by us.

Please store your data set in the folder workspace/03-18_sampling_an_ecological_network/. Later you can use the 03-18_sampling_an_ecological_network.Rmd script therein to produce your report in html format. Once you are done, please download the whole folder and upload it on OLAT.

Bonus

Join forces with other two students to compare the networks that you sampled in the different sites (i.e., forest edge, park, and hedges garden). To do so, create a single .csv file. In your report, describe how you performed your sampling, including where, when (time and date) you sampled, the methods that you used to record the interactions (e.g., did you include all visitors regardless of whether they touched the anthers of the flower?). Describe each site (e.g., was there a dominant/most abundant plant species in each site?). Plot the networks and describe their differences (e.g., are the same species present in all sites? The species with most interactions in one site is also the one with most interactions in all other sites?). Did you perceive a relationship between flower abundance and the number of visitor species recorder for each species? Describe.

If you encounter technical troubles, please ask Sebastián or Fernando to help you.