Chapter 3 Sampling an ecological network
Eva Knop, Vincent Grognuz and Sebastián Montoya
session 18/03/2025
Field work
For this exercise session, you will go outside to collect data on species interactions in order to build your own pollination network. You can find instructions on how to collect interaction data in the slides provided. Moreover (weather permitting), we will demonstrate what you should do at the Irchel Park.
An important thing to keep in mind is that you should record the interactions you sample in the file we provide. You can download the file here. This is the file you are expected to use when building your network in R.
Data analysis and report preparation
After having collected your data, you should do the following tasks:
- Load the data you collected in the field into R
- Check species names
- Estimate sampling completeness
- Convert the data into a network
- Visualize your network
We expect you to submit a short report that combines your code, data,
and visualizations. To do so, we provide you with a template Rmarkdown
file that you can fill in. You can find this file here:
/workspace/03-18_sampling_an_ecological_network
.
Please keep in mind that the deadline to submit the report for this exercise session is Saturday, April 5th at 1:00 pm. If you have questions, please do not hesitate to contact us.
3.1 Practical demonstration
To walk you through all the steps needed to prepare your report we will work with a mock data set.
3.1.1 Load data
After you recorded your data into the file called
species_interactions_protocol.csv
, you will need to upload it to the
Rstudio Server. To do so, you can click on the Upload button found on
the bottom right panel of Rstudio. This will allow you to navigate your
file system to find the file you wish to upload and then upload it
directly to the server. Please upload your data in the following location:
/workspace/03-18_sampling_an_ecological_network/
. Once the file is on the server, we can load the
relevant packages we will need and read the data into R as follows:
# Load packages
library(igraph)
library(dplyr)
library(bipartite)
library(taxize)
library(tidyr)
library(vegan)
# Read data into R
data <- read.csv("~/ecological_networks_2025/downloads/Data/sampling_an_ecological_network/species_interactions_protocol_example.csv", sep=";")
Note that you will need to change the file path so that it points to the location where your data is stored on the server(!).
Next we can check that the data was imported correctly:
## site date time plant insect
## 1 forest edge 22.03.2022 14:02 Primula_vulgaris Bombus_terrestris
## 2 hedges garden 22.03.2022 14:02 Primula_vulgaris Bombus_terrestris
## 3 park 22.03.2022 14:02 Salix_sp. Episyrphus_balteatus
## 4 forest edge 22.03.2022 14:15 Primula_vulgaris Hymenoptera_sp.2
## 5 hedges garden 23.03.2022 14:05 Salix_sp. Bombus_m1
## 6 hedges garden 23.03.2022 14:05 Salix_sp. Bombus_m1
## insect_behaviour temperature weather remarks
## 1 resting 18 sunny visitor with missing legs
## 2 resting 18 sunny missing petals
## 3 feeding on nectar 18 cloudy visitor has ectoparasites
## 4 feeding on nectar 18 cloudy <NA>
## 5 feeding on nectar 18 cloudy <NA>
## 6 feeding on nectar 18 cloudy <NA>
3.1.2 Check species names
Checking species names is a fundamental step before you analyze your data, as misspelled names will increase the number of nodes in a network.
Let’s begin by checking plant names.
## [1] "Gallanthus_nivalis" "Primula_vulgaris" "Primula_vulgaris "
## [4] "Salix_sp. " "Taraxacum_officinalis" "YellowFlower _m1 "
Note that R recognizes “Primula_vulgaris” different from “Primula_vulgaris ”, because there is a blank space. Other plant names also have blank spaces.
Now, let’s check insect names.
## [1] "Apis_mellifera" "Bombus_m1 " "Bombus_terrestris"
## [4] "Episyrphus_balteatus" "Hymenoptera_sp.2" "Hymenoptera_sp1"
## [7] "Hymenoptera_sp2" "Orthoptera_m1"
Besides a blank space in “Bombus_m1 ”, “Hymenoptera_sp.2” and “Hymenoptera_sp2” are being recognized as different.
There are two ways to solve this issue. You can either correct the names directly in your .csv
file or you can use R for this. When you have a huge database of interactions, it is easier to use R than to track each problematic name in your file. Here is how you could solve it with R.
# Correct plant names
# Remove unwanted blank spaces and correct names
data$plant <- gsub(" ", "", data$plant)
# Do another visual check for plants
sort(unique(data$plant))
## [1] "Gallanthus_nivalis" "Primula_vulgaris" "Salix_sp."
## [4] "Taraxacum_officinalis" "YellowFlower_m1"
# Correct insect names
# Remove unwanted blank spaces and correct names
data$insect <- gsub(" ", "", data$insect)
# Standardize names
data$insect <- gsub("_sp1", "_sp.1", data$insect)
data$insect <- gsub("_sp2", "_sp.2", data$insect)
# Do another visual check
sort(unique(data$insect))
## [1] "Apis_mellifera" "Bombus_m1" "Bombus_terrestris"
## [4] "Episyrphus_balteatus" "Hymenoptera_sp.1" "Hymenoptera_sp.2"
## [7] "Orthoptera_m1"
Plant and insect names are now standardized.
3.1.2.1 Pay attention to taxonomy
When you sample interactions, it is your responsibility to make sure that species names are correct and updated. Remember, your data could be used by you or other researchers in the future and its reusability will depend on how carefully you handled your data set. This step is important, even if you obtain your networks from an online database.
The taxize
package of R offers a useful, but not infallible, tool to quickly check your names.
# Check taxonomy for plants
gna_verifier(data$plant %>% unique, # unique species names in your data frame
data_sources=196 # the taxonomy database to check your names with
)[,c(1,3,9,12)]
## # A tibble: 5 × 4
## submittedName dataSourceTitleShort matchedName matchedCanonicalFull
## <chr> <chr> <chr> <chr>
## 1 Primula_vulgaris World Flora Online Primula vulga… Primula vulgaris
## 2 Salix_sp. World Flora Online Salix L. Salix
## 3 YellowFlower_m1 <NA> <NA> <NA>
## 4 Gallanthus_nivalis World Flora Online Galanthus niv… Galanthus nivalis
## 5 Taraxacum_officinalis World Flora Online Taraxacum off… Taraxacum officinale
# Check taxonomy for insects
gna_verifier(data$insect %>% unique, # unique species names in your data frame
data_sources=1 # the taxonomy database ID to check your names with
)[,c(1,3,9,12)]
## # A tibble: 7 × 4
## submittedName dataSourceTitleShort matchedName matchedCanonicalFull
## <chr> <chr> <chr> <chr>
## 1 Bombus_terrestris Catalogue of Life Bombus terrest… Bombus terrestris
## 2 Episyrphus_balteatus Catalogue of Life Episyrphus (Ep… Episyrphus balteatus
## 3 Hymenoptera_sp.2 Catalogue of Life Hymenoptera Hymenoptera
## 4 Bombus_m1 Catalogue of Life Bombus Latreil… Bombus
## 5 Apis_mellifera Catalogue of Life Apis mellifera… Apis mellifera
## 6 Hymenoptera_sp.1 Catalogue of Life Hymenoptera Hymenoptera
## 7 Orthoptera_m1 Catalogue of Life Orthoptera Oli… Orthoptera
In case there is an outdated name, taxize
will provide the most up to date name in the “matchedCanonicalFull” column. The package can also help you in detecting misspelled names.
Be careful, if your names are not found within a given database, it does not mean they do not exist, just that the data base does not contain that name. This is very common for recently described species, poorly known species, or species from low-income countries.
Once you detect outdated names, you can change them directly in your .csv
file or with R. You can find the full list of databases linked to taxize
and their IDs in https://resolver.globalnames.org/data_sources
3.1.3 Estimate interaction sampling completeness
Similar to estimating sampling completeness for species richness in a community, it is possible to estimate sampling completeness for the unique pairwise interactions in a network. Here we will do it using the estaccumR
function of the vegan
package.
First, we will order our data as a matrix, with each unique pairwise interaction in the rows, and each sample (sampling day) in the columns.
# Create a sampling interaction matrix
samp_mat<-data.frame(interactions=paste0(data$plant, "&", data$insect),
date=data$date, value=1) |>
group_by(interactions, date) |>
# summarise the groups by adding a new column which counts the number of interactions between each unique combination of interactions and sampling date
summarise(weight = n()) |>
ungroup() |>
pivot_wider(names_from="date", values_from="weight", values_fill=0, names_sort = TRUE) # Transform into a matrix-like object
samp_mat
## # A tibble: 12 × 6
## interactions `22.03.2022` `23.03.2022` `24.03.2022` `25.03.2022` `26.03.2022`
## <chr> <int> <int> <int> <int> <int>
## 1 Gallanthus_… 0 0 0 1 0
## 2 Gallanthus_… 1 0 0 0 0
## 3 Primula_vul… 2 0 0 0 0
## 4 Primula_vul… 1 0 0 0 0
## 5 Primula_vul… 4 0 0 0 0
## 6 Salix_sp.&B… 4 2 0 0 0
## 7 Salix_sp.&E… 1 1 0 0 0
## 8 Taraxacum_o… 0 0 0 0 1
## 9 Taraxacum_o… 1 0 0 0 0
## 10 Taraxacum_o… 1 0 0 0 0
## 11 YellowFlowe… 0 0 1 0 0
## 12 YellowFlowe… 1 0 0 0 0
Then, we will estimate interaction sampling completeness using the Chao1 estimator.
# Calculate an interaction accumulation curve and estimate the number of unique pairwise interactions per accumulated sample
pool_estimators <- estaccumR(t(samp_mat[,-1]))
# Calculate sampling completeness. To do so, divide the observed number of unique pairwise interactions by the estimated number and multiply it by 100.
max(pool_estimators$S)/max(pool_estimators$chao)*100
## [1] 40
Now, let’s have a look at the curves.
# Plot the species accumulation curve
plot(x=pool_estimators$means[,1], # accumulated samples
y=pool_estimators$means[,2], # observed accumulated number of unique pairwise interactions
main="Interaction Accumulation Curve", # define title
xlab="Number of Samples", # define label for X axis
ylab="Number of unique pairwise interactions", # define label for Y axis
col="black", # define line color
lwd=2, # define line width
las=1, # define the orientation of tickmark labels
ylim=c(0,max(pool_estimators$means[,3])+5), # define limits for Y axis
ty="l" # define the plot as lines
)
# Add the Chao1 estimator to the plot
lines(x=pool_estimators$means[,1], # accumulated samples
y=pool_estimators$means[,3], # estimated number of unique pairwise interactions with Chao 1
col = "red", # define line color
lwd = 2 # define line width
)
# Add a legend
legend("bottomright", # position of the legend
legend = c("Observed", "Chao1 Estimate"), # define labels
col = c("black", "red"), # define line colors
lwd = 2, # define line width
bty = "n" # remove the box of the legend
)
3.1.4 Convert data to a bipartite network
Now that you have loaded your data set, checked their names, and estimated sampling completeness, it is time to visualize your network. To do so, you need to convert it into an igraph
object, i.e., “something” that is suitable to be visualized and to be analyzed quantitatively as a network. Let’s start by converting
our raw data into an edge list:
# convert the raw data into an edge list
my_edge_list <- data |>
# only select relevant columns
select(plant, insect) |>
# group by each unique combination of plant and insect
group_by(plant, insect) |>
# summarise the groups by adding a new column which counts the number of
# interactions between each unique combination of plant and insect
summarise(weight = n()) |>
# ungroup the plants and animals
ungroup()
Now our data is formatted in such a way that each row summarises the number of interactions between a pair of plant and insect:
## # A tibble: 6 × 3
## plant insect weight
## <chr> <chr> <int>
## 1 Gallanthus_nivalis Apis_mellifera 1
## 2 Gallanthus_nivalis Hymenoptera_sp.2 1
## 3 Primula_vulgaris Bombus_terrestris 2
## 4 Primula_vulgaris Hymenoptera_sp.1 1
## 5 Primula_vulgaris Hymenoptera_sp.2 4
## 6 Salix_sp. Bombus_m1 6
Following the procedure we have applied in the session
toolkit for network analysis
, we can convert our edge list into a
network using the igraph
package.
# Convert the edge list into network
my_network <- graph_from_data_frame(my_edge_list, directed = FALSE)
# Check network
my_network
## IGRAPH 2b0678e UNW- 12 12 --
## + attr: name (v/c), weight (e/n)
## + edges from 2b0678e (vertex names):
## [1] Gallanthus_nivalis --Apis_mellifera
## [2] Gallanthus_nivalis --Hymenoptera_sp.2
## [3] Primula_vulgaris --Bombus_terrestris
## [4] Primula_vulgaris --Hymenoptera_sp.1
## [5] Primula_vulgaris --Hymenoptera_sp.2
## [6] Salix_sp. --Bombus_m1
## [7] Salix_sp. --Episyrphus_balteatus
## [8] Taraxacum_officinalis--Episyrphus_balteatus
## + ... omitted several edges
Now we need to make our network bipartite. To do so, we can use the
bipartite_mapping
function to automatically assign plants and insects
to separate groups:
# Assign groups to insects and plants
bipartite_mapping_output <- bipartite_mapping(my_network)
# Set these groups to the nodes of the network
V(my_network)$type <- bipartite_mapping_output$type
# Confirm that network is now bipartite
is_bipartite(my_network)
## [1] TRUE
3.1.5 Plot the network
We can plot the network using either the igraph
or bipartite
package. Remember, however, that each one requires our network to be
represented in different formats. igraph
requires a network object
while bipartite
requires an incidence matrix.
Let’s first plot our network using igraph
:
# Set the layout for our plot
set.seed(11)
LO = layout_nicely(my_network) # Set a layout that reduces edge overlapping
# Visualize our network
plot(my_network, # define the network we want to plot
vertex.label=V(my_network)$Name, # define the node labels
vertex.size=8, # define node size
vertex.label.dist=2, # define position of node labels
vertex.label.degree =pi/6, # define the position of node labels
vertex.label.cex=0.8, # define label size
vertex.color=c("darkgreen", "orange")[V(my_network)$type+1], # define colour of nodes
vertex.shape=c("square", "circle")[V(my_network)$type+1], # define shape of nodes
vertex.label.family="Times", # define typography
vertex.label.color="black", # define colour of border
edge.color="gray", # define colour of links
edge.curved=0.1, # curve the edges
edge.width=E(my_network)$weight, # define edge width
main = "My Network") # define title
If we want to plot our network using the bipartite
package, we first
need to convert our network into an incidence matrix:
# Convert the igraph object into incidence matrix
my_incidence_matrix <- as_incidence_matrix(
my_network,
attr = "weight",
names = TRUE,
sparse = FALSE
)
# Check matrix
my_incidence_matrix
## Apis_mellifera Hymenoptera_sp.2 Bombus_terrestris
## Gallanthus_nivalis 1 1 0
## Primula_vulgaris 0 4 2
## Salix_sp. 0 0 0
## Taraxacum_officinalis 0 1 0
## YellowFlower_m1 0 0 0
## Hymenoptera_sp.1 Bombus_m1 Episyrphus_balteatus
## Gallanthus_nivalis 0 0 0
## Primula_vulgaris 1 0 0
## Salix_sp. 0 6 2
## Taraxacum_officinalis 0 0 1
## YellowFlower_m1 0 1 0
## Orthoptera_m1
## Gallanthus_nivalis 0
## Primula_vulgaris 0
## Salix_sp. 0
## Taraxacum_officinalis 1
## YellowFlower_m1 1
Now we can plot our network:
# Plot the network
# (you can tweak the arguments to change the look of your network)
plotweb(my_incidence_matrix, # define the matrix
text.rot=90, # rotate the text
bor.col.interaction="gray40", # define colour of links
col.high="orange", # define colour of top level
col.low="darkgreen", # define colour of lower level
y.lim=c(-1,3)) # tweak y axis so that network is visible in RMARKDOWN
3.1.6 Compare networks that you sampled on different sites
When you sample networks at different sites, or seasons, you might be interested in not aggregating interactions in a single network, but visualizing their differences. To separate our data as a list of networks per site we use the frame2webs
function of the bipartite
package.
# Compare the networks that you sampled on different sites
my_network_list<-
# Split the data frame into networks based on the "site" where the interaction were recorded
# You could use the "date" instead if you were interested in daily differences
frame2webs(data, varnames = c("plant", "insect", "site")) %>%
lapply(graph_from_incidence_matrix, directed = FALSE, weighted=TRUE)
my_network_list
## $`forest edge`
## IGRAPH 096f7e1 UNWB 11 8 --
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges from 096f7e1 (vertex names):
## [1] Gallanthus_nivalis --Apis_mellifera
## [2] Primula_vulgaris --Bombus_terrestris
## [3] Primula_vulgaris --Hymenoptera_sp.1
## [4] Primula_vulgaris --Hymenoptera_sp.2
## [5] Salix_sp. --Bombus_m1
## [6] Salix_sp. --Episyrphus_balteatus
## [7] Taraxacum_officinalis--Episyrphus_balteatus
## [8] Taraxacum_officinalis--Orthoptera_m1
##
## $`hedges garden`
## IGRAPH d971ebb UNWB 6 4 --
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges from d971ebb (vertex names):
## [1] Primula_vulgaris--Bombus_terrestris Primula_vulgaris--Hymenoptera_sp.2
## [3] Salix_sp. --Bombus_m1 YellowFlower_m1 --Bombus_m1
##
## $park
## IGRAPH e8b5a42 UNWB 9 6 --
## + attr: type (v/l), name (v/c), weight (e/n)
## + edges from e8b5a42 (vertex names):
## [1] Gallanthus_nivalis --Hymenoptera_sp.2
## [2] Primula_vulgaris --Hymenoptera_sp.2
## [3] Salix_sp. --Bombus_m1
## [4] Salix_sp. --Episyrphus_balteatus
## [5] Taraxacum_officinalis--Hymenoptera_sp.2
## [6] YellowFlower_m1 --Orthoptera_m1
# Set the layout for the plot
LO = lapply(my_network_list, # Each network in the list
layout_nicely) # Use a layout that reduces edge overlapping
# Visualize the networks
layout(matrix(1:4, nrow = 2, ncol = 2)) # Set up the figure layout
par(mar=c(2,2,2,2))
for (name in names(my_network_list)) {
x <- my_network_list[[name]] # Get the network object
set.seed=10
plot(x, # define the network we want to plot
vertex.label=V(x)$Name, # define the node labels
vertex.size=8, # define node size
vertex.label.dist=2.5, # define position of node labels
vertex.label.degree =pi/6, # define the position of node labels
vertex.label.cex=0.8, # define label size
vertex.color=c("darkgreen", "orange")[V(x)$type+1], # define colour of nodes
vertex.shape=c("square", "circle")[V(my_network)$type+1], # define shape of nodes
vertex.label.family="Times", # define typography
vertex.label.color="black", # define colour of border
edge.color="gray", # define colour of links
edge.curved=0.1, # curve the edges
edge.width=E(x)$weight, # define edge width
main = name) # use the name of the list element as the title
}
3.2 Your task
We expect you to repeat the analysis shown above with the data set you
collected during the block course. We ask you to upload the raw data you
collected as a .csv
file onto the server filling the template provided
by us.
Please store your data set in the folder
workspace/03-18_sampling_an_ecological_network/
. Later you can use the
03-18_sampling_an_ecological_network.Rmd
script therein to produce
your report in html format. Once you are done, please download the whole
folder and upload it on OLAT.
Bonus
Join forces with other two students to compare the networks that you sampled in the different sites (i.e., forest edge, park, and hedges garden). To do so, create a single .csv
file. In your report, describe how you performed your sampling, including where, when (time and date) you sampled, the methods that you used to record the interactions (e.g., did you include all visitors regardless of whether they touched the anthers of the flower?). Describe each site (e.g., was there a dominant/most abundant plant species in each site?). Plot the networks and describe their differences (e.g., are the same species present in all sites? The species with most interactions in one site is also the one with most interactions in all other sites?). Did you perceive a relationship between flower abundance and the number of visitor species recorder for each species? Describe.
If you encounter technical troubles, please ask Sebastián or Fernando to help you.