BCB 520 - Final Project

Exploring biomass and quality values in the John Day, Oregon

I will be using quality and biomass data from my sampled vegetation data in the John Day, Oregon to help visualize how forage quality changes throughout my study area over time.
Author

Alexis Means

Published

March 26, 2025

Project Description:

Understanding Bighorn Sheep Nutrition and Movement in the John Day Canyon

Located within the Columbia Plateau, the John Day Canyon is a steep, rugged landscape where sagebrush and grasses dominate the hillsides. This dynamic environment supports a unique population of bighorn sheep — one that, remarkably, has avoided the disease outbreaks that often devastate other herds.

My research focuses on understanding the nutritional requirements of these bighorn sheep and how those needs shift throughout their reproductive cycle due to changing metabolic demands. The ultimate goal is to develop a predictive model that links their nutritional requirements to movement patterns. By doing so, we can identify potential threats, such as areas where bighorn sheep may come into contact with domestic livestock, and pinpoint critical habitats essential for their survival.

To achieve this, I am collecting vegetation data to calculate a measure of “suitable biomass”—essentially, the portion of available forage that meaningfully contributes to the sheep’s energy needs at different reproductive stages. This metric is created using measures of digestible energy and digestible protein. I have currently not finalized my values of “suitable biomass”, but I do have preliminary values of digestible energy and digestible protein from a subset of my data. Eventually using this metric, I will develop a Generalized Additive Model (GAM) to map areas of high and low nutritional quality across the landscape.

These techniques will help determine which resources most influence bighorn sheep movement and allow us to predict future movement patterns—a critical step in ensuring their continued success in this landscape. By bridging nutritional ecology with movement ecology, this research will contribute valuable insights into how bighorn sheep navigate their environment and help inform conservation strategies for this unique, disease-free population.

Data Wrangling

Code
library(lpSolve)
library(openxlsx)
library(tools)
library(dplyr)
library(sf)
library(tidyverse)

# Define the path to the directory containing the file
excelpath <- "C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/FRESH/processed.data/"

# Define the specific file name
filename <- "FRESH.Subset.xlsx"

# Construct the full file path
excel.file <- file.path(excelpath, filename)

# Load the workbook and read the data
library(openxlsx)
habitat <- loadWorkbook(excel.file)
data <- data.frame(readWorkbook(excel.file, sheet = "FRESH Data", startRow = 1, colNames = TRUE))


plant <-read.csv("C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/Cleaning/processed.data/plant.csv")

data <- data %>% 
  rename(Code = Plant.Code) %>% 
  left_join(y = plant, 
            by = "Code",
            relationship = "many-to-one")

Data Dictionary

For further description as to what each of my variables are within my data set. Please see my data dictionary here.

Code
table<- read.csv("C:/Users/Alexis Means/Documents/School/BCB520/2A/attributes.csv") 
knitr::kable(table)
Attribute Type Note
Biomass Quantitative This measurement tells us what the weight of dry biomass is for each specific observation
TransectID Categorical This descriptor is a unique ID that tells us which randomized point we sampled
Quadrat Categorical There are 5 quadrats that we sample for each quadrat (20,40,60,80,100)
Code Categorical These are unique codes that describe the family, genius and species for every item observed
Phenology Categorical Each species is assigned a growth stage when we observe it - New, Budding, Flowering, Fruiting, Mature or Cured (N, B, FL, FR, M, C)
PVT Categorical This number describes the vegetation community that was being sampled, we have 5 total for the study area, it is used as part of the descriptor for each plotID
percent Quantitative This measurement tells us the percent cover that each composition_id occupies within a 1x1m quadrat
Spp Categorical This unique ID is slightly more broad and will be used to identify species/phenology combinations within each vegetation type as well as the season
Family Categorical This will be used to group quality data if we do not have enough information to determine the quality down to the smaller scale (genus)
Genus Categorical This will be used to group quality data if we do not have enough information to determine the quality down to the smaller scale (spp)
Species Categorical This will be used to group quality data if we do not have enough information to determine the quality down to the smaller scale (phenological stage)
FunctionalGroup Categorical The smallest grouping variable for my quality data (perennial or annual grass, forb, or shrub)
FGNew Categorical Shortened version of FunctionalGroup
CommonName Categorical This is another identifier for each species, it will likely not be used within the analysis so it could be removed
Duration Categorical This is another category I may use to group quality data based on the growth duration of each species
Status Categorical Each species is categorized as native or invasive
Season Categorical Our observations are grouped based on the date that they were sampled (Spring, Summer and Fall) to observe the changes in nutritional quality
Dates Categorical This keeps track of the day that each observation was sampled
Aspect Categorical Described the direction the hill was facing that each of our transects had been sampled on
Elev Quantitative Describes the elevation that each of the transects was sampled at
Lat/Long Categorical Each of the lat/long pairs plots the beginning, middle and end of each of the transects
DE Quantitative Amount of digestible energy provided by each species
DP Quantitative Amount of digestible protien provided by each species
DE.SD Quantitative Standard deviation of digestible energy
DP.SD Quantitative Standard deviation of digestible protien
Julian Date Categorical The converted date that each of the plots was sampled to julian day
Easting Categorical The converted latitude to UTM
Northing Categorical The converted longitude to UTM
TotalDE Quantitative The amount of total DE available at a transect
TotalDP Quantitative The amount of total DP available at a transect
SuitableBiomass Quantitative The amount of forage available at each transect that meet the assigned energetic demands of a lactating female sheep
AveDE Quantitative The average amount of digestible energy available at a transect
AveDP Quantitative The average amount of digestible protien available at a transect
Month Categorical The month that the plot was sampled

GPS Data

Lastly, I want to see how the sheep movements change each month with the overall biomass of both native and invasive species. I started out by creating a heat map of our overall sheep location use throughout the canyon.

Code
gps <- read.csv("C:/Users/Alexis Means/Documents/Project/Demographics/Code/cleaned.data/sheep_clean_24.csv")

gps <- gps %>% 
  rename(Easting = x_,
         Northing = y_)

# Step 1: Kernel Density Estimation (KDE) for GPS data
gps_kde <- ggplot(gps, aes(x = Easting, y = Northing)) +
  stat_density_2d(aes(fill = after_stat(density)), geom = "raster", contour = FALSE) +
  scale_fill_viridis_c(option = "inferno",
          name = "GPS Use Density",
          breaks = c(2e-09, 10e-09),  # Define specific density values
          labels = c("Low", "High")  # Replace  values with "Low" and "High"
  ) +
  labs(title = "Sheep Area Use")+
  theme_minimal()+
  theme(axis.text = element_blank(),  # Remove tick labels
        axis.ticks = element_blank()) 
  plot(gps_kde)

If the areas of high biomass density align with values of high nutrient quality in theory this means that the density of our sheep points should also follow the trends of high biomass density in the months of April and May. I wanted to see if I could overlay our sheep locations each month with the biomass density data each month to observe my hunch that sheep would follow this trend in these months. They do not appear to follow this trend, most likely due to other outside factors such as selection for escape terrain that outweigh the need to follow vegetation trends. It is also likely that areas of high biomass does not always mean areas of high nutritional quality.

Code
month.sum <- spp.map %>%
  filter(!is.na(Biomass) & Biomass > 0) %>%  # Remove NAs
  mutate(Month = case_when(
    Julian_Day >= 93 & Julian_Day < 124 ~ "April",
    Julian_Day >= 124 & Julian_Day < 155 ~ "May",
    Julian_Day >= 155 & Julian_Day < 185 ~ "June",
    Julian_Day >= 185 & Julian_Day <= 212 ~ "July",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(Month)) %>%
  mutate(Month = factor(Month, levels = c("April", "May", "June", "July")))


# Normalize biomass within each month (to make sample sizes comparable)
month.sum <- month.sum %>%
  group_by(Month) %>%
  mutate(Total_Biomass = sum(Biomass),  # Calculate total biomass for each month
         Normalized_Biomass = Biomass / Total_Biomass) %>%  # Normalize biomass
  ungroup()  # Remove grouping

# Extract month from the GPS timestamp
gps <- gps %>%
  mutate(Month = case_when(
    month(t_) == 4 ~ "April",
    month(t_) == 5 ~ "May",
    month(t_) == 6 ~ "June",
    month(t_) == 7 ~ "July",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(Month)) %>% 
  mutate(Month = factor(Month, levels = c("April", "May", "June", "July")))

ggplot() + # Biomass density heatmap using kernel density estimation
  stat_density_2d(
    data = month.sum,
    aes(x = Easting, y = Northing, weight = Normalized_Biomass, fill = after_stat(density)),
    geom = "raster",
    contour = FALSE
  ) +
  facet_grid(Month ~ Status) +  # Grid facet by Month (rows) and Status (columns)
  scale_fill_viridis_c(
    name = "Biomass Density", 
    option = "turbo",
    breaks = c(2e-09, 18e-09),  # Define specific density values
    labels = c("Low", "High")  # Replace numerical values with "Low" and "High"
  ) +  
  labs(title = "Normalized Density of Biomass with GPS Locations (Native vs Invasive Species by Month)",
       x = "Easting", 
       y = "Northing") +
  theme_minimal() +
  theme(axis.text = element_blank(),  # Remove tick labels
        axis.ticks = element_blank()) +
  # Overlay GPS points, filtering by Month to match the facet
  geom_point(
    data = gps, 
    aes(x = Easting, y = Northing), 
    color = "red", alpha = 0.1, size = 0.4
  )

Code
# Define the path to the directory containing the file
excelpath <- "C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/FRESH/processed.data/"

# Define the specific file name
filename <- "2024.subset.results.xlsx"

# Construct the full file path
excel.file <- file.path(excelpath, filename)

# Load the workbook and read the data

habitat <- loadWorkbook(excel.file)
suitable <- data.frame(readWorkbook(excel.file, sheet = "Plot-Level-Summary", startRow = 1, colNames = TRUE))


suitable.map <- suitable %>% 
  rename(PlotID = TransectID) %>% 
  left_join(y = plot,
            by = "PlotID")

suitable.map <- suitable.map %>% 
  rename(Lat = BeginLat,
         Long = BeginLong,
         Easting = BeginUTM_Easting,
         Northing = BeginUTM_Northing) %>% 
  select(PlotID, TotalDE, TotalDP, SuitableBiomass, AveDE, AveDP, Julian_Day, Lat, Long, Easting, Northing)
View(suitable.map)

# Create an sf object with the Lat and Long columns
suitable.map_UTM <- st_as_sf(suitable.map, coords = c("Long", "Lat"), crs = 4326)  # EPSG:4326 is WGS84

# Convert to UTM Zone 11N (EPSG:26911)
suitable.map_UTM <- st_transform(suitable.map_UTM, crs = 32611)

# Extract UTM coordinates (Easting and Northing)
suitable.map$Easting <- st_coordinates(suitable.map_UTM)[, 1]
suitable.map$Northing <- st_coordinates(suitable.map_UTM)[, 2]

Suitable Biomass

After completing the FRESH model using a subset of my data, I was able to get estimates of suitable biomass available at each transect that I sampled. I replicated similar graphs to my biomass density for native vs invasive species and the results are fairly similar. Like the biomass plots, there is a higher density in the south end of our study area when looking at the suitable biomass overall for the study area.

Code
suitable.summary <- suitable.map %>%
   filter(!is.na(SuitableBiomass) & SuitableBiomass > 0)  # Remove NAs
  
ggplot(suitable.summary, aes(x = Easting, y = Northing, weight = SuitableBiomass)) +
  stat_density_2d(aes(fill = after_stat(density)), geom = "raster", contour = FALSE) +
  scale_fill_viridis_c(
    name = "Suitable Biomass Density", 
    option = "turbo",
    breaks = c(0.5e-09, 3.5e-09),  # Define specific density values
    labels = c("Low", "High")  # Replace numerical values with "Low" and "High"
  ) +  
  labs(title = "Density of Suitable Biomass",
       x = "Easting", 
       y = "Northing") +
  theme_minimal() +
  theme(axis.text = element_blank(),  # Remove tick labels
    axis.ticks = element_blank())  # Rotate x-axis labels

Code
#|warning: false
#| message: false
suitablemonth.sum <- suitable.map %>%
  filter(!is.na(SuitableBiomass) & SuitableBiomass > 0) %>%
    mutate(Month = case_when(
      Julian_Day >= 93 & Julian_Day < 124 ~ "April",
      Julian_Day >= 124 & Julian_Day < 155 ~ "May",
      Julian_Day >= 155 & Julian_Day < 185 ~ "June",
      Julian_Day >= 185 & Julian_Day <= 212 ~ "July",
      TRUE ~ NA_character_
    )) %>%
  filter(!is.na(Month)) %>%
  mutate(Month = factor(Month, levels = c("April", "May", "June", "July")))

# Normalize biomass within each month (to make sample sizes comparable)
suitablemonth.sum <- suitablemonth.sum %>%
  group_by(Month) %>%
  mutate(Total_SuitableBiomass = sum(SuitableBiomass),  # Calculate total biomass for each month
         Normalized_SuitableBiomass = SuitableBiomass / Total_SuitableBiomass) %>%  # Normalize biomass
  ungroup()  # Remove grouping


# Create density plot faceted by Month and Status
ggplot(suitablemonth.sum, aes(x = Easting, y = Northing, weight = Normalized_SuitableBiomass)) +
  stat_density_2d(aes(fill = after_stat(density)), geom = "raster", contour = FALSE) +
  facet_grid("Month") +  # Grid facet by Month (rows) and Status (columns)
  scale_fill_viridis_c(
    name = "Suitable Biomass Density", 
    option = "turbo",
    breaks = c(1e-09, 9e-09),  # Define specific density values
    labels = c("Low ", "High ")  # Replace numerical values with "Low" and "High"
  ) +  
  labs(title = "Normalized Density of Suitable Biomass",
       x = "Easting", 
       y = "Northing") +
  theme_minimal() +
  theme(axis.text = element_blank(),  # Remove tick labels
        axis.ticks = element_blank())
Warning: The following aesthetics were dropped during statistical transformation:
weight.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation:
weight.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation:
weight.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation:
weight.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

When looking at the density of suitable biomass month by month, May still has the highest density of suitable biomass. While this is most likely the month with the largest amount of species in their nutrient dense phase, it is also the month that we sampled the most. So I feel like the sampling efforts may be slightly skewing my data.

In order to truly see these values accurately represented across my study area, I will need to start creating my predictive GAM. This will allow me to project these values of suitable biomass to other 30x30m pixels with similar covariates and predict how vegetative quality changes throughout the canyon in different times of the year.