Quality Measurement Exploration

Creating visualizations to compare levels of DE and DP

I want to create multiple visualizations that allow me to compare the levels of DE and DP for individual species. Specifically I want to see which species have the highest and the lowest levels and if they are actually species that can be consumed by the sheep (not Lupinus or Artemisia spp).I would also like to compare how invasive/ non native species compare to native species. Specifically focusing on grasses.
Author

Alexis Means

Published

July 7, 2025

Introduction

In my final project for BCB 520, I started to dive into the levels of DE and DP supplied by different phenological stages as well as different vegetation communities. Now I would like to make similar observations, but at a species level.

Data Wrangling

This chunk of code loads in my quality measurements for my species with observed biomass measurements.

Code
library(lpSolve)
library(openxlsx)
library(tools)
library(dplyr)
library(sf)
library(tidyverse)

# Define the path to the directory containing the file
excelpath <- "C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/FRESH/processed.data/"

# Define the specific file name
filename <- "FRESH.Subset.xlsx"

# Construct the full file path
excel.file <- file.path(excelpath, filename)

# Load the workbook and read the data
library(openxlsx)
habitat <- loadWorkbook(excel.file)
data <- data.frame(readWorkbook(excel.file, sheet = "FRESH Data", startRow = 1, colNames = TRUE))


plant <-read.csv("C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/Cleaning/processed.data/plant.csv")

data <- data %>% 
  rename(Code = Plant.Code) %>% 
  left_join(y = plant, 
            by = "Code",
            relationship = "many-to-one")

Highest DE and DP

First I wanted to filter the dataset to contain only the top 20 species/ phenological stage pairs with the highest levels of DE and DP. I started by grouping my observatons by their species, phenological stage and part of the plant. If there was multiple measurements for those observations, I calculated the mean for that unique plant combination.

Digestible Energy

Code
library(tidyverse)

#make sure that the phenological stages are seen as categorical variables and in the correct order
data$Phenology <- factor(data$Phenology, levels = c("N", "B", "FL", "FR", "M", "C"))

library(dplyr)
library(stringr)
library(plotly)

# Clean up any leading/trailing whitespace and convert to character
data_clean <- data %>%
  mutate(
    Code = str_trim(as.character(Code)),
    Phenology = str_trim(as.character(Phenology)),
    Part = str_trim(as.character(Part))
  )

# Step 1: Calculate mean DE and organize data
top_20_combos <- data_clean %>%
  group_by(Code, Phenology, Part) %>%
  summarise(mean_DE = mean(DE, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(mean_DE)) %>%
  slice_head(n = 20) %>% 
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))%>%
  mutate(combo_label = reorder(combo_label, -mean_DE))

top_20_combos <- top_20_combos %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(), 
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, -mean_DE))


##creating interavtive scatterplot
p1 <- ggplot(top_20_combos, aes(
  x = combo_label,
  y = mean_DE,
  color = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DE, 2)
  )
)) +
  geom_point(size = 4) +
  labs(
    title = "Top 20 Species Combinations by Mean DE",
    x = "Code | Phenology | Part",
    y = "Mean DE",
    color = "Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p1, tooltip = "text")
Code
# Step 2: Filter the dataset to only include those top 20 combinations
filtered_DE <- data_clean %>%
  semi_join(top_20_combos, by = c("Code", "Phenology", "Part"))

# Count number of observations per Code–Phenology–Part combo
combo_counts <- filtered_DE %>%
  count(Code, Phenology, Part, name = "obs_count") %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(),
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, obs_count))

p2 <- ggplot(combo_counts, aes(
  x = combo_label,
  y = obs_count,
  fill = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Observations: ", obs_count
  )
)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Observation Frequency of Top 20 Species Combos",
    x = "Code | Phenology | Part",
    y = "Number of Observations",
    fill = "Status"
  ) +
  theme_minimal(base_size = 12)

ggplotly(p2, tooltip = "text")

The first graph shows that Sisymbrium altissimum (SIAL2) when in its newly emergent stage (N) has the highest observed levels of digestible energy at 14.54 (units?) followed by Amsinckia menziesii (AMME) in its newly emergent stage (N) at 14.47 (units) and Equisetum hyemale (EQHY) in its newly emergent stage (N) at 14.31 (units). Most of the species that were observed with the highest DE were in the newly emergent (N) or budding (B) stages which is as to be expected. There was only two flowering species that fell within the top 20 with the highest DE which were (ERCO12) at 13.47 and (LONU2) at 13.26, one fruiting species (TRDU) at 13.26 and two mature species (CREPIS) at 13.56 and (TACA8) at 13.2. I was surprised to see any mature species make it within the top 20 highest digestible energy values. There were only 3 invasive species that within this list as well, they were new Equisetum hyemale (EQHY), and both new and budding Bromus Tectorum (BRTE).

However, the species that we observed with the highest digestible energy are not the ones that we observed most frequently or the species with the highest biomass. The second graph shows that of the top 20 species-phenology combinations, Bromus Tectorum (BRTE) in its newly emergent stage is the species that we observed the most which is a common invasive species which overruns much of the canyon.

Code
biomass_DE <- data %>% 
  group_by(Code, Phenology, Part) %>% 
  left_join(
    data %>%
      select(Code, Phenology, Part, Status, Biomass) %>%
      distinct(), 
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))
Warning in left_join(., data %>% select(Code, Phenology, Part, Status, Biomass) %>% : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Digestible Protein

Code
# Step 1: Calculate mean DE and organize data
top_20_DP <- data_clean %>%
  group_by(Code, Phenology, Part) %>%
  summarise(mean_DP = mean(DP, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(mean_DP)) %>%
  slice_head(n = 20) %>% 
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))%>%
  mutate(combo_label = reorder(combo_label, -mean_DP))

top_20_DP <- top_20_DP %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(), 
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, -mean_DP))


##creating interavtive scatterplot
p3 <- ggplot(top_20_DP, aes(
  x = combo_label,
  y = mean_DP,
  color = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DP, 2)
  )
)) +
  geom_point(size = 4) +
  labs(
    title = "Top 20 Species Combinations by Mean DP",
    x = "Code | Phenology | Part",
    y = "Mean DP",
    color = "Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p3, tooltip = "text")
Code
# Step 2: Filter the dataset to only include those top 20 combinations
filtered_DP <- data_clean %>%
  semi_join(top_20_DP, by = c("Code", "Phenology", "Part"))

# Count number of observations per Code–Phenology–Part combo
DP_counts <- filtered_DP %>%
  count(Code, Phenology, Part, name = "obs_count") %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(),
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, obs_count))

p4 <- ggplot(DP_counts, aes(
  x = combo_label,
  y = obs_count,
  fill = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Observations: ", obs_count
  )
)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Observation Frequency of Top 20 Species Combos",
    x = "Code | Phenology | Part",
    y = "Number of Observations",
    fill = "Status"
  ) +
  theme_minimal(base_size = 12)

ggplotly(p4, tooltip = "text")

Grass Species

Digestible Energy

Code
grass.DE <- data %>%
  filter(Family == "POACEAE") %>% 
  group_by(Code, Phenology, Part) %>%
  summarise(mean_DE = mean(DE, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(mean_DE)) %>%
  slice_head(n = 20) %>% 
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(), 
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, -mean_DE))

##creating interavtive scatterplot
p5 <- ggplot(grass.DE, aes(
  x = combo_label,
  y = mean_DE,
  color = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DE, 2)
  )
)) +
  geom_point(size = 4) +
  labs(
    title = "Top 20 Grass Species by Mean DE",
    x = "Code | Phenology | Part",
    y = "Mean DE",
    color = "Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p5, tooltip = "text")
Code
filtered_grass.DE <- data %>%
  semi_join(grass.DE, by = c("Code", "Phenology", "Part"))

# Count number of observations per Code–Phenology–Part combo
grass.DE_counts <- filtered_grass.DE %>%
  count(Code, Phenology, Part, name = "obs_count") %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(),
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, obs_count))

p6 <- ggplot(grass.DE_counts, aes(
  x = combo_label,
  y = obs_count,
  fill = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Observations: ", obs_count
  )
)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Observation Frequency of Top 20 Grass Species by DE",
    x = "Code | Phenology | Part",
    y = "Number of Observations",
    fill = "Status"
  ) +
  theme_minimal(base_size = 12)

ggplotly(p6, tooltip = "text")

Digestible Protein

Code
grass.DP <- data %>%
  filter(Family == "POACEAE") %>% 
  group_by(Code, Phenology, Part) %>%
  summarise(mean_DP = mean(DP, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(mean_DP)) %>%
  slice_head(n = 20) %>% 
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(), 
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, -mean_DP))

##creating interavtive scatterplot
p7 <- ggplot(grass.DP, aes(
  x = combo_label,
  y = mean_DP,
  color = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DP: ", round(mean_DP, 2)
  )
)) +
  geom_point(size = 4) +
  labs(
    title = "Top 20 Grass Species by Mean DP",
    x = "Code | Phenology | Part",
    y = "Mean DP",
    color = "Status"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p7, tooltip = "text")
Code
filtered_grass.DP <- data %>%
  semi_join(grass.DP, by = c("Code", "Phenology", "Part"))

# Count number of observations per Code–Phenology–Part combo
grass.DP_counts <- filtered_grass.DP %>%
  count(Code, Phenology, Part, name = "obs_count") %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(),
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, obs_count))

p8 <- ggplot(grass.DP_counts, aes(
  x = combo_label,
  y = obs_count,
  fill = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Observations: ", obs_count
  )
)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Observation Frequency of Top 20 Grass Species by DP",
    x = "Code | Phenology | Part",
    y = "Number of Observations",
    fill = "Status"
  ) +
  theme_minimal(base_size = 12)

ggplotly(p8, tooltip = "text")