Quality Measurement Exploration

Creating visualizations to compare levels of DE and DP

I want to create multiple visualizations that allow me to compare the levels of DE and DP for individual species. Specifically I want to see which species have the highest and the lowest levels and if they are actually species that can be consumed by the sheep (not Lupinus or Artemisia spp).I would also like to compare how invasive/ non native species compare to native species. Specifically focusing on grasses.
Author

Alexis Means

Published

July 7, 2025

Introduction

In my final project for BCB 520, I started to dive into the levels of DE and DP supplied by different phenological stages as well as different vegetation communities. Now I would like to make similar observations, but at a species level.

Data Wrangling

This chunk of code loads in my quality measurements for my species with observed biomass measurements.

Code
library(lpSolve)
library(openxlsx)
library(tools)
library(dplyr)
library(sf)
library(tidyverse)

# Define the path to the directory containing the file
excelpath <- "C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/FRESH/processed.data/"

# Define the specific file name
filename <- "FRESH.Subset.xlsx"

# Construct the full file path
excel.file <- file.path(excelpath, filename)

# Load the workbook and read the data
library(openxlsx)
habitat <- loadWorkbook(excel.file)
data <- data.frame(readWorkbook(excel.file, sheet = "FRESH Data", startRow = 1, colNames = TRUE))


plant <-read.csv("C:/Users/Alexis Means/Documents/Project/Nutrition Sampling/R code/Cleaning/processed.data/plant.csv")

data <- data %>% 
  rename(Code = Plant.Code) %>% 
  left_join(y = plant, 
            by = "Code",
            relationship = "many-to-one")

Highest DE and DP

First I wanted to filter the dataset to contain only the top 20 species/ phenological stage pairs with the highest levels of DE and DP. I started by grouping my observatons by their species, phenological stage and part of the plant. If there was multiple measurements for those observations, I calculated the mean for that unique plant combination.

DE vs DP

Code
library(tidyverse)

#make sure that the phenological stages are seen as categorical variables and in the correct order
data$Phenology <- factor(data$Phenology, levels = c("N", "B", "FL", "FR", "M", "C"))

library(dplyr)
library(stringr)
library(plotly)

# Clean up any leading/trailing whitespace and convert to character
data_clean <- data %>%
  mutate(
    Code = str_trim(as.character(Code)),
    Phenology = str_trim(as.character(Phenology)),
    Part = str_trim(as.character(Part))
  )

# Step 1: Calculate means and filter top 20 combos
top_20_combos <- data_clean %>%
  group_by(Code, Phenology, Part, Status) %>%
  summarise(
    mean_DE = mean(DE, na.rm = TRUE),
    mean_DP = mean(DP, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(mean_DE), desc(mean_DP)) %>%  # Arrange by both
  slice_head(n = 20) %>%                     # Keep only top 20
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))

# Step 2: Create interactive scatterplot
p1 <- ggplot(top_20_combos, aes(
  x = mean_DE,
  y = mean_DP,
  color = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DE, 2),
    "<br>Mean DP: ", round(mean_DP, 2)
  )
)) +
  geom_point(size = 4) +
  labs(
    title = "Top 20 Species Combinations by Mean DE and DP",
    x = "Mean DE",
    y = "Mean DP",
    color = "Status"
  ) +
  theme_minimal(base_size = 12) +
  xlim(13, 15) +   # You can adjust this as needed
  ylim(6, NA)

ggplotly(p1, tooltip = "text")

Frequency of Observation

Code
# Step 2: Filter the dataset to only include those top 20 combinations
filtered_DE <- data_clean %>%
  semi_join(top_20_combos, by = c("Code", "Phenology", "Part"))

# Count number of observations per Code–Phenology–Part combo
combo_counts <- filtered_DE %>%
  count(Code, Phenology, Part, name = "obs_count") %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(),
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, obs_count))

p2 <- ggplot(combo_counts, aes(
  x = combo_label,
  y = obs_count,
  fill = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Observations: ", obs_count
  )
)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Observation Frequency of Top 20 Species Combos",
    x = "Code | Phenology | Part",
    y = "Number of Observations",
    fill = "Status"
  ) +
  theme_minimal(base_size = 12)

ggplotly(p2, tooltip = "text")

The first graph shows that Sisymbrium altissimum (SIAL2) when in its newly emergent stage (N) has the highest observed levels of digestible energy at 14.54 (units?) followed by Amsinckia menziesii (AMME) in its newly emergent stage (N) at 14.47 (units) and Equisetum hyemale (EQHY) in its newly emergent stage (N) at 14.31 (units). Most of the species that were observed with the highest DE were in the newly emergent (N) or budding (B) stages which is as to be expected. There was only two flowering species that fell within the top 20 with the highest DE which were (ERCO12) at 13.47 and (LONU2) at 13.26, one fruiting species (TRDU) at 13.26 and two mature species (CREPIS) at 13.56 and (TACA8) at 13.2. I was surprised to see any mature species make it within the top 20 highest digestible energy values. There were only 3 invasive species that within this list as well, they were new Equisetum hyemale (EQHY), and both new and budding Bromus Tectorum (BRTE).

However, the species that we observed with the highest digestible energy are not the ones that we observed most frequently or the species with the highest biomass. The second graph shows that of the top 20 species-phenology combinations, Bromus Tectorum (BRTE) in its newly emergent stage is the species that we observed the most which is a common invasive species which overruns much of the canyon.

Code
# Create a unified dataset with ordered phenological stages
phenos <- data %>%
  mutate(Stage = case_when(
    str_detect(Phenology, "N")  ~ "Newly Emergent",
    str_detect(Phenology, "B")  ~ "Budding",
    str_detect(Phenology, "FL") ~ "Flowering",
    str_detect(Phenology, "M")  ~ "Mature",
    str_detect(Phenology, "C")  ~ "Cured",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(Stage)) %>%
  mutate(Stage = factor(Stage, levels = c("Newly Emergent", "Budding", "Flowering", "Mature", "Cured"))) %>%
  group_by(Stage, Code, Phenology, Part, Status) %>%
  summarise(
    mean_DE = mean(DE, na.rm = TRUE),
    mean_DP = mean(DP, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(Stage, desc(mean_DE), desc(mean_DP)) %>%
  group_by(Stage) %>%
  slice_head(n = 20) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))

pheno_plot <- ggplot(phenos, aes(
  x = mean_DE,
  y = mean_DP,
  color = Status,
  text = paste0(
    "Stage: ", Stage,
    "<br>Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DE, 2),
    "<br>Mean DP: ", round(mean_DP, 2)
  )
)) +
  geom_point(size = 3) +
  geom_vline(xintercept = 13.5, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 7.5, linetype = "dashed", color = "red") +
  labs(
    title = "Comparing Species by Phenological Stage",
    x = "Mean DE",
    y = "Mean DP",
    color = "Status"
  ) +
  facet_wrap(~ Stage, scales = "free") +
  scale_x_continuous(
  breaks = function(x) floor(min(x)):ceiling(max(x)),
  labels = scales::number_format(accuracy = 1)
) +
scale_y_continuous(
  breaks = function(x) floor(min(x)):ceiling(max(x)),
  labels = scales::number_format(accuracy = 1)
) +
  theme_minimal(base_size = 12)

ggplotly(pheno_plot, tooltip = "text")
Code
biomass_DE <- data %>% 
  group_by(Code, Phenology, Part) %>% 
  left_join(
    data %>%
      select(Code, Phenology, Part, Status, Biomass) %>%
      distinct(), 
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))
Warning in left_join(., data %>% select(Code, Phenology, Part, Status, Biomass) %>% : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Grass Species

DE vs DP

Code
grass <- data %>%
  filter(Family == "POACEAE") %>% 
  group_by(Code, Phenology, Part, Status) %>%
   summarise(
    mean_DE = mean(DE, na.rm = TRUE),
    mean_DP = mean(DP, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(mean_DE), desc(mean_DP)) %>%  # Arrange by both
  slice_head(n = 20) %>% 
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))


##creating interavtive scatterplot
p5 <- ggplot(grass, aes(
  x = mean_DE,
  y = mean_DP,
  color = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DE, 2),
    "<br>Mean DP: ", round(mean_DP, 2)
  )
)) +
  geom_point(size = 4) +
  geom_vline(xintercept = 13.5, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 7.5, linetype = "dashed", color = "red") +
  labs(
    title = "Top 20 Grass Species by Mean DE and DP",
    x = "Mean DE",
    y = "Mean DP",
    color = "Status"
  ) +
  theme_minimal(base_size = 12) +
  xlim(12, 14) +
  ylim(0, NA)

ggplotly(p5, tooltip = "text")

Frequency of Observations

Code
filtered_grass.DE <- data %>%
  semi_join(grass, by = c("Code", "Phenology", "Part"))

# Count number of observations per Code–Phenology–Part combo
grass.DE_counts <- filtered_grass.DE %>%
  count(Code, Phenology, Part, name = "obs_count") %>%
  left_join(
    data %>%
      select(Code, Phenology, Part, Status) %>%
      distinct(),
    by = c("Code", "Phenology", "Part")
  ) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"),
         combo_label = reorder(combo_label, obs_count))

p6 <- ggplot(grass.DE_counts, aes(
  x = combo_label,
  y = obs_count,
  fill = Status,
  text = paste0(
    "Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Observations: ", obs_count
  )
)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Observation Frequency of Top 20 Grass Species by DE",
    x = "Code | Phenology | Part",
    y = "Number of Observations",
    fill = "Status"
  ) +
  theme_minimal(base_size = 12)

ggplotly(p6, tooltip = "text")

Grass Species Phenological Stages

Code
# Create a unified dataset with ordered phenological stages
grass_combined <- data %>%
  filter(Family == "POACEAE") %>%
  mutate(Stage = case_when(
    str_detect(Phenology, "N")  ~ "Newly Emergent",
    str_detect(Phenology, "B")  ~ "Budding",
    str_detect(Phenology, "FL") ~ "Flowering",
    str_detect(Phenology, "M")  ~ "Mature",
    str_detect(Phenology, "C")  ~ "Cured",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(Stage)) %>%
  mutate(Stage = factor(Stage, levels = c("Newly Emergent", "Budding", "Flowering", "Mature", "Cured"))) %>%
  group_by(Stage, Code, Phenology, Part, Status) %>%
  summarise(
    mean_DE = mean(DE, na.rm = TRUE),
    mean_DP = mean(DP, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(Stage, desc(mean_DE), desc(mean_DP)) %>%
  group_by(Stage) %>%
  slice_head(n = 20) %>%
  mutate(combo_label = paste(Code, Phenology, Part, sep = "_"))

p_combined <- ggplot(grass_combined, aes(
  x = mean_DE,
  y = mean_DP,
  color = Status,
  text = paste0(
    "Stage: ", Stage,
    "<br>Code: ", Code,
    "<br>Phenology: ", Phenology,
    "<br>Part: ", Part,
    "<br>Status: ", Status,
    "<br>Mean DE: ", round(mean_DE, 2),
    "<br>Mean DP: ", round(mean_DP, 2)
  )
)) +
  geom_point(size = 3) +
  geom_vline(xintercept = 13.5, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 7.5, linetype = "dashed", color = "red") +
  labs(
    title = "Comparing Grass Species by Phenological Stage",
    x = "Mean DE",
    y = "Mean DP",
    color = "Status"
  ) +
  facet_wrap(~ Stage, scales = "free") +
  scale_x_continuous(
  breaks = function(x) floor(min(x)):ceiling(max(x)),
  labels = scales::number_format(accuracy = 1)
) +
scale_y_continuous(
  breaks = function(x) floor(min(x)):ceiling(max(x)),
  labels = scales::number_format(accuracy = 1)
) +
  theme_minimal(base_size = 12)

ggplotly(p_combined, tooltip = "text")