St. Lawrence Lowlands Precipitation Data: 30-Year Trends & Anomalies

By Johanie Fournier, agr., M.Sc. in rstats tidymodels tidytuesday eda viz

January 27, 2025


Precipitation plays a crucial role in Quebec’s climate, influencing everything from agriculture to hydrology and urban planning. Understanding long-term rainfall patterns is essential for assessing climate variability, detecting anomalies, and making informed environmental decisions.

In this blog post, we explore 30 years of precipitation data from the AgERA5 dataset, a high-resolution global reanalysis dataset widely used for climate studies. Using Exploratory Data Analysis (EDA) techniques, we investigate rainfall trends, seasonal variations, and precipitation anomalies across St. Lawrence Lowlands.

By the end of this analysis, you’ll gain insights into how precipitation patterns have evolved over tree decades and what this means for Quebec’s climate. Whether you’re a data scientist, climate researcher, or just curious about our weather trends, this post will provide valuable insights using reproducible R-based data analysis techniques.

Goal

The primary objective of this analysis is to explore 30 years of precipitation data from the AgERA5 dataset for Quebec using Exploratory Data Analysis (EDA). Specifically, we aim to:

  • Analyze long-term precipitation trends to understand rainfall variability across different regions.
  • Detect anomalies in precipitation patterns to identify periods of unusually high or low rainfall.
  • Visualize yearly precipitation patterns to highlight trends and deviations over time.
  • Provide data-driven insights into how precipitation has evolved and its potential implications for climate research, agriculture, and water resource management.

This analysis will help uncover climate patterns, extreme weather events, and precipitation shifts over the past tree decades in for the agricultural land of Quebec

Get the data

Country borders

We need the polygon of the region of interest. We will use the rgeoboundaries package to extract the polygon of Quebec.

qc_sf <- rgeoboundaries::gb_adm2(country = "CAN") |>
  filter(shapeName %in% c("Bas-Saint-Laurent", 
                          "Gaspésie--Îles-de-la-Madelei", 
                          "Capitale-Nationale",
                          "Chaudière-Appalaches",
                          "Estrie",
                          "Centre-du-Québec",
                          "Montérégie",
                          "Montréal",
                          "Laval",
                          "Outaouais",
                          "Abitibi-Témiscamingue",
                          "Lanaudière",
                          "Laurentides",
                          "Mauricie")) |> 
  select(shapeName, geometry) 
qc_sf #geographic coordinate
Simple feature collection with 14 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -79.58688 ymin: 44.99114 xmax: -61.14201 ymax: 49.25699
Geodetic CRS:  WGS84
First 10 features:
                      shapeName                       geometry
1  Gaspésie--Îles-de-la-Madelei MULTIPOLYGON (((-66.32593 4...
2             Bas-Saint-Laurent MULTIPOLYGON (((-67.59801 4...
3            Capitale-Nationale MULTIPOLYGON (((-69.70949 4...
4          Chaudière-Appalaches MULTIPOLYGON (((-70.09711 4...
5                        Estrie MULTIPOLYGON (((-71.46412 4...
6              Centre-du-Québec MULTIPOLYGON (((-72.03755 4...
7                    Montérégie MULTIPOLYGON (((-72.3144 45...
8                      Montréal MULTIPOLYGON (((-73.47668 4...
9                         Laval MULTIPOLYGON (((-73.53145 4...
10                   Lanaudière MULTIPOLYGON (((-73.03963 4...
plot(qc_sf$geometry)

Precipitation data

We will extract precipitation data from the AgERA5 dataset using the KrigR package. The AgERA5 dataset provides high-resolution climate data, including precipitation, temperature, and wind speed, for global climate research.

# Load the KrigR package
api_user <- "*******************************" # PLEASE INSERT YOUR USER NUMBER
api_key <- "********************************" # PLEASE INSERT YOUR API TOKEN

# List of available dataset
KrigR::Meta.List()

# List of available variables
vars_df <- KrigR::Meta.Variables(
    "reanalysis-era5-land-monthly-means"
)

# Dataset description
KrigR::Meta.QuickFacts(
    "reanalysis-era5-land-monthly-means"
)
#extract precipitation data
start_date <- "1993-01-01 00:00"
end_date <- "2023-12-31 24:00"

precipitation_raw <- KrigR::CDownloadS(
    Type = "monthly_averaged_reanalysis",
    Variable = "total_precipitation",
    DataSet = "reanalysis-era5-land-monthly-means",
    DateStart = start_date,
    DateStop = end_date,
    TZone = "CET",
    FUN = "mean",
    TResolution = "month",
    TStep = 1,
    Dir = Dir.Data,
    FileName = "precipitation_raw",
    Extent = as(qc_sf, "Spatial"),
    API_User = api_user,
    API_Key = api_key,
    closeConnections = TRUE
)

Data preperation

We will convert the raster data to a dataframe and extract the precipitation values for the region of interest.

# Change layer names
months_vector <- seq(
    from = as.Date(start_date),
    to = as.Date(end_date),
    by = "month"
)
names(precipitation_raw) <- months_vector

# Raster to dataframe
precipitation_sf <- as.data.frame(
    precipitation_raw,
    xy = TRUE, na.rm = TRUE)|>
    tidyr::pivot_longer(
        !c(x, y),
        names_to = "date",
        values_to = "value"
    ) |> 
  mutate(year=year(date), 
         month=month(date)) |> 
  select(x, y, date, year, month, value) |> 
  st_as_sf(coords=c("x", "y")) |> 
  st_set_crs("WGS84") |> 
  st_intersection(qc_sf) 

 precipitation_dt<-precipitation_sf |> 
  as_tibble() |> 
  select(-geometry) |> 
  group_by(shapeName, date, year, month) |> 
  summarise(mean=mean(value, na.rm=TRUE)) |> 
  ungroup()

General trend

Let’s start by exploring the precipitation data to understand its distribution and general trends.

skimr::skim(precipitation_dt)
Name precipitation_dt
Number of rows 5208
Number of columns 5
_______________________
Column type frequency:
character 2
numeric 3
________________________
Group variables None

Data summary

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
shapeName 0 1 5 28 0 14 0
date 0 1 10 10 0 372 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2008.00 8.95 1993.00 2000.00 2008.00 2016.00 2023.00 ▇▇▇▇▇
month 0 1 6.50 3.45 1.00 3.75 6.50 9.25 12.00 ▇▅▅▅▇
mean 0 1 277.59 11.07 249.07 267.67 278.09 288.27 297.28 ▁▇▆▆▇

Trend over time

Is there a general trend over time? Let’s find out!

precipitation_dt_year<-precipitation_dt |> 
  group_by(year) |>  
  summarise(sum=sum(mean)) |>  
  ungroup()

ggplot(data=precipitation_dt_year, aes(x=year, y=sum))+
  geom_line()

Precipitation has increased over time.

Space trend

Is there a general trend over space? Let’s find out!

precipitation_dt_site<-precipitation_dt  |> 
  group_by(shapeName) |>  
  summarise(sum=sum(mean)-102494.3) |>  
  ungroup() 

ggplot(data=precipitation_dt_site, 
       aes(x=reorder(shapeName, -sum), y=sum))+
  geom_bar(stat="identity")+
  coord_flip()+
  theme(axis.title.y = element_blank()) +
  ylab("Precipitation for 30 years (m) over Capitale-Nationale")

The total precipitation for 30 years is different for each shapeName.

Spatio-temporal trend

Can we link the spatial trend to the temporal trend? Let’s find out!

precipitation_dt |>
  group_by(shapeName, year) |> 
  summarize(value_year=sum(mean)) |> 
  mutate(year_date=as.Date(as.character(year), "%Y")) |>
  ungroup() |> 
  group_by(shapeName) |>
    plot_time_series(
        .date_var    = year_date,
        .value       = value_year,
        .interactive = FALSE,
        .facet_ncol  = 4,
        .facet_scales = "free",
    )

The trend looks similar for all the shapeName but the values are different.

Anomalies and outliers

Are there any anomalies or outliers in the precipitation data? Let’s investigate!

Time serie anomalies

What are the yearly precipitation anomalies?

library(anomalize)

precipitation_dt |> 
  group_by(shapeName, year) |> 
  summarize(value_year=sum(mean)) |> 
  mutate(year_date=as.Date(as.character(year), "%Y")) |>
  select(-year) |> 
  ungroup() |> 
  filter(shapeName %in% "Chaudière-Appalaches") |>
  time_decompose(value_year) |> 
  anomalize(remainder) |>
  plot_anomaly_decomposition()

Weather anomalies

What are monthly precipitation anomalies?

Reference group

# estimating anomalies
ref <- precipitation_dt |>
  group_by(shapeName, month) |>
  summarise(ref = mean(mean))

monthly_anomalies <- precipitation_dt |> 
  left_join(ref, by = c("shapeName", "month")) |> 
  mutate(anomalie = (mean * 100 / ref) - 100,
  sign = ifelse(anomalie > 0, "pos", "neg") |> factor(c("pos", "neg")),
  date=as.Date(date),
  month_name_abb = month(date, label = TRUE))

Statistical Metrics

data_norm <- group_by(monthly_anomalies, month_name_abb) |>
                summarise(
                  mx = max(anomalie),
                  min = min(anomalie),
                  q25 = stats::quantile(anomalie, .25),
                  q75 = stats::quantile(anomalie, .75),
                  iqr = q75 - q25
                )
DT::datatable(data_norm) |> 
  DT::formatRound(c("mx","min","q25","q75","iqr"), digits=1)

Create the graph

library(ggthemes)
library(gganimate)

gg <- ggplot(data_norm ) +
  geom_crossbar(aes(x = month_name_abb, 
                    y = 0, 
                    ymin = min, 
                    ymax = mx),
    fatten = 0, fill = "grey90", colour = "NA") + 
  geom_crossbar(aes(x = month_name_abb, 
                    y = 0, 
                    ymin = q25, 
                    ymax = q75),
  fatten = 0, fill = "grey70"
)  +
  geom_crossbar(
  data = filter(monthly_anomalies, shapeName=="Chaudière-Appalaches"),
  aes(x = month_name_abb, 
      y = 0, 
      ymin = 0, 
      ymax = anomalie, 
      group= year,
      fill = sign),
  fatten = 0, width = 0.7, alpha = .7, colour = "NA",
  show.legend = FALSE
) + 
  transition_time(as.integer(year)) +
  ggtitle('Precipitation anomaly in Chaudière-Appalaches {frame_time}') +
  shadow_mark(past=FALSE) +
  geom_hline(yintercept = 0) +
  scale_fill_manual(values = c("#99000d", "#034e7b")) +
  scale_y_continuous("Precipitation anomaly (%)",
    breaks = seq(-5, 5, 1)
  ) +
  labs(
    x = "",
    caption = "Data: AgERA5"
  ) +
  theme_hc()
num_years <- max(monthly_anomalies$year) - min(monthly_anomalies$year) + 1

# Save the animation as a GIF
gganimate::animate(gg, duration = 30, fps = 4, width = 500, height = 300, renderer = gifski_renderer())
anim_save("gif/output.gif")
# Read and display the saved GIF animation
animation <- magick::image_read("gif/output.gif")
print(animation, info = FALSE)

This animation shows the monthly precipitation anomalies in Chaudière-Appalaches over the past 30 years. The blue bars represent positive anomalies, while the red bars represent negative anomalies.

Conclusion

In this analysis, we explored 30 years of precipitation data from the AgERA5 dataset for Quebec using Exploratory Data Analysis (EDA) techniques. By analyzing long-term precipitation trends, seasonal variations, and anomalies, we uncovered valuable insights into how rainfall patterns have evolved over the past tree decades.


WANT MORE?

Sign up for exclusive content, emails & things I doesn't share anywhere else.

We respect your email privacy

Powered by AWeber Email Marketing

 

Session Info

sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] anomalize_0.3.0      jofou.lib_0.0.0.9000 reticulate_1.40.0   
 [4] tidytuesdayR_1.1.2   tictoc_1.2.1         rgeoboundaries_1.3.1
 [7] terra_1.8-10         sf_1.0-19            pins_1.4.0          
[10] fs_1.6.5             timetk_2.9.0         yardstick_1.3.2     
[13] workflowsets_1.1.0   workflows_1.1.4      tune_1.2.1          
[16] rsample_1.2.1        parsnip_1.2.1        modeldata_1.4.0     
[19] infer_1.0.7          dials_1.3.0          scales_1.3.0        
[22] broom_1.0.7          tidymodels_1.2.0     recipes_1.1.0       
[25] doFuture_1.0.1       future_1.34.0        foreach_1.5.2       
[28] skimr_2.1.5          gganimate_1.0.9      forcats_1.0.0       
[31] stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2         
[34] readr_2.1.5          tidyr_1.3.1          tibble_3.2.1        
[37] ggplot2_3.5.1        tidyverse_2.0.0      lubridate_1.9.4     
[40] kableExtra_1.4.0     inspectdf_0.0.12.1   openxlsx_4.2.7.1    
[43] knitr_1.49          

loaded via a namespace (and not attached):
  [1] rstudioapi_0.17.1   jsonlite_1.8.9      magrittr_2.0.3     
  [4] magick_2.8.5        farver_2.1.2        rmarkdown_2.29     
  [7] vctrs_0.6.5         memoise_2.0.1       hoardr_0.5.5       
 [10] base64enc_0.1-3     htmltools_0.5.8.1   progress_1.2.3     
 [13] curl_6.1.0          TTR_0.24.4          sass_0.4.9         
 [16] parallelly_1.41.0   bslib_0.8.0         KernSmooth_2.23-26 
 [19] htmlwidgets_1.6.4   zoo_1.8-12          cachem_1.1.0       
 [22] ggfittext_0.10.2    mime_0.12           lifecycle_1.0.4    
 [25] iterators_1.0.14    pkgconfig_2.0.3     Matrix_1.7-2       
 [28] R6_2.5.1            fastmap_1.2.0       digest_0.6.37      
 [31] colorspace_2.1-1    furrr_0.3.1         crosstalk_1.2.1    
 [34] labeling_0.4.3      urltools_1.7.3      timechange_0.3.0   
 [37] compiler_4.4.2      proxy_0.4-27        withr_3.0.2        
 [40] tseries_0.10-58     backports_1.5.0     DBI_1.2.3          
 [43] MASS_7.3-64         lava_1.8.1          rappdirs_0.3.3     
 [46] classInt_0.4-11     tibbletime_0.1.9    tools_4.4.2        
 [49] units_0.8-5         lmtest_0.9-40       quantmod_0.4.26    
 [52] zip_2.3.1           future.apply_1.11.3 nnet_7.3-20        
 [55] quadprog_1.5-8      glue_1.8.0          nlme_3.1-166       
 [58] grid_4.4.2          generics_0.1.3      gtable_0.3.6       
 [61] countrycode_1.6.0   tzdb_0.4.0          class_7.3-23       
 [64] data.table_1.16.4   hms_1.1.3           xml2_1.3.6         
 [67] pillar_1.10.1       splines_4.4.2       lhs_1.2.0          
 [70] tweenr_2.0.3        lattice_0.22-6      renv_1.0.7         
 [73] survival_3.8-3      tidyselect_1.2.1    urca_1.3-4         
 [76] svglite_2.1.3       forecast_8.23.0     crul_1.5.0         
 [79] xfun_0.50           hardhat_1.4.0       timeDate_4041.110  
 [82] DT_0.33             stringi_1.8.4       DiceDesign_1.10    
 [85] yaml_2.3.10         evaluate_1.0.3      codetools_0.2-20   
 [88] httpcode_0.3.0      cli_3.6.3           rpart_4.1.24       
 [91] systemfonts_1.2.1   jquerylib_0.1.4     repr_1.1.7         
 [94] munsell_0.5.1       Rcpp_1.0.14         globals_0.16.3     
 [97] triebeard_0.4.1     png_0.1-8           parallel_4.4.2     
[100] fracdiff_1.5-3      assertthat_0.2.1    gower_1.0.2        
[103] prettyunits_1.2.0   sweep_0.2.5         GPfit_1.0-8        
[106] listenv_0.9.1       viridisLite_0.4.2   ipred_0.9-15       
[109] xts_0.14.1          prodlim_2024.06.25  e1071_1.7-16       
[112] crayon_1.5.3        rlang_1.1.5        
Posted on:
January 27, 2025
Length:
11 minute read, 2146 words
Categories:
rstats tidymodels tidytuesday eda viz
Tags:
eda rstats tidymodels tidytuesday viz
See Also:
30 Years of Precipitation for Centre-du-Québec: Trends, Patterns & Anomalies
St. Lawrence Lowlands Precipitation Data: 30-Year Trends Prediction
TyT2024W21 - VIZ:Carbon Majors Emissions Data