Skip to contents

Background

This document serves as a users guide and a tutorial for the FLARE (Forecasting Lake and Reservoir Ecosystems) system (Thomas et al. 2020). FLARE generates forecasts and forecast uncertainty of water temperature and water quality for 1 to 35-dau ahead time horizon at multiple depths of a lake or reservoir. It uses data assimilation to update the initial starting point for a forecast and the model parameters based a real-time statistical comparisons to observations. It has been developed, tested, and evaluated for Falling Creek Reservoir in Vinton,VA (Thomas et al. 2020) and National Ecological Observatory Network lakes (Thomas et al. 2023

FLARE is a set of R scripts that

  • Generating the inputs and configuration files required by the General Lake Model (GLM)
  • Applying data assimilation to GLM
  • Processing and archiving forecast output
  • Visualizing forecast output

FLARE uses the 1-D General Lake Model (Hipsey et al. 2019) as the mechanistic process model that predicts hydrodynamics of the lake or reservoir. For forecasts of water quality, it uses GLM with the Aquatic Ecosystem Dynamics library. The binaries for GLM and GLM-AED are included in the FLARE code that is available on GitHub. FLARE requires GLM version 3.3 or higher.

More information about the GLM can be found here:

FLARE development has been supported by grants from National Science Foundation (CNS-1737424, DEB-1753639, EF-1702506, DBI-1933016, DEB-1926050)

Requirements

  • RStudio
  • FLAREr R package
  • FLAREr dependencies

1: Set up

First, install the FLAREr package from GitHub. There will be other required packages that will also be downloaded.

remotes::install_github("flare-forecast/FLAREr")

Second, create a directory that will be your working directory for your FLARE run

lake_directory <-  tempdir()
dir.create(file.path(lake_directory, "configuration/default"), recursive = TRUE)
dir.create(file.path(lake_directory, "targets")) # For QAQC data
dir.create(file.path(lake_directory, "drivers")) # Weather and inflow forecasts

2: Configuration files

First, FLAREr requires two configuration yaml files. The code below copies examples from the FLAREr package.

file.copy(system.file("extdata", "configuration", "default", "configure_flare.yml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "configure_flare.yml"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "configure_run.yml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "configure_run.yml"))
#> [1] TRUE

Second, FLAREr requires a set of configuration CSV files. The CSV files are used to define the states that are simulated and the parameters that are calibrated. The code below copies examples from the FLAREr package

file.copy(system.file("extdata", "configuration", "default", "parameter_calibration_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "parameter_calibration_config.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "states_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "states_config.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "depth_model_sd.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "depth_model_sd.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "observations_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "observations_config.csv"))
#> [1] TRUE

Third, FLAREr requires GLM specific configurations files. For applications that require on water temperature, only the GLM namelist file is needed. Applications that require other water quality variables will require additional namelist files that are associated with the AED model.

file.copy(system.file("extdata", "configuration", "default", "glm3.nml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "glm3.nml"))
#> [1] TRUE

3: Observation and driver files

Since the FLAREr package for general application, scripts to download and process observation and drivers are not included in the package. Therefore the application of FLARE to a lake will require a set of additional scripts that are specific to the data formats for the lakes. The example includes files for application to FCR.

file.copy(from = system.file("extdata/targets", package = "FLAREr"), to = lake_directory, recursive = TRUE)
#> [1] TRUE
file.copy(from = system.file("extdata/drivers", package = "FLAREr"), to = lake_directory, recursive = TRUE)
#> [1] TRUE

First, FLAREr requires the observation file to have a specific name (observations_postQAQC_long.csv) and format.

head(read_csv(file.path(lake_directory,"targets/fcre/fcre-targets-insitu.csv"), show_col_types = FALSE))
#> # A tibble: 6 × 5
#>   datetime            site_id depth observation variable   
#>   <dttm>              <chr>   <dbl>       <dbl> <chr>      
#> 1 2022-09-02 00:00:00 fcre        0        27.0 temperature
#> 2 2022-09-03 00:00:00 fcre        0        26.5 temperature
#> 3 2022-09-04 00:00:00 fcre        0        26.2 temperature
#> 4 2022-09-05 00:00:00 fcre        0        25.8 temperature
#> 5 2022-09-06 00:00:00 fcre        0        25.2 temperature
#> 6 2022-09-07 00:00:00 fcre        0        26.7 temperature

2: Configure simulation (GLM)

The configuration functions are spread across the files. These files are described in more detail below

  • glm3.nml
  • configure_flare.yml
  • configure_run.yml
  • states_config.csv
  • observations_config.csv
  • parameter_calibration_config.csv
  • depth_model_sd.csv

configure_run.yml

This file is the configuration file that define the specific timing of the run.

  • restart_file: This is the full path to the file that you want to use as initial conditions for the simulation. You will set this to NA if the simulation is not a continuation of a previous simulation.
  • sim_name: a string with the name of your simulation. This will appear in your output file names
  • forecast_days: This is your forecast horizon. The max is 16 days. Set to 0if only doing data assimilation with observed drivers.
  • start_datetime: The date time of day you want to start a forecast. Because GLM is a daily timestep model, the simulation will start at this time. It uses YYYY-MM-DD mm:hh:ss format and must only be a whole hour. It is in the UTC time. It can be any hour if only doing data assimilation with observed drivers (forecast_days = 0). If forecasting (forecast_days > 0) it is required to match up with the availability of a NOAA forecast. NOAA forecasts are available at the following times UTC so you must select a local time that matches one of these times (i.e., 07:00:00 at FCR is the 12:00:00 UTC NOAA forecast).
    • 00:00:00 UTC
    • 06:00:00 UTC
    • 12:00:00 UTC
    • 18:00:00 UTC
  • forecast_start_datetime: The date that you want forecasting to start in your simulation. Uses the YYYY-MM-DD mm:hh:ss format (e.g., “2019-09-20 00:00:00”). The difference between start_time and forecast_start_datetime determines how many days of data assimilation occur using observed drivers before handing off to forecasted drivers and not assimilating data
  • configure_flare: name of FLARE configuration file located in your configuration/[config_set] directory (configure_flare.yml)
  • configure_obs: name of optional observation processing configuration file located in your configuration/[config_set] directory (configure_obs.yml)
  • use_s3: use s3 cloud storage for saving forecast, scores, and restart files.

glm3.nml

glm3.nml is the configuration file that is required by GLM. It can be configured to run only GLM or GLM + AED. This version is already configured to run only GLM for FCR and you do not need to modify it for the example simulation.

configure_flare.yml

configure_flare.yml has the bulk of the configurations for FLARE that you will set once and reuse. The end of this document describes all of the configurations in configure_flare.yml. Later in the tutorial, you will modify key configurations in configure_flare.yml

states_config.csv

Needs to be in configuration/[config_set]

observations_config.csv

Needs to be in your configuration/[config_set]

parameter_calibration_config.csv

Needs to be in your configuration/[config_set]

3: Run your GLM example simulation

Read configuration files

The following reads in the configuration files and overwrites the directory locations based on the lake_directory and directories provided above. In practice you will specific these directories in the configure file and not overwrite them.


next_restart <- FLAREr::run_flare(lake_directory = lake_directory,configure_run_file = "configure_run.yml", config_set_name = "default")
#>      Running forecast that starts on: 2022-09-28 00:00:00
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Using GLM Version 3.3.0a9
#> Running time step 1/20 : 2022-09-28 00:00 - 2022-09-29 00:00 [2023-10-08 18:21:14.215881]
#> zone1temp: mean 11.0835 sd 1
#> zone2temp: mean 14.1078 sd 1
#> lw_factor: mean 1.014 sd 0.02
#> Running time step 2/20 : 2022-09-29 00:00 - 2022-09-30 00:00 [2023-10-08 18:21:18.770897]
#> zone1temp: mean 11.4355 sd 1
#> zone2temp: mean 14.3982 sd 1
#> lw_factor: mean 1.0153 sd 0.02
#> Running time step 3/20 : 2022-09-30 00:00 - 2022-10-01 00:00 [2023-10-08 18:21:23.277978]
#> zone1temp: mean 11.5912 sd 1
#> zone2temp: mean 14.1471 sd 1
#> lw_factor: mean 1.015 sd 0.02
#> Running time step 4/20 : 2022-10-01 00:00 - 2022-10-02 00:00 [2023-10-08 18:21:27.762072]
#> zone1temp: mean 14.3844 sd 1
#> zone2temp: mean 14.8046 sd 1
#> lw_factor: mean 0.9922 sd 0.02
#> Running time step 5/20 : 2022-10-02 00:00 - 2022-10-03 00:00 [2023-10-08 18:21:32.266219]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 6/20 : 2022-10-03 00:00 - 2022-10-04 00:00 [2023-10-08 18:21:36.731205]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 7/20 : 2022-10-04 00:00 - 2022-10-05 00:00 [2023-10-08 18:21:41.113768]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 8/20 : 2022-10-05 00:00 - 2022-10-06 00:00 [2023-10-08 18:21:45.562136]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 9/20 : 2022-10-06 00:00 - 2022-10-07 00:00 [2023-10-08 18:21:50.129635]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 10/20 : 2022-10-07 00:00 - 2022-10-08 00:00 [2023-10-08 18:21:54.536963]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 11/20 : 2022-10-08 00:00 - 2022-10-09 00:00 [2023-10-08 18:21:58.929223]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 12/20 : 2022-10-09 00:00 - 2022-10-10 00:00 [2023-10-08 18:22:03.29419]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 13/20 : 2022-10-10 00:00 - 2022-10-11 00:00 [2023-10-08 18:22:07.700255]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 14/20 : 2022-10-11 00:00 - 2022-10-12 00:00 [2023-10-08 18:22:12.128662]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 15/20 : 2022-10-12 00:00 - 2022-10-13 00:00 [2023-10-08 18:22:16.552627]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 16/20 : 2022-10-13 00:00 - 2022-10-14 00:00 [2023-10-08 18:22:21.038972]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 17/20 : 2022-10-14 00:00 - 2022-10-15 00:00 [2023-10-08 18:22:25.436976]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 18/20 : 2022-10-15 00:00 - 2022-10-16 00:00 [2023-10-08 18:22:30.017534]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 19/20 : 2022-10-16 00:00 - 2022-10-17 00:00 [2023-10-08 18:22:34.565985]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 20/20 : 2022-10-17 00:00 - 2022-10-18 00:00 [2023-10-08 18:22:38.979452]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Writing netcdf
#> Writing arrow forecast
#> starting writing dataset
#> ending writing dataset
#> Scoring forecasts
#> Warning: There was 1 warning in `dplyr::mutate()`.
#>  In argument: `depth = as.numeric(str_split_fixed(site_id, "-", 2)[, 2])`.
#> Caused by warning:
#> ! NAs introduced by coercion
#> Generating plot
#> temp
#> Warning: Removed 567 rows containing missing values (`geom_point()`).
#> salt
#> Warning: Removed 798 rows containing missing values (`geom_point()`).
#> zone1temp
#> zone2temp
#> lw_factor
#> extc
#> Warning: Removed 1 row containing missing values (`geom_line()`).
#> secchi
#> Warning: Removed 1 row containing missing values (`geom_line()`).
#> Warning: Removed 18 rows containing missing values (`geom_point()`).
#> Putting forecast
#> successfully generated flare forecats for: fcre-2022-10-02-test.nc

Visualizing output

df <- arrow::open_dataset(file.path(lake_directory,"forecasts/parquet")) |> collect()
head(df)
#> # A tibble: 6 × 12
#>   datetime            pubDate             depth family   parameter variable   
#>   <dttm>              <dttm>              <dbl> <chr>        <int> <chr>      
#> 1 2022-09-28 00:00:00 2023-10-08 18:22:43     0 ensemble         1 temperature
#> 2 2022-09-28 00:00:00 2023-10-08 18:22:43     0 ensemble         2 temperature
#> 3 2022-09-28 00:00:00 2023-10-08 18:22:43     0 ensemble         3 temperature
#> 4 2022-09-28 00:00:00 2023-10-08 18:22:43     0 ensemble         4 temperature
#> 5 2022-09-28 00:00:00 2023-10-08 18:22:43     0 ensemble         5 temperature
#> 6 2022-09-28 00:00:00 2023-10-08 18:22:43     0 ensemble         6 temperature
#> # ℹ 6 more variables: prediction <dbl>, forecast <dbl>, variable_type <chr>,
#> #   site_id <chr>, model_id <chr>, reference_datetime <chr>
df |> 
  filter(variable == "temperature",
         depth == 1) |> 
  ggplot(aes(x = datetime, y = prediction, group = parameter)) +
  geom_line() +
  geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
  labs(title = "1 m water temperature forecast")

df <- arrow::open_dataset(file.path(lake_directory,"scores/parquet")) |> collect()
head(df)
#> # A tibble: 6 × 18
#>   datetime            family variable observation    crps   logs    mean  median
#>   <dttm>              <chr>  <chr>          <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
#> 1 2022-09-28 00:00:00 sample extc            NA   NA      NA     NA      NA     
#> 2 2022-09-28 00:00:00 sample salt            NA   NA      NA      0.100   0.100 
#> 3 2022-09-28 00:00:00 sample tempera…        20.5  0.432   1.47  21.0    21.1   
#> 4 2022-09-29 00:00:00 sample extc            NA   NA      NA      0.87    0.87  
#> 5 2022-09-29 00:00:00 sample salt            NA   NA      NA      0.0994  0.0993
#> 6 2022-09-29 00:00:00 sample tempera…        19.6  0.0329 -0.956 19.6    19.6   
#> # ℹ 10 more variables: sd <dbl>, quantile97.5 <dbl>, quantile02.5 <dbl>,
#> #   quantile90 <dbl>, quantile10 <dbl>, horizon <dbl>, depth <dbl>,
#> #   site_id <chr>, model_id <chr>, reference_datetime <chr>
df |> 
  filter(variable == "temperature",
         depth == 1) |> 
  ggplot(aes(x = datetime)) +
  geom_ribbon(aes(ymin = quantile10, ymax = quantile90), alpha = 0.5, fill = "lightblue") +
  geom_line(aes(y = median)) +
  geom_point(aes(y = observation), color = "red") +
  geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
  labs(title = "1 m water temperature forecast")

6: Modifying FLARE

Turning off data assimilation

In configure_flare.yml you can change da_method to “none”

Removing parameter estimation

Set par_config_file = .na in the configure_flare.yml

Increasing observational uncertainty

The second modification you will do is to to increase the observational uncertainty. In observations_config.csv set obs_sd = 1.

Changing the ensemble size

The variable ensemble_size allows you to adjust the size of the ensemble.

Changing the number of depths simulated

The variable modeled_depths allows you to adjust the depths that FLARE simulates. If you used a restart file, then the model_depths are required to be equal to the model_depths in the restart file.