Skip to contents

Background

This document serves as a users guide and a tutorial for the FLARE (Forecasting Lake and Reservoir Ecosystems) system (Thomas et al. 2020). FLARE generates forecasts and forecast uncertainty of water temperature and water quality for 1 to 35-dau ahead time horizon at multiple depths of a lake or reservoir. It uses data assimilation to update the initial starting point for a forecast and the model parameters based a real-time statistical comparisons to observations. It has been developed, tested, and evaluated for Falling Creek Reservoir in Vinton,VA (Thomas et al. 2020) and National Ecological Observatory Network lakes (Thomas et al. 2023

FLARE is a set of R scripts that

  • Generating the inputs and configuration files required by the General Lake Model (GLM)
  • Applying data assimilation to GLM
  • Processing and archiving forecast output
  • Visualizing forecast output

FLARE uses the 1-D General Lake Model (Hipsey et al. 2019) as the mechanistic process model that predicts hydrodynamics of the lake or reservoir. For forecasts of water quality, it uses GLM with the Aquatic Ecosystem Dynamics library. The binaries for GLM and GLM-AED are included in the FLARE code that is available on GitHub. FLARE requires GLM version 3.3 or higher.

More information about the GLM can be found here:

FLARE development has been supported by grants from National Science Foundation (CNS-1737424, DEB-1753639, EF-1702506, DBI-1933016, DEB-1926050)

Requirements

  • RStudio
  • FLAREr R package
  • FLAREr dependencies

1: Set up

First, install the FLAREr package from GitHub. There will be other required packages that will also be downloaded.

remotes::install_github("flare-forecast/FLAREr")

Second, download the General Lake Model (GLM) code. You can get in using multiple pathways

The easiest way is to install the GLM3r package from Github using

remotes::install_github("rqthomas/GLM3r")
#> Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
#> Downloading GitHub repo rqthomas/GLM3r@HEAD
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file ‘/tmp/RtmpEjXQxo/remotes1fb2417e182a/rqthomas-GLM3r-977b223/DESCRIPTION’ ... OK
#> * preparing ‘GLM3r’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘GLM3r_3.1.17.tar.gz’
#> Installing package into '/home/runner/work/_temp/Library'
#> (as 'lib' is unspecified)

After installing GLM3r you need to set an environment variable that points to the GLM code.

Sys.setenv('GLM_PATH'='GLM3r')

Third, create a directory that will be your working directory for your FLARE run. To find this directory on your computer you can use print(lake_directory)

lake_directory <-  normalizePath(tempdir(),  winslash = "/")
dir.create(file.path(lake_directory, "configuration/default"), recursive = TRUE)
dir.create(file.path(lake_directory, "targets")) # For QAQC data
dir.create(file.path(lake_directory, "drivers")) # Weather and inflow forecasts

2: Configuration files

First, FLAREr requires two configuration yaml files. The code below copies examples from the FLAREr package.

file.copy(system.file("extdata", "configuration", "default", "configure_flare.yml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "configure_flare.yml"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "configure_run.yml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "configure_run.yml"))
#> [1] TRUE

Second, FLAREr requires a set of configuration CSV files. The CSV files are used to define the states that are simulated and the parameters that are calibrated. The code below copies examples from the FLAREr package

file.copy(system.file("extdata", "configuration", "default", "parameter_calibration_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "parameter_calibration_config.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "states_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "states_config.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "depth_model_sd.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "depth_model_sd.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "observations_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "observations_config.csv"))
#> [1] TRUE

Third, FLAREr requires GLM specific configurations files. For applications that require on water temperature, only the GLM namelist file is needed. Applications that require other water quality variables will require additional namelist files that are associated with the aed model.

file.copy(system.file("extdata", "configuration", "default", "glm3.nml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "glm3.nml"))
#> [1] TRUE

3: Observation and driver files

Since the FLAREr package for general application, scripts to download and process observation and drivers are not included in the package. Therefore the application of FLARE to a lake will require a set of additional scripts that are specific to the data formats for the lakes. The example includes files for application to FCR.

file.copy(from = system.file("extdata/targets", package = "FLAREr"), to = lake_directory, recursive = TRUE)
#> [1] TRUE
file.copy(from = system.file("extdata/drivers", package = "FLAREr"), to = lake_directory, recursive = TRUE)
#> [1] TRUE

First, FLAREr requires the observation file to have a specific name (observations_postQAQC_long.csv) and format.

head(read_csv(file.path(lake_directory,"targets/fcre/fcre-targets-insitu.csv"), show_col_types = FALSE))
#> # A tibble: 6 × 5
#>   datetime            site_id depth observation variable   
#>   <dttm>              <chr>   <dbl>       <dbl> <chr>      
#> 1 2022-09-28 00:00:00 fcre        0        20.5 temperature
#> 2 2022-09-29 00:00:00 fcre        0        19.6 temperature
#> 3 2022-09-30 00:00:00 fcre        0        18.8 temperature
#> 4 2022-10-01 00:00:00 fcre        0        17.2 temperature
#> 5 2022-10-02 00:00:00 fcre        0        16.0 temperature
#> 6 2022-10-03 00:00:00 fcre        0        15.4 temperature

2: Configure simulation (GLM)

The configuration functions are spread across the files. These files are described in more detail below

  • glm3.nml
  • configure_flare.yml
  • configure_run.yml
  • states_config.csv
  • observations_config.csv
  • parameter_calibration_config.csv
  • depth_model_sd.csv

configure_run.yml

This file is the configuration file that define the specific timing of the run.

  • restart_file: This is the full path to the file that you want to use as initial conditions for the simulation. You will set this to NA if the simulation is not a continuation of a previous simulation.
  • sim_name: a string with the name of your simulation. This will appear in your output file names
  • forecast_days: This is your forecast horizon. The max is 16 days. Set to 0if only doing data assimilation with observed drivers.
  • start_datetime: The date time of day you want to start a forecast. Because GLM is a daily timestep model, the simulation will start at this time. It uses YYYY-MM-DD mm:hh:ss format and must only be a whole hour. It is in the UTC time. It can be any hour if only doing data assimilation with observed drivers (forecast_days = 0). If forecasting (forecast_days > 0) it is required to match up with the availability of a NOAA forecast. NOAA forecasts are available at the following times UTC so you must select a local time that matches one of these times (i.e., 07:00:00 at FCR is the 12:00:00 UTC NOAA forecast).
    • 00:00:00 UTC
    • 06:00:00 UTC
    • 12:00:00 UTC
    • 18:00:00 UTC
  • forecast_start_datetime: The date that you want forecasting to start in your simulation. Uses the YYYY-MM-DD mm:hh:ss format (e.g., “2019-09-20 00:00:00”). The difference between start_time and forecast_start_datetime determines how many days of data assimilation occur using observed drivers before handing off to forecasted drivers and not assimilating data
  • configure_flare: name of FLARE configuration file located in your configuration/[config_set] directory (configure_flare.yml)
  • configure_obs: name of optional observation processing configuration file located in your configuration/[config_set] directory (configure_obs.yml)
  • use_s3: use s3 cloud storage for saving forecast, scores, and restart files.

glm3.nml

glm3.nml is the configuration file that is required by GLM. It can be configured to run only GLM or GLM + AED. This version is already configured to run only GLM for FCR and you do not need to modify it for the example simulation.

configure_flare.yml

configure_flare.yml has the bulk of the configurations for FLARE that you will set once and reuse. The end of this document describes all of the configurations in configure_flare.yml. Later in the tutorial, you will modify key configurations in configure_flare.yml

states_config.csv

Needs to be in configuration/[config_set]

observations_config.csv

Needs to be in your configuration/[config_set]

parameter_calibration_config.csv

Needs to be in your configuration/[config_set]

3: Run your GLM example simulation

Read configuration files

The following reads in the configuration files and overwrites the directory locations based on the lake_directory and directories provided above. In practice you will specific these directories in the configure file and not overwrite them.

next_restart <- FLAREr::run_flare(lake_directory = lake_directory,configure_run_file = "configure_run.yml", config_set_name = "default")
#> Warning in set_up_simulation(configure_run_file, lake_directory, clean_start =
#> clean_start, : NAs introduced by coercion
#> Warning in set_up_simulation(configure_run_file, lake_directory, clean_start =
#> clean_start, : NAs introduced by coercion
#> Warning in set_up_simulation(configure_run_file, lake_directory, clean_start =
#> clean_start, : NAs introduced by coercion
#> Warning in set_up_simulation(configure_run_file, lake_directory, clean_start =
#> clean_start, : NAs introduced by coercion
#> Warning in set_up_simulation(configure_run_file, lake_directory, clean_start =
#> clean_start, : NAs introduced by coercion
#>      Running forecast that starts on: 2022-09-28 00:00:00
#> Retrieving Observational Data...
#> Generating Met Forecasts...
#> Registered S3 method overwritten by 'tsibble':
#>   method               from 
#>   as_tibble.grouped_df dplyr
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Creating inflow/outflow files...
#> Setting states and initial conditions...
#> Running time step 1/20 : 2022-09-28 00:00 - 2022-09-29 00:00 [2024-11-21 14:15:12.606262]
#> performing data assimilation
#> zone1temp: mean 10.8857 sd 1.123
#> zone2temp: mean 14.1658 sd 1.1801
#> lw_factor: mean 0.9652 sd 0.066
#> Running time step 2/20 : 2022-09-29 00:00 - 2022-09-30 00:00 [2024-11-21 14:15:19.285801]
#> performing data assimilation
#> zone1temp: mean 11.0377 sd 1.1773
#> zone2temp: mean 14.3356 sd 1.3052
#> lw_factor: mean 0.9852 sd 0.0563
#> Running time step 3/20 : 2022-09-30 00:00 - 2022-10-01 00:00 [2024-11-21 14:15:25.608756]
#> performing data assimilation
#> zone1temp: mean 11.8072 sd 0.9291
#> zone2temp: mean 12.9755 sd 1.707
#> lw_factor: mean 0.9076 sd 0.0434
#> Running time step 4/20 : 2022-10-01 00:00 - 2022-10-02 00:00 [2024-11-21 14:15:31.856556]
#> performing data assimilation
#> zone1temp: mean 11.6061 sd 1.1831
#> zone2temp: mean 13.2888 sd 1.8014
#> lw_factor: mean 0.947 sd 0.0432
#> Running time step 5/20 : 2022-10-02 00:00 - 2022-10-03 00:00 [2024-11-21 14:15:38.106883]
#> zone1temp: mean 11.5244 sd 1.8083
#> zone2temp: mean 13.0512 sd 1.8351
#> lw_factor: mean 0.9472 sd 0.0474
#> Running time step 6/20 : 2022-10-03 00:00 - 2022-10-04 00:00 [2024-11-21 14:15:44.324784]
#> zone1temp: mean 11.3803 sd 2.4174
#> zone2temp: mean 12.9521 sd 1.953
#> lw_factor: mean 0.9585 sd 0.0527
#> Running time step 7/20 : 2022-10-04 00:00 - 2022-10-05 00:00 [2024-11-21 14:15:50.493612]
#> zone1temp: mean 11.2899 sd 2.5976
#> zone2temp: mean 12.9694 sd 2.2027
#> lw_factor: mean 0.9581 sd 0.054
#> Running time step 8/20 : 2022-10-05 00:00 - 2022-10-06 00:00 [2024-11-21 14:15:56.892099]
#> zone1temp: mean 11.1126 sd 3.1169
#> zone2temp: mean 12.6374 sd 2.5558
#> lw_factor: mean 0.9601 sd 0.0638
#> Running time step 9/20 : 2022-10-06 00:00 - 2022-10-07 00:00 [2024-11-21 14:16:03.156106]
#> zone1temp: mean 11.2196 sd 3.2648
#> zone2temp: mean 12.3821 sd 2.5852
#> lw_factor: mean 0.9617 sd 0.0666
#> Running time step 10/20 : 2022-10-07 00:00 - 2022-10-08 00:00 [2024-11-21 14:16:09.338974]
#> zone1temp: mean 11.0348 sd 3.4857
#> zone2temp: mean 12.2284 sd 3.1018
#> lw_factor: mean 0.9644 sd 0.073
#> Running time step 11/20 : 2022-10-08 00:00 - 2022-10-09 00:00 [2024-11-21 14:16:15.556603]
#> zone1temp: mean 11.2301 sd 3.4191
#> zone2temp: mean 12.1017 sd 3.3726
#> lw_factor: mean 0.9685 sd 0.0734
#> Running time step 12/20 : 2022-10-09 00:00 - 2022-10-10 00:00 [2024-11-21 14:16:21.749152]
#> zone1temp: mean 11.1753 sd 3.5736
#> zone2temp: mean 12.0448 sd 3.3292
#> lw_factor: mean 0.9658 sd 0.0812
#> Running time step 13/20 : 2022-10-10 00:00 - 2022-10-11 00:00 [2024-11-21 14:16:27.833605]
#> zone1temp: mean 11.1384 sd 3.8501
#> zone2temp: mean 12.2788 sd 3.8034
#> lw_factor: mean 0.9709 sd 0.0831
#> Running time step 14/20 : 2022-10-11 00:00 - 2022-10-12 00:00 [2024-11-21 14:16:34.030884]
#> zone1temp: mean 10.9551 sd 4.0738
#> zone2temp: mean 12.4865 sd 3.5276
#> lw_factor: mean 0.9703 sd 0.0892
#> Running time step 15/20 : 2022-10-12 00:00 - 2022-10-13 00:00 [2024-11-21 14:16:40.198007]
#> zone1temp: mean 10.9337 sd 4.2939
#> zone2temp: mean 12.3107 sd 3.6883
#> lw_factor: mean 0.9728 sd 0.0889
#> Running time step 16/20 : 2022-10-13 00:00 - 2022-10-14 00:00 [2024-11-21 14:16:46.311066]
#> zone1temp: mean 10.8646 sd 4.3379
#> zone2temp: mean 12.7352 sd 3.9142
#> lw_factor: mean 0.971 sd 0.0949
#> Running time step 17/20 : 2022-10-14 00:00 - 2022-10-15 00:00 [2024-11-21 14:16:52.513226]
#> zone1temp: mean 11.0585 sd 4.7625
#> zone2temp: mean 12.6473 sd 4.2614
#> lw_factor: mean 0.9713 sd 0.0985
#> Running time step 18/20 : 2022-10-15 00:00 - 2022-10-16 00:00 [2024-11-21 14:16:58.695338]
#> zone1temp: mean 11.1435 sd 4.9173
#> zone2temp: mean 12.719 sd 4.4483
#> lw_factor: mean 0.9715 sd 0.0995
#> Running time step 19/20 : 2022-10-16 00:00 - 2022-10-17 00:00 [2024-11-21 14:17:04.803752]
#> zone1temp: mean 10.9883 sd 4.9468
#> zone2temp: mean 12.6822 sd 4.2867
#> lw_factor: mean 0.9721 sd 0.1084
#> Running time step 20/20 : 2022-10-17 00:00 - 2022-10-18 00:00 [2024-11-21 14:17:10.793933]
#> zone1temp: mean 10.9189 sd 4.9409
#> zone2temp: mean 12.5622 sd 4.6109
#> lw_factor: mean 0.9674 sd 0.1093
#> Writing restart
#> Writing forecast
#> starting writing dataset
#> ending writing dataset
#> successfully generated flare forecats for: fcre-2022-10-02-test

Visualizing output

df <- arrow::open_dataset(file.path(lake_directory,"forecasts/parquet")) |> collect()
head(df)
#> # A tibble: 6 × 14
#>   reference_datetime  datetime            pub_datetime        depth family  
#>   <dttm>              <dttm>              <dttm>              <dbl> <chr>   
#> 1 2022-10-02 00:00:00 2022-09-28 00:00:00 2024-11-21 14:17:17     0 ensemble
#> 2 2022-10-02 00:00:00 2022-09-28 00:00:00 2024-11-21 14:17:17     0 ensemble
#> 3 2022-10-02 00:00:00 2022-09-28 00:00:00 2024-11-21 14:17:17     0 ensemble
#> 4 2022-10-02 00:00:00 2022-09-28 00:00:00 2024-11-21 14:17:17     0 ensemble
#> 5 2022-10-02 00:00:00 2022-09-28 00:00:00 2024-11-21 14:17:17     0 ensemble
#> 6 2022-10-02 00:00:00 2022-09-28 00:00:00 2024-11-21 14:17:17     0 ensemble
#> # ℹ 9 more variables: parameter <int>, variable <chr>, prediction <dbl>,
#> #   forecast <dbl>, variable_type <chr>, log_weight <dbl>, site_id <chr>,
#> #   model_id <chr>, reference_date <chr>
df |> 
  filter(variable == "temperature",
         depth == 1) |> 
  ggplot(aes(x = datetime, y = prediction, group = parameter)) +
  geom_line() +
  geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
  labs(title = "1 m water temperature forecast")

targets_df <- read_csv(file.path(lake_directory, "targets/fcre/fcre-targets-insitu.csv"), show_col_types = FALSE)
combined_df <- left_join(df, targets_df, by = join_by(datetime, depth, variable, site_id))
combined_df |> 
  filter(variable == "temperature",
         depth == 1) |> 
  ggplot(aes(x = datetime, y = prediction, group = parameter)) +
  geom_line() +
  geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
  geom_point(aes(y = observation), color = "red") +
  labs(title = "1 m water temperature forecast")

df |> 
  filter(variable == "lw_factor") |> 
  ggplot(aes(x = datetime, y = prediction, group = parameter)) +
  geom_line() +
  geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
  labs(title = "lw_factor parameter")

## 5. Comparing to observations

6: Modifying FLARE

Turning off data assimilation

In configure_flare.yml you can change da_method to “none”

Removing parameter estimation

Set par_config_file = .na in the configure_flare.yml

Increasing observational uncertainty

The second modification you will do is to to increase the observational uncertainty. In observations_config.csv set obs_sd = 1.

Changing the ensemble size

The variable ensemble_size allows you to adjust the size of the ensemble.