FLAREr example
flare-example-vignette.Rmd
Background
This document serves as a users guide and a tutorial for the FLARE (Forecasting Lake and Reservoir Ecosystems) system (Thomas et al. 2020). FLARE generates forecasts and forecast uncertainty of water temperature and water quality for 1 to 35-dau ahead time horizon at multiple depths of a lake or reservoir. It uses data assimilation to update the initial starting point for a forecast and the model parameters based a real-time statistical comparisons to observations. It has been developed, tested, and evaluated for Falling Creek Reservoir in Vinton,VA (Thomas et al. 2020) and National Ecological Observatory Network lakes (Thomas et al. 2023
FLARE is a set of R scripts that
- Generating the inputs and configuration files required by the General Lake Model (GLM)
- Applying data assimilation to GLM
- Processing and archiving forecast output
- Visualizing forecast output
FLARE uses the 1-D General Lake Model (Hipsey et al. 2019) as the mechanistic process model that predicts hydrodynamics of the lake or reservoir. For forecasts of water quality, it uses GLM with the Aquatic Ecosystem Dynamics library. The binaries for GLM and GLM-AED are included in the FLARE code that is available on GitHub. FLARE requires GLM version 3.3 or higher.
More information about the GLM can be found here:
FLARE development has been supported by grants from National Science Foundation (CNS-1737424, DEB-1753639, EF-1702506, DBI-1933016, DEB-1926050)
Requirements
- RStudio
-
FLAREr
R package -
FLAREr
dependencies
1: Set up
First, install the FLAREr
package from GitHub. There
will be other required packages that will also be downloaded.
remotes::install_github("flare-forecast/FLAREr")
Second, create a directory that will be your working directory for your FLARE run
lake_directory <- tempdir()
dir.create(file.path(lake_directory, "configuration/default"), recursive = TRUE)
dir.create(file.path(lake_directory, "targets")) # For QAQC data
dir.create(file.path(lake_directory, "drivers")) # Weather and inflow forecasts
2: Configuration files
First, FLAREr
requires two configuration yaml files. The
code below copies examples from the FLAREr
package.
file.copy(system.file("extdata", "configuration", "default", "configure_flare.yml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "configure_flare.yml"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "configure_run.yml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "configure_run.yml"))
#> [1] TRUE
Second, FLAREr
requires a set of configuration CSV
files. The CSV files are used to define the states that are simulated
and the parameters that are calibrated. The code below copies examples
from the FLAREr
package
file.copy(system.file("extdata", "configuration", "default", "parameter_calibration_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "parameter_calibration_config.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "states_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "states_config.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "depth_model_sd.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "depth_model_sd.csv"))
#> [1] TRUE
file.copy(system.file("extdata", "configuration", "default", "observations_config.csv", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "observations_config.csv"))
#> [1] TRUE
Third, FLAREr requires GLM specific configurations files. For applications that require on water temperature, only the GLM namelist file is needed. Applications that require other water quality variables will require additional namelist files that are associated with the AED model.
file.copy(system.file("extdata", "configuration", "default", "glm3.nml", package = "FLAREr"), file.path(lake_directory, "configuration", "default", "glm3.nml"))
#> [1] TRUE
3: Observation and driver files
Since the FLAREr package for general application, scripts to download and process observation and drivers are not included in the package. Therefore the application of FLARE to a lake will require a set of additional scripts that are specific to the data formats for the lakes. The example includes files for application to FCR.
file.copy(from = system.file("extdata/targets", package = "FLAREr"), to = lake_directory, recursive = TRUE)
#> [1] TRUE
file.copy(from = system.file("extdata/drivers", package = "FLAREr"), to = lake_directory, recursive = TRUE)
#> [1] TRUE
First, FLAREr requires the observation file to have a specific name (observations_postQAQC_long.csv) and format.
head(read_csv(file.path(lake_directory,"targets/fcre/fcre-targets-insitu.csv"), show_col_types = FALSE))
#> # A tibble: 6 × 5
#> datetime site_id depth observation variable
#> <dttm> <chr> <dbl> <dbl> <chr>
#> 1 2022-09-02 00:00:00 fcre 0 27.0 temperature
#> 2 2022-09-03 00:00:00 fcre 0 26.5 temperature
#> 3 2022-09-04 00:00:00 fcre 0 26.2 temperature
#> 4 2022-09-05 00:00:00 fcre 0 25.8 temperature
#> 5 2022-09-06 00:00:00 fcre 0 25.2 temperature
#> 6 2022-09-07 00:00:00 fcre 0 26.7 temperature
2: Configure simulation (GLM)
The configuration functions are spread across the files. These files are described in more detail below
glm3.nml
configure_flare.yml
configure_run.yml
states_config.csv
observations_config.csv
parameter_calibration_config.csv
depth_model_sd.csv
configure_run.yml
This file is the configuration file that define the specific timing of the run.
-
restart_file
: This is the full path to the file that you want to use as initial conditions for the simulation. You will set this toNA
if the simulation is not a continuation of a previous simulation. -
sim_name
: a string with the name of your simulation. This will appear in your output file names -
forecast_days
: This is your forecast horizon. The max is16
days. Set to0
if only doing data assimilation with observed drivers. -
start_datetime
: The date time of day you want to start a forecast. Because GLM is a daily timestep model, the simulation will start at this time. It usesYYYY-MM-DD mm:hh:ss
format and must only be a whole hour. It is in the UTC time. It can be any hour if only doing data assimilation with observed drivers (forecast_days = 0). If forecasting (forecast_days > 0) it is required to match up with the availability of a NOAA forecast. NOAA forecasts are available at the following times UTC so you must select a local time that matches one of these times (i.e., 07:00:00 at FCR is the 12:00:00 UTC NOAA forecast).- 00:00:00 UTC
- 06:00:00 UTC
- 12:00:00 UTC
- 18:00:00 UTC
-
forecast_start_datetime
: The date that you want forecasting to start in your simulation. Uses the YYYY-MM-DD mm:hh:ss format (e.g., “2019-09-20 00:00:00”). The difference betweenstart_time
andforecast_start_datetime
determines how many days of data assimilation occur using observed drivers before handing off to forecasted drivers and not assimilating data -
configure_flare
: name of FLARE configuration file located in yourconfiguration/[config_set]
directory (configure_flare.yml
) -
configure_obs
: name of optional observation processing configuration file located in yourconfiguration/[config_set]
directory (configure_obs.yml
) -
use_s3
: use s3 cloud storage for saving forecast, scores, and restart files.
glm3.nml
glm3.nml
is the configuration file that is required by
GLM. It can be configured to run only GLM or GLM + AED. This version is
already configured to run only GLM for FCR and you do not need to modify
it for the example simulation.
3: Run your GLM example simulation
Read configuration files
The following reads in the configuration files and overwrites the directory locations based on the lake_directory and directories provided above. In practice you will specific these directories in the configure file and not overwrite them.
next_restart <- FLAREr::run_flare(lake_directory = lake_directory,configure_run_file = "configure_run.yml", config_set_name = "default")
#> Running forecast that starts on: 2022-09-28 00:00:00
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> Using GLM Version 3.3.0a9
#> Running time step 1/20 : 2022-09-28 00:00 - 2022-09-29 00:00 [2023-10-08 18:21:14.215881]
#> zone1temp: mean 11.0835 sd 1
#> zone2temp: mean 14.1078 sd 1
#> lw_factor: mean 1.014 sd 0.02
#> Running time step 2/20 : 2022-09-29 00:00 - 2022-09-30 00:00 [2023-10-08 18:21:18.770897]
#> zone1temp: mean 11.4355 sd 1
#> zone2temp: mean 14.3982 sd 1
#> lw_factor: mean 1.0153 sd 0.02
#> Running time step 3/20 : 2022-09-30 00:00 - 2022-10-01 00:00 [2023-10-08 18:21:23.277978]
#> zone1temp: mean 11.5912 sd 1
#> zone2temp: mean 14.1471 sd 1
#> lw_factor: mean 1.015 sd 0.02
#> Running time step 4/20 : 2022-10-01 00:00 - 2022-10-02 00:00 [2023-10-08 18:21:27.762072]
#> zone1temp: mean 14.3844 sd 1
#> zone2temp: mean 14.8046 sd 1
#> lw_factor: mean 0.9922 sd 0.02
#> Running time step 5/20 : 2022-10-02 00:00 - 2022-10-03 00:00 [2023-10-08 18:21:32.266219]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 6/20 : 2022-10-03 00:00 - 2022-10-04 00:00 [2023-10-08 18:21:36.731205]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 7/20 : 2022-10-04 00:00 - 2022-10-05 00:00 [2023-10-08 18:21:41.113768]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 8/20 : 2022-10-05 00:00 - 2022-10-06 00:00 [2023-10-08 18:21:45.562136]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 9/20 : 2022-10-06 00:00 - 2022-10-07 00:00 [2023-10-08 18:21:50.129635]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 10/20 : 2022-10-07 00:00 - 2022-10-08 00:00 [2023-10-08 18:21:54.536963]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 11/20 : 2022-10-08 00:00 - 2022-10-09 00:00 [2023-10-08 18:21:58.929223]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 12/20 : 2022-10-09 00:00 - 2022-10-10 00:00 [2023-10-08 18:22:03.29419]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 13/20 : 2022-10-10 00:00 - 2022-10-11 00:00 [2023-10-08 18:22:07.700255]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 14/20 : 2022-10-11 00:00 - 2022-10-12 00:00 [2023-10-08 18:22:12.128662]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 15/20 : 2022-10-12 00:00 - 2022-10-13 00:00 [2023-10-08 18:22:16.552627]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 16/20 : 2022-10-13 00:00 - 2022-10-14 00:00 [2023-10-08 18:22:21.038972]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 17/20 : 2022-10-14 00:00 - 2022-10-15 00:00 [2023-10-08 18:22:25.436976]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 18/20 : 2022-10-15 00:00 - 2022-10-16 00:00 [2023-10-08 18:22:30.017534]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 19/20 : 2022-10-16 00:00 - 2022-10-17 00:00 [2023-10-08 18:22:34.565985]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Running time step 20/20 : 2022-10-17 00:00 - 2022-10-18 00:00 [2023-10-08 18:22:38.979452]
#> zone1temp: mean 15.125 sd 1
#> zone2temp: mean 14.5865 sd 1
#> lw_factor: mean 0.9837 sd 0.02
#> Writing netcdf
#> Writing arrow forecast
#> starting writing dataset
#> ending writing dataset
#> Scoring forecasts
#> Warning: There was 1 warning in `dplyr::mutate()`.
#> ℹ In argument: `depth = as.numeric(str_split_fixed(site_id, "-", 2)[, 2])`.
#> Caused by warning:
#> ! NAs introduced by coercion
#> Generating plot
#> temp
#> Warning: Removed 567 rows containing missing values (`geom_point()`).
#> salt
#> Warning: Removed 798 rows containing missing values (`geom_point()`).
#> zone1temp
#> zone2temp
#> lw_factor
#> extc
#> Warning: Removed 1 row containing missing values (`geom_line()`).
#> secchi
#> Warning: Removed 1 row containing missing values (`geom_line()`).
#> Warning: Removed 18 rows containing missing values (`geom_point()`).
#> Putting forecast
#> successfully generated flare forecats for: fcre-2022-10-02-test.nc
Visualizing output
df <- arrow::open_dataset(file.path(lake_directory,"forecasts/parquet")) |> collect()
head(df)
#> # A tibble: 6 × 12
#> datetime pubDate depth family parameter variable
#> <dttm> <dttm> <dbl> <chr> <int> <chr>
#> 1 2022-09-28 00:00:00 2023-10-08 18:22:43 0 ensemble 1 temperature
#> 2 2022-09-28 00:00:00 2023-10-08 18:22:43 0 ensemble 2 temperature
#> 3 2022-09-28 00:00:00 2023-10-08 18:22:43 0 ensemble 3 temperature
#> 4 2022-09-28 00:00:00 2023-10-08 18:22:43 0 ensemble 4 temperature
#> 5 2022-09-28 00:00:00 2023-10-08 18:22:43 0 ensemble 5 temperature
#> 6 2022-09-28 00:00:00 2023-10-08 18:22:43 0 ensemble 6 temperature
#> # ℹ 6 more variables: prediction <dbl>, forecast <dbl>, variable_type <chr>,
#> # site_id <chr>, model_id <chr>, reference_datetime <chr>
df |>
filter(variable == "temperature",
depth == 1) |>
ggplot(aes(x = datetime, y = prediction, group = parameter)) +
geom_line() +
geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
labs(title = "1 m water temperature forecast")
df <- arrow::open_dataset(file.path(lake_directory,"scores/parquet")) |> collect()
head(df)
#> # A tibble: 6 × 18
#> datetime family variable observation crps logs mean median
#> <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2022-09-28 00:00:00 sample extc NA NA NA NA NA
#> 2 2022-09-28 00:00:00 sample salt NA NA NA 0.100 0.100
#> 3 2022-09-28 00:00:00 sample tempera… 20.5 0.432 1.47 21.0 21.1
#> 4 2022-09-29 00:00:00 sample extc NA NA NA 0.87 0.87
#> 5 2022-09-29 00:00:00 sample salt NA NA NA 0.0994 0.0993
#> 6 2022-09-29 00:00:00 sample tempera… 19.6 0.0329 -0.956 19.6 19.6
#> # ℹ 10 more variables: sd <dbl>, quantile97.5 <dbl>, quantile02.5 <dbl>,
#> # quantile90 <dbl>, quantile10 <dbl>, horizon <dbl>, depth <dbl>,
#> # site_id <chr>, model_id <chr>, reference_datetime <chr>
df |>
filter(variable == "temperature",
depth == 1) |>
ggplot(aes(x = datetime)) +
geom_ribbon(aes(ymin = quantile10, ymax = quantile90), alpha = 0.5, fill = "lightblue") +
geom_line(aes(y = median)) +
geom_point(aes(y = observation), color = "red") +
geom_vline(aes(xintercept = as_datetime(reference_datetime))) +
labs(title = "1 m water temperature forecast")