FLAREr configurations
flare-config-vignette.RmdA guide to the variables in configure_flare.yml and in
the observations_config.csv,
parameter_calibration_config.csv, and
states_config.csv.
configure_flare.yml
configure_flare.yml is required to be located in your
{lake_directory}/configurations/{config_set_name}
directory.
location:
- site_id: Four letter code for lake 
- name: name of lake 
- latitude: latitude in degrees north 
- longitude: longitude in degrees east 
da_setup:
- 
da_method: code for data assimilation method ( enkforpf)- enkf: ensemble Kalman Filter
- pf: bootstrap resampling particle filter
 
- 
par_fit_method: method for parameter fitting - inflate uses the parameter - inflat_parsin the par configuration to increase the variance of the parameters when data is assimilated. This is the method used in Thomas et al. 2023
- perturb adds normal random noise to each parameter based on the parameter - perturb_parin the par configuration.
- perturb_const Data assimilation fits the mean of the parameter distribution uses a specified variance for parameters defined by the parameter - perturb_parin the par configuration
 
- ensemble_size: number of ensemble members 
- localization_distance: distance in meters over which covariances are in the enkf covariance metric is diminished. The distance governs the exponential decay of the covariance strength. 
- no_negative_states: Force non-temperature states to be positive ( - TRUEor- FALSE)
- assimilate_first_step: Assimilate data provided by the initial conditions. Set to FALSE if the initial conditions already have data assimilated. ( - TRUEor- FALSE)
- use_obs_constraint: Assimilate observations ( - TRUEor- FALSE)
- obs_filename: file name of targets file. It is required to be located in the - lake_directory/targets/{site_id}directory.
- pf_always_resample: Force the particle filter to resample each timestep (only when - da_method = pf) (- TRUEor- FALSE)
model_settings:
- ncore: number of process cores to use 
- model_name: name of process model ( - glmor- glm_aed)
- base_GLM_nml: name of base GLM namelist. It is required to be in the - lake_directory/configuraton/{config_set}directory.
- max_model_layers: maximum number of layers allowed in the GLM simulations 
- modeled_depths: vector of depths (m) with observations or that the user desires output for. Value is for the top of the layer. 
- par_config_file: name of parameter configuration csv. It is required to be in the - lake_directory/configuraton/{config_set}directory. parameter_calibration_config.csv obs_config_file: name of observation configuration csv. It is required to be in the- lake_directory/configuraton/{config_set}directory.
- states_config_file: name of state configuration csv. It is required to be in the - lake_directory/configuraton/{config_set}directory.
- depth_model_sd_config_file: Optional state configuration file that specifies how process uncertainty depends on depth. If used it is required be in the - lake_directory/configuraton/{config_set}directory.
default_init:
- lake_depth: initial lake depth (meters) 
- temp: vector of initial temperature profile 
- temp_depths: vector of depths in initial temperature profile 
- salinity: initial salinity value (g/kg) 
- snow_thickness: initial snow thickness (m) 
- white_ice_thickness: initial white ice thickness (m) 
- blue_ice_thickness: initial blue ice thickness (m) 
met:
- 
future_met_model: path to met model used for
forecast days (relative to s3$drivers$bucketpath or local_met_directory). It defines the form of the path partitioning. For examplemet/gefs-v12/stage2/reference_datetime={reference_date}/site_id={site_id}provides the path with the the part in the brackets being updated within the FLARE run
- 
historical_met_model: path to met model used for
historical days (relative to s3$drivers$bucketpath or local_met_directory). It defines the form of the path partitioning. For examplemet/gefs-v12/stage3/site_id={site_id}provides the path with the the part in the brackets being updated within the FLARE run.
- forecast_lag_days: number of days to look backward for a forecast
- 
use_ler_vars: use LER standardized met names
(TRUEorFALSE)
- 
historical_met_use_s3: access historical met data
on s3 bucket (TRUEorFALSE)
- 
future_met_use_s3: access met data on s3 bucket
(TRUEorFALSE)
- 
use_openmeteo: use openmeteo for meterology inputs
(TRUEorFALSE)
- 
openmeteo_api: the name of the openmeteo api to use
(only used if openmeteo = TRUE);seasonal,ensemble_forecast,historical,climate
- 
openmeteo_model: name of the openmeteo model to use
( (only used if openmeteo = TRUE)); see https://open-meteo.com/en/docs for list of models
- 
use_openmeteo_archive: use a archived version of
openmeteo on s3 rather than directly using the api. (only used if
openmeteo = TRUE)
- local_met_directory: directory where meteorology forecasts are saved if not using s3 access. Relative to the lake_directory.
inflow:
- 
include_inflow: Include inflows in simulations
(TRUEorFALSE)
- 
include_outflow: Include outflows in simulations
(TRUEorFALSE)
- 
future_inflow_model: path to inflow model used for
forecast days (relative to s3$inflow$bucketpath or local_inflow_directory). It defines the form of the path partitioning. For exampleinflow/model_id=historical/reference_datetime={reference_date}/site_id={site_id}provides the path with the the part in the brackets being updated within the FLARE run
- 
historical_inflow_model: path to inflow model used
for historical days (relative to s3$inflow$bucketpath). It defines the form of the path partitioning. For exampleinflow/model_id=historical/site_id={site_id}provides the path with the the part in the brackets being updated within the FLARE run.
- local_inflow_directory: directory where inflow forecasts are saved if not using s3 access. Relative to the lake_directory.
- 
future_outflow_model: path to outflow model used
for forecast days (relative to s3$outflow$bucketpath or local_outflow_directory)
- 
historical_outflow_model: path to outflow model
used for historical days (relative to s3$outflow$bucketpath or local_outflow_directory)
- local_outflow_directory: directory where outflow forecasts are saved if not using s3 access. Relative to the lake_directory.
- 
use_flows_s3: access flow models on an s3 bucket
(TRUEorFALSE)
uncertainty:
- observation: Include uncertainty in observations ( - TRUEor- FALSE)
- process: Include normal random noise added to states during forecast ( - TRUEor- FALSE)
- weather: Include multiple weather forecast ensemble members ( - TRUEor- FALSE)
- initial_condition: Include uncertainty in states at initiation of forecast ( - TRUEor- FALSE)
- parameter: Include uncertainty in parameters during forecast ( - TRUEor- FALSE)
- inflow: Include uncertainty in inflow during forecast ( - TRUEor- FALSE)
output_settings:
- diagnostics_names: names of non-state GLM variables to save
- 
generate_plots: generate diagnostic plots
(TRUEorFALSE)
- 
diagnostics_daily:
- names: the variable names in the csv file produced by GLM. Options are at https://github.com/AquaticEcoDynamics/glm-aed/wiki/Navigating-GLM-outputs
- 
save_names: the name of the variable in the FLARE
forecast output. This may differ from csv_namewhen you want to add more information to the variable name. For example the csv may havetempbut you want to save it asoutflow_tempso you know it is from the outflow. If the output file isoutput.ncthen thesavename needs to the aggregation function (mean,min, andmaxare supported). For example, a value oftemp_meanwould calculate the daily mean from the subdaily output.nc file.nsavein the glm3.nml file needs adjusted to output at sub-daily time steps (nsave = 1would output hourly ifdt = 3600).
- 
file: the name of the GLM output csv that has the
variable. This is defined in the GLM nml. Example are
lake.csv,outlet_00.csv, andoutput.nc
- depth: Depth for daily diagnostic. Use NA for variables that are not associated with a depth (like co2 flux).
 
s3
- 
drivers:
- endpoint: s3 endpoint of met drivers
- bucket: s3 bucket of met drivers
 
- 
inflow_drivers:
- endpoint: s3 endpoint of inflow drivers
- bucket: s3 bucket of inflow drivers
 
- 
outflow_drivers:
- endpoint: s3 endpoint of outflow drivers
- bucket: s3 bucket of outflow drivers
 
- 
targets:
- endpoint: s3 endpoint of target files
- bucket: s3 bucket of target files
 
- 
forecasts_parquet:
- endpoint: s3 endpoint of forecast parquet files
- bucket: s3 bucket of forecast parquet files
 
- 
restart:
- endpoint: s3 endpoint of restart yaml file
- bucket: s3 bucket of restart yaml file
 
- 
scores:
- endpoint: s3 endpoint of scores
- bucket: s3 bucket of scores
 
parameter_calibration_config.csv
parameter_calibration_config.csv is required to be
located in your
{lake_directory}/configurations/{config_set_name}
directory.
- par_names: vector of GLM names of parameter values estimated
- par_names_save: vector of names of parameter values estimated that are desired in output and plots
- par_file: vector of nml or csv file names that contains the parameter that is being estimated
- par_init: vector of initial mean value for parameters
- par_init_lowerbound: vector of lower bound for the initial uniform distribution of the parameters
- par_init_upperbound: vector of upper bound for the initial uniform distribution of the parameters
- par_lowerbound: vector of lower bounds that a parameter can have
- par_upperbound: vector of upper bounds that a parameter can have
- 
perturb_par: The parameter controlling the noise or
spread in the parameters. If the parameter fitting method is
perturbthen it is the standard deviation of the normally distributed random noise that is added to parameters. If the parameter fitting method isperturb_const, then it is the standard deviation of the parameter distribution. If the parameter fitting method isinflateit is the The variance inflation factor applied to the parameter component of the ensemble (Value greater than 1).
- par_units: Units of parameter for plotting
- fix_par: 0 = fit parameter, 1 = hold parameter at par_init
states_config.csv
states_config.csv is required to be located in your
{lake_directory}/configurations/{config_set_name}
directory.
- state_names: name of states.
- 
initial_conditions: The initial conditions for the
state if observations are not available to initialize. Assumes the
initial conditions are constant over all depths, except for temperature
which uses the default_temp_initvariable inconfigure_flare.Rto set the depth profile when observations are lacking
- model_sd: the standard deviation of the process error for the tate
- vert_decorr_length:
- initial_model_sd: the standard deviation on the initial distribution of the state
- states_to_obs_mapping: a multiplier on the state to convert to the observation. In most cases this is 1. However, in the case of phytoplankton, the model predicts mmol/m3 biomass but the observations are ug/L chla. Therefore the multiplier is the biomass to chla conversion
- 
states_to_obs_1: The observation that the state
contributes to
- 
NAis required if no matching observations
- Name in this column must match an observation name
 
- 
- 
states_to_obs_2: A second observation that the
state contributes to
- 
NAis required if no matching observations
- Name in this column must match an observation name
 
- 
- init_obs_name: the name of observation that is used to initialize the state if there is an observation
- init_obs_mapping: a multiplier on the observation when used to initialize. For example, if using a combined DOC measurement to initialize two DOC states, you need to provide the proportion of the observation that is assigned to each state.
depth_model_sd.csv:
depth_model_sd.csv is optional and should be located in
your {lake_directory}/configurations/{config_set_name}
directory.
- depth: depth (m)
- column names: variable name for states that have depth varying process uncertainty. Values are the sd for each depth. sd will be interpolated between and extrapolated beyond the depths provided.
observations_config.csv
observations_config.csv is required to be located in
your {lake_directory}/configurations/{config_set_name}
directory.
- state_names_obs: names of states with observations
- obs_sd: the standard deviation of the observation uncertainty
- target_variable: the name of variable in the data file that is used for the observed state.
- multi_depth: 1 = observation has multiple depths, 0 = observation does not have a depth associated with it.