Developing leapfrog • leapfrog

This vignette details how to make changes and add extensions to leapfrog. There are some organisation and structural things about leapfrog which were added to enable different researchers to make extensions to the model. In a way that we can turn these on or off at compile time to run different variants of the model.

The overall aim of this is to allow researchers to write these model extensions with as little overhead as possible. If you find something annoying or difficult, let me know and we can probably try and simplify it. Or at least document it better.

Prerequisite

R for generating test data and runnng R wrapper
uv for codegen and Python wrappers
(Optionally) CMake (>3.15) if running the C++ standalone wrapper or Python wrappers

Initial setup

To make changes to Leapfrog you will first need to

Create the test data by running ./scripts/create_test_data.R from the project root
Generate the C++ code by running ./scripts/generate from the project root

You should then be able to run leapfrog via R, Python, or C++. See the README.md in the specific wrappers.

Structure

Model variants

Leapfrog has a set of ModelVariants which can be run. See leapfrog-core/model_schemas/ModelVariants.json for details of these. Each model variant is a collection of boolean switches or enums. These are used to turn on or off different parts of the code when it is run, so that leapfrog can have different extensions and we can compose these together in any way we want. These model variants are evaluated at compile time. When you compile the code, there will be an instance in the compiled binary for each variant your code will call. This will cause the binary to be larger but we chose to do it this way because we wanted the speed from the compile-time polymorphism.

All model functions are templated on the model variant, and any conditional behaviour based upon the model variant should be written as an if constexpr.

Model variant functions are in the models directory. Every model variant should be created as a struct. We can use the struct to alias state space variables or bits of the config to make the code more easily readable. The struct should expose at least one public function which can then be called from the project_year function.

To explain what is going on in the model struct we can look at the adult HIV model as an example

Templating on Config - this is used to ensure that the code is not compiled when we’re running a variant in which it is disabled.
We alias parts of the config and define private state space variables so we do not need to use fully qualified names later in the code and can instead use the shorthand for readability.
We define args as part of the struct constructor these hold the actual runtime data such has
- t - the current time step of the model as an index, e.g. if running for 1970:2030, this will start at 1 and loop to 61. Any input data you read based on time step index should have 1970 at the first index. Index 1 in R and 0 in C++.
- pars - the parameters for this model, these are read-only values
- state_curr - the state at the current point in time. This is read-only.
- state_next - the state at the next time point. This is what we are currently populating from the previous time step and the parameters.
- intermediate - a place for storing any data used within a single time step. Use this as to store intermediate values for use later in your code. This is reset to all 0s at the end of every time step.
One or more public functions that we can call from project_year
Zero or more private functions these do the actual work of the model and will never be called from outside the struct. You can create as many or as few as you want, they should be used to organise different parts of the model code as needed.

Note that after each time step the code will do the following

Optionally save out the state see State saving section.
Replace state_curr with state_next.
Set new state_next to all 0s.
Set intermediate to all 0s.

State saving

Leapfrog runs a top-level loop over the time step. At the end of each time step, the state is optionally saved out and eventually returned. We do it this way because it decouples the reporting of the model and each time step iteration. When you run the model with run_model you can specify which years you want to output data for. By default it will output for all time steps, but if say you are only interested in the last time step. You can return this by running e.g.

run_model(data, parameters, 1970:2030, 10, 2030)

This time output is managed by the internal OutputState struct see e.g. leapfrog-core/include/leapfrog.hpp

Modifying config to add new input, output or intermediate data

Leapfrog uses code generation to write the code for wiring up the input and output data. This is to reduce the number of locations you need to make changes when you add new input data or return new data from the model. In short it amounts to updating one of the configs at leapfrog-core/model_schemas/configs and running scripts/generate

Config structure

The config is JSON, it has the following sections:

name - The name of this model variant type, it should be short. It is used in C++ code for the type which holds the state space, input data, intermediate data and output data associated with this variant.
long_name - A long name for the model variant, at the moment used on for Delphi interface (to distinguish between 2 digit module codes used by Avenir internally)
namespace - The name of an instance of the model variant, used in C++ code.
enable_if - A conditional for when the model variant should be active. This is a compile time conditional based on model variant booleans. When the condition is true, the input data must be supplied and output data will be returned.
state_space - An object containing named integers. These are the dimensions for the statically allocated input, intermediate and output data. A variant can use state space parameters from other variants.
pars - The inputs to the model used in this variant. You can use parameters supplied by other variants. Each parameter can define:
1. A “num_type” which should be “int” or “real_type”. “real_type” is used by TMB but for normal running is just a double.
2. (optionally) “dims” - which is an array of sizes, it can use values from the state space, options or expressions e.g. opts.proj_steps * opts.hts_per_year. If no “dims” are set, assumed this is a scalar value.
3. (optionally) “alias” - a named list of aliases for the different language interfaces, at the moment only “r” is used and should be removed in the future. This should not be used for new parameters.
intermediate - use to define any intermediate bits of data used during the model run. We define them here because then they can be statically allocated instead of allocated every iteration of the time loop. These are automatically set to 0 at the end of every time step. Each piece of intermediate data needs a “num_type” and “dims”.
state - data output from the model. Defined as a JSON object, these are the results filled in during a model run. Each item needs a “num_type” and (optionally) “dims”. When a model is run the output is a named list/dictionary with keys matching from the JSON object and values corresponding to the “dims” with an additional time dimension. So if the state defines p_totpop with dims ["SS::pAG", "SS::NS"] the output will have p_totpop with 3 dimensions of lengths pAG, NS and number of output years.

After making changes to the config, run the generate script scripts/generate. Note that this will update several of the generated files. The generated files should never be manually changed as the generate script will completely rewrite it.

After regenerating the code, rebuild the project and you are ready to use the new input data in C++.

Adding a new model variant

There are two types of model variants at the moment:

A flag which turns a section of code on or off, this will come with additional model inputs and outputs
A flag which changes the dimensions of some model input or outputs

The required changes will be different depending on if your new variant is one of the first types, which brings additional input/output or the second type which does not bring additional data.

To add a new variant. Firstly, in the code generation:

Add a new flag and variant in leapfrog-core/model_schemas/ModelVariants.json. Make sure the new flag is set to true or false in all other model variants.
If you have new input data or output data for this variant, add a new config file in the configs dir and fill in the details as required
Run the generate script

In the C++ code:

Add new model code for your variant. At this point I would just add a skeleton with a print line to check your set up works, and fill in actual model code later. Model code should go here. You can use the following snippet as a template

#pragma once

#include "../options.hpp"
#include "../generated/config_mixer.hpp"

namespace leapfrog {
namespace internal {

// model_variant_flag1 & 2 need to be in pascal case, can do one of multiple model variant flags required for this model
template<typename Config>
concept {{ model_name }}Enabled = {{ model_variant_flag1 }}<Config> && {{ model_variant_flag2 }}<Config>;

template<typename Config>
struct {{ model_name }} {
  {{ model_name }}(...) {};
};

template<{{ model_name }}Enabled Config>
struct {{ model_name }}<Config> {
  using real_type = typename Config::real_type;
  using ModelVariant = typename Config::ModelVariant;
  using SS = Config::SS;
  using Pars = Config::Pars;
  using State = Config::State;
  using Intermediate = Config::Intermediate;
  using Args = Config::Args;

  // function args
  int t;
  const Pars& pars;
  const State& state_curr;
  State& state_next;
  Intermediate& intermediate;
  const Options<real_type>& opts;

  // only exposing the constructor and some methods
  public:
  {{ model_name }}(Args& args):
    t(args.t),
    pars(args.pars),
    state_curr(args.state_curr),
    state_next(args.state_next),
    intermediate(args.intermediate),
    opts(args.opts)
  {};

  void run() { std::cout << "Running new model\n" };
};

Call your new model variant at the appropriate point in the project_year function

You now need to update the wrappers you want to expose the model variant to. For the R wrapper:

Add the variant in the available model configurations.
Add mapping from the string to the model variant struct in Rcpp wrapper code in src/leapfrog.cpp.
Add a mapping from the string to the model variant struct in get_leapfrog_ss function.
Add a test which calls your new model variant, to make sure everything is wired up correctly.

For the Python wrapper:

Add the variant in the available model configurations.
Add mapping from the string to the model variant struct in the wrapper code.
Add a mapping fromt he string to the model variant struct in the get_leapfrog_ss function.

The C/Delphi interface and C++ interface have more specific usages so we probably don’t need to expose new variants via these interfaces. But if we do, speak to Rob for help with how to do this.