sampler_differential_evolution_deo_tempered_MCMC Class Reference

Parallel tempered Markov Chain Monte Carlo sampler. More...

#include "sampling/sampler_differential_evolution_deo_tempered_MCMC.h"

Collaboration diagram for sampler_differential_evolution_deo_tempered_MCMC:

Public Member Functions

 sampler_differential_evolution_deo_tempered_MCMC (int seed)
 Class constructor, accepts the integer seed for a random number generator as an argument.
 
void run_sampler (likelihood _L, int start_length, int thin, int temp_stride, int chi2_stride, std::string chain_file, std::string lklhd_file, std::string chi2_file, std::string annealing_file, std::vector< double > means, std::vector< double > ranges, std::vector< std::string > var_names, bool continue_flag, int output_precision=6, int verbosity=0, std::vector< double > inverse_temperatures=std::vector< double >(0))
 Function to run the sampler, takes a likelihood object, name of output files, and tuning parameters for the sampler and returns the MCMC chain, likelihoods and chi squared values. More...
 
double RndGaussian (double, double, bool)
 Function to generate random numbers with a gaussian distribution.
 
double RndUni (double, double)
 Function to generate real valued random numbers in an interval.
 
int RndUnint (int, int)
 Function to generate integer valued random numbers in an interval.
 
void set_cpu_distribution (int num_replicas, int num_walkers, int num_likelihood)
 Function to set the distribution of processors in different layers of parallelization. More...
 
void set_annealing_schedule (int num_rounds, int geometric_increase, double initial_spacing=1.15)
 Function to set the parameters of the tempering schedule. We now use the adaption scheme from Syed 2019. We need the number of adaption rounds to use and the geometric increase factor for the number of rounds in each sample. More...
 
void read_initial_ladder (std::string annealing_file)
 Function reads in an annealing.dat file and set the initial ladder for the sampler to the ladder from the final round in the annealing file. More...
 
void set_checkpoint (int ckpt_stride, std::string ckpt_file)
 Function to set the checkpoint/restart functionality. More...
 
void estimate_bayesian_evidence (std::vector< std::string > file_names, std::vector< double > temperatures, int burn_in)
 Function to estimate the bayesian evidence using the Thermodynamic Integration method. More...
 
std::vector< double > find_best_fit (std::string chain_file, std::string lklhd_file)
 Finds the best fit within provided chain file and returns it.
 

Private Member Functions

double update_annealing_params (double *beta, double R[])
 
double find_beta (Interpolator1D &fLambda, int k, double Lambda, double eps=1e-12, int MAX_ITR=1e8)
 

Private Attributes

int _nrounds
 Function that finds the communication barrier of the tempering problem.
 
int _b
 
double _initial_spacing
 
int TNum
 
int ChNum
 
int LKLHD_Num
 
int dim
 
int _ckpt_stride
 
bool default_cpu_distribution
 
bool _no_initial_ladder
 
std::string _old_annealing_file
 
std::string _ckpt_file
 
Ran2RNG _rng
 
GaussianRandomNumberGenerator< Ran2RNG_grng
 
MPI_Comm E_COMM
 
MPI_Comm T_COMM
 
MPI_Comm L_COMM
 
MPI_Comm C_COMM
 
int E_size
 
int E_rank
 
int T_size
 
int T_rank
 
int L_size
 
int L_rank
 
int C_size
 
int C_rank
 

Detailed Description

Runs parallel tempered differential evolution ensemble sampling Markov Chain Monte Carlo chains to sample the likelihood surface. This routine uses parallel tempering on top of the differential evolution method of Cajo J.F Ter Braak (2006). This implementation closely follows that of B. Nelson et. al (2013) used for analyzing radial velocity observations (RUN DMC code). We also use the DEO tempering and adaption scheme from Syed et. al (2019). Given an object of type likelihood (which encompases the likelihood, priors and chi squared) the sampler explores and samples the likelihood surface over its dependent parameters. It will provide a sampling of the posterior probability distribution. The likelihood and the chi squared evaluated at the sampled points are also provided by the sampler.

Member Function Documentation

void estimate_bayesian_evidence ( std::vector< std::string >  file_names,
std::vector< double >  temperatures,
int  burn_in 
)
Parameters
file_namesVector of strings holding the names of the likelihood files corresponding to each tempered level.
temperaturesVector containing the temperature values for the corresponding log-likelihood sample files. The order of temperatures and likelihood file names in their vectors must be the same.
burn_inNumber of burn-in samples to exclude from the analysis.
void read_initial_ladder ( std::string  annealing_file)
Parameters
annealing_fileis the file containing the annealing information for the run. Namely the round, ladder, and rejection rates.
void run_sampler ( likelihood  _L,
int  start_length,
int  thin,
int  temp_stride,
int  chi2_stride,
std::string  chain_file,
std::string  lklhd_file,
std::string  chi2_file,
std::string  annealing_file,
std::vector< double >  means,
std::vector< double >  ranges,
std::vector< std::string >  var_names,
bool  continue_flag,
int  output_precision = 6,
int  verbosity = 0,
std::vector< double >  inverse_temperatures = std::vector<double>(0) 
)
Parameters
_LAn object of class likelihood.
lengthNumber of steps (stretch moves) taken by the ensemble sampler.
thinFrequency that output is saved, e.g. if thin=10 then every 10 steps will be saved into the Chain and Lklhd file. Ideally this is set to the autocorrelation time of the sampler, but this is unknown before actually running the problem.
temp_strideNumber of steps between subsequent communication among chains of different temperatures.
chi2_strideNumber of steps between outputting Chi squared values.
chain_fileString variable holding the name of the output MCMC chain file.
lklhd_fileString variable holding the name of the output likelihood file, contains log-likelihood values for each MCMC step.
chi2_fileString variable holding the name of the output chi squared file, contains chi squared values for the MCMC chain.
annealing_fileString variable holding the name of the output file that contains important stats for acceptance rates between tempering levels. Also produces the summary file which appends .summary to the annealing_file string.
meansVector holding the mean values of parameters used for initializing the MCMC walkers.
rangesVector holding the standard deviation of parameters used for initializing the MCMC walkers.
var_namesA vector of strings. It holds the names for each sampled variable. The names are compiled as a header in the "chain_file". If the vector doesn't contain any names the header will not be generated. If the header is present The Themis analysis tools can use it to correctly label the generated diagnostics plots.
continue_flagBoolean variable. If set to "True" the sampler would use a checkpoint file to resume it's state and continue the run. If the output files exist the new data is appended to the same files. If set to "false" it will start a new chain using the provided "means" and "ranges" variables to initialize the chain. Note in the latter case existing output files will be overwitten by the new ones.
output_precisionSets the output precision, the number of significant digits used to represent a number in the sampler output files. The defaul precision is 6.
verbosityIf set to one chain files will be produced for all tempering levels, otherwise only the lowest temperature will produce a chain file which is the deisred posterior probability distribution
inverse_temperaturesOptional vector to set the \( \beta=1/T\) used for parallel tempering.

Here is the call graph for this function:

void set_annealing_schedule ( int  num_rounds,
int  geometric_increase,
double  initial_spacing = 1.15 
)
Parameters
num_roundsis the number of rounds to run
void set_checkpoint ( int  ckpt_stride,
std::string  ckpt_file 
)
Parameters
ckpt_strideNumber of steps between writing a new checkpoint.
ckpt_fileString variable holding the name of the output checkpoint file.
void set_cpu_distribution ( int  num_replicas,
int  num_walkers,
int  num_likelihood 
)
Parameters
num_replicasInteger value ( \( \geq 1 \)). Number of replicas of the monte carlo process, i.e. number of temperatures. If set to one, the sampler will run without tempering.
num_walkersNumber of walkers used by ensemble sampler. This should be at least a few times the dimension of the parameter space.
num_likelihoodnumber of threads allocated for each likelihood calculation.

The following plot shows how the sampler scales with different number of walkers per MPI process. The green line shows the ideal case of linear scaling where the run time is inversely proportional to the number of MPI processes used. The purple line shows how the sampler scales with the number of MPI processes. As can be seen in the figure the scaling closely follows linear scaling and always remains within \(\%20\) of the ideal linear scaling.

sampler_scaling2.png
Sampler scaling plot. The green line shows the linear scaling.

The documentation for this class was generated from the following files: