convergence_tools Namespace Reference

Functions

def next_pow_two (n)
 
def autocorr_func_1d
 
def auto_window (taus, c)
 
def ensemble_autocorr
 
def mean_ess
 
def read_elklhd (lname, thin, warmup_frac)
 
def read_echain
 
def load_erun
 
def join_echains (echains)
 
def mean_echain (echain)
 
def var_echain (echain)
 
def ranknormalized_chains (chains)
 
def fold_chains (chains)
 
def bulk_split_rhat (chains)
 
def split_rhat (chains)
 
def _rhat (chains)
 
def find_eautocorr (echain)
 
def plot_rank_hist
 
def save_ensemble_diagnostics
 

Detailed Description

Some guidlines for the functions here

This contains the functions we are currently using for the convergence
tools in Themis. Since we are using ensemble samplers to find the
autocorrelation time (ACT) I typically use the integrated ACT (IACT)
were the ACT is calculated for each chain and then averaged.

The interesting questions then becomes for the ESS which is usually
the number of steps divided by the IACT. This is important because the
monte carlo error for the chain estimates are functions of the ESS NOT
THE NUMBER OF STEPS. Futhermore, finding the total ESS across all
chains is not simple. This is because the walkers are all correlated
so you can't really just add them. This is something I need to look
further into


Additionally in the future if you use multiple INDEPENDENT RUNS, then
you should compute the Rhat statistic between the runs. If the Rhat is
> 1.05 then the chains are not converged and you should restart the
runs and run them at least a factor of 2 longer.

Function Documentation

def convergence_tools._rhat (   chains)
private
Utility function
Computes the rhat between the chains. This only assumes that we are using a single
chain. If using an ensemble method please first convert it into a single chain by 
taking the expectation of a function, i.e. a mean, across the walkers.

Here is the caller graph for this function:

def convergence_tools.bulk_split_rhat (   chains)
Computes the bulk rhat using rank normalized chains. The returned value is the 
maximum of the folded rhat and regular rank normalized rhat as recommended in Vehtari 2019.
Parameters:
    - `chains` is an single walker chains, with indexing runs,steps,params
Returns:
    - split rhat statistic from Gelman 2013 for each parameter in the chain using rank-normalized values.

Here is the call graph for this function:

Here is the caller graph for this function:

def convergence_tools.ensemble_autocorr (   param_chain,
  c = 5.0 
)
Computes the integrated autocorrelation time
for the chain, using emcee's formula.
`param_chain` is the chain for a parameter over n walkers,
so the `param_chain` shape is walkers,steps

Here is the caller graph for this function:

def convergence_tools.find_eautocorr (   echain)
Finds and returns the IACT for each parameter in the ensemble chain.

Here is the call graph for this function:

Here is the caller graph for this function:

def convergence_tools.fold_chains (   chains)
Returns the chain folded around the median of all the chains.
This helps with measuring rhat in the tails

Here is the caller graph for this function:

def convergence_tools.join_echains (   echains)
Join a list of ensemble chains from separate runs. If chains have a different number of steps
then the largest even common size is used to join them.
def convergence_tools.load_erun (   cname,
  lname,
  thin,
  plist = [],
  warmup_frac = 0.5 
)
Loads the ensemble chain and likelihood files
Params:
    - cname<str> Is the name of the chains file
    - lname<str> Is the name of the lklhd file
    - thin <int> Is the thinning of the lkhd and chain
    - plist<arr<int>> Is the list of parameters to read in, e.g. [1,2,5] will read in the first second and 5 parameter from the chain file.
                      defaults to [] which reads in all the parameters.
    - warmup_frac Is the fraction of the chain to remove due to warmup. The default is 0.5 so the first 50% of the chain will be removed.
Returns:
    chain, lklhd
Where the chain is a 3d array using the walker,steps,param indexing.

Here is the call graph for this function:

def convergence_tools.mean_echain (   echain)
Computes the average of the ensemble chain for each step. That is, 
we average over the walkers at each step converting the chain into a
single particle chain of the average.
def convergence_tools.mean_ess (   y,
  c = 5.0 
)
Calculates the average ESS across chains by dividing the number of steps by the IAC.
Note the actual ESS of the entire chain and walkers is > than this, but likely isn't
the number of walkers * mean_ess since the chains are correlated.

Here is the call graph for this function:

def convergence_tools.plot_rank_hist (   chains,
  bins = 30,
  names = None 
)
Computes the rank plots, where the rank is taken over all the runs. 
The histograms should be roughly uniform. Any structure in them signals 
that the chains aren't converged or mixing well, i.e. your Monte Carlo estimates
will be biased.

Chains must be the chains for a specific parameter over multiple runs with indexing run,steps
def convergence_tools.ranknormalized_chains (   chains)
Computes the rank-normalized chains, where the rank is taken over all the chains.
This allows rhat to be used when we the distribution might not have a finite
mean or variance.

Returns the rank normalized chains, i.e. the z-scores and the actual ranks.

Here is the caller graph for this function:

def convergence_tools.read_echain (   cname,
  nwalkers,
  nskip,
  thin,
  plist = [] 
)
Reads in a Themis chain from the ensemble methods.
Arguments:
 - `cname` : string of the file containing the chain.
 - `nwalkers` : number of walkers used in the ensemble sampler
 - `nskip` : number of MCMC steps to skip, i.e. warmup period.
 - `thin` : controls how much we want to thin the chain, so every thin step will be stored.
Keyword arguments:
 - `plist` : list of integers of parameters to read in. Default is [] which reads in every parameter
Returns:
 chain file with dimensions step,walker,param indexing.

Here is the caller graph for this function:

def convergence_tools.read_elklhd (   lname,
  thin,
  warmup_frac 
)
Reads in a themis likelihood file, `lname` storing only the 1-`warmup_frac` part
for every `thin` steps. Returns the lklhd file in an 2D array with dimensions stepsXwalkers,
the number of steps that were skipped and the number of COMPLETED steps.

Here is the caller graph for this function:

def convergence_tools.save_ensemble_diagnostics (   out,
  chains,
  chainsname,
  method = "rank",
  nthin = 1 
)
Outputs the diagnostics of the multiple chains in `chains` to the out file, where the chain is kept track of in `chainsname`.
Currently it only outputs the chains integrated 
autocorrelation time for each parameter. Eventually other
diagnostics will be included, such as the Gelman-Rubin
diagnostic, i.e. split-rhat, and ESS. Currently the default method
for rhat is the rank method from Vehtari 2019.

Note that if IACT is "significantly different" between chains, then at least one of the chains isn't converged!.

nthin is included the multiple the IACT of the `chain` passed.

Here is the call graph for this function:

def convergence_tools.split_rhat (   chains)
Computes the split rhat statistic from Gelman 2013. This is able to deal with distributions that don't have finite mean and/or variance. Additionally it splits the chain in two in order to check if the beginning and end of the chain have similar variance to check for intrachain convergence.
Parameters:
    - `chains` is an single walker chains, with indexing runs,steps,params
Returns:
    - split rhat statistic from Gelman 2013 for each parameter in the chain.

Here is the call graph for this function:

Here is the caller graph for this function:

def convergence_tools.var_echain (   echain)
Computes the variance of the echain for each step. That is, 
we compute the variance of the walkers at each step, converting the chain
into a single particle chain of the variance between chains.