# Transformed density rejection sampling

In the first post a short introduction to non-uniform sampling
was given. This post explains the Tinflex algorithm, which is implemented in
`mir.random.flex`

.

## Idea

The main idea of the Flex algorithm is to split the density distribution
into multiple intervals and sample from these
constructed intervals using the *composition* method.
For each interval a *hat* and *squeeze* function can be constructed.
Thus with the approximated hat and squeeze function of a selected interval
a value $x$ can be sampled
using the *rejection with inversion* method.

For example, with efficiency $\rho = 1.8$ the normal distribution is split into six intervals with different hat and squeeze functions. The efficiency of the sampling can iteratively be improved by splitting the composition into more intervals and thus yielding a better approximation to the density function.

In the following we will quickly dive into the essential basics behind the Tinflex algorithm. In case your last math lesson was long ago, feel free to jump directly to the examples and come back when you want to understand the inner workings of the Tinflex algorithm in more details.

## Tinflex transformations

Remember that the Tinflex algorithm splits the density distribution into intervals
and constructs a hat and squeeze function for each interval.
For the approximation of a hat or squeeze function, the Tinflex algorithm
constructs linear functions for *concave*, *convex* intervals (or both if the interval
contains only one inflection point).
However as most distributions aren’t *concave*, but
*log-concave*, Tinflex uses a family of transformations for the
probability density function (pdf) as it operates in *logspace*
and can thus “force” most distributions to be concave. The
transformations can be grouped in two classes: (1) $c \neq 0$ and (2) $c = 0$
, which will subsequently be analyzed.

### $c$-transformations with $c \neq 0$

The general transformation function is $T(x) = x^c$, however as its inverse function is only defined in $\mathbb{C}$, its needs to be restricted to positive numbers only:

With the usual inversion rules and if $c > 0$ we get:

which is $y^{1 / c}$ and if $c < 0$ we obtain:

which is $(-y)^{1 / c}$.

Thus in general the inverse function is defined as:

Two examples of $T_c$ transformations can be seen below:

Please note that (1) in the Tinflex paper and Mir implementation this case is split into multiple common cases to obtain simpler formulas and thus reduce numerical errors. Furthermore, (2) only with $c = 0$ the transformation is two-sided, thus for $c < 0$ the inverse transformation $T^{-1}(x)$ in $\mathbb{R}$ is only defined for $x < 0$.

### $c$-transformations with $c = 0$

The special case for $c = 0$ needs to be made as division by zero isn’t defined, however for this case the natural logarithm and it’s inverse the exponential function can be used:

### Input of Tinflex

Remember that $f(x)$ is the pdf, and is its transformation $\tilde{f}(x)$ is:

The Tinflex algorithm expects the **log**-density, which means that for $c = 0$, no
transformation needs to be applied. However, for $c \neq 0$ the inverse
is needed. This mean we first need to apply the *inverse* $T_c^{-1}(x) = exp(x)$
and then apply the other $T_c$ transformation:

## Linear functions

### Construction of linear functions

For every interval with function $\tilde{f}$ (black), we define the three linear functions: its left tangent (orange), its right tangent (red), and its secant (blue):

As it can be seen in the graphs of the two major case categories (concave, convex) at least one linear function is majorizing $\tilde{f}$ (e.g. both tangents in the concave case), and at least one linear function is majorized by $\tilde{f}$ (e.g. the secant in the concave case). Botts et. al. (2011) distinguish between eight cases and prove that this observation of finding at least one linear hat and squeeze function is valid as long as there is at most one inflection point in the interval.

### Definition of the linear functions

The following definition for linear functions is used:

Remember that the transformation is applied with $T_c(x)$ and that the hat
function majorizes the *transformed* pdf, whereas the *transformed* pdf
majorizes the squeeze function.

Both the left and right tangent can be defined as follows, where $x_0$ is either $l$ or $r$:

Moreover for the secant we can pick either $\ell$ or $r$ as root point, and thus only the definition of the slope $\beta$ is different:

With these definition we can integrate the linear function to calculate the area
below the hat and squeeze function. Furthermore, using the their integral the
inverse function for the cumulative density function can be calculated, which
is used to map a uniform point $u$ to a point $x$ which is proportional
to the density of $h(x)$. These details can be found in Tinflex paper
and in the Mir implementation. For `mir.random.flex`

additional
supplementary material is provided which explains the used integrations,
inversions and approximations.
In the following a more practical side of the Tinflex algorithm will be given by showing
two examples for the construction of a random distribution with the Tinflex method.

## Example 1: Dagum distribution

Assume you would want to draw random values from a non-common random distribution like e.g. the Dagum distribution which has the following probability density function:

For simplicity of this example we fix $a$, $b$ and $p$ to $1$ and thus obtain:

For the Tinflex algorithm the pdf function and its first and second derivative need to be given in log-space:

Its first and second derivative after $x$ are:

In case you haven’t used the differentiation syntax on Wolfram Alpha, you should familiarize yourself with it. For example, the correctness of the first derivate and second derivative can be quickly checked. Last but not least, the Tinflex algorithm requires that each interval can have at most one inflection point. Hence we need to check the plot of $\tilde{f}(x)$ for inflection points. However as there none if $x > 0$, we can freely pick any interval as starting intervals and can thus construct the random distribution:

```
import std.math : log, pow;
alias S = double;
// log-transformed pdf + first two derivatives
auto f0 = (S x) => cast(S) log(1 / pow(x + 1, 2));
auto f1 = (S x) => -2 / (1 + x);
auto f2 = (S x) => 2 / pow(1 + x, 2);
// which c-transformation function to use
S c = 0.5;
// the range of the interval that should be sampled from
// if there are multiple inflection points, more points need to be given
S[] points = [0, 10, S.max];
// target efficiency of hatArea / squeezeArea
S rho = 1.1;
flex(f0, f1, f2, c, points, rho);
```

We can also have a look under the hood and inspect hat / squeeze plot of the Dagum distribution (hat: red, squeeze: green). For a clearer visibility only the range $[0, 3]$ is shown:

As it can be seen $\rho$ has a huge impact on the resulting speed, but *not*
on the statistical quality. Similarly the predefined intervals have only a slight
impact on the algorithm itself as e.g. they limit the left and right start border.
Picking a $c$-transformation can influence the numerical errors,
construction and sampling performance and even the number of intervals.
Hence usually a pre-definded transformation like $0$ (the “pure” log transformation) or $1$ is used.
However there are some restrictions, for example for unbounded domains $c > -1$ is required,
but $c$ also needs to be sufficiently small, so usually $-0.5$ is used.

## Example 2: Gompertz distribution

Assume you would want to sample from the Gompertz distribution which has the following probability density function:

For simplicity, we set $\eta = 0.005$ and $b = 1.5$ and thus get

And consequently in log-space:

Its first and second derivative after $x$ are:

Note that for log-density also no inflection point exists and thus we can define our sampling procedure:

```
import std.math : log, pow;
alias S = double;
// log-transformed pdf + first two derivatives
auto f0 = (S x) => cast(S) log( S(0.00753759) * exp(S(-0.005) * exp(S(1.5) * x) + S(1.5) * x));
auto f1 = (S x) => S(1.5) - S(0.0075) * exp(1.5 * x);
auto f2 = (S x) => S(-0.01125) * exp(S(1.5) * x);
// which c-transformation function to use
S c = 1.5;
// the range of the interval that should be sampled from
S[] points = [0, 6, S.max];
// target efficiency of hatArea / squeezeArea
S rho = 1.1;
flex(f0, f1, f2, c, points, rho);
```

We can also have a look under the hood and inspect hat / squeeze plot of the Gompertz distribution (hat: red, squeeze: green). For a clearer visibility only the range $[0, 6]$ is shown:

## Benchmarks

### Constructing intervals

In the following the R Tinflex implementation and the Mir Flex implementation are compared. For the benchmark the time to construct the (Tin)flex intervals with efficiency $\rho$ for $n$ iterations has been measured:

$\rho$ | n | Flex | Tinflex |
---|---|---|---|

1.1 | 10K | 122 ms | 79.473s |

1.01 | 1K | 46 ms | 21.229s |

1.001 | 1K | 169 ms | 79.047s |

Due to the enormous time consumption of the R implementation, the number of iterations has been reduced from 10K to 1K for higher $\rho$. With the Mir Flex implementation a performance boost for the construction phase of the factor 500-600 in comparison to the R Tinflex implementation can be observed.

Not only has the D implementation been engineered very carefully with performance in mind,
but it also uses several tricks like a `BinaryHeap`

to pick the best interval to split,
so that the *number of constructed intervals* is also lower:

$\rho$ | Flex | Tinflex |
---|---|---|

1.5 | 8 | 9 |

1.1 | 15 | 20 |

1.01 | 45 | 49 |

1.001 | 141 | 175 |

1.0001 | 452 | 496 |

1.00001 | 1361 | 1824 |

The source of this benchmark is available online.

### Sampling

A benchmark between multiple existing normal distribution samplers of sampling
10M random values has been made.
All existing libraries (dstats,
hap,
atmosphere)
use the Box-Muller transformation
to sample random values. Moreover intervals with `mir.random.flex`

with three
different efficiencies $\rho$ ($1.3, 1.1, 1.001$) were constructed. The last method
is based on the WIP implementation of the Ziggurat algorithm,
which is a specialized algorithm for monotone decreasing distributions.

Method | Avg. time | Std.dev. time |
---|---|---|

boxMuller.naive | 850 ms | 82 ms |

boxMuller.dstats | 801 ms | 2 ms |

boxMuller.hap | 3618 ms | 4 ms |

boxMuller.atmos | 934 ms | 2 ms |

flexNormal.slow | 1406 ms | 4 ms |

flexNormal.medium | 1206 ms | 1 ms |

flexNormal.fast | 1124 ms | 0 ms |

ziggurat | 357 ms | 5 ms |

clang++ | 2216 ms | 6 ms |

g++ | 941 ms | 11 ms |

While one should look with care at performance benchmarks as statistical quality is usually a higher goal than speed, it still can be seen that (1) investing more time in constructing better approximated intervals is worth the investment if more values need to be drawn and (2) there are specialized algorithm for very common random distributions that of course should be preferred over generalized methods.

However, after all the goal of the Flex algorithm isn’t to create the fastest random method, but
a *general applicable* method that can be used to *create samplers for new distributions quickly*.

The source of this benchmark is directly available within the Mir library.

## Where to go from here

The Tinflex algorithm is available via the Mir library in `mir.random.flex`

.
Apart from the examples listed in this blog, more examples
are provided as part of Mir and can be conveniently executed with D’s package manager `dub`

.
If you want to dive deeper, you can also read the Tinflex paper,
browse the Mir implementation or see the supplementary material
for the Mir implementation in which the used equations will be explained.

## Literature references

- Devroye, Luc. “Sample-based non-uniform random variate generation.”
*Proceedings of the 18th conference on Winter simulation*. ACM, 1986. - Hörmann, Wolfgang. “A rejection technique for sampling from T-concave distributions.”
*ACM Transactions on Mathematical Software (TOMS)*21.2 (1995): 182-193. - Leydold, Josef, et al. “An automatic code generator for nonuniform random variate generation.”
*Mathematics and Computers in Simulation*62.3 (2003): 405-412. - Botts, Carsten, Wolfgang Hörmann, and Josef Leydold. “Transformed density rejection with inflection points.”
*Statistics and Computing*23.2 (2013): 251-260. - Hörmann, Wolfgang, Josef Leydold, and Gerhard Derflinger. Automatic nonuniform random variate generation.
*Springer Science & Business Media*, 2013.