Transformed density rejection sampling
In the first post a short introduction to non-uniform sampling
was given. This post explains the Tinflex algorithm, which is implemented in
mir.random.flex
.
Idea
The main idea of the Flex algorithm is to split the density distribution into multiple intervals and sample from these constructed intervals using the composition method. For each interval a hat and squeeze function can be constructed. Thus with the approximated hat and squeeze function of a selected interval a value can be sampled using the rejection with inversion method.
For example, with efficiency the normal distribution is split into six intervals with different hat and squeeze functions. The efficiency of the sampling can iteratively be improved by splitting the composition into more intervals and thus yielding a better approximation to the density function.
In the following we will quickly dive into the essential basics behind the Tinflex algorithm. In case your last math lesson was long ago, feel free to jump directly to the examples and come back when you want to understand the inner workings of the Tinflex algorithm in more details.
Tinflex transformations
Remember that the Tinflex algorithm splits the density distribution into intervals and constructs a hat and squeeze function for each interval. For the approximation of a hat or squeeze function, the Tinflex algorithm constructs linear functions for concave, convex intervals (or both if the interval contains only one inflection point). However as most distributions aren’t concave, but log-concave, Tinflex uses a family of transformations for the probability density function (pdf) as it operates in logspace and can thus “force” most distributions to be concave. The transformations can be grouped in two classes: (1) and (2) , which will subsequently be analyzed.
-transformations with
The general transformation function is , however as its inverse function is only defined in , its needs to be restricted to positive numbers only:
With the usual inversion rules and if we get:
which is and if we obtain:
which is .
Thus in general the inverse function is defined as:
Two examples of transformations can be seen below:
Please note that (1) in the Tinflex paper and Mir implementation this case is split into multiple common cases to obtain simpler formulas and thus reduce numerical errors. Furthermore, (2) only with the transformation is two-sided, thus for the inverse transformation in is only defined for .
-transformations with
The special case for needs to be made as division by zero isn’t defined, however for this case the natural logarithm and it’s inverse the exponential function can be used:
Input of Tinflex
Remember that is the pdf, and is its transformation is:
The Tinflex algorithm expects the log-density, which means that for , no transformation needs to be applied. However, for the inverse is needed. This mean we first need to apply the inverse and then apply the other transformation:
Linear functions
Construction of linear functions
For every interval with function (black), we define the three linear functions: its left tangent (orange), its right tangent (red), and its secant (blue):
As it can be seen in the graphs of the two major case categories (concave, convex) at least one linear function is majorizing (e.g. both tangents in the concave case), and at least one linear function is majorized by (e.g. the secant in the concave case). Botts et. al. (2011) distinguish between eight cases and prove that this observation of finding at least one linear hat and squeeze function is valid as long as there is at most one inflection point in the interval.
Definition of the linear functions
The following definition for linear functions is used:
Remember that the transformation is applied with and that the hat function majorizes the transformed pdf, whereas the transformed pdf majorizes the squeeze function.
Both the left and right tangent can be defined as follows, where is either or :
Moreover for the secant we can pick either or as root point, and thus only the definition of the slope is different:
With these definition we can integrate the linear function to calculate the area
below the hat and squeeze function. Furthermore, using the their integral the
inverse function for the cumulative density function can be calculated, which
is used to map a uniform point to a point which is proportional
to the density of . These details can be found in Tinflex paper
and in the Mir implementation. For mir.random.flex
additional
supplementary material is provided which explains the used integrations,
inversions and approximations.
In the following a more practical side of the Tinflex algorithm will be given by showing
two examples for the construction of a random distribution with the Tinflex method.
Example 1: Dagum distribution
Assume you would want to draw random values from a non-common random distribution like e.g. the Dagum distribution which has the following probability density function:
For simplicity of this example we fix , and to and thus obtain:
For the Tinflex algorithm the pdf function and its first and second derivative need to be given in log-space:
Its first and second derivative after are:
In case you haven’t used the differentiation syntax on Wolfram Alpha, you should familiarize yourself with it. For example, the correctness of the first derivate and second derivative can be quickly checked. Last but not least, the Tinflex algorithm requires that each interval can have at most one inflection point. Hence we need to check the plot of for inflection points. However as there none if , we can freely pick any interval as starting intervals and can thus construct the random distribution:
import std.math : log, pow;
alias S = double;
// log-transformed pdf + first two derivatives
auto f0 = (S x) => cast(S) log(1 / pow(x + 1, 2));
auto f1 = (S x) => -2 / (1 + x);
auto f2 = (S x) => 2 / pow(1 + x, 2);
// which c-transformation function to use
S c = 0.5;
// the range of the interval that should be sampled from
// if there are multiple inflection points, more points need to be given
S[] points = [0, 10, S.max];
// target efficiency of hatArea / squeezeArea
S rho = 1.1;
flex(f0, f1, f2, c, points, rho);
We can also have a look under the hood and inspect hat / squeeze plot of the Dagum distribution (hat: red, squeeze: green). For a clearer visibility only the range is shown:
As it can be seen has a huge impact on the resulting speed, but not on the statistical quality. Similarly the predefined intervals have only a slight impact on the algorithm itself as e.g. they limit the left and right start border. Picking a -transformation can influence the numerical errors, construction and sampling performance and even the number of intervals. Hence usually a pre-definded transformation like (the “pure” log transformation) or is used. However there are some restrictions, for example for unbounded domains is required, but also needs to be sufficiently small, so usually is used.
Example 2: Gompertz distribution
Assume you would want to sample from the Gompertz distribution which has the following probability density function:
For simplicity, we set and and thus get
And consequently in log-space:
Its first and second derivative after are:
Note that for log-density also no inflection point exists and thus we can define our sampling procedure:
import std.math : log, pow;
alias S = double;
// log-transformed pdf + first two derivatives
auto f0 = (S x) => cast(S) log( S(0.00753759) * exp(S(-0.005) * exp(S(1.5) * x) + S(1.5) * x));
auto f1 = (S x) => S(1.5) - S(0.0075) * exp(1.5 * x);
auto f2 = (S x) => S(-0.01125) * exp(S(1.5) * x);
// which c-transformation function to use
S c = 1.5;
// the range of the interval that should be sampled from
S[] points = [0, 6, S.max];
// target efficiency of hatArea / squeezeArea
S rho = 1.1;
flex(f0, f1, f2, c, points, rho);
We can also have a look under the hood and inspect hat / squeeze plot of the Gompertz distribution (hat: red, squeeze: green). For a clearer visibility only the range is shown:
Benchmarks
Constructing intervals
In the following the R Tinflex implementation and the Mir Flex implementation are compared. For the benchmark the time to construct the (Tin)flex intervals with efficiency for iterations has been measured:
n | Flex | Tinflex | |
---|---|---|---|
1.1 | 10K | 122 ms | 79.473s |
1.01 | 1K | 46 ms | 21.229s |
1.001 | 1K | 169 ms | 79.047s |
Due to the enormous time consumption of the R implementation, the number of iterations has been reduced from 10K to 1K for higher . With the Mir Flex implementation a performance boost for the construction phase of the factor 500-600 in comparison to the R Tinflex implementation can be observed.
Not only has the D implementation been engineered very carefully with performance in mind,
but it also uses several tricks like a BinaryHeap
to pick the best interval to split,
so that the number of constructed intervals is also lower:
Flex | Tinflex | |
---|---|---|
1.5 | 8 | 9 |
1.1 | 15 | 20 |
1.01 | 45 | 49 |
1.001 | 141 | 175 |
1.0001 | 452 | 496 |
1.00001 | 1361 | 1824 |
The source of this benchmark is available online.
Sampling
A benchmark between multiple existing normal distribution samplers of sampling
10M random values has been made.
All existing libraries (dstats,
hap,
atmosphere)
use the Box-Muller transformation
to sample random values. Moreover intervals with mir.random.flex
with three
different efficiencies () were constructed. The last method
is based on the WIP implementation of the Ziggurat algorithm,
which is a specialized algorithm for monotone decreasing distributions.
Method | Avg. time | Std.dev. time |
---|---|---|
boxMuller.naive | 850 ms | 82 ms |
boxMuller.dstats | 801 ms | 2 ms |
boxMuller.hap | 3618 ms | 4 ms |
boxMuller.atmos | 934 ms | 2 ms |
flexNormal.slow | 1406 ms | 4 ms |
flexNormal.medium | 1206 ms | 1 ms |
flexNormal.fast | 1124 ms | 0 ms |
ziggurat | 357 ms | 5 ms |
clang++ | 2216 ms | 6 ms |
g++ | 941 ms | 11 ms |
While one should look with care at performance benchmarks as statistical quality is usually a higher goal than speed, it still can be seen that (1) investing more time in constructing better approximated intervals is worth the investment if more values need to be drawn and (2) there are specialized algorithm for very common random distributions that of course should be preferred over generalized methods.
However, after all the goal of the Flex algorithm isn’t to create the fastest random method, but a general applicable method that can be used to create samplers for new distributions quickly.
The source of this benchmark is directly available within the Mir library.
Where to go from here
The Tinflex algorithm is available via the Mir library in mir.random.flex
.
Apart from the examples listed in this blog, more examples
are provided as part of Mir and can be conveniently executed with D’s package manager dub
.
If you want to dive deeper, you can also read the Tinflex paper,
browse the Mir implementation or see the supplementary material
for the Mir implementation in which the used equations will be explained.
Literature references
- Devroye, Luc. “Sample-based non-uniform random variate generation.” Proceedings of the 18th conference on Winter simulation. ACM, 1986.
- Hörmann, Wolfgang. “A rejection technique for sampling from T-concave distributions.” ACM Transactions on Mathematical Software (TOMS) 21.2 (1995): 182-193.
- Leydold, Josef, et al. “An automatic code generator for nonuniform random variate generation.” Mathematics and Computers in Simulation 62.3 (2003): 405-412.
- Botts, Carsten, Wolfgang Hörmann, and Josef Leydold. “Transformed density rejection with inflection points.” Statistics and Computing 23.2 (2013): 251-260.
- Hörmann, Wolfgang, Josef Leydold, and Gerhard Derflinger. Automatic nonuniform random variate generation. Springer Science & Business Media, 2013.