In the first post a short introduction to non-uniform sampling was given. This post explains the Tinflex algorithm, which is implemented in mir.random.flex.

Idea

The main idea of the Flex algorithm is to split the density distribution into multiple intervals and sample from these constructed intervals using the composition method. For each interval a hat and squeeze function can be constructed. Thus with the approximated hat and squeeze function of a selected interval a value xx can be sampled using the rejection with inversion method.

For example, with efficiency ρ=1.8\rho = 1.8 the normal distribution is split into six intervals with different hat and squeeze functions. The efficiency of the sampling can iteratively be improved by splitting the composition into more intervals and thus yielding a better approximation to the density function.

ρ=1.8\rho = 1.8, 6 intervals constructed
ρ=1.25\rho = 1.25, 10 intervals constructed
ρ=1.1\rho = 1.1, 14 intervals constructed
ρ=1.01\rho = 1.01, 44 intervals constructed

In the following we will quickly dive into the essential basics behind the Tinflex algorithm. In case your last math lesson was long ago, feel free to jump directly to the examples and come back when you want to understand the inner workings of the Tinflex algorithm in more details.

Tinflex transformations

Remember that the Tinflex algorithm splits the density distribution into intervals and constructs a hat and squeeze function for each interval. For the approximation of a hat or squeeze function, the Tinflex algorithm constructs linear functions for concave, convex intervals (or both if the interval contains only one inflection point). However as most distributions aren’t concave, but log-concave, Tinflex uses a family of transformations for the probability density function (pdf) as it operates in logspace and can thus “force” most distributions to be concave. The transformations can be grouped in two classes: (1) c0c \neq 0 and (2) c=0c = 0 , which will subsequently be analyzed.

cc-transformations with c0c \neq 0

The general transformation function is T(x)=xcT(x) = x^c, however as its inverse function is only defined in C\mathbb{C}, its needs to be restricted to positive numbers only:

Tc(x)=sgn(c)xc\displaystyle T_c(x) = sgn(c) \cdot x^c

With the usual inversion rules and if c>0c > 0 we get:

y=xclog(y)=log(xc)log(y)=clog(x)log(y)c=log(x)exp(log(y)c)=x\displaystyle \begin{aligned} y &= x^c \\ log(y) &= log(x^c) \\ log(y) &= c \cdot log(x) \\ \frac{log(y)}{c} &= log(x) \\ exp\left(\frac{log(y)}{c}\right) &= x \end{aligned}

which is y1/cy^{1 / c} and if c<0c < 0 we obtain:

y=xclog(y)=log(xc)log(y)=clog(x)log(y)c=log(x)exp(log(y)c)=x\displaystyle \begin{aligned} y &= -x^c \\ log(y) &= log(-x^c) \\ log(y) &= -c \cdot log(x) \\ \frac{log(y)}{c} &= -log(x) \\ exp\left(\frac{log(y)}{c}\right) &= -x \end{aligned}

which is (y)1/c(-y)^{1 / c}.

Thus in general the inverse function is defined as:

Tc1(x)=(sgn(c)x)1c\displaystyle T_c^{-1}(x) = (sgn(c) \cdot x)^{\frac{1}{c}}

Two examples of TcT_c transformations can be seen below:

x2x^2 (blue) and it’s inverse x1/2x^{1/2} (green)
x0.5-x^{-0.5} (blue) and it’s inverse (x)1/0.5(-x)^{1/-0.5} (green)

Please note that (1) in the Tinflex paper and Mir implementation this case is split into multiple common cases to obtain simpler formulas and thus reduce numerical errors. Furthermore, (2) only with c=0c = 0 the transformation is two-sided, thus for c<0c < 0 the inverse transformation T1(x)T^{-1}(x) in R\mathbb{R} is only defined for x<0x < 0.

cc-transformations with c=0c = 0

The special case for c=0c = 0 needs to be made as division by zero isn’t defined, however for this case the natural logarithm and it’s inverse the exponential function can be used:

Tc(x)=log(x)Tc1(x)=exp(x)\displaystyle T_c(x) = log(x) \quad T_c^{-1}(x) = exp(x)
Exponential function (green) and it’s inverse the natural log (blue)

Input of Tinflex

Remember that f(x)f(x) is the pdf, and is its transformation f~(x)\tilde{f}(x) is:

f~(x)=Tc(f(x))\displaystyle \tilde{f}(x) = T_c(f(x))

The Tinflex algorithm expects the log-density, which means that for c=0c = 0, no transformation needs to be applied. However, for c0c \neq 0 the inverse is needed. This mean we first need to apply the inverse Tc1(x)=exp(x)T_c^{-1}(x) = exp(x) and then apply the other TcT_c transformation:

f~(x)=Tc0(T01(x)))=Tc0(exp(x)))=sgn(c)exp(x)c=sgn(c)exp(xc)\displaystyle \begin{aligned} \tilde{f}(x) &= T_{c \neq 0}(T_0^{-1}(x))) \\ &= T_{c \neq 0}(exp(x))) \\ &= sgn(c) \cdot exp(x)^c \\ &= sgn(c) \cdot exp(x \cdot c) \end{aligned}

Linear functions

Construction of linear functions

For every interval with function f~\tilde{f} (black), we define the three linear functions: its left tangent (orange), its right tangent (red), and its secant (blue):

Concave interval
Convex interval

As it can be seen in the graphs of the two major case categories (concave, convex) at least one linear function is majorizing f~\tilde{f} (e.g. both tangents in the concave case), and at least one linear function is majorized by f~\tilde{f} (e.g. the secant in the concave case). Botts et. al. (2011) distinguish between eight cases and prove that this observation of finding at least one linear hat and squeeze function is valid as long as there is at most one inflection point in the interval.

Definition of the linear functions

The following definition for linear functions is used:

f~(x)=α+β(xx0)\displaystyle \tilde{f}(x) = \alpha + \beta \cdot (x - x_0)

Remember that the transformation is applied with Tc(x)T_c(x) and that the hat function majorizes the transformed pdf, whereas the transformed pdf majorizes the squeeze function.

Both the left and right tangent can be defined as follows, where x0x_0 is either ll or rr:

tanx0(x)=f~(x0)+f~(x0)(xx0)\displaystyle tan_{x_0}(x) = \tilde{f}(x_0) + \tilde{f}'(x_0) \cdot (x - x_0)

Moreover for the secant we can pick either \ell or rr as root point, and thus only the definition of the slope β\beta is different:

sec(x)=f~()+f~(r)f~()r(x)\displaystyle sec(x) = \tilde{f}(\ell) + \frac{\tilde{f}(r) - \tilde{f}(\ell)}{r - \ell} \cdot (x - \ell)

With these definition we can integrate the linear function to calculate the area below the hat and squeeze function. Furthermore, using the their integral the inverse function for the cumulative density function can be calculated, which is used to map a uniform point uu to a point xx which is proportional to the density of h(x)h(x). These details can be found in Tinflex paper and in the Mir implementation. For mir.random.flex additional supplementary material is provided which explains the used integrations, inversions and approximations. In the following a more practical side of the Tinflex algorithm will be given by showing two examples for the construction of a random distribution with the Tinflex method.

Example 1: Dagum distribution

Assume you would want to draw random values from a non-common random distribution like e.g. the Dagum distribution which has the following probability density function:

f(x)=apx((xb)ap((xa)a+1)p+1)\displaystyle f(x) = \frac{ap}{x} \left( \frac{ (\frac{x}{b})^{ap}}{\left((\frac{x}{a})^a + 1\right)^{p+1}} \right)

For simplicity of this example we fix aa, bb and pp to 11 and thus obtain:

f(x)=1(x+1)2\displaystyle f(x) = \frac{1}{(x + 1)^2}

For the Tinflex algorithm the pdf function and its first and second derivative need to be given in log-space:

f~(x)=log(1(x+1)2)\displaystyle \tilde{f}(x) = log \left( \frac{1}{(x + 1)^2} \right)

Its first and second derivative after xx are:

f~(x)=2x+1f~(x)=2(x+1)2\displaystyle \tilde{f}'(x) = - \frac{2}{x + 1} \quad \tilde{f}''(x) = \frac{2}{(x + 1)^2}

In case you haven’t used the differentiation syntax on Wolfram Alpha, you should familiarize yourself with it. For example, the correctness of the first derivate and second derivative can be quickly checked. Last but not least, the Tinflex algorithm requires that each interval can have at most one inflection point. Hence we need to check the plot of f~(x)\tilde{f}(x) for inflection points. However as there none if x>0x > 0, we can freely pick any interval as starting intervals and can thus construct the random distribution:

import std.math : log, pow;

alias S = double;

// log-transformed pdf + first two derivatives
auto f0 = (S x) => cast(S) log(1 / pow(x + 1, 2));
auto f1 = (S x) => -2 / (1 + x);
auto f2 = (S x) => 2 / pow(1 + x, 2);

// which c-transformation function to use
S c = 0.5;

// the range of the interval that should be sampled from
// if there are multiple inflection points, more points need to be given
S[] points = [0, 10, S.max];

// target efficiency of hatArea / squeezeArea
S rho = 1.1;

flex(f0, f1, f2, c, points, rho);
Consecutive histogram
Cumulative histogram

We can also have a look under the hood and inspect hat / squeeze plot of the Dagum distribution (hat: red, squeeze: green). For a clearer visibility only the range [0,3][0, 3] is shown:

ρ=1.5\rho = 1.5, 4 intervals
ρ=1.1\rho = 1.1, 8 intervals

As it can be seen ρ\rho has a huge impact on the resulting speed, but not on the statistical quality. Similarly the predefined intervals have only a slight impact on the algorithm itself as e.g. they limit the left and right start border. Picking a cc-transformation can influence the numerical errors, construction and sampling performance and even the number of intervals. Hence usually a pre-definded transformation like 00 (the “pure” log transformation) or 11 is used. However there are some restrictions, for example for unbounded domains c>1c > -1 is required, but cc also needs to be sufficiently small, so usually 0.5-0.5 is used.

Example 2: Gompertz distribution

Assume you would want to sample from the Gompertz distribution which has the following probability density function:

f(x)=bηebx eη exp(ηebx)\displaystyle f(x) = b \eta e^{bx} \ e^{\eta} \ exp(-\eta e^{bx})

For simplicity, we set η=0.005\eta = 0.005 and b=1.5b = 1.5 and thus get

f(x)=0.00753759e0.005e1.5x+1.5x\displaystyle f(x) = 0.00753759 e^{-0.005 e^{1.5 x} + 1.5 x}

And consequently in log-space:

f~(x)=log(0.00753759e0.005e1.5x+1.5x)\displaystyle \tilde{f}(x) = log \left( 0.00753759 e^{-0.005 e^{1.5 x} + 1.5 x} \right)

Its first and second derivative after xx are:

f~(x)=1.50.0075e1.5xf~(x)=0.01125e1.5x\displaystyle \tilde{f}'(x) = 1.5-0.0075 e^{1.5 x} \quad \tilde{f}''(x) = -0.01125 e^{1.5 x}

Note that for log-density also no inflection point exists and thus we can define our sampling procedure:

import std.math : log, pow;

alias S = double;

// log-transformed pdf + first two derivatives
auto f0 = (S x) => cast(S) log( S(0.00753759) * exp(S(-0.005) * exp(S(1.5) * x) + S(1.5) * x));
auto f1 = (S x) => S(1.5) - S(0.0075) * exp(1.5 * x);
auto f2 = (S x) => S(-0.01125) * exp(S(1.5) * x);

// which c-transformation function to use
S c = 1.5;

// the range of the interval that should be sampled from
S[] points = [0, 6, S.max];

// target efficiency of hatArea / squeezeArea
S rho = 1.1;

flex(f0, f1, f2, c, points, rho);
Consecutive histogram
Cumulative histogram

We can also have a look under the hood and inspect hat / squeeze plot of the Gompertz distribution (hat: red, squeeze: green). For a clearer visibility only the range [0,6][0, 6] is shown:

ρ=1.5\rho = 1.5, 8 intervals
ρ=1.1\rho = 1.1, 14 intervals

Benchmarks

Constructing intervals

In the following the R Tinflex implementation and the Mir Flex implementation are compared. For the benchmark the time to construct the (Tin)flex intervals with efficiency ρ\rho for nn iterations has been measured:

ρ\rho n Flex Tinflex
1.1 10K 122 ms 79.473s
1.01 1K 46 ms 21.229s
1.001 1K 169 ms 79.047s

Due to the enormous time consumption of the R implementation, the number of iterations has been reduced from 10K to 1K for higher ρ\rho. With the Mir Flex implementation a performance boost for the construction phase of the factor 500-600 in comparison to the R Tinflex implementation can be observed.

Not only has the D implementation been engineered very carefully with performance in mind, but it also uses several tricks like a BinaryHeap to pick the best interval to split, so that the number of constructed intervals is also lower:

ρ\rho Flex Tinflex
1.5 8 9
1.1 15 20
1.01 45 49
1.001 141 175
1.0001 452 496
1.00001 1361 1824

The source of this benchmark is available online.

Sampling

A benchmark between multiple existing normal distribution samplers of sampling 10M random values has been made. All existing libraries (dstats, hap, atmosphere) use the Box-Muller transformation to sample random values. Moreover intervals with mir.random.flex with three different efficiencies ρ\rho (1.3,1.1,1.0011.3, 1.1, 1.001) were constructed. The last method is based on the WIP implementation of the Ziggurat algorithm, which is a specialized algorithm for monotone decreasing distributions.

Method Avg. time Std.dev. time
boxMuller.naive 850 ms 82 ms
boxMuller.dstats 801 ms 2 ms
boxMuller.hap 3618 ms 4 ms
boxMuller.atmos 934 ms 2 ms
flexNormal.slow 1406 ms 4 ms
flexNormal.medium 1206 ms 1 ms
flexNormal.fast 1124 ms 0 ms
ziggurat 357 ms 5 ms
clang++ 2216 ms 6 ms
g++ 941 ms 11 ms

While one should look with care at performance benchmarks as statistical quality is usually a higher goal than speed, it still can be seen that (1) investing more time in constructing better approximated intervals is worth the investment if more values need to be drawn and (2) there are specialized algorithm for very common random distributions that of course should be preferred over generalized methods.

However, after all the goal of the Flex algorithm isn’t to create the fastest random method, but a general applicable method that can be used to create samplers for new distributions quickly.

The source of this benchmark is directly available within the Mir library.

Where to go from here

The Tinflex algorithm is available via the Mir library in mir.random.flex. Apart from the examples listed in this blog, more examples are provided as part of Mir and can be conveniently executed with D’s package manager dub. If you want to dive deeper, you can also read the Tinflex paper, browse the Mir implementation or see the supplementary material for the Mir implementation in which the used equations will be explained.

Literature references