6. Mathematical notes ¶

Here mathematical notes are kept.

Contents

6.1. Using LaTeX syntax for mathematics in RestructuredText and Sphinx ¶

Abstract:

This document is about using Latex in reStructuredText and Sphinx. At the link below there is a introduction to mathematical symbols in Latex.

ftp://ftp.ams.org/pub/tex/doc/amsmath/short-math-guide.pdf

6.1.1. Math/Latex in RestructuredText and Sphinx ¶

There are 2 ways of writing math in restructuredText:

Inline ``:math:`SomeLatex```

To insert relatively simple latex for mathematical expressions like $\psi(r)=\exp(-2r)$ in the text write :math:`\psi(r) = \exp(-2r)` Inside the back-tics (`) any Latex code can be written. There must be no space between :math: and the first back-tic.

Complex Latex like eg equation

For producing more complex math like eg an equation* environment in a LaTeX document write:

.. math::
    \psi(r) = e^{-2r}

you will get:

$\psi(r) = e^{-2r}$

Warning

The math markup can be used within reST files (to be parsed by Sphinx) but within your python’s docstring, the slashes need to be escaped ! :math:`\alpha` should therefore be written :math:`\\alpha` or put an “r” before the docstring

6.1.2. Examples ¶

Below are some examples of Latex commands and symbols:

To do square roots like eg $\sqrt{x^2-1}$ use :math:`\sqrt{x^2-1}`.

To do fractions like eg $\frac{1}{2}$ use :math:`\frac{1}{2}`.

In order to insert text into a formula use :math:`k_{\text{B}}T` to get like in $k_{\text{B}}T$

Use \left and \right for before brackets like in :math:`\left(\frac{1}{2}\right)^n` to get $\left(\frac{1}{2}\right)^n$ .

Displayed math can use \\ and & for line shifts and alignments, respectively.

.. math::
  a & = (x + y)^2 \\
    & = x^2 + 2xy + y^2

The result is:

$a & = (x + y)^2 \\ & = x^2 + 2xy + y^2$

The matrix environment can also contain \\ and &. To get like:

$\left(\begin{matrix} a & b \\ c & d \end{matrix}\right)$

write:

.. math::
  \left(\begin{matrix} a & b \\
    c & d \end{matrix}\right)

Equations are labeled with the label option and referred to using:

:eq:`label`

E.g:

(1) $a^2 + b^2 = c^2$

See equation (1)

is written like:

.. math::
   :label: pythag

    a^2 + b^2 = c^2

See equation :eq:`pythag`

Note that:

No spaces in label name

There should be a blank line before the actual latex code

There should be a space between :label: and the label name

When referred to label name should be surrounded by `

6.2. Splines at Scipy ¶

Abstract:

This chapter is about splines in Scipy and how they can be used for yield curve calculations.

It is shown that splines in scipy is probably b-splines. These splines gives different (but not nessacerily wrong) values when interpolating.

This information is important when using splines from scipy.

6.2.1. Natural cubic splines ¶

The natural cubic spline is already implemented in decimalpy

In order to redo the calculations below you need to install and import the package decimalpy.

>>> import decimalpy as dp

The example will be taken from [Kiusalaas] p. 119. The data are:

>>> x_data = dp.Vector([1, 2, 3, 4, 5])
>>> y_data = dp.Vector([0, 1, 0, 1, 0])

And the natural cubic spline function is instantiated by:

>>> f = dp.NaturalCubicSpline(x_data, y_data)

And the it is possible to do calculations like function values:

>>> print f(1.5), f(4.5)
0.7678571428571428571428571427 0.7678571428571428571428571427

Slopes:

>>> print f(1.5, 1), f(4.5, 1)
1.178571428571428571428571429 -1.178571428571428571428571428

and curvatures:

>>> print f(1.5, 2), f(4.5, 2)
-2.142857142857142857142857142 -2.142857142857142857142857143

and it is quite easy to do a plot:

>>> import matplotlib.pyplot as plt
>>> linspace = lambda start, stop, steps: dp.Vector([(stop - start) / (steps - 1) * i + start for i in range(steps)])
>>> xnew = linspace(1, 5, 40)
>>> x0, spread  = 2, 0.5
>>> x2new = linspace(x0 - spread, x0 + spread, 40)
>>> f0x0, f1x0, f2x0 = f(x0, 0), f(x0, 1), f(x0, 2)
>>> f0Lst = dp.Vector([f(x) for x in xnew])
>>> f1Lst = dp.Vector([f0x0 + f1x0 * (x - x0) for x in x2new])
>>> f2Lst = dp.Vector([f0x0 + f1x0 * (x - x0) + f2x0 * (x - x0)**2 / 2 for x in x2new])
>>> plt.plot(x_data, y_data, 'o', xnew, f0Lst, '-', x2new, f1Lst, '-', x2new, f2Lst, '-')   
>>> plt.legend(['data', 'fitted f', 'tangent', '2. order'], loc='best')   
>>> plt.show()

like:

../_images/naturalCubicSplineExample.png

6.2.2. Splines at Scipy ¶

On the other hand [scipy] is well equipped with spline functionality

First set up scipy and numpy:

>>> from numpy import linspace, array, float64
>>> from scipy.interpolate import InterpolatedUnivariateSpline as spline

Use the same data as before:

>>> x_data = array([1, 2, 3, 4, 5], float64)
>>> y_data = array([0, 1, 0, 1, 0], float64)

Creating the cubic spline function:

>>> cubic = spline(x_data, y_data, k=3)

And calculate the cubic spline at 1.5 and 4.5:

>>> cubic(1.5)
array(1.1250000000000002)
>>> cubic(4.5)
array(1.1250000000000002)

This isn’t the same value as before (0.767857142857).

And if one uses interp1d from scipy.interpolate to get the function cubic2:

>>> from numpy import linspace, array, float64
>>> from scipy.interpolate import interp1d
>>> cubic2 = interp1d(x_data, y_data, kind='cubic')

Using cubic2 one gets:

>>> cubic2(1.5)
array(1.1944444444444462)
>>> cubic2(4.5)
array(1.1944444444444466)

Again a new set of values. What is a bit surprising is that a third set of values appears.

So to check the functions in a graph I do:

>>> xnew = linspace(1, 5, 40)
>>> import matplotlib.pyplot as plt
>>> plt.plot(x_data, y_data, 'o', xnew, f0Lst, '-', xnew, cubic(xnew), '--', xnew, cubic2(xnew), '+-')   
>>> plt.legend(['data', 'natural spline', 'cubic', 'cubic2'], loc='best')   
>>> plt.draw()

to get the graph:

../_images/compareNaturalSplineAndScipySpline.png

And now it is obvious that scipy isn’t using the natural spline as cubic spline.

What kind of spline is used in scipy? Their code is build upon FITPACK by P. Dierckx.

The main reference is [Dierckx] which is all about b-splines. So a fair guess is that scipy also is b-splines.

This isn’t necessarily bad. Actually I haven’t looked into the differences between the 2 scipy interpolations.

Conclusion:

It matters what kind of interpolation one uses.

The difference in values is probably due to different derivations of (cubic) splines where scipy interpolation is based on b-splines.

6.3. Derivation of Cubic Spline ¶

Abstract:

This chapter is about derivation of cubic splines and the use of cubic splines for yield curve calculation.

6.3.1. Deriving the cubic spline function ¶

Assume n points $((x_{i},y_{i}))_{i=1}^{n}$ and n curvatures $(k_{i})_{i=1}^{n}$ at each point.

To ease notation we introduce:

$\triangle x_{i}=x_{i+1}-x_{i}$

The point are known, the curvatures are assumed known

$f_{i}''(x) & = \frac{x-x_{i}}{\triangle x_{i}}\cdot k_{i+1}+\frac{x_{i+1}-x}{\triangle x_{i}}\cdot k_{i}\\ & = \frac{(x-x_{i})\cdot k_{i+1}+(x_{i+1}-x)\cdot k_{i}}{\triangle x_{i}}$

Because we want to evaluate f,f’,f’’ at $x_{i}$ it makes things easier when they are formulated as functions of $(x-x_{i})$ and $(x-x_{i+1})$ . Eg it is quite simple to see that:

$f_{i-1}''(x_{i})=f_{i}''(x_{i})=k_{i}$

So the second derivative is continous and piecewise linear.

Also the first derivative f’ can be formulated as (since $\int(a\cdot x-b)\partial x=\frac{1}{2a}\cdot(a\cdot x-b)^{2}+c)$ :

$f_{i}'(x)=\frac{(x-x_{i})^{2}\cdot k_{i+1}-(x_{i+1}-x)^{2}\cdot k_{i}}{2\cdot\triangle x_{i}}+c_{i}$

The function $f_{i}$ which lies $x_{i}$ and $x_{i+1}$ for $i=1,\ldots,n-1$ must look like (integrating twice):

$f_{i}(x) & = \frac{(x-x_{i})^{3}\cdot k_{i+1}+(x_{i+1}-x)^{3}\cdot k_{i}}{6\cdot\triangle x_{i}}+c_{i}\cdot x+d_{i}\\ & = \frac{(x-x_{i})^{3}\cdot k_{i+1}+(x_{i+1}-x)^{3}\cdot k_{i}}{6\cdot\triangle x_{i}}+a_{i}\cdot(x_{i+1}-x)+b_{i}\cdot(x-x_{i})$

where $c_{i}=-a_{i}+b_{i}$ and $d_{i}=a_{i}\cdot x_{i+1}-b_{i}\cdot x_{i}$ or

$\left(\begin{array}{c}\begin{array}{c}c_{i}\end{array}\\d_{i}\end{array}\right)=\left(\begin{array}{cc}-1 & 1\\x_{i+1} & -x_{i}\end{array}\right)\cdot\left(\begin{array}{c}a_{i}\\b_{i}\end{array}\right)$

This is possible if the determinant $x_{i}-x_{i+1}\neq0$ . This is the case if there isn’t 2 points with the same x-value. An asumption which already must be in place.

And since f must be continous and go through the points $((x_{i},y_{i}))_{i=1}^{n}$ we have:

$f_{i}(x_{i}) & = y_{i}\\ \Updownarrow\\ \frac{(x_{i}-x_{i})^{3}\cdot k_{i+1}+\triangle x_{i}^{3}\cdot k_{i}}{6\cdot\triangle x_{i}}+a_{i}\cdot\triangle x_{i}+b_{i}\cdot(x_{i}-x_{i}) & = y_{i}\\ \Updownarrow\\ \frac{\triangle x_{i}\cdot k_{i}}{6}+a_{i} & = \frac{y_{i}}{\triangle x_{i}}\\ \Updownarrow\\ \frac{y_{i}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot k_{i}}{6} & = a_{i}$

and

$f_{i}(x_{i+1}) & = y_{i+1}\\ \Updownarrow\\ \frac{\triangle x_{i}^{3}\cdot k_{i+1}+(x_{i+1}-x_{i+1})^{3}\cdot k_{i}}{6\cdot\triangle x_{i}}+a_{i}\cdot(x_{i+1}-x_{i+1})+b_{i}\cdot\triangle x_{i} & = y_{i+1}\\ \Updownarrow\\ \frac{y_{i+1}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot k_{i+1}}{6} & = b_{i}$

This means that the constant $c_{i}$ :

$c_{i} & = -a_{i}+b_{i}\\ & = -(\frac{y_{i}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot k_{i}}{6})+\frac{y_{i+1}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot k_{i+1}}{6}\\ & = \frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot\triangle k_{i}}{6}$

6.3.2. Ekstrapolation ¶

The question is what to do when there is a need for extrapolation. This is a situation quite common when talking about yieldcurve. Eg there are observed zero coupons up until year 20 but a zero coupon for year 25 is needed.

Talking about the natural cubic spline and financial cubic spline they are just linear extrapolations of the endpoints with the same slope as in the endpoints.

The only difference between the 2 cubic splines is that financial cubic spline is set to have a slope equal to zero at the endpoint to the right.

6.3.3. Estimation of the curvatures ¶

To estimate the curvatures one uses the first order derivatives:

$f_{i}'(x)=\frac{(x-x_{i})^{2}\cdot k_{i+1}-(x_{i+1}-x)^{2}\cdot k_{i}}{2\cdot\triangle x_{i}}+c_{i}$

And the assumption that the slopes are continous at the points $((x_{i},y_{i}))_{i=1}^{n}$ , ie

$f_{i-1}'(x_{i}) & = f_{i}'(x_{i})\;,i=2,\ldots,n-1\\ \Updownarrow\\ \frac{\triangle x_{i-1}^{2}\cdot k_{i}-(x_{i}-x_{i})^{2}\cdot k_{i-1}}{2\cdot\triangle x_{i-1}}+c_{i-1} & = \frac{(x_{i}-x_{i})^{2}\cdot k_{i+1}-\triangle x_{i}^{2}\cdot k_{i}}{2\cdot\triangle x_{i}}+c_{i}\\ \Updownarrow\\ \frac{\triangle x_{i-1}\cdot k_{i}}{2}+\frac{\triangle y_{i-1}}{\triangle x_{i-1}}-\frac{\triangle x_{i-1}\cdot\triangle k_{i-1}}{6} & = \frac{-\triangle x_{i}\cdot k_{i}}{2}+\frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot\triangle k_{i}}{6}\\ \Updownarrow\\ \frac{\triangle x_{i-1}\cdot k_{i}}{3}+\frac{\triangle y_{i-1}}{\triangle x_{i}}+\frac{\triangle x_{i-1}\cdot k_{i-1}}{6} & = \frac{-\triangle x_{i}\cdot k_{i}}{3}+\frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle x_{i}\cdot k_{i+1}}{6}\\ \Updownarrow\\ \frac{\triangle x_{i-1}\cdot k_{i-1}}{6}+\frac{(\triangle x_{i}+\triangle x_{i-1})\cdot k_{i}}{3}+\frac{\triangle x_{i}\cdot k_{i+1}}{6} & = \frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle y_{i-1}}{\triangle x_{i-1}}\\ \Updownarrow\\ \triangle x_{i-1}\cdot k_{i-1}+2\cdot(\triangle x_{i}+\triangle x_{i-1})\cdot k_{i}+\triangle x_{i}\cdot k_{i+1} & = 6\cdot(\frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle y_{i-1}}{\triangle x_{i-1}})\;,i=2,\ldots,n-1$

This gives n - 2 linear equations to estimate n curvatures. This means that 2 asumptions are necessary in order to calculate the n curvtures.

6.3.3.1. The Natural Cubic Spline ¶

Here the asumptions are $k_{1} = k_{n} = 0$ . The other equations becomes $i=2,\ldots,n-1$ :

$\triangle x_{i-1}\cdot k_{i-1}+2\cdot(\triangle x_{i}+\triangle x_{i-1})\cdot k_{i}+\triangle x_{i}\cdot k_{i+1} & = 6\cdot(\frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle y_{i-1}}{\triangle x_{i-1}})$

In all n variables $(k_{i})_{i=1}^n$ and n linear equations

6.3.3.2. The Financial Cubic Spline ¶

Here the asumptions are $k_{1} = 0$ and $f_{n-1}'(x_{n}) = 0$ or:

$0 & = \frac{\triangle x_{n-1}^{2}\cdot k_{n}-(x_{n}-x_{n})^{2}\cdot k_{n-1}}{2\cdot\triangle x_{n-1}}+c_{n-1}\\ & = \frac{\triangle x_{n-1}\cdot k_{n}}{2}+\frac{\triangle y_{n-1}}{\triangle x_{n-1}}-\frac{\triangle x_{n-1}\cdot (k_{n} - k_{n-1})}{6}\\ \Updownarrow\\ -6\cdot \frac{\triangle y_{n-1}}{\triangle x_{n-1}} & = \triangle x_{n-1}\cdot k_{n-1} + 2\cdot \triangle x_{n-1}\cdot k_{n}$

The other equations becomes $i=2,\ldots,n-1$ :

$\triangle x_{i-1}\cdot k_{i-1}+2\cdot(\triangle x_{i}+\triangle x_{i-1})\cdot k_{i}+\triangle x_{i}\cdot k_{i+1} & = 6\cdot(\frac{\triangle y_{i}}{\triangle x_{i}}-\frac{\triangle y_{i-1}}{\triangle x_{i-1}})$

Also here there are in all n variables $(k_{i})_{i=1}^n$ and n linear equations

6.4. Using Barycentric Lagrange Interpolation in Finance ¶

Abstract:

Lagrange Interpolation has had a revelation since the Barycentric formulas were developed.

Here Barycentric Lagrange Interpolation will be examined as a tool for finding weights to approximate the first and second order derivative for a function only known by it’s functional values.

The first order is used for the duration in finance while the second order derivative is used for the convexity.

This section is highly inspired on [BerrutTrefethen], [SadiqViswanath] and [Fornberg].

Also as references [Kiusalaas] and [Ralston] has been invaluable.

6.4.1. Lagrange Interpolation ¶

First construct a polynomial with n + 1 different grid values $(x_j)_{j=0 \dotsc n}$ :

$l(x) = \prod_{j=0 \dotsc n}(x - x_j)$

This polynomial has degree n + 1. And it is obvious then that $p(x_j) = 0 \ \forall j$ .

Consider then the tangent for p in grid value $x_j$ , ie.:

$y & = l'(x_j) \cdot (x - x_j) + l(x_j) \\ & = l'(x_j) \cdot (x - x_j)$

Now define n + 1 polynomials as:

$l_j(x) = \frac{l(x)}{l'(x_j) \cdot (x - x_j)}$

These polynomials all satisfy:

$l_j(x_i) & = \begin{cases} 1 \ if \ i = j \\ 0 \ if \ i \neq j \end{cases} \\ & = \delta_{ij}$

In other words $l_j(x_i)$ is Kroenecker’s delta. Note that the $l_j$ ‘s all have degree n.

Since $l(x) = (x - x_j) \cdot \prod_{i \neq j} (x - x_i)$ we have:

$l'(x) = \prod_{i \neq j} (x - x_i) + (x - x_j) \cdot ( \prod_{i \neq j} (x - x_i) )'$

And then when letting the grid values be the argument of l(x) (last part above is zero since $(x_j - x_j)$ is):

$l'(x_j) & = \prod_{i \neq j} (x_j - x_i) \\ & \neq 0$

we have that the polynomials $l_j(x)$ are well defined. This is because $(x - x_j)$ can be shortened out and then the polynomials $l_j(x)$ are just scaled with a nonzero scaling constants.

Definition, Lagrange Interpolation and Remainder:

Consider n + 1 points $(x_j, f(x_j))_{j=0 \dotsc n}$ . Then the unique polynomial of the minimal degree n going through all n + 1 points is:

$p(x) = \sum_{j=0 \dotsc n} l_j(x) \cdot f(x_j)$

Define $R(x) = f(x) - p(x)$ as the Lagrange remainder.

Since $l_j(x_i)$ is Kroenecker’s delta it is obvious that $l(x_j) = l_j(x_j) \cdot f(x_j) = f(x_j)$ .

Theorem, Lagrange Remainder:

Let $]a, b[$ being an open interval containing all the n+1 grid values $(x_j)_{j=0 \dotsc n}$ and f is a n+1 times continous differentiable function on $]a, b[$ :

$\forall x \in ]a, b[ \quad \exists \zeta \in ]a, b[: f(x) &= \sum_{j=0 \dotsc n} l_j(x) \cdot f(x_j) + \frac{l(x) \cdot f^{(n+1)}(\zeta)}{(n+1)!} \\ &= p(x) + R(x)$

Proof:

For $x \in ]a,b[$ :

$R(x) &= f(x) - \sum_{j=0 \dotsc n} l_j(x) \cdot f(x_j) \\ &= f(x) - p(x)$

Observe that $f(x_j) = p(x_j) \forall j=0 \dotsc n$ i.e. $R(x)$ has n+1 zeros.

Also observe that since p(x) has degree n $R^{(n+1)}(x) = f^{(n+1)}(x)$ and $p^{(n)}(x) = (n+1)!$ .

Now define:

$g(t) = R(t) - \frac{l(t) \cdot R(x)}{l(x)}$

Observe that $g(t)$ has both $(x_j)_{j=0 \dotsc n}$ and x as roots, ie n+2 zeros.

According to Rolle’s theorem $g^{(1)}(t)$ has n+1 zeros and successively $g^{(2)}(t)$ has n zeros.

So $g^{(n+1)}(t)$ has 1 zero, ie $\exists \zeta :g^{(n+1)}(\zeta) = 0$ or

$0 &= g^{(n+1)}(\zeta)\\ 0 &= R^{(n+1)}(\zeta) - \frac{(n+1)! \cdot R(x)}{l(x)} \\ 0 &= f^{(n+1)}(\zeta) - \frac{(n+1)! \cdot R(x)}{l(x)} \\ & \Updownarrow \\ R(x) &= \frac{l(x) \cdot f^{(n+1)}(\zeta)}{(n+1)!}$

(see also)

Q.E.D.

The Lagrange Remainder can be used to limit the overall accuracy for the Lagrange Interpolation if the function f(x) is n+1 times continous differentiable on the closed interval $[a,b]$ .

This is because $\exists M : |f^{(n+1)}(x)| \le M \forall x \in [a,b]$ , ie:

$|f(x) - p(x)| = |R(x)| &\le \frac{l(x) \cdot M}{(n+1)!} \\ &\le \frac{(b-a)^{n+1} \cdot M}{(n+1)!}$

So if $(b-a)$ converges to zero then the accuracy becomes of order n+1 ie $\mathcal{O}((b-a)^{n+1})$ .

Observation:

The Lagrange Interpolation can for a set of n+1 grid values $(x_j)_{j=0 \dotsc n}$ can be seen as a way of interpolating a function value f(x) based on a weighted average of the functional values $(f(x_j))_{j=0 \dotsc n}$ where the weights are $(l_j(x))_{j=0 \dotsc n}$ and the accuracy will be of order $\mathcal{O}((b-a)^{n+1})$ .

This can be taken even further. Since only the weights depends on x the first order differentiation of $f(x),\ f^{(1)}(x)$ can be approximated by:

$p^{(1)}(x) = \sum_{j=0 \dotsc n} l_j^{(1)}(x) \cdot f(x_j)$

So the first order derivative $f^{(1)}(x)$ at x can be approximated by the functional values $(f(x_j))_{j=0 \dotsc n}$ and the weights are $(l_j^{(1)}(x))_{j=0 \dotsc n}$ and the accuracy will be of order $\mathcal{O}((b-a)^{n})$ , ie the accuracy loses 1 degree.

And successively for the second order derivative:

$p^{(2)}(x) = \sum_{j=0 \dotsc n} l_j^{(2)}(x) \cdot f(x_j) + \mathcal{O}((b-a)^{n-1})$

This makes Lagrange Interpolation well suited for finding weighted averages of functional values $(f(x_j))_{j=0 \dotsc n}$ for derivatives of any order with high accuracy, $\mathcal{O}((b-a)^{number\ of\ points - degree\ of\ differentiation})$ .

6.4.2. Barycentric Lagrange Interpolation ¶

Computationally Lagrange Interpolation isn’t that effective. But further analysis shows that the $p(x_j)$ ‘s all have the same common factor $\prod_{j=0 \dotsc n}(x - x_j)$ so we have the First Barycentric formula:

(2) $p(x) &= \sum_{j=0 \dotsc n} l_j(x) \cdot f(x_j) \\ &= \prod_{j=0 \dotsc n}(x - x_j) \cdot \sum_{j=0 \dotsc n} \frac{w_j}{x - x_j} \cdot f(x_j) \\ &= l(x) \cdot \sum_{j=0 \dotsc n} \frac{w_j}{x - x_j} \cdot f(x_j)$

where

$w_j &= \frac{1}{l'(x_j)} \\ &= \frac{1}{\prod_{i \neq j} (x_j - x_i)}$

Looking at the constant function 1 gives:

$1 = l(x) \cdot \sum_{j=0 \dotsc n} \frac{w_j}{x - x_j}$

or

$l(x) = \frac{1}{\sum_{j=0 \dotsc n} \frac{w_j}{x - x_j}}$

This way we have the Second Barycentric formula:

(3) $p(x) = \frac{\sum_{j=0 \dotsc n} \frac{w_j}{x - x_j} \cdot f(x_j)}{\sum_{j=0 \dotsc n} \frac{w_j}{x - x_j}}$

Both formulas shows a remarkable low dependence on the $f(x_j)$ ‘s since a change in one $f(x_j)$ ‘s will have a multiplicative effect only one place.

6.4.2.1. Algorithms ¶

Interpolation should be done based on the Second Barycentric formula (3).

For updating the weights with an extra point consider a set of basis points $(x_j, f(x_j))_{j=0 \dotsc n}$ to be used as a basis for Barycentric Lagrange interpolation.

When a new grid value $x_{n+1}$ has to be added the algorithm is:

Calculate $w_j := \frac{w_j}{x_j - x_{n+1}}\forall j$

Calculate $w_{n+1} := \frac{1}{\prod_{j=0 \dotsc n+1}(x_{n+1} - x_j)}$

It is obvious that there needs to be minimum 2 values. When there are only 2 grid values $x_1$ and $x_2$ then $w_1 = -w_2$ .

6.4.2.2. Numerical differentiation at grid points ¶

One of the things to use Barycentric Lagrange interpolation is to evaluate is to differentiate a curve at a point when only a set points are known with a high precision.

Usual formulas are eg, $\frac{f(x + h) - f(x - h)}{2 \cdot h}$ which is considered as a 3 point estimation based on the points with first coordinate (x - h, x, x + h).

The precision or accuracy is here of order 2.

Now consider:

$f(x) & = \sum_{j=0 \dotsc n} l_j(x) \cdot f(x_j) \\ & \Updownarrow \\ f^{(1)}(x) & = \sum_{j=0 \dotsc n} l^{(1)}_j(x) \cdot f(x_j)$

Now according to Second Barycentric formula (3):

$l_j(x) &= \frac{\frac{w_j}{x - x_j}}{\sum_{k=0 \dotsc n} \frac{w_k}{x - x_k}} \\ & \Updownarrow \\ l_j(x) \cdot \sum_{k=0 \dotsc n} \frac{w_k}{x - x_k} &= \frac{w_j}{x - x_j}$

Letting $s_i(x) = \sum_{k=0 \dotsc n} \frac{w_k \cdot (x - x_i)}{x - x_k}$ where $x_i$ is one of the roots $(x_k)_{k=0 \dotsc n}$ (Important assumption) we have:

$\begin{eqnarray} l_j(x) \cdot s_i(x) &=& \frac{w_j \cdot (x - x_i)}{x - x_j} \nonumber \\ & \Downarrow \nonumber \\ l_j^{(1)}(x) \cdot s_i(x) + l_j(x) \cdot s_i^{(1)}(x) &=& \frac{(x - x_j) - (x - x_i)}{(x - x_j)^2} \cdot w_j \nonumber \\ & \Updownarrow \nonumber \\ l_j^{(1)}(x) \cdot s_i(x) + l_j(x) \cdot s_i^{(1)}(x) &=& \frac{x_i - x_j}{(x - x_j)^2} \cdot w_j \nonumber \end{eqnarray}$

The last equation will be used several times so it gets a number:

(4) $l_j^{(1)}(x) \cdot s_i(x) + l_j(x) \cdot s_i^{(1)}(x) = \frac{x_i - x_j}{(x - x_j)^2} \cdot w_j$

Since by design $s_i(x_i) = w_i$ and $l_j(x_i) = \delta_{ij}$ we have for $i \neq j$ in (4):

$l_j^{(1)}(x_ i) \cdot s_i(x_i) + l_j(x_i) \cdot s_i^{(1)}(x_i) &= \frac{1}{x_i - x_j} \cdot w_j \\ & \Updownarrow \\ l_j^{(1)}(x_i) \cdot w_i &= \frac{1}{x_i - x_j} \cdot w_j \\ & \Updownarrow \\ l_j^{(1)}(x_i) &= \frac{w_j}{(x_i - x_j) \cdot w_i}$

Now for i = j we have:

$1 &= \sum_{k=0 \dotsc n} l_k(x) \\ & \Downarrow \\ 0 &= \sum_{k=0 \dotsc n} l^{(1)}_k(x) \\ & \Downarrow \\ 0 &= \sum_{k=0 \dotsc n} l^{(1)}_k(x_j) \\ & \Updownarrow \\ l^{(1)}_j(x_j) &= - \sum_{k \neq j} l^{(1)}_k(x_j)$

This way we have:

Theorem, Barycentric first order derivative:

Consider a set of grid values $(x_i)_{i=0 \dotsc n}$ and the function values $(f(x_i))_{i=0 \dotsc n} = (f_i)_{i=0 \dotsc n}$ . The formula for the first order derivative at one of the grid values $x_i$ , $f^{(1)}(x_i)$ is:

$f^{(1)}(x_i) = \sum_{j=0 \dotsc n} l^{(1)}_j(x_i) \cdot f(x_j)$

where

$l_j^{(1)}(x_i) &= \frac{w_j}{(x_i - x_j) \cdot w_i} \ for \ i \neq j \\ l^{(1)}_i(x_i) &= - \sum_{k \neq i} l^{(1)}_k(x_i) \ for \ i = j$

Example 1:

The formula for interpolating the first order derivative can used to optimize the approximation of the derivative. Consider eg $(x_i)_{i = 0 \dotsc 1}$ . Then $(w_i)_{i = 0 \dotsc 1} = (\frac{1}{x_1 - x_0}, \frac{-1}{x_1 - x_0})$ and:

$l^{(1)}_j(x_i) &= \left( \begin{matrix} \frac{1}{x_0 - x_1} & \frac{-1}{x_0 - x_1} \\ \frac{1}{x_0 - x_1} & \frac{-1}{x_0 - x_1} \end{matrix} \right) \\ &= \frac{1}{x_0 - x_1} \cdot \left( \begin{matrix} 1 & -1 \\ 1 & -1 \end{matrix} \right)$

where i is the row index while j is the column index.

And then with:

$\left( \begin{matrix} f^{(1)}(x_0) \\ f^{(1)}(x_1) \end{matrix} \right) = l^{(1)}_j(x_i) \cdot \left( \begin{matrix} f(x_0) \\ f(x_1) \end{matrix} \right)$

The result is:

$f^{(1)}(x_0) = f^{(1)}(x_1) = \frac{f(x_0) - f(x_1)}{x_0 - x_1} + \mathcal{O}(|x_0 -x_1|)$

And this is the formula for Newton’s difference quotient.

Example 2:

Consider eg $(x_i)_{i = 0 \dotsc 2} = (x - h, x, x + h)$ . Then $(w_i)_{i = 0 \dotsc 2} = (\frac{1}{2h^2}, \frac{1}{-h^2}, \frac{1}{2h^2})$ and:

$l^{(1)}_j(x_i) &= \left( \begin{matrix} - \sum_{k \neq (x - h)} l^{(1)}_k(x - h) & \frac{2h^2}{-h^2 \cdot ((x - h) - x)} & \frac{2h^2}{2h^2 \cdot ((x - h) -(x + h))} \\ \frac{-h^2}{2h^2 \cdot (x - (x - h))} & - \sum_{k \neq x} l^{(1)}_k(x) & \frac{-h^2}{2h^2 \cdot (x - (x + h))} \\ \frac{2h^2}{2h^2 \cdot ((x + h) - (x - h))} & \frac{2h^2}{-h^2 \cdot ((x + h) - x)} & - \sum_{k \neq (x + h)} l^{(1)}_k(x + h) \end{matrix} \right) \\ &= \frac{1}{h} \cdot \left( \begin{matrix} -1 \frac{1}{2} & 2 & \frac{-1}{2} \\ \frac{-1}{2} & 0 & \frac{1}{2} \\ \frac{1}{2} & -2 & 1 \frac{1}{2} \end{matrix} \right)$

These coefficients are to formula 25.3.4 on page 883 in [AbramowitzStegun].

They are all of accuracy $\mathcal{O}(h^{2})$ .

Looking at the middle row the formula for Newton’s difference quotient is found once again:

$f^{(1)}(x) &= \frac{\frac{-1}{2} \cdot f(x - h) + \frac{1}{2} \cdot f(x + h)}{h} \\ &= \frac{f(x + h) - f(x - h)}{(x + h) - (x - h)} \\ &= \frac{f(x + h) - f(x - h)}{2 \cdot h}$

Differentiating (4) leads to:

(5) $l_j^{(2)}(x) \cdot s_i(x) + 2 \cdot l_j^{(1)}(x) \cdot s_i^{(1)}(x) + l_j(x) \cdot s_i^{(2)}(x) = -2 \cdot \frac{x_i - x_j}{(x - x_j)^3} \cdot w_j$

Again by design $s_i(x_i) = w_i$ and $l_j(x_i) = \delta_{ij}$ we have for $i \neq j$ in (5):

$l_j^{(2)}(x_i) \cdot s_i(x_i) + 2 \cdot l_j^{(1)}(x_i) \cdot s_i^{(1)}(x_i) + l_j(x_i) \cdot s_i^{(2)}(x_i) &= \frac{-2 \cdot w_j}{(x_i - x_j)^2} \\ \Updownarrow \\ l_j^{(2)}(x_i) \cdot w_i + 2 \cdot l_j^{(1)}(x_i) \cdot s_i^{(1)}(x_i) &= \frac{-2 \cdot w_j}{(x_i - x_j)^2}$

From Barycentric first order derivative Barycentric first order derivative we have $l_j^{(1)}(x_i) = \frac{w_j}{(x_i - x_j) \cdot w_i} \ for \ i \neq j$ so:

$l_j^{(2)}(x_i) \cdot w_i + 2 \cdot \frac{w_j}{(x_i - x_j) \cdot w_i} \cdot s_i^{(1)}(x_i) = \frac{-2 \cdot w_j}{(x_i - x_j)^2}$

Finally since $s_i^{(1)}(x) = \sum_{k \neq i} w_k \cdot \frac{(x - x_k) - (x - x_i)}{(x - x_k)^2} = \sum_{k \neq i} w_k \cdot \frac{(x_i - x_k)}{(x - x_k)^2}$ :

$l_j^{(2)}(x_i) \cdot w_i + 2 \cdot \frac{w_j}{(x_i - x_j) \cdot w_i} \cdot \sum_{k \neq i} \frac{w_k}{x - x_k} &= \frac{-2 \cdot w_j}{(x_i - x_j)^2} \\ & \Updownarrow \\ l_j^{(2)}(x_i) \cdot w_i &= \frac{-2 \cdot w_j}{(x_i - x_j)^2} - 2 \cdot \frac{w_j}{(x_i - x_j) \cdot w_i} \cdot \sum_{k \neq i} \frac{w_k}{x_i - x_k} \\ & \Updownarrow \\ l_j^{(2)}(x_i) &= \frac{ - 2 \cdot w_j}{(x_i - x_j) \cdot w_i} \cdot \left(\frac{1}{x_i - x_j} + \sum_{k \neq i} \frac{w_k}{(x_i - x_k) \cdot w_i} \right) \\ & \Updownarrow \\ l_j^{(2)}(x_i) &= \frac{ - 2 \cdot w_j}{(x_i - x_j) \cdot w_i} \cdot \left(\frac{1}{x_i - x_j} + l_i^{(1)}(x_i) \right)$

Now for i = j we have something similar as for the first order case:

$1 &= \sum_{k=0 \dotsc n} l_k(x) \\ & \Downarrow \\ 0 &= \sum_{k=0 \dotsc n} l^{(2)}_k(x_j) \\ & \Updownarrow \\ l^{(2)}_j(x_j) &= - \sum_{k \neq j} l^{(2)}_k(x_j)$

This way we have:

Theorem, Barycentric second order derivative:

Consider a set of grid values $(x_i)_{i=0 \dotsc n}$ and the function values $(f(x_i))_{i=0 \dotsc n} = (f_i)_{i=0 \dotsc n}$ . The formula for the second order derivative at one of the grid values $x_i$ , $f^{(1)}(x_i)$ is:

$f^{(2)}(x_i) = \sum_{j=0 \dotsc n} l^{(2)}_j(x_i) \cdot f(x_j)$

where

$l_j^{(2)}(x_i) &= \frac{ - 2 \cdot w_j}{(x_i - x_j) \cdot w_i} \cdot \left(\frac{1}{x_i - x_j} + l_i^{(1)}(x_i) \right) \ for \ i \neq j \\ l^{(2)}_i(x_i) &= - \sum_{k \neq i} l^{(2)}_k(x_i) \ for \ i = j$

Note

There is a sign error in [BerrutTrefethen] page 513

Example 2, second order:

Consider eg $(x_i)_{i = 0 \dotsc 2} = (x - h, x, x + h)$ . Then $(w_i)_{i = 0 \dotsc 2} = (\frac{1}{2h^2}, \frac{1}{-h^2}, \frac{1}{2h^2})$ and $l_x^{(1)}(x) = 0$ so:

$l^{(2)}_j(x) &= \left( \begin{matrix} \frac{1}{h^2}, & \frac{-2}{h^2}, & \frac{1}{h^2} \\ \end{matrix} \right)$

This is usual Newton second order central finite difference approximation of accuracy $\mathcal{O}(h)$ , ie:

$f^{(2)}(x) = \frac{f(x-h) - 2 \cdot f(x) + f(x+h)}{h^2}$

Also $l_{x-h}^{(1)}(x-h) = \frac{-3}{2 \cdot h}$ so:

$l^{(2)}_j(x-h) &= \left( \begin{matrix} - \sum_{k \neq i} l^{(2)}_k(x_i), & \frac{ - 2 \cdot \frac{1}{-h^2}}{h \cdot \frac{1}{2h^2}} \cdot \left(\frac{1}{h} + \frac{-3}{2 \cdot h} \right), & \frac{ - 2 \cdot \frac{1}{2h^2}}{h \cdot \frac{1}{2h^2}} \cdot \left(\frac{1}{2 \cdot h} + \frac{-3}{2 \cdot h} \right) \\ \end{matrix} \right) \\ &= \left( \begin{matrix} - \sum_{k \neq i} l^{(2)}_k(x_i), & \frac{4}{h} \cdot \frac{-1}{2 \cdot h}, & \frac{-2}{h} \cdot \frac{-1}{2 \cdot h} \\ \end{matrix} \right) \\ &= \left( \begin{matrix} \frac{1}{h^2}, & \frac{-2}{h^2}, & \frac{1}{h^2} \\ \end{matrix} \right)$

This is usual Newton second order forward finite difference approximation of accuracy $\mathcal{O}(h)$ , ie:

$f^{(2)}(x - h) = \frac{f(x-h) - 2 \cdot f(x) + f(x+h)}{h^2}$

Both results are in accordance with table 25.2 page 914 in [AbramowitzStegun].

[SadiqViswanath] study the fact that the accuracy sometimes is 1 higher than specified by Lagrange, ie the accuracy is boosted. One of their results are:

Corollary 7 page 14:

Look at the grid values $(x_i)_{i=0 \dotsc n} = (h \cdot z_i)_{i=0 \dotsc n}$ . If the grid relative grid values $(z_i)_{i=0 \dotsc n}$ are symmetric about 0 (in other words z is a relative grid value if and only if −z is a relative grid value) and n − m (m is the order of the derivative) is odd, the order of accuracy is boosted by 1.

So according to the corollary the Newton second order central finite difference approximation is actually of accuracy $\mathcal{O}(h^2)$ since $(x_i)_{i = 0 \dotsc 2} = (x - h, x, x + h)$ has symmetric relative grid values and $n - m = 3 - 1 = 1$ is odd. On the other hand the accuracy of the Newton second order forward finite difference approximation is still $\mathcal{O}(h)$ . This is confirmed by table 25.2 page 914 in [AbramowitzStegun].

Decision, the finance package:

In the finance package the derivatives numericcally will approximated such that the accuracy $\mathcal{O}(0.0001^4)$ leading to an accuracy down to the sixteenth decimal.

6.5. A note on Gary Foshee Tuesday Problem and related probability problems ¶

Abstract:

There is no more dangerous mathematical discipline than probability theory. And this due to the fact that intuition and actual facts differs as nowhere else.

This text goes through a list of classical probability paradoxes and it is shown that there actually are no paradoxes, only bad use of conditional probabilities.

The problems that will be studied and contrasted are:

d’Alembert and the 2 coin tossing

The Two Children Problem

Getting 2 aces of the same color

Monty Hall

The thuesday problem

The main error is to reduce the probability space by conditioning some events away. This is first done by d’Alembert and is later done again by both Martin Gardner in “The Two Children Problem”, by Gary Foshee in “The Thuesday Child Problem” and by the opponents to vos Savant in the Monty Hall paradox.

What is done by conditioning before answering corresponds to the following. Before finding out what color an elephant has, the investigaters paints it red and then conclude that the color of the elephant is.... red.

The mistaken strategy of conditioning before answering is actually quite the oposite to modern use of conditioning in stochastic processes and filtering since here fewer events means that you know less. So in that light using the strategy of conditioning before answering you actually decide to get more ignorant before answering. And it leads to examples of fake infection of dependency.

6.5.1. Introduction ¶

I read about the tuesday problem in a danish e-magazine, Ingeniøren.

The problem is also described very well in Science News.

I was puzzled by the inconsistencies to my perception of probability theory. Having a master degree in applied probability theory I’ve come to learn never to trust your intuition unless you can back it up with mathematical proof and rigor.

And even then it can go wrong. As in the case of Monte Hall I’ve seen faulty probability trees used again and again to show the faulty conclusion.

And this brings me to the people that uses computer similations. Because simulations build on a wrong probability model verifies nothing just as faulty probability trees don’t prove anything.

So you seldom proves anything by a computer simulation, you just move the argument to whether or not your simulation represents the right problem.

The only way is to sit down and think the problems through throroughly.

6.5.2. d’Alembert and the 2 coin tossing ¶

The problem is eg described in Prakash Gorroochurn:Classic Problems of Probability, chapter 12

d’Alembert and the 2 coin tossing:

The question slightly reformulated is to calculate the probability of getting 2 tails when tossing 2 coins.

According to d’Alembert there are 3 outcomes {H, TH, TT} in which case the probability becomes $\frac{1}{3}$ instead of the correct probability $\frac{1}{4}$ . His argument being that we don’t need to toss the second coin when the first is Head.

Actually he is answering the question by conditioning. Letting T1 and T2 being the events of having tail in toss 1 and 2 respectively his answer is $P(T1 \cap T2 | T1 \cup T2)=\frac{1}{3}$

And hence d’Alembert is assigning a wrong probability to the outcome H by ignoring the outcomes of the second coin. And hence he is coming to the wrong result by reducing probability space by conditioning on it before answering.

Today everyone agrees that d’Alembert is wrong by not using the full and original probability space {HH, TH, HT, TT}, but at the time he actually had what appeared to be the best arguments.

But this classical error appears again and again in the so called paradoxes below.

6.5.3. The Two Children Problem ¶

We continue with The Two Children Problem.

This is also analyzed in eg Grinstead and Snell’s Introduction to Probability p. 171

The Two Children Problem:

Suppose that Mr. Smith has two children, at least one of whom is a son. What is the probability both children are boys?

Let us to start by looking at the probability space of 2 children and whether or not each child is a boy (B) or a girl (G).

Here we have the events (where i is the order the children are born):

B1, B2 is the event that child number 1, 2 is a boy, respectively

G1, G2 is the event that child number 1, 2 is a girl, respectively

and BR is short for $B1 \cap R2$ .

The probabilities are:

$P(B1)= P(G1) = P(B2) = P(G2) = \frac{1}{2}$

The 2 events B1 and B2 are assumed to be independent.

To answer the question we say that in the frase “at least one of whom is a son” we don’t know whether it is the oldest or the youngest child that is that boy mentioned.

Now given that the boy is the oldest then the probability for him to have a younger brother is due to independence $P(B2|B1) = P(B2) = \frac{1}{2}$ .

And hence by similarity:

$P(B2|B1) = P(B1|B2) = \frac{1}{2}$

Combining this gives :

$P(OtherChildIsABoy) &= P(B2|B1) \cdot P(B1) + P(B1|B2) \cdot P(B2) \\ &= \frac{1}{2} \cdot \frac{1}{2} + \frac{1}{2} \cdot \frac{1}{2} \\ &= \frac{1}{2}$

In other words: If we know which child we’ve got the information about, then the sex of the other child remains independent of the first childs sex. And hence the the probability of the other child being a boy is $\frac{1}{2}$ . Since we do not which child the information is about we have to take the “average” of the probabilities of the other child being a boy over child chosen. Still independence is preserved.

So we keep the assumption that the sex of each child is independent of the other.

This result fits into the intuition since knowing the sex of one child shouldn’t influence the probability of the other child.

The other way of looking at it is the one of Gardner’s and the referred to in Ingeniøren. It is similar to the the way d’Alembert wrongfully solved his problem.

They are saying given that there is at least 1 boy out of 2 children $(B1 \cup B2)$ what is the probability of having 2 boys $(B1 \cap B2)$ .

So they interpret the answer as a conditional probability $P(B1 \cap B2 | B1 \cup B2)$ .

The probability of at least 1 boy is:

$P(B1 \cup B2) = \frac{3}{4}$

And then the answer becomes:

$P(B1 \cap B2 | B1 \cup B2) &= \frac{P(B1 \cap B2)}{P(B1 \cup B2)} \\ &= \frac{1}{3}$

But conditioning changes the probability space totally as can be seen by:

$P(B1 | B1 \cup B2) = \frac{P(B1)}{P(B1 \cup B2)} = \frac{2}{3} \\ P(B2 | B1 \cup B2) = \frac{P(B2)}{P(B1 \cup B2)} = \frac{2}{3} \\ P(B1 \cap B2 | B1 \cup B2) = \frac{P(BB)}{P(B1 \cup B2)} = \frac{1}{3} \\$

which shows that B1 and B2 no more are independent since:

$P(B1 \cap B2 | B1 \cup B2) = \frac{1}{3} \neq \frac{4}{9} = P(B1 | B1 \cup B2) \cdot P(B2 | B1 \cup B2)$

This fact becomes the key to handling the thuesday problem later on.

So by conditioning by $(B1 \cup B2)$ one makes to 2 originally independent events B1 and B2 dependent.

There is nothing mathematically wrong in the above.

In the end it is all a matter of whether $P(OtherChildIsABoy)$ or $P(B1 \cap B2 | B1 \cup B2)$ is the answer to the question.

But as noted beforeit is easy to see that the answer $P(B1 \cap B2 | B1 \cup B2)$ is making the same mistake as d’Alembert.

At this point it might be an good idea to dwell at what conditioning actually means?

By conditioning you reduce the originally probability space to a new probability space.

Since the events in the conditioned probability space is a subset of the originally space one would expect that other bindings like independence are inherited, but they aren’t necessarily.

But it is a new probability space where the event GG never happens (just like d’Alembert paradox) whenever someone is talking about being parent to 2 children.

And that is why the conditioned probability space shouldn’t be used in answering the question.

On the other hand it is quite easy to consider the situation where you meet a person and one of his/hers children.

Looking closer you see that the accompanying child is a boy. From the conversation following it appears that he/she has another child.

In this case the probability of him/her having 2 boys when it is known that he/she has at least one boy is $P(OtherChildIsABoy)=\frac{1}{2}$ .

6.5.4. Drawing 2 cards of the same color out of 4 cards ¶

The next case shows how one sometimes can be mislead by using uniform probabilities instead of the right ones. But conditioning to get the answer can also be a mistake.

Drawing 2 cards of the same color out of 4 cards:

Draw 2 cards from 4 cards, eg 4 aces. What are the probability of drawing 2 of the same color (2 reds or 2 blacks)?

Here we have the events:

B1, B2 is the event that card number 1, 2 is black, respectively

R1, R2 is the event that card number 1, 2 is red, respectively

and BR is short for $B1 \cap R2$

Here one can be mislead to believe that there are 4 events {BB, BR, RB, RR} with equal probability leading to the (wrong) answer:

$P(BB \cup RR) = \frac{1}{2}$

But the probabilities are in fact:

$P(B1) = P(R1) = \frac{1}{2} \\ P(B2|B1) = P(R2|R1) = \frac{1}{3} \\ P(R2|B1) = P(B2|R1) = \frac{2}{3}$

leading to:

$P(BB) = P(RR) = \frac{1}{6} \\ P(RB) = P(BR) = \frac{1}{3}$

Here the right answer is $P(BB \cup RR) = \frac{1}{3}$

The not so known mistake however is to argue for the right probability by using P(B2|B1) or P(R2|R1). This is only right due to symmetry.

If there were 5 cards (3 reds and 2 blacks) one would get the probabilities:

$P(B1) &= \frac{2}{5} \\ P(R1) &= \frac{3}{5} \\ P(B2|B1) &= \frac{1}{4} \\ P(R2|B1) &= \frac{3}{4} \\ P(R2|R1) &= \frac{1}{2} \\ P(B2|R1) &= \frac{1}{2}$

Here the 2 conditional distributions are very different since one conditioning on R1 is uniform. So here one can’t use P(B2|B1) or P(R2|R1) directly.

Now the above leads to:

$P(BB) &= \frac{2}{5} \cdot \frac{1}{4} = \frac{1}{10} \\ P(BR) &= \frac{2}{5} \cdot \frac{3}{4} = \frac{3}{10} \\ P(RB) &= \frac{3}{5} \cdot \frac{1}{2} = \frac{3}{10} \\ P(RR) &= \frac{3}{5} \cdot \frac{1}{2} = \frac{3}{10}$

Here the right answer becomes $P(BB \cup RR) = \frac{4}{10}$ , which is neither $P(B2|B1) = \frac{1}{4}$ or $P(R2|R1) = \frac{1}{2}$ .

So of course one should use the right underlying probabilities, but one should also be carefull about answering in the full probability space.

6.5.5. The Monty Hall problem ¶

Here it is again a question of posing the right questions in order to build the right probability space. Also here You don’t answer the question by conditioning the probability space before answering.

Monty Hall Problem

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

The faulty solution says that after a door is opened then there are 2 doors with equal probability. And hence there is no gain in choosing the other door.

Here the originally probability space is reduced by the one door opened.

Proper analysis eg in Grinstead and Snell’s Introduction to Probability, p. 136 shows that the proper probability is $\frac{2}{3}$ if you choose the other door.

The argument being based of the originally probability space with 3 doors is shown below.

The columns represents the 3 independent events that describes the whole probability space:

The car is placed randomly behind one of the 3 doors
A door is chosen by the contestant
The show host (Monty) opens a door and offers the constestant to choose anew

The probability space is:

Car behind door	Door Chosen	Monty opens	Path Probability	Choose other door
$P(1)=\frac{1}{3}$	$P(1)=\frac{1}{3}$	$P(2)=\frac{1}{2}$	$\frac{1}{18}$	Loose
$P(1)=\frac{1}{3}$	$P(1)=\frac{1}{3}$	$P(3)=\frac{1}{2}$	$\frac{1}{18}$	Loose
$P(1)=\frac{1}{3}$	$P(2)=\frac{1}{3}$	$P(3)=1$	$\frac{1}{9}$	Win
$P(1)=\frac{1}{3}$	$P(3)=\frac{1}{3}$	$P(2)=1$	$\frac{1}{9}$	Win
$P(2)=\frac{1}{3}$	$P(1)=\frac{1}{3}$	$P(3)=1$	$\frac{1}{9}$	Win
$P(2)=\frac{1}{3}$	$P(2)=\frac{1}{3}$	$P(2)=\frac{1}{2}$	$\frac{1}{18}$	Loose
$P(2)=\frac{1}{3}$	$P(2)=\frac{1}{3}$	$P(3)=\frac{1}{2}$	$\frac{1}{18}$	Loose
$P(2)=\frac{1}{3}$	$P(3)=\frac{1}{3}$	$P(1)=1$	$\frac{1}{9}$	Win
$P(3)=\frac{1}{3}$	$P(3)=\frac{1}{3}$	$P(2)=\frac{1}{2}$	$\frac{1}{18}$	Loose
$P(3)=\frac{1}{3}$	$P(3)=\frac{1}{3}$	$P(3)=\frac{1}{2}$	$\frac{1}{18}$	Loose
$P(3)=\frac{1}{3}$	$P(1)=\frac{1}{3}$	$P(2)=1$	$\frac{1}{9}$	Win
$P(3)=\frac{1}{3}$	$P(2)=\frac{1}{3}$	$P(1)=1$	$\frac{1}{9}$	Win

And hence we have $P(WinByChoosingOtherDoor)=\frac{4}{9}=\frac{2}{3}$ and then it is better to choose the other door.

6.5.6. The Thuesday Child Problem ¶

Now we are ready for the Thuesday Child Problem. It is an expansion of the 2 children paradox above.

The Thuesday Child Problem:

Suppose that a person has two children, at least one of whom is a son born on a Thuesday. What is the probability both children are boys?

One should imagine that the information on which day the child was born was irrelevant. But when conditioning on the probability space before answering one by accident introduces a false dependency between birth day of week and the sex of the child and hence some change in the probabilities.

As seen at 2 Children Paradox this is the wrong way to answer the question. After showing the wrong arguments leading to the paradox, the right way to do the calculations are shown.

Finally I show how adding more and more irrelevant information actually leads to a convergence to the right probability.

First some notation. There are 2 children, 1 is short for the oldest and 2 is short of the youngest. Each child can be a boy (B) or a girl (G). Finally each child can be born on a Thuesday (T) or at any other day of the week (O).

It is assumed that child order, child sex and child birth day are independent.

In short:

1G is first child is a girl

1T is first child is born on a Thuesday

1BT is first child is a boy born on a Thuesday

1BT2GO is first child is a boy born on a Thuesday and second child is a girl born on any other day than Thuesday

etc.

And the basic probabilities are:

$P(B) &= P(G) = \frac{1}{2} \\ P(T) &= \frac{1}{7} \\ P(O) &= \frac{6}{7} \\$

As in the 2 Children Problem we choose to ignore all events not having at least one boy born on a Thuesday as an event, ie we condition with $1BT \cup 2BT$ .

Below is a probability summary:

Now:

$P(1BT) &= P(2BT) = P(B) \cdot P(T) = \frac{1}{14} \\ P(1BT \cap 2BT) &= P(1BT) \cdot P(2BT) = \frac{1}{196} \\ P(1BT \cup 2BT) &= P(1BT) + P(2BT) - P(1BT \cap 2BT) = \frac{27}{196}$

This is anything but the grey area in the probability summary.

And the probability of 2 boys, ie the other child is a boy, at least one of them born on a Thuesday is:

$P(\{1BT2BT, 1BO2BT, 1BT2BO\}) = \frac{1 + 6 + 6}{196} = \frac{13}{196}$

This is the green area in the probability summary.

And hence (ignoring the grey area):

$P(\{1BT2BT, 1BO2BT, 1BT2BO\} | 1BT \cup 2BT) = \frac{\frac{13}{196}}{\frac{27}{196}} = \frac{13}{27}$

So the probability of the other child $\frac{13}{27}$ , a little less than the right value $\frac{1}{2}$ and actually quite far from the $\frac{1}{3}$ that would be expected from the 2 Children Problem.

Let us add a paradox to the paradox. Now what happens if we didn’t knew P(T), but set it to p. Then P(O) = 1 - p.

$P(1BT) &= P(2BT) = P(B) \cdot P(T) = \frac{p}{2} \\ P(1BT \cap 2BT) &= P(1BT) \cdot P(2BT) = p^2 \\ P(1BT \cup 2BT) &= P(1BT) + P(2BT) - P(1BT \cap 2BT) = p - \frac{p^2}{4} = \frac{-p^2 + 4 \cdot p}{4} \\ P(\{1BT2BT, 1BO2BT, 1BT2BO\}) &= \frac{p^2 + (1-p) \cdot p + p \cdot (1-p)}{4} = \frac{-p^2 + 2 \cdot p}{4}$

This leads to:

$P(\{1BT2BT, 1BO2BT, 1BT2BO\} | 1BT \cup 2BT) = \frac{-p^2 + 2 \cdot p}{-p^2 + 4 \cdot p}$

Now if we make the event more rare, eg saying that the child was born on aprils fool, then p would become smaller, eg $\frac{1}{365}$ . Then we see that:

$P(\{1BT2BT, 1BO2BT, 1BT2BO\} | 1BT \cup 2BT)_{p \rightarrow 0} \rightarrow \frac{1}{2}$

Which can be seen easily by the use of the rule of l’Hospital.

Now making the event more rare corresponds to adding more and more irrelevant data, ie noise.

And what is seen from the above limit is that the more noise that is added the closer we get to the right probability $\frac{1}{2}$ (not the wrong probability from the 2 Children Problem, $\frac{1}{3}$ ).

Actually the function $f(p) = \frac{-p^2 + 2 \cdot p}{-p^2 + 4 \cdot p}$ is monotone decreasing in the interval [0, 1] and the minimum $f(1) = \frac{1}{3}$ .

So the probability of the other child being a boy too is dependent on the amount of noice added. And in the one extreme case of no noice we get the Gardner case, $\frac{1}{3}$ and in the other extreme of total noice we get the probability from independency of sexes, $\frac{1}{2}$ .

This makes no sence! The probability should be constant, ie independent of noice. And it is constant if calculations are done properly!

Let us see how one should do the calculations. The argument is similar to the 2 Children problem.

If we known which child who is the thuesday boy then the answer is simple since the sex of the other child is independent of whatever can be said about the thuesday child.

Define 1BT, 2BT as the event that child 1, 2 is the thuesday boy, respectively. And since the children are chosen randomly we have $P(1BT)=P(2BT) = \frac{1}{2}$ .

And then we have:

$P(2B | 1BT) &= P(2B) = \frac{1}{2} \\ P(1B | 2BT) &= P(1B) = \frac{1}{2} \\ P(OtherChildIsABoy) &= P(2B | 1BT) \cdot P(1BT) + P(1B | 2BT) \cdot P(2BT) = \frac{1}{2}$

I rest my case :)

Table Of Contents

Previous topic

Next topic

This Page

6. Mathematical notes ¶

6.1. Using LaTeX syntax for mathematics in RestructuredText and Sphinx ¶

6.1.1. Math/Latex in RestructuredText and Sphinx ¶

6.1.2. Examples ¶

6.2. Splines at Scipy ¶

6.2.1. Natural cubic splines ¶

6.2.2. Splines at Scipy ¶

6.3. Derivation of Cubic Spline ¶

6.3.1. Deriving the cubic spline function ¶

6.3.2. Ekstrapolation ¶

6.3.3. Estimation of the curvatures ¶

6.3.3.1. The Natural Cubic Spline ¶

6.3.3.2. The Financial Cubic Spline ¶

6.4. Using Barycentric Lagrange Interpolation in Finance ¶

6.4.1. Lagrange Interpolation ¶

6.4.2. Barycentric Lagrange Interpolation ¶

6.4.2.1. Algorithms ¶

6.4.2.2. Numerical differentiation at grid points ¶

Navigation

Previous topic

Next topic

This Page

Quick search

Navigation