---
title: "Actuarial Statistical Distributions"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Actuarial Statistical Distributions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
results = "asis",
message = FALSE,
warning = FALSE
)
```
## The Importance of Statistical Distributions in Actuarial Modelling
In Property Casualty Insurance, policies are written to cover
policyholder losses that arise from unpredictable events that occur at
unforeseeable times and contain incalculable dollar values (hence the
invention of insurance!). Therefore, the field of mathematical
statistics and predictive modelling must be utilized by actuaries in
order to optimize the insurance system to ensure liquidity and
profitability.
Every Property Casualty claim process involves two independent random
variables:
- the Claim Size Random Variable - *Severity*
- the Claim Count Random Variable - *Frequency*
These two variables combine to create a third fundamental claim
variable: **the aggregate-loss random variable**. This represents the
total claim amount generated by the underlying claim process.
## Severity - Claim Size
The random variable associated with the size of a claim, or in other
words, its dollar amount, is based on the finite population of claims
(or a sample of it from a larger population), and always has a discrete
distribution. However, it is much more useful to assume a continuous
distribution so that various mathematical calculations and integration
can be performed on the data.
Claim sizes, by their very nature are always positive, and typically
*long-tailed* meaning larger values are more rare. This means the
distribution is not symmetric, but rather is skewed-right or *positively
skewed*.
The skewness of a continuous distribution is calculated by its
normalized third central moment, and is a useful measure of distribution
symmetry:
### The Gamma Distribution Family
Gamma distributions comprise a versatile family of probability
distributions, with many applications in statistics and probability.
Property/casualty actuaries have found them useful in constructing a
variety of insurance models -- parameter uncertainty for claim-count
distributions, approximation of aggregate-loss distributions, and
occasionally as claim-size distributions.
The gamma distribution with positive parameters $\left(a,b\right)$ is
defined by the probability density function:
where the symbol $\Gamma$ represents the **Gamma Function**:
### Lognormal Distribution
Random variable X has a lognormal distribution with parameters (µ, s) if, and
only if, log X is normally distributed with mean µ and variance s2. Therefore, the
lognormal variable X can be expressed as X = esZ+µ, where Z is the standard normal
random variable. As a consequence, the lognormal cumulative distribution function is:
### Pareto Distribution
Pareto distributions bear the name of the eponymous Italian sociologist and
economist *Vilfredo Pareto* (1843–1923), who first proposed using them in 1896.
The distribution has long been attractive to property/casualty actuaries.
The computationally simple form of the distribution function—requiring only
algebraic calculations and no limit processes—and the typically heavy long tail have
made the Pareto family the distributional family of choice to model claim size in a
variety of actuarial applications.
## Frequency Distributions
- [Poisson Distribution]()
- [Beta-Binomial Distribution](https://reference.wolfram.com/language/ref/BetaBinomialDistribution.html)
- [Beta-NegativeBinomial Distribution]()
- [Geometric Distribution]()
- [Log Series Distribution]()
- [Negative Binomial Distribution]()
## Severity Distributions
- [Exponential Distribution]() --- size of claims when independent and
constant rate
- [Pareto Distribution]() --- heavy tail claim sizes, for outsized
claims
- [Spliced Distribution]() --- splicing together body and tail claim
distribution
- [Weibull Distribution]()
- [Half Normal Distribution]() -
Actuarial computation deals with quantifying and redistributing risk in
insurance and finance. Risks refer to financial losses and may relate to
health, cars, life, and financial investments, etc. Risks are
redistributed by grouping many individuals and analyzing the whole group
to determine premiums and risk probabilities, etc.
The Wolfram Language provides extensive support for models, data, and
computation related to finance, probability, and statistics. In life
insurance, important aspects include time value of money with either
deterministic or stochastic models of lifetimes. In non-life insurance,
important parts include the frequency and size of claims for a group,
either short term or long term
Many of the models highlighted here are related to gamma
distribution either directly or indirectly. So the catalog of
distributions starts with the gamma distribution at the top and then
branches out to the other related models.
Mathematically, the gamma distribution is a two-parameter, continuous
distribution defined using the gamma function.
The gamma sub-family includes the exponential distribution, Erlang
distribution, and chi-squared distribution.
These are distributions that are gamma distributions with certain
restrictions on one or both of the gamma parameters.
Other distributions are obtained by raising a distribution to a power;
Others are obtained by mixing distributions.
Here's a listing of the models:
| Deivation | Distribution |
|:-----------------------------:|:---------------------------------:|
| Gamma Function | Gamma Distribution |
| Gamma Sub-Families | Erlang, Exponential, Chi-Squared |
| Independent Sum of Gammas | Erlang, Hypo-Exponential |
| Exponentiation | Lognormal |
| Raised to a Positive Power | Weibull |
| Raised to a Exponential Power | Transformed Exponential = Weibull |
## Fitting Distributions
Actuaries frequently need to fit a *parametric* distribution model to a
set of claims data in order to a) smooth the empirical distribution and
b) interpolate or extrapolate between or beyond the existing data.
`lossrx` provides some useful helper functions meant to aid an actuary in
determining an optimal parametric distribution to fit the data at hand:
```{r, eval=FALSE}
library(lossrx)
data("claims_transactional")
data("losses")
data("exposures")
latest_eval <- losses |> dplyr::filter(eval_date == max(.data$eval_date))
wc_dat <- latest_eval |> dplyr::filter(coverage == "WC", total_incurred > 0)
al_dat <- latest_eval |> dplyr::filter(coverage == "AL")
describe_distribution <- function(data, ...) {
fitdistrplus::descdist(data = data, boot = 1000, ...)
}
describe_distribution(wc_dat$total_incurred)
```