{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tweedie Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In insurance premium prediction problems, the total claim amount for a covered risk usually has a continuous distribution on positive values, except for the possibility of being exact zero when the claim does not occur. One standard approach in actuarial science in modeling such data is using compound Poisson models.\n", "\n", "##### Compound Poisson distribution\n", "\n", "Let $ N $ be a random variable with Poisson distribution and $ Z_1, Z_2, ... $ be independent identically distributed random variables with Gamma distribution. Define a random variable $ Z $ by\n", "\n", "$$ Z = \\begin{cases}0, & \\mbox{if}\\ N = 0\\\\Z_1 + Z_2 + ... + Z_N, & \\mbox{if}\\ N > 0\\end{cases} $$\n", "\n", "The resulting distribution of $ Z $ is called compound Poisson distribution. In the case of insurance premium prediction $ N $ referres to the number of claims, $ Z_i $ reffers to the amount of $i$-th claim. Compound Poisson distribution is a special case of Tweedie model.\n", "\n", "Log-likelihood of compound Poisson distribution can be written as\n", "$$ p(z) = \\frac{1}{\\phi}\\left(z \\frac{\\mu^{1-\\rho}}{1-\\rho} - \\frac{\\mu^{2-\\rho}}{2-\\rho}\\right) + a$$\n", "\n", "where $ a, \\phi, \\mu $ and $ 1 < \\rho < 2 $ are some constants.\n", "\n", "We will apply Tweedie model to an auto insurance claim dataset analyzed in Yip, Yau (2005) and Zhou, Yang, Qian (2019).\n", "\n", "##### Loading dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!wget https://cran.r-project.org/src/contrib/cplm_0.7-8.tar.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!tar -xf cplm_0.7-8.tar.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install rdata" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "import rdata\n", "\n", "data = rdata.parser.parse_file('cplm/data/AutoClaim.RData')\n", "df = rdata.conversion.convert(data)['AutoClaim']" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "from sklearn.model_selection import train_test_split\n", "from catboost.utils import eval_metric\n", "from catboost import CatBoostRegressor, Pool" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | AGE | \n", "BLUEBOOK | \n", "HOMEKIDS | \n", "KIDSDRIV | \n", "MVR_PTS | \n", "NPOLICY | \n", "RETAINED | \n", "TRAVTIME | \n", "AREA | \n", "CAR_USE | \n", "CAR_TYPE | \n", "GENDER | \n", "JOBCLASS | \n", "MAX_EDUC | \n", "MARRIED | \n", "REVOLKED | \n", "CLM_AMT5 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1019 | \n", "45 | \n", "14830 | \n", "2 | \n", "0 | \n", "0 | \n", "3 | \n", "6 | \n", "31 | \n", "Urban | \n", "Private | \n", "Sedan | \n", "M | \n", "Professional | \n", "Masters | \n", "Yes | \n", "No | \n", "0 | \n", "
5461 | \n", "42 | \n", "13770 | \n", "3 | \n", "1 | \n", "0 | \n", "1 | \n", "14 | \n", "24 | \n", "Urban | \n", "Private | \n", "Sports Car | \n", "F | \n", "Professional | \n", "Bachelors | \n", "Yes | \n", "No | \n", "0 | \n", "
7226 | \n", "55 | \n", "21520 | \n", "0 | \n", "0 | \n", "4 | \n", "1 | \n", "1 | \n", "25 | \n", "Urban | \n", "Private | \n", "Van | \n", "M | \n", "Blue Collar | \n", "<High School | \n", "No | \n", "No | \n", "6656 | \n", "
6233 | \n", "33 | \n", "25380 | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "6 | \n", "27 | \n", "Urban | \n", "Commercial | \n", "Panel Truck | \n", "M | \n", "Blue Collar | \n", "High School | \n", "No | \n", "No | \n", "0 | \n", "
8215 | \n", "45 | \n", "22680 | \n", "0 | \n", "0 | \n", "5 | \n", "1 | \n", "6 | \n", "24 | \n", "Urban | \n", "Private | \n", "Sedan | \n", "M | \n", "Professional | \n", "Masters | \n", "No | \n", "No | \n", "6314 | \n", "