{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# $$CatBoost\\ Tutorial$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/catboost/tutorials/blob/master/python_tutorial.ipynb)\n", "\n", "In this tutorial we would explore some base cases of using catboost, such as model training, cross-validation and predicting, as well as some useful features like early stopping, snapshot support, feature importances and parameters tuning.\n", " \n", "You could run this tutorial in Google Colaboratory environment with free CPU or GPU. Just click on this link." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## $$Contents$$\n", "* [1. Data Preparation](#$$1.\\-Data\\-Preparation$$)\n", " * [1.1 Data Loading](#1.1-Data-Loading)\n", " * [1.2 Feature Preparation](#1.2-Feature-Preparation)\n", " * [1.3 Data Splitting](#1.3-Data-Splitting)\n", "* [2. CatBoost Basics](#$$2.\\-CatBoost\\-Basics$$)\n", " * [2.1 Model Training](#2.1-Model-Training)\n", " * [2.2 Model Cross-Validation](#2.2-Model-Cross-Validation)\n", " * [2.3 Model Applying](#2.3-Model-Applying)\n", "* [3. CatBoost Features](#$$3.\\-CatBoost\\-Features$$)\n", " * [3.1 Using the best model](#3.1-Using-the-best-model)\n", " * [3.2 Early Stopping](#3.2-Early-Stopping)\n", " * [3.3 Using Baseline](#3.3-Using-Baseline)\n", " * [3.4 Snapshot Support](#3.4-Snapshot-Support)\n", " * [3.5 User Defined Objective Function](#3.5-User-Defined-Objective-Function)\n", " * [3.6 User Defined Metric Function](#3.6-User-Defined-Metric-Function)\n", " * [3.7 Staged Predict](#3.7-Staged-Predict)\n", " * [3.8 Feature Importances](#3.8-Feature-Importances)\n", " * [3.9 Eval Metrics](#3.9-Eval-Metrics)\n", " * [3.10 Learning Processes Comparison](#3.10-Learning-Processes-Comparison)\n", " * [3.11 Model Saving](#3.11-Model-Saving)\n", "* [4. Parameters Tuning](#$$4.\\-Parameters\\-Tuning$$)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## $$1.\\ Data\\ Preparation$$\n", "### 1.1 CatBoost installation\n", "If you have not already installed CatBoost, you can do so by running '!pip install catboost' command. \n", " \n", "Also you should install ipywidgets package and run special command before launching jupyter notebook to draw plots." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting catboost\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/49/d9/898a290d24bfd20a3e0758f4639b4da15fc338aea1e160c91e288c574195/catboost-0.11.2-cp37-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (7.4MB)\n", "\u001b[K 100% |████████████████████████████████| 7.4MB 2.7MB/s ta 0:00:011\n", "\u001b[?25hRequirement already satisfied: pandas>=0.19.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from catboost) (0.23.4)\n", "Collecting enum34 (from catboost)\n", " Using cached https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl\n", "Requirement already satisfied: numpy>=1.11.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from catboost) (1.15.4)\n", "Requirement already satisfied: six in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from catboost) (1.12.0)\n", "Requirement already satisfied: python-dateutil>=2.5.0 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from pandas>=0.19.1->catboost) (2.7.5)\n", "Requirement already satisfied: pytz>=2011k in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from pandas>=0.19.1->catboost) (2018.7)\n", "Installing collected packages: enum34, catboost\n", "Successfully installed catboost-0.11.2 enum34-1.1.6\n", "Requirement already satisfied: ipywidgets in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (7.4.2)\n", "Requirement already satisfied: traitlets>=4.3.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipywidgets) (4.3.2)\n", "Requirement already satisfied: ipykernel>=4.5.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipywidgets) (5.1.0)\n", "Requirement already satisfied: nbformat>=4.2.0 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipywidgets) (4.4.0)\n", "Requirement already satisfied: ipython>=4.0.0; python_version >= \"3.3\" in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipywidgets) (7.2.0)\n", "Requirement already satisfied: widgetsnbextension~=3.4.0 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipywidgets) (3.4.2)\n", "Requirement already satisfied: six in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from traitlets>=4.3.1->ipywidgets) (1.12.0)\n", "Requirement already satisfied: ipython-genutils in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from traitlets>=4.3.1->ipywidgets) (0.2.0)\n", "Requirement already satisfied: decorator in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from traitlets>=4.3.1->ipywidgets) (4.3.0)\n", "Requirement already satisfied: jupyter-client in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets) (5.2.4)\n", "Requirement already satisfied: tornado>=4.2 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipykernel>=4.5.1->ipywidgets) (5.1.1)\n", "Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets) (2.6.0)\n", "Requirement already satisfied: jupyter-core in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbformat>=4.2.0->ipywidgets) (4.4.0)\n", "Requirement already satisfied: pickleshare in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.7.5)\n", "Requirement already satisfied: appnope; sys_platform == \"darwin\" in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.1.0)\n", "Requirement already satisfied: backcall in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.1.0)\n", "Requirement already satisfied: pexpect; sys_platform != \"win32\" in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (4.6.0)\n", "Requirement already satisfied: pygments in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (2.3.1)\n", "Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (2.0.7)\n", "Requirement already satisfied: jedi>=0.10 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.13.2)\n", "Requirement already satisfied: setuptools>=18.5 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (40.6.3)\n", "Requirement already satisfied: notebook>=4.4.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from widgetsnbextension~=3.4.0->ipywidgets) (5.7.4)\n", "Requirement already satisfied: pyzmq>=13 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (17.1.2)\n", "Requirement already satisfied: python-dateutil>=2.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (2.7.5)\n", "Requirement already satisfied: ptyprocess>=0.5 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from pexpect; sys_platform != \"win32\"->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.6.0)\n", "Requirement already satisfied: wcwidth in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.1.7)\n", "Requirement already satisfied: parso>=0.3.0 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from jedi>=0.10->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets) (0.3.1)\n", "Requirement already satisfied: prometheus-client in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.5.0)\n", "Requirement already satisfied: terminado>=0.8.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.8.1)\n", "Requirement already satisfied: Send2Trash in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (1.5.0)\n", "Requirement already satisfied: jinja2 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (2.10)\n", "Requirement already satisfied: nbconvert in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (5.4.0)\n", "Requirement already satisfied: MarkupSafe>=0.23 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (1.1.0)\n", "Requirement already satisfied: mistune>=0.8.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.8.4)\n", "Requirement already satisfied: entrypoints>=0.2.2 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.2.3)\n", "Requirement already satisfied: bleach in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (3.0.2)\n", "Requirement already satisfied: pandocfilters>=1.4.1 in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (1.4.2)\n", "Requirement already satisfied: testpath in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.4.2)\n", "Requirement already satisfied: defusedxml in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.5.0)\n", "Requirement already satisfied: webencodings in /Users/sbrazhnik/anaconda3/lib/python3.7/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets) (0.5.1)\n", "Enabling notebook extension jupyter-js-widgets/extension...\n", " - Validating: \u001b[32mOK\u001b[0m\n" ] } ], "source": [ "!pip install catboost\n", "!pip install ipywidgets\n", "!jupyter nbextension enable --py widgetsnbextension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 Data Loading\n", "The data for this tutorial can be obtained from [this page](https://www.kaggle.com/c/titanic/data) (you would have to register a kaggle account or just login with facebook or google+) or you could use catboost.datasets as in code below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "