# $$CatBoost\ Partial\ Dependence\ Plot\ (PDP)\ Tutorial$$

#### A partial dependence plot can show the nature of the relationship between the target and the feature, whether it is linear, monotonic or more complex.

#### Let $x_S$ be the features for which the partial dependence function should be plotted and let $x_C$ be the other features used in the machine learning model $f$.
#### The partial function  $f_{x_S}$ is estimated by calculating averages in the training data:
\begin{equation*}
f_{x_s}(x_s) = \frac{1}{n}\sum\limits_{i=1}^{n}f(x_s, x_c^{(i)})
\end{equation*}
#### where n is the number of instances in the dataset

#### In practice, the set of features $S$ usually only contains one feature or a maximum of two, because one feature produces 2D plots and two features produce 3D plots.

In [2]:
import numpy as np
from catboost import CatBoost, Pool, datasets
from catboost import datasets

#### Let's try to plot PDP on some dataset:

In [3]:
train_df, _ = datasets.higgs()

In [4]:
X, y = np.array(train_df.drop(0, axis=1))[:1000], np.array(train_df[0])[:1000]
pool = Pool(X, y)

#### Let's train CatBoost:

In [5]:
cb = CatBoost({'iterations': 50, 'verbose': False, 'random_seed': 42})
cb.fit(pool);

#### Let's choose one feature and plot its PDP:

In [None]:
features = 1

_ = cb.plot_partial_dependence(pool, features)

#### We also can plot PDP for two features at once. This time instead of simple line plot we get 2d heatmap:

In [None]:
features = [1, 2]

_ = cb.plot_partial_dependence(pool, features)