# Regression on Gradient Boosting: CPU vs GPU

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/catboost/tutorials/blob/master/tools/google_colaboratory_cpu_vs_gpu_regression_tutorial.ipynb)

This is a basic tuturoal which shows how to run regression on gradient boosting on CPU and GPU on Google Colaboratory. It will give you an opportunity to see the speedup that you get from GPU training. The speedup is large even on Tesla K80 that is available in Colaboratory. On newer generations of GPU the speedup will be much bigger.

We will use CatBoost gradient boosting library, which is known for it's good GPU performance.

# !Set GPU as hardware accelerator!
First of all, you need to select GPU as hardware accelerator. There are two simple steps to do so:  
#### Step 1. Navigate to 'Runtime' menu and select 'Change runtime type'  
#### Step 2. Choose GPU as hardware accelerator.  
That's all!

## Importing CatBoost

Next big thing is to import CatBoost inside environment. Colaboratory has built in libraries installed and most libraries can be installed quickly with a simple  `!pip install` command.  
Please take notice you need to re-import library every time you starts new session of Colab.

In [1]:
!pip install catboost



## Including libraries
Now we need to include the libraries:

`CatBoostRegressor` - for regression,

`timeit` - to measure time,

`make_regression` - to generate dataset

In [0]:
from catboost import CatBoostRegressor
import timeit
from sklearn.datasets import make_regression

## Generating dataset
The next step is dataset generating. GPU training is useful for large datsets. You will get a good speedup starting from 10k objects.
Because of that reason we have generated a large dataset  (40.000 documents and 2.000 features) for this tutorial.

We will generate a dataset using the  `datasets.make_regression` module from the `sklearn` library, because this is the easiest way to load a large dataset into Google Kolab for our tests. The dataset is a linear regression with Gaussian noise.

The code below does this.

In [0]:
num_rows = 40000
num_colomns = 2000
X_train, y_train = make_regression(n_samples=num_rows, n_features=num_colomns, 
                                   bias=100, noise=1.0, random_state=0)  
X_test, y_test = X_train, y_train 

## Training on CPU
Now we will train the model on CPU and measure execution time.
We will use 100 iterations for our CPU training since otherwise it will take a long time.
It will take around 8 minutes.

In [4]:
def train_on_cpu():  
  model = CatBoostRegressor(
    iterations=100,
    learning_rate=0.03
  )
  
  model.fit(
      X_train, y_train,
      eval_set=(X_test, y_test),
      verbose=10
  );   
      
cpu_time = timeit.timeit('train_on_cpu()', 
                         setup="from __main__ import train_on_cpu", 
                         number=1)

print('Time to fit model on CPU: {} sec'.format(int(cpu_time)))

0:	learn: 210.7406779	test: 210.7406779	best: 210.7406779 (0)	total: 5.42s	remaining: 8m 56s
10:	learn: 175.7370024	test: 175.7370024	best: 175.7370024 (10)	total: 51.3s	remaining: 6m 55s
20:	learn: 149.6425348	test: 149.6425348	best: 149.6425348 (20)	total: 1m 34s	remaining: 5m 56s
30:	learn: 130.0950162	test: 130.0950162	best: 130.0950162 (30)	total: 2m 18s	remaining: 5m 8s
40:	learn: 114.8508041	test: 114.8508041	best: 114.8508041 (40)	total: 3m	remaining: 4m 19s
50:	learn: 102.8672930	test: 102.8672930	best: 102.8672930 (50)	total: 3m 41s	remaining: 3m 33s
60:	learn: 93.1019234	test: 93.1019234	best: 93.1019234 (60)	total: 4m 23s	remaining: 2m 48s
70:	learn: 84.9256561	test: 84.9256561	best: 84.9256561 (70)	total: 5m 5s	remaining: 2m 4s
80:	learn: 77.9872987	test: 77.9872987	best: 77.9872987 (80)	total: 5m 46s	remaining: 1m 21s
90:	learn: 72.1042340	test: 72.1042340	best: 72.1042340 (90)	total: 6m 26s	remaining: 38.3s
99:	learn: 67.4124799	test: 67.4124799	best: 67.4124799 (99)	tot

## Training on GPU
The previous code execution has been done on CPU. It's time to use GPU!  
We need to use '*task_type='GPU'*' parameter value to run GPU training. Now the execution time wouldn't be so big :)  
BTW if Colaboratory shows you a warning 'GPU memory usage is close to the limit', just press 'Ignore'.

In [5]:
def train_on_gpu():  
  model = CatBoostRegressor(
    iterations=100,
    learning_rate=0.03,
    task_type='GPU'
  )
  model.fit(
      X_train, y_train,
      eval_set=(X_test, y_test),
      verbose=10
  );      
      
gpu_time = timeit.timeit('train_on_gpu()', 
                         setup="from __main__ import train_on_gpu", 
                         number=1)

print('Time to fit model on GPU: {} sec'.format(int(gpu_time)))

0:	learn: 210.6864742	test: 210.6864438	best: 210.6864438 (0)	total: 418ms	remaining: 41.4s
10:	learn: 175.3974321	test: 175.3974048	best: 175.3974048 (10)	total: 3.75s	remaining: 30.3s
20:	learn: 149.2556840	test: 149.2556786	best: 149.2556786 (20)	total: 6.84s	remaining: 25.7s
30:	learn: 129.5295858	test: 129.5295858	best: 129.5295858 (30)	total: 9.87s	remaining: 22s
40:	learn: 114.3826525	test: 114.3826420	best: 114.3826420 (40)	total: 12.9s	remaining: 18.5s
50:	learn: 102.3210672	test: 102.3210711	best: 102.3210711 (50)	total: 15.8s	remaining: 15.2s
60:	learn: 92.4963048	test: 92.4963048	best: 92.4963048 (60)	total: 18.8s	remaining: 12s
70:	learn: 84.2915749	test: 84.2915654	best: 84.2915654 (70)	total: 21.7s	remaining: 8.86s
80:	learn: 77.3639503	test: 77.3639528	best: 77.3639528 (80)	total: 24.5s	remaining: 5.75s
90:	learn: 71.4349886	test: 71.4349942	best: 71.4349942 (90)	total: 27.3s	remaining: 2.7s
99:	learn: 66.7548260	test: 66.7548230	best: 66.7548230 (99)	total: 29.8s	remai

In [6]:
print('GPU speedup over CPU: ' + '%.2f' % (cpu_time/gpu_time) + 'x')

GPU speedup over CPU: 6.71x


As you can see GPU is much faster than CPU on large datasets. It takes just 1 - 2 mins vs 7 - 8 mins to fit the model.
This is a good reason to use GPU instead of CPU!
  
Thank you for attention! 