# Gradient Boosting: CPU vs GPU
This is a basic tuturoal which shows how to run gradient boosting on CPU and GPU on Google Colaboratory. It will give you and opportunity to see the speedup that you get from GPU training. The speedup is large even on Tesla K80 that is available in Colaboratory. On newer generations of GPU the speedup will be much bigger.

We will use CatBoost gradient boosting library, which is known for it's good GPU performance.
  
 You could try it out on Colaboratory, just pressing on the following badge:  
 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/catboost/tutorials/blob/master/tools/google_colaboratory_cpu_vs_gpu_tutorial.ipynb) 

## Set GPU as hardware accelerator
First of all, you need to select GPU as hardware accelerator. There are two simple steps to do so:  
Step 1. Navigate to 'Runtime' menu and select 'Change runtime type'  
Step 2. Choose GPU as hardware accelerator.  
That's all!

## Importing CatBoost

Next big thing is to import CatBoost inside environment. Colaboratory has built in libraries installed and most libraries can be installed quickly with a simple *!pip install* command.  
Please ignore the warning message about already imported enum package. Furthermore take note that you need to re-import the library every time you start a new session of Colab.

In [1]:
!pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/98/03/777a0e1c12571a7f3320a4fa6d5f123dba2dd7c0bca34f4f698a6396eb48/catboost-0.12.2-cp36-none-manylinux1_x86_64.whl (55.5MB)
[K    100% |████████████████████████████████| 55.5MB 853kB/s 
[?25hCollecting enum34 (from catboost)
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Installing collected packages: enum34, catboost
Successfully installed catboost-0.12.2 enum34-1.1.6


## Download and prepare dataset
The next step is dataset downloading. GPU training is useful for large datsets. You will get a good speedup starting from 10k objects and the more objects you have, the more will be the speedup.
Because of that reason we have selected a large dataset - [Epsilon](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) (500.000 documents and 2.000 features) for this tutorial.
Firstly, we will get the data through catboost.datasets module. The code below does this. It will run for approximately 10-15 minutes. So please be patient :)

In [0]:
from catboost.datasets import epsilon

train, test = epsilon()

X_train, y_train = train.iloc[:,1:], train[0]
X_test, y_test = test.iloc[:,1:], test[0]

## Training on CPU
Now we will train the model on CPU and measure execution time.
We will use 100 iterations for our CPU training since otherwise it will take a long time.
It will take around 15 minutes.

In [3]:
from catboost import CatBoostClassifier
import timeit

def train_on_cpu():  
  model = CatBoostClassifier(
    iterations=100,
    learning_rate=0.03
  )
  
  model.fit(
      X_train, y_train,
      eval_set=(X_test, y_test),
      verbose=10
  );   
      
cpu_time = timeit.timeit('train_on_cpu()', 
                         setup="from __main__ import train_on_cpu", 
                         number=1)

print('Time to fit model on CPU: {} sec'.format(int(cpu_time)))

0:	learn: 0.6878003	test: 0.6878003	best: 0.6878003 (0)	total: 7.64s	remaining: 12m 36s
10:	learn: 0.6460415	test: 0.6460415	best: 0.6460415 (10)	total: 1m 20s	remaining: 10m 53s
20:	learn: 0.6169323	test: 0.6169323	best: 0.6169323 (20)	total: 2m 30s	remaining: 9m 25s
30:	learn: 0.5948178	test: 0.5948178	best: 0.5948178 (30)	total: 3m 38s	remaining: 8m 5s
40:	learn: 0.5766751	test: 0.5766751	best: 0.5766751 (40)	total: 4m 44s	remaining: 6m 49s
50:	learn: 0.5615754	test: 0.5615754	best: 0.5615754 (50)	total: 5m 48s	remaining: 5m 35s
60:	learn: 0.5484313	test: 0.5484313	best: 0.5484313 (60)	total: 6m 52s	remaining: 4m 23s
70:	learn: 0.5370070	test: 0.5370070	best: 0.5370070 (70)	total: 7m 55s	remaining: 3m 14s
80:	learn: 0.5265792	test: 0.5265792	best: 0.5265792 (80)	total: 8m 58s	remaining: 2m 6s
90:	learn: 0.5173778	test: 0.5173778	best: 0.5173778 (90)	total: 10m 4s	remaining: 59.8s
99:	learn: 0.5097820	test: 0.5097820	best: 0.5097820 (99)	total: 11m 2s	remaining: 0us

bestTest = 0.509

Take notice that learning time itself wothout data feeding is around 12 minutes. Whereas all the process consumes 14-15 min.

## Training on GPU
The previous code execution has been done on CPU. It's time to use GPU!  
We need to use '*task_type='GPU'*' parameter value to run GPU training. Now the execution time wouldn't be so big :)  
BTW if Colaboratory shows you a warning 'GPU memory usage is close to the limit', just press 'Ignore'.

In [4]:
def train_on_gpu():  
  model = CatBoostClassifier(
    iterations=100,
    learning_rate=0.03,
    task_type='GPU'
  )
  
  model.fit(
      X_train, y_train,
      eval_set=(X_test, y_test),
      verbose=10
  );     
      
gpu_time = timeit.timeit('train_on_gpu()', 
                         setup="from __main__ import train_on_gpu", 
                         number=1)

print('Time to fit model on GPU: {} sec'.format(int(gpu_time)))
print('GPU speedup over CPU: ' + '%.2f' % (cpu_time/gpu_time) + 'x')

0:	learn: 0.6877673	test: 0.6877673	best: 0.6877673 (0)	total: 335ms	remaining: 33.1s
10:	learn: 0.6457423	test: 0.6457424	best: 0.6457424 (10)	total: 2.54s	remaining: 20.6s
20:	learn: 0.6163271	test: 0.6163271	best: 0.6163271 (20)	total: 4.61s	remaining: 17.3s
30:	learn: 0.5943045	test: 0.5943045	best: 0.5943045 (30)	total: 6.59s	remaining: 14.7s
40:	learn: 0.5763315	test: 0.5763315	best: 0.5763315 (40)	total: 8.56s	remaining: 12.3s
50:	learn: 0.5607702	test: 0.5607704	best: 0.5607704 (50)	total: 10.5s	remaining: 10.1s
60:	learn: 0.5478195	test: 0.5478195	best: 0.5478195 (60)	total: 12.4s	remaining: 7.95s
70:	learn: 0.5360011	test: 0.5360011	best: 0.5360011 (70)	total: 14.3s	remaining: 5.83s
80:	learn: 0.5258044	test: 0.5258044	best: 0.5258044 (80)	total: 16.2s	remaining: 3.79s
90:	learn: 0.5165437	test: 0.5165438	best: 0.5165438 (90)	total: 18.1s	remaining: 1.79s
99:	learn: 0.5089656	test: 0.5089656	best: 0.5089656 (99)	total: 19.7s	remaining: 0us
bestTest = 0.5089655859
bestIteratio

As you can see GPU is much faster than CPU on large datasets. It takes just 3-4 mins vs 14-15 mins to fit the model. Moreover learning process consumes just 30 seconds vs 12 minutes! This is a good reason to use GPU instead of CPU!
  
Thank you for attention! 