# XGBoost

Ensemble learning is the basis for XGBoost. Ensemble learning is a method for combining the predictive abilities of numerous learners in a systematic way. The result is a single model that aggregates the results of several models.In Ensemble Learning, XGBoost stands for Extreme Gradient Boosting, is a scalable, distributed gradient-boosted decision tree (GBDT) machine learning library. It provides parallel tree boosting and the term gradient boosting refers to single weak model by combining it with a number of other weak models to create a collectively strong model.

It is an extension of boosting in which an objective function is used as input and a gradient descent algorithm is used to generate weak models. Previously, only Python and R packages were available for XGBoost, but it has recently been expanded to include Java, Scala, Julia, and more languages[2016].

## Discussion

• How does XGBoost works?

XGBoost operates by dividing data into segments that lead to precise predictions depending on various parameters. The trees in XGBoost take into account the previous prediction value for a given data point and create a new tree that splits the existing data as best as possible to maximise the 'gain' in prediction (if tree-splits are created that don't lead to too much gain, the tree is pruned to avoid overfitting based on a hyper-parameter set threshold). Gradient Boosted Trees generate several models by taking previous trees and factoring in their predictions to create a new tree with the goal of reducing prediction error. The algorithm will combine all of the created trees' predictions to generate its final regression after it has been trained.

The training is then repeated repeatedly, adding new trees with the capacity to forecast residuals as well as prior tree mistakes, which are then combined with the previous trees to generate the final prediction.$$New prediction =Initial prediction + Learning Rate*Residuals$$

• How does XGBoost determine features?

XGBoost automatically delivers feature relevance evaluations based on a trained predictive model. After constructing a boosting tree, it retrieves feature importance ratings for each attribute. The feature importance gives a score that reflects how useful each feature was in the model's building of the enhanced decision trees. In scikit-learn, feature importance scores can be used to pick features. This is accomplished by utilising the SelectFromModel class, which accepts a model and may turn a dataset into a subset with selected features.This class can accept a pre-trained model, such as one that has been trained on the whole training dataset. It can then decide which features to select by applying a threshold. This threshold is utilised when you use the transform() method on the SelectFromModel instance to select the same features on the training and test datasets consistently. By using scikit-permutation each feature in the model will be shuffled randomly and the difference in performance will be computed.

Import plot_importance from xgboost tells us how important that feature is to the model. Using this method, you can choose features not only for your XGBoost, but also for any other similar model that run on the data.

• How XGBoost handles missing values in a given dataset?

On XGBoost, it can be handled with a sparsity-aware split finding algorithm that can accurately handle missing values on XGBoost. The algorithm helps in the process of creating a CART on XGBoost to work out missing values directly.CART is a binary decision tree that repeatedly separates a node into two leaf nodes.The above figure illustrates that data is used to learn the optimal default directions. The key to the method to handle algorithm is to visit only those that are not missing in entries $$I_k$$.

The algorithm learns how to handle missing values by treating the non-presence as a missing value. When the non-presence corresponds to a user specified value, the algorithm can also be applied by enumerating only consistent solutions.All sparsity patterns are handled uniformly by XGBoost. This sparsity is used to make computation complexity proportional to the number of non-missing elements in the input.

• What are the XGB data pre-processing steps?

The following are the data pre-processing procedures for XGB:

• Examine the data and delete any unnecessary attributes.
• Convert textual values to numeric values
• If necessary, locate and replace the missing values and don't remove predictors with zero or near-zero variance like we used to do for normal models.
• Divide the dataset into two parts: training and testing.
• Scale the features or normalise the data.
• Do Pricipal component analysis process which helps the performance of XGBoost. Prinicipal component analysis is a statistical process that employs an orthogonal transformation to convert a set of correlated variables to a set of uncorrelated variables.
• What does the XGBoost leaf node weight mean, and how do you figure it out?

In general, we build a tree that predicts residual values between target and base predictions. The "leaf weight" in tree building is the projected output of the model associated with each leaf (exit) node.XGBoost tries all split points given by the data values for each feature and records their gain. It then picks the feature and threshold combination with the largest gain.

Here's an example of how to compute the leaf node weights in XGBoost-Consider the following test data point: age=10, gender=female.To forecast the data point, the tree is traversed top to bottom, undergoing a series of tests. A feature is required at each intermediate node to compare against a threshold. According to the comparison's outcome, need to navigate to the left or right child node of the tree. Because "age 15" is true in the case of (10, female), the left branch should be done first, then the age 15 test. In the second test, "gender = male?" It is false and go to Leaf 2 which is constructed with a leaf weight of 0.1.

• What makes XGboost better than gradient boosting?

Parallel Computing: When you run XGBoost, by default it would use all the cores of your laptop/machine enabling its capacity to do parallel computation.

Tree pruning using depth firist approach: XGBoost uses ‘max_depth’ parameter instead of criterion first, and starts pruning trees backward. Tree Pruning is a data compression technique used in machine learning and search algorithms to minimise the size of decision trees by deleting non-critical and redundant sections of the tree used to classify instances.

Missing Values: XGBoost is designed to handle missing values internally. The missing values are treated in such a manner that any trend in missing values (if it exists) is captured by the model.

Regularization: The biggest advantage of XGBoost is that it uses regularisation in its objective function which helps to controls the overfitting and simplicity of the model, leading to better performance.

• Is XGBoost more efficient than a random forest?

Random Forest and XGBoost are decision tree algorithms that take a distinct approach to training data. When the model fails to forecast the inconsistency for the first time in XGBoost, it gives it more preferences and weightage in subsequent iterations, enhancing its ability to predict the low-participation class; however, we cannot guarantee that random forest will treat the class imbalance properly.If the bulk of the trees in the forest are given comparable samples, the random forest is likely to overfit the data.

## Milestones

2016

Tianqi Chen creates XGBoost as a research project as part of the Distributed (Deep) Machine Learning Community (DMLC) group.

2019

XGBoost is also available on OpenCL for FPGAs.OpenCL (Open Computing Language) is a programming language that runs on a variety of platforms, including CPUs, GPUs, DSPs, FPGAs, and other processors and hardware accelerators.

2021

XGBoost is integrated in Apache Hadoop, Apache Spark, Apache Flink, and Dask, which are distributed processing frameworks.

## Sample Code

• #source:https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/
#Accessed:2021-12-29
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Author
No. of Edits
No. of Chats
DevCoins
18
9
1929
1
7
561
1307
Words
2
Likes
247
Hits

## Cite As

Devopedia. 2022. "XGBoost." Version 19, January 12. Accessed 2022-01-18. https://devopedia.org/xgboost
Contributed by
2 authors

Last updated on
2022-01-12 15:08:03
• Site Map