XGBoost

Article Info

Contributed by
2 authors

Last updated on
2022-01-12 15:08:03

Boosting (Machine Learning)
Ensemble Learning
Decision Trees for Machine Learning
Random Forests
CatBoost
Supervised vs Unsupervised Learning

Article Versions

19 2022-01-12 15:08:03
3088,3083 19,3088

By arvindpdmn

Spelling error in tag. Applied correct tags. Correct formatting of image citations. Updated See Also.
18 2022-01-12 05:27:40
3083,3078 18,3083

By Bhavani vangipurapu

checking the article
17 2022-01-11 17:40:27
3078,3077 17,3078

By Bhavani vangipurapu

Checking the article
16 2022-01-11 17:20:55
3077,3076 16,3077

By Bhavani vangipurapu

checking the article
15 2022-01-11 16:57:41
3076,3075 15,3076

By Bhavani vangipurapu

Checking the article

Chat Room

Submitting ...

You are editing an existing chat message.
2022-01-12 15:01:05
-

By arvindpdmn

Will publish shortly. Saw the new milestones. It's better. Explanations are better than your earlier versions. I may edit them further to simplify for the reader. You can study my changes later. It will help you improve your grammar and writing.
2022-01-11 17:43:25
-

By Bhavani vangipurapu

Hi sir,

Done all the changes except that JVM which Iam unable to find it sir.

This article is ready for another review.
2022-01-11 05:57:02
-

By arvindpdmn

1. "it has recently been expanded to ...[2016]": original paper from 2016 only says R, Python and Julia. I think JVM support came later but when? Try to find out.
2. Images in the source are higher quality. What tool are you using? It's reducing the image quality.
3. Attribution should be Morde 2019.[(Morde 2019)]
4. Give a space after period at the end of a sentence. Generally, a space is needed after , ; : ? ! and . punctuation marks. Summary sentences don't have this space.
5. Grammar check "How does XGBoost works?"
6. "XGBoost is a gradient boosting-based decision-tree-based ensemble Machine Learning technique." No need to repeat what's already covered in Summary.
7. You commented: "xgboost.readthedocs.io explains the code". This is exactly why we have Devopedia. It's the authors job to read into the code and see how it can be explained in the article. XGBoost documentation is the most important source for this article.
8. First answer: lot of repetition from Summary. Plus, "loss gradient", "differentiable loss function" are tech jargon. Reader will be lost here and will not read the rest of the article. Explanation makes no reference to the figure. Because this figure is quite complex, it needs explanation.
9. Summary image says: parallel processing, tree pruning, missing values, regularization. Are these the main techniques used in XGBoost that makes it better than Gradient Boosting? If so, one question should explain these features.
10. Since XGBoost is an evolution of gradient boosting, it's important to cover what it does better than gradient boosting.
11. Not reviewing the rest of the article.
12. Use of PCA: the reference is shady because the writer appears to be a beginner, the article is too short and he/she does give an reference. Also the author say PCA "may help". This is not a good reference. Need a better source to claim that PCA helps XGBoost.
13. "target and our base predictions": don't understand this.
14. "fitting no decision trees on different subsamples": is there a mistake here? "we have to take a Restaurant" is not correct expression. The analogy makes no sense to me. In fact, first para can be removed. "fails to forecast the inconsistency" is confusing. Then suddenly class imbalance is mentioned.
15. Last milestone on XGBoost is okay. Earlier milestones actually belong to a separate article on boosting. For this article, combine them into one.
16. Does XGBoost have multiple releases? We want to know the evolution of XGBoost. This should be the main focus. Look at the release notes.
17. "How LightGBM differs from XGBoost?" Remove this from this article. Create a new article LightGBM and add this into the new article. A volunteer will complete that article.
2022-01-09 12:55:27
-

By raam.raam

Q. What are the XGB data pre-processing steps?
How is this different from pre-processing for other methods? That has not come clear
No reference to substantiate PCA to be part of pre-processing

Some images do not have reference links
2022-01-07 09:39:34
-

By arvindpdmn

Continuing improving it. I will not be reviewing this before Monday due to other pending reviews.

Evolution of XGBoost. Source: Morde 2019.

Ensemble learning is the basis for XGBoost. Ensemble learning is a method for combining the predictive abilities of numerous learners in a systematic way. The result is a single model that aggregates the results of several models.In Ensemble Learning, XGBoost stands for Extreme Gradient Boosting, is a scalable, distributed gradient-boosted decision tree (GBDT) machine learning library. It provides parallel tree boosting and the term gradient boosting refers to single weak model by combining it with a number of other weak models to create a collectively strong model.

It is an extension of boosting in which an objective function is used as input and a gradient descent algorithm is used to generate weak models. Previously, only Python and R packages were available for XGBoost, but it has recently been expanded to include Java, Scala, Julia, and more languages[2016].

Discussion

How does XGBoost works?
XGBoost working. Source: Amazon 2021.
XGBoost operates by dividing data into segments that lead to precise predictions depending on various parameters. The trees in XGBoost take into account the previous prediction value for a given data point and create a new tree that splits the existing data as best as possible to maximise the 'gain' in prediction (if tree-splits are created that don't lead to too much gain, the tree is pruned to avoid overfitting based on a hyper-parameter set threshold). Gradient Boosted Trees generate several models by taking previous trees and factoring in their predictions to create a new tree with the goal of reducing prediction error. The algorithm will combine all of the created trees' predictions to generate its final regression after it has been trained.
The training is then repeated repeatedly, adding new trees with the capacity to forecast residuals as well as prior tree mistakes, which are then combined with the previous trees to generate the final prediction.\(New prediction =Initial prediction + Learning Rate*Residuals \)
How does XGBoost determine features?
XGBoost automatically delivers feature relevance evaluations based on a trained predictive model. After constructing a boosting tree, it retrieves feature importance ratings for each attribute. The feature importance gives a score that reflects how useful each feature was in the model's building of the enhanced decision trees. In scikit-learn, feature importance scores can be used to pick features. This is accomplished by utilising the SelectFromModel class, which accepts a model and may turn a dataset into a subset with selected features.This class can accept a pre-trained model, such as one that has been trained on the whole training dataset. It can then decide which features to select by applying a threshold. This threshold is utilised when you use the transform() method on the SelectFromModel instance to select the same features on the training and test datasets consistently. By using scikit-permutation each feature in the model will be shuffled randomly and the difference in performance will be computed.
Import plot_importance from xgboost tells us how important that feature is to the model. Using this method, you can choose features not only for your XGBoost, but also for any other similar model that run on the data.
How XGBoost handles missing values in a given dataset?
Handling the missing values. Source: Rusdah 2020.
On XGBoost, it can be handled with a sparsity-aware split finding algorithm that can accurately handle missing values on XGBoost. The algorithm helps in the process of creating a CART on XGBoost to work out missing values directly.CART is a binary decision tree that repeatedly separates a node into two leaf nodes.The above figure illustrates that data is used to learn the optimal default directions. The key to the method to handle algorithm is to visit only those that are not missing in entries \(I_k\).
The algorithm learns how to handle missing values by treating the non-presence as a missing value. When the non-presence corresponds to a user specified value, the algorithm can also be applied by enumerating only consistent solutions.All sparsity patterns are handled uniformly by XGBoost. This sparsity is used to make computation complexity proportional to the number of non-missing elements in the input.
What are the XGB data pre-processing steps?
Data pre-processing steps. Source: Panarese 2021.
The following are the data pre-processing procedures for XGB:
- Load the information
- Examine the data and delete any unnecessary attributes.
- Convert textual values to numeric values
- If necessary, locate and replace the missing values and don't remove predictors with zero or near-zero variance like we used to do for normal models.
- Divide the dataset into two parts: training and testing.
- Scale the features or normalise the data.
- Do Pricipal component analysis process which helps the performance of XGBoost. Prinicipal component analysis is a statistical process that employs an orthogonal transformation to convert a set of correlated variables to a set of uncorrelated variables.
What does the XGBoost leaf node weight mean, and how do you figure it out?
XGBoost leaf nodes. Source: dmlc XGBoost 2020.
In general, we build a tree that predicts residual values between target and base predictions. The "leaf weight" in tree building is the projected output of the model associated with each leaf (exit) node.XGBoost tries all split points given by the data values for each feature and records their gain. It then picks the feature and threshold combination with the largest gain.
Here's an example of how to compute the leaf node weights in XGBoost-Consider the following test data point: age=10, gender=female.To forecast the data point, the tree is traversed top to bottom, undergoing a series of tests. A feature is required at each intermediate node to compare against a threshold. According to the comparison's outcome, need to navigate to the left or right child node of the tree. Because "age 15" is true in the case of (10, female), the left branch should be done first, then the age 15 test. In the second test, "gender = male?" It is false and go to Leaf 2 which is constructed with a leaf weight of 0.1.
What makes XGboost better than gradient boosting?
Tree pruning. Source: Wikipedia 2014.
Parallel Computing: When you run XGBoost, by default it would use all the cores of your laptop/machine enabling its capacity to do parallel computation.
Tree pruning using depth firist approach: XGBoost uses ‘max_depth’ parameter instead of criterion first, and starts pruning trees backward. Tree Pruning is a data compression technique used in machine learning and search algorithms to minimise the size of decision trees by deleting non-critical and redundant sections of the tree used to classify instances.
Missing Values: XGBoost is designed to handle missing values internally. The missing values are treated in such a manner that any trend in missing values (if it exists) is captured by the model.
Regularization: The biggest advantage of XGBoost is that it uses regularisation in its objective function which helps to controls the overfitting and simplicity of the model, leading to better performance.
Is XGBoost more efficient than a random forest?
Random Forest and XGBoost are decision tree algorithms that take a distinct approach to training data. When the model fails to forecast the inconsistency for the first time in XGBoost, it gives it more preferences and weightage in subsequent iterations, enhancing its ability to predict the low-participation class; however, we cannot guarantee that random forest will treat the class imbalance properly.If the bulk of the trees in the forest are given comparable samples, the random forest is likely to overfit the data.

Milestones

2016

Tianqi Chen creates XGBoost as a research project as part of the Distributed (Deep) Machine Learning Community (DMLC) group.

2019

XGBoost is also available on OpenCL for FPGAs.OpenCL (Open Computing Language) is a programming language that runs on a variety of platforms, including CPUs, GPUs, DSPs, FPGAs, and other processors and hardware accelerators.

2021

XGBoost is integrated in Apache Hadoop, Apache Spark, Apache Flink, and Dask, which are distributed processing frameworks.

Sample Code

python

#source:https://machinelearningmastery.com/develop-first-xgboost-model-python-scikit-learn/
#Accessed:2021-12-29
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

References

Article Stats

1307

Words

Authors

Edits

Chats

Likes

5063

Hits

Cite As

Devopedia. 2022. "XGBoost." Version 19, January 12. Accessed 2023-11-13. https://devopedia.org/xgboost

Contributed by
2 authors

Last updated on
2022-01-12 15:08:03

algorithms machine learning random forest boosting

Boosting (Machine Learning)
Ensemble Learning
Decision Trees for Machine Learning
Random Forests
CatBoost
Supervised vs Unsupervised Learning

XGBoost

Discussion

Milestones

Sample Code

References

Further Reading

Article Stats

Author-wise Stats for Article Edits

Cite As

See Also

Login