99久久精品国产一区二区三区,亚洲男女一区二区三区出奶水了,亚洲国产精久久久久久久春色

在下面的實際代碼實現中，第一層使用了多種基學習器，包括隨機森林、XGBoost、LightGBM、梯度提升、AdaBoost和CatBoost，這些模型分別獨立訓練并生成預測結果，第二層的元學習器采用線性回歸，通過學習第一層各基學習器的預測結果，進一步整合優化，生成最終的預測結果

SHAP如何解釋Stacking模型？

需要注意的是，SHAP是一種對單一模型進行解釋的工具，它通過分配特征對模型預測的貢獻值來衡量特征的重要性，所以針對Stacking需要逐層拆解進行分析，可以通過以下兩種方式來解釋Stacking模型：

逐步拆解Stacking結構，分別解釋基學習器和元學習器的行為
將Stacking模型視為整體的“黑箱”進行解釋（僅關注輸入特征與最終預測輸出的關系）

代碼實現

模型構建

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt 

import warnings

warnings.filterwarnings("ignore")



plt.rcParams['font.family'] = 'Times New Roman'

plt.rcParams['axes.unicode_minus'] = False

df = pd.read_excel('2024-12-7公眾號Python機器學習AI.xlsx')



from sklearn.model_selection import train_test_split, KFold



X = df.drop(['Y'],axis=1)

y = df['Y']



# 劃分訓練集和測試集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 

                                                    random_state=42)



from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor, StackingRegressor

from xgboost import XGBRegressor

from lightgbm import LGBMRegressor

from catboost import CatBoostRegressor

from sklearn.linear_model import LinearRegression



# 定義一級學習器

base_learners = [

    ("RF", RandomForestRegressor(n_estimators=100, random_state=42)),

    ("XGB", XGBRegressor(n_estimators=100, random_state=42, verbosity=0)),

    ("LGBM", LGBMRegressor(n_estimators=100, random_state=42, verbose=-1)),

    ("GBM", GradientBoostingRegressor(n_estimators=100, random_state=42)),

    ("AdaBoost", AdaBoostRegressor(n_estimators=100, random_state=42)),

    ("CatBoost", CatBoostRegressor(n_estimators=100, random_state=42, verbose=0))  

]



# 定義二級學習器

meta_model = LinearRegression()



# 創建Stacking回歸器

stacking_regressor = StackingRegressor(estimators=base_learners, final_estimator=meta_model, cv=5)



# 訓練模型

stacking_regressor.fit(X_train, y_train)

通過訓練多個基學習器（如隨機森林、XGBoost等）和一個線性回歸作為元學習器，構建并訓練了一個用于回歸任務的Stacking集成模型。

基學習器SHAP值計算

針對RF單一模型解釋

shap_dfs['RF']

計算SHAP值，逐一解析Stacking模型中每個基學習器的特征重要性并保存為數據框，便于后續分析，這里只展示了隨機森林的shap值結果。

RF模型蜂巢圖

plt.figure()

shap.summary_plot(np.array(shap_dfs['RF']), X_test, feature_names=X_test.columns, plot_type="dot", show=False)

plt.savefig("RF summary_plot.pdf", format='pdf',bbox_inches='tight')

RF模型shap特征貢獻圖

plt.figure(figsize=(10, 5), dpi=1200)

shap.summary_plot(np.array(shap_dfs['RF']), X_test, plot_type="bar", show=False)

plt.title('SHAP_numpy Sorted Feature Importance')

plt.tight_layout()

plt.savefig("RF Sorted Feature Importance.pdf", format='pdf',bbox_inches='tight')

plt.show()

繪制基學習器里隨機森林的SHAP蜂巢圖和特征貢獻排序圖，其他基學習器也可用類似方法進行特征重要性分析

繪制完整基學習器蜂巢圖

為Stacking模型中的所有基學習器繪制SHAP特征重要性蜂巢圖，可以發現每個基學習器的 SHAP 解釋并不相同，正是因為每個基學習器獨立工作并對特征有不同的偏好所導致的

繪制完整基學習器shap特征貢獻圖

為Stacking模型中的所有基學習器繪制SHAP特征貢獻排序圖（柱狀圖），展示每個基學習器特征重要性的平均影響，其它SHAP可視化同樣的道理繪制。

元學習器SHAP值計算

shap_df

計算元學習器的SHAP值，其中輸入特征是各基學習器的預測結果，模型僅對這些特征進行解釋以揭示基學習器對最終預測的貢獻。

元學習器蜂巢圖

plt.figure()

shap.summary_plot(np.array(shap_df), shap_df, feature_names=shap_df.columns, plot_type="dot", show=False)

plt.title("SHAP Contribution Analysis for the Meta-Learner in the Second Layer of Stacking Regressor", fontsize=16, y=1.02)

plt.savefig("SHAP Contribution Analysis for the Meta-Learner in the Second Layer of Stacking Regressor.pdf", format='pdf', bbox_inches='tight')

plt.show()

元學習器hap特征貢獻圖

plt.figure(figsize=(10, 5), dpi=1200)

shap.summary_plot(np.array(shap_df), shap_df, plot_type="bar", show=False)

plt.tight_layout()

plt.title("Bar Plot of SHAP Feature Contributions for the Meta-Learner in Stacking Regressor", fontsize=16, y=1.02)

plt.savefig("Bar Plot of SHAP Feature Contributions for the Meta-Learner in Stacking Regressor.pdf", format='pdf', bbox_inches='tight')

plt.show()

繪制元學習器（第二層 LinearRegression）的SHAP蜂巢圖和特征貢獻排序圖，分別展示各基學習器的預測值對元學習器最終決策的影響分布和平均重要性。這些可視化結果揭示了在Stacking 第二層中，不同基學習器對元學習器預測的貢獻程度，從而幫助了解每個基學習器在整體模型中的相對重要性

元學習器蜂巢圖與特征關系圖結合展示

組合shap可視化蜂巢圖和特征貢獻圖，讓復雜的機器學習模型變得更加透明和易于解釋。

Stacking模型視為整體的“黑箱”解釋

Stacking模型整體shap計算

stacking_shap_df

對Stacking模型進行整體解釋，計算輸入特征對模型預測輸出的貢獻，僅關注輸入特征與最終預測輸出的關系，當然這里作者只計算了測試集里前100個樣本的shap值，由于模型本身的復雜性同時計算所有樣本shap值對于時間成本有一定要求

Stacking模型整體蜂巢圖

plt.figure()

shap.summary_plot(np.array(stacking_shap_df), stacking_shap_df, feature_names=stacking_shap_df.columns, plot_type="dot", show=False)

plt.title("Based on the overall feature contribution analysis of SHAP to the stacking model", fontsize=16, y=1.02)

plt.savefig("Based on the overall feature contribution analysis of SHAP to the stacking model.pdf", format='pdf', bbox_inches='tight')

plt.show()

Stacking模型整體特征貢獻圖

plt.figure(figsize=(10, 5), dpi=1200)

shap.summary_plot(np.array(stacking_shap_df), shap_df, plot_type="bar", show=False)

plt.tight_layout()

plt.title("SHAP-based Stacking Model Feature Contribution Histogram Analysis", fontsize=16, y=1.02)

plt.savefig("SHAP-based Stacking Model Feature Contribution Histogram Analysis.pdf", format='pdf', bbox_inches='tight')

plt.show()