免费国产小视频,最新69成人精品毛片,天天操天天干天天爽

生成一組含有隨機缺失值的正態分布數據，為接下來的實驗數據

print("原始數據缺失值數量:", df.isnull().sum()) # 刪除缺失值樣本　并重置索引 df_1 = df.dropna().reset_index(drop=True) print("數據處理后缺失值數量:", df_1.isnull().sum()) # 可視化數據（刪除缺失值后的數據） plt.figure(figsize=(15, 5)) plt.plot(df_1, marker='o', linestyle='-', label='刪除缺失值數據', color='blue') plt.title('刪除缺失值數據') plt.xlabel('索引', fontsize=14) plt.ylabel('數值', fontsize=14) plt.legend(fontsize=12) plt.grid(True, linestyle='--', alpha=0.7) plt.show()

使用 Pandas 中的 dropna() 方法，在給定的dataframe中，調用 dropna() 方法會刪除包含任何缺失值（NaN）的行，默認情況下，它會返回一個新的數據框，其中不含有缺失值的行，這里刪除數據后還進行了索引重置

這里我們的實驗數據為單序列數據，對于刪除缺失列，直接給出相應用法，第一行代碼表示刪除列B，第二行代碼表示同時刪除列AB

# 用均值插補缺失值 df_mean = df.fillna(df['data'].mean()) # 用中位數插補缺失值 df_median = df.fillna(df['data'].median()) # 用眾數插補缺失值 df_mode = df.fillna(df['data'].mode()[0]) # 標記缺失值的位置 missing_indices = df[df['data'].isna()].index # 可視化均值、中位數和眾數插補 plt.figure(figsize=(15, 5)) plt.plot(df_mean['data'], marker='o', linestyle='-', label='均值插補', color='green', alpha=0.8) plt.plot(df_median['data'], marker='o', linestyle='-', label='中位數插補', color='orange', alpha=0.6) plt.plot(df_mode['data'], marker='o', linestyle='-', label='眾數插補', color='red', alpha=0.4) # 標注插值的數據點 plt.scatter(missing_indices, df_mean.loc[missing_indices], color='green', edgecolors='black', zorder=5, s=100, marker='s', label='均值插補點') plt.scatter(missing_indices, df_median.loc[missing_indices], color='orange', edgecolors='black', zorder=5, s=100, marker='^', label='中位數插補點') plt.scatter(missing_indices, df_mode.loc[missing_indices], color='red', edgecolors='black', zorder=5, s=100, marker='v', label='眾數插補點') plt.title('均值/中位數/眾數插補缺失值') plt.xlabel('索引', fontsize=10) plt.ylabel('數值', fontsize=10) plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.show()

使用均值、中位數和眾數三種方法對數據框 df 中的缺失值進行插補，并將插補后的數據可視化展示出來

前向/后向填充

# 前向填充 df_ffill = df.ffill() # 后向填充 df_bfill = df.bfill() # 可視化前向填充和后向填充 plt.figure(figsize=(15, 5)) plt.plot(df_ffill['data'], marker='o', linestyle='-', label='前向填充', color='green', alpha=0.8) plt.plot(df_bfill['data'], marker='o', linestyle='-', label='后向填充', color='orange', alpha=0.6) # 標注插值的數據點 plt.scatter(missing_indices, df_ffill.loc[missing_indices], color='green', edgecolors='black', zorder=5, s=100, marker='s', label='前向填充點') plt.scatter(missing_indices, df_bfill.loc[missing_indices], color='orange', edgecolors='black', zorder=5, s=100, marker='^', label='后向填充點') # 添加標題和標簽 plt.title('前向/后向填充缺失值') plt.xlabel('索引', fontsize=10) plt.ylabel('數值', fontsize=10) plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.show()

使用前向填充（ffill）和后向填充（bfill）方法來處理數據框 df 中的缺失值

from scipy.interpolate import interp1d # 線性樣條插值 linear_interpolator = interp1d(df.dropna().index, df.dropna()['data'], kind='linear', fill_value="extrapolate") df_linear = df.copy() df_linear['data'] = df_linear['data'].combine_first(pd.Series(linear_interpolator(df.index), index=df.index)) # 二次樣條插值 quadratic_interpolator = interp1d(df.dropna().index, df.dropna()['data'], kind='quadratic', fill_value="extrapolate") df_quadratic = df.copy() df_quadratic['data'] = df_quadratic['data'].combine_first(pd.Series(quadratic_interpolator(df.index), index=df.index)) # 三次樣條插值 cubic_interpolator = interp1d(df.dropna().index, df.dropna()['data'], kind='cubic', fill_value="extrapolate") df_cubic = df.copy() df_cubic['data'] = df_cubic['data'].combine_first(pd.Series(cubic_interpolator(df.index), index=df.index)) # 可視化線性、二次和三次樣條插值 plt.figure(figsize=(15, 5)) plt.plot(df_linear['data'], marker='o', linestyle='-', label='線性樣條插值', color='blue', alpha=0.8) plt.plot(df_quadratic['data'], marker='o', linestyle='-', label='二次樣條插值', color='green', alpha=0.6) plt.plot(df_cubic['data'], marker='o', linestyle='-', label='三次樣條插值', color='red', alpha=0.4) # 標注插值的數據點 plt.scatter(missing_indices, df_linear.loc[missing_indices], color='blue', edgecolors='black', zorder=5, s=100, marker='s', label='線性插值點') plt.scatter(missing_indices, df_quadratic.loc[missing_indices], color='green', edgecolors='black', zorder=5, s=100, marker='^', label='二次插值點') plt.scatter(missing_indices, df_cubic.loc[missing_indices], color='red', edgecolors='black', zorder=5, s=100, marker='v', label='三次插值點') # 添加標題和標簽 plt.title('線性樣條插值/二次樣條插值/三次樣條插值填充缺失值') plt.xlabel('索引', fontsize=10) plt.ylabel('數值', fontsize=10) plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.show()

使用線性、二次和三次樣條插值方法來填充數據框 df 中的缺失值

# 進行多重插補 imputer = IterativeImputer(max_iter=10, random_state=42) df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns) # 可視化原始數據和插補后的數據 plt.figure(figsize=(15, 5)) plt.plot(df_imputed['data'], marker='o', linestyle='-', label='多重插補', color='red', alpha=0.8) # 標注插值的數據點 plt.scatter(missing_indices, df_imputed.loc[missing_indices], color='red', edgecolors='black', zorder=5, s=100, marker='s', label='插補點') # 添加標題和標簽 plt.title('多重插補填充缺失值') plt.xlabel('索引', fontsize=10) plt.ylabel('數值', fontsize=10) plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.show()

使用多重插補（Multiple Imputation）方法來填補數據中的缺失值，多重插補是一種通過生成多個完整的數據集（每個數據集都是通過不同方式填補缺失值得到的）來處理缺失值的方法，這里使用了IterativeImputer類來進行多重插補，這個類在每次迭代中使用回歸模型來估算缺失值，直到收斂或達到最大迭代次數，max_iter=10指定了最大迭代次數為10

from sklearn.impute import KNNImputer from sklearn.ensemble import RandomForestRegressor # 使用KNN插值填充缺失值 knn_imputer = KNNImputer(n_neighbors=30) df_knn_imputed = pd.DataFrame(knn_imputer.fit_transform(df), columns=df.columns) # 使用隨機森林填補缺失值 # 需要將數據分為訓練集和測試集，訓練集不包含缺失值 df_non_missing = df.dropna() X_train = df_non_missing.index.values.reshape(-1, 1) y_train = df_non_missing['data'].values # 創建并訓練隨機森林回歸模型 rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42) rf_regressor.fit(X_train, y_train) # 預測缺失值 X_missing = missing_indices.values.reshape(-1, 1) y_missing_pred = rf_regressor.predict(X_missing) # 將預測的缺失值填充回原始數據框 df_rf_imputed = df.copy() df_rf_imputed.loc[missing_indices, 'data'] = y_missing_pred # 可視化原始數據、KNN插值和隨機森林插值后的數據 plt.figure(figsize=(15, 5)) plt.plot(df_knn_imputed['data'], marker='o', linestyle='-', label='KNN插值', color='blue', alpha=0.8) plt.plot(df_rf_imputed['data'], marker='o', linestyle='-', label='隨機森林插值', color='red', alpha=0.4) # 標注插值的數據點 plt.scatter(missing_indices, df_knn_imputed.loc[missing_indices], color='blue', edgecolors='black', zorder=5, s=100, marker='s', label='KNN插值點') plt.scatter(missing_indices, df_rf_imputed.loc[missing_indices], color='red', edgecolors='black', zorder=5, s=100, marker='^', label='隨機森林插值點') # 添加標題和標簽 plt.title('KNN插值和隨機森林插值填充缺失值') plt.xlabel('索引', fontsize=10) plt.ylabel('數值', fontsize=10) plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.show()