1 何为岭回归?
2 何为岭迹分析?
1)定义
2)作用
3 如何基于岭回归选择变量?
选择变量的原则:
4 python实现
from sklearn.linear_model import Ridge
import pandas as pd
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame(
{'y': [15, 10, 8, 9, 3],
'x1': [1, 1, 2, 3, 4],
'x2': [2, 2, 3, 2, 1],
'x3': [4, 6, 7, 8, 9],
'x4': [4, 3, 4, 5, 4]},
)
result = smf.ols('y~x1+x2+x3+x4',data = df).fit()
VIFlist = []
for i in range(1, 3, 1):
vif = variance_inflation_factor(result.model.exog, i)
VIFlist.append(vif)
print(pd.Series(VIFlist))
eps = list(np.random.randn(5))
y = -1.1584 + 0.0547 * df['x1'] + 0.1341 * df['x2'] -0.0548 * df['x3']-0.0320* df['x4'] + eps
df['y'] = y
dfnorm = (df - df.mean()) / df.std()
Xnorm = dfnorm.iloc[:, 1:]
ynorm = df.iloc[:, 0]
clf = Ridge()
coefs = []
errors = []
alphas = np.linspace(0.1, 30, 2000)
for a in alphas:
clf.set_params(alpha=a)
clf.fit(dfnorm, ynorm)
coefs.append(clf.coef_)
plt.subplot(111)
ax = plt.gca()
ax.plot(alphas, coefs, label=list(Xnorm.keys()))
ax.legend(list(Xnorm.keys()), loc='best')
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('Ridge coefficients as a function of the regularization')
plt.axis('tight')
plt.show()
从图中可以看出,alpha(k)越小时,系数很不稳定,随着k的增大,系数趋于稳定,当k趋近于无穷时,系数趋近于0。
参考:
https:///DL11007/article/details/129198295