如何在scikit-learn中将标准化应用于SVM？

小编典典

如何在scikit-learn中将标准化应用于SVM？

python

我正在使用scikit-
learn的当前稳定版本0.13。我正在使用class将线性支持向量分类器应用于某些数据sklearn.svm.LinearSVC。

在scikit-learn文档中有关预处理的章节中，我阅读了以下内容：

学习算法的目标函数中使用的许多元素（例如，支持向量机的RBF内核或线性模型的l1和l2正则化器）都假定所有特征都围绕零为中心并且具有相同顺序的方差。如果某个特征的方差比其他特征大几个数量级，则它可能会支配目标函数，并使估计器无法按预期从其他特征中正确学习。

问题1： 标准化对于SVM总体上是否有用，对于像我一样具有线性内核功能的SVM也有用吗？

问题2：
据我所知，我必须计算训练数据的均值和标准差，并使用class将相同的变换应用于测试数据sklearn.preprocessing.StandardScaler。但是，我不明白的是，在将数据输入到SVM分类器之前，是否还必须转换训练数据还是仅转换测试数据。

也就是说，我是否必须这样做：

scaler = StandardScaler()
scaler.fit(X_train)                # only compute mean and std here
X_test = scaler.transform(X_test)  # perform standardization by centering and scaling

clf = LinearSVC()
clf.fit(X_train, y_train)
clf.predict(X_test)

还是我必须这样做：

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)  # compute mean, std and transform training data as well
X_test = scaler.transform(X_test)  # same as above

clf = LinearSVC()
clf.fit(X_train, y_train)
clf.predict(X_test)

简而言之，我是否必须使用scaler.fit(X_train)或使用scaler.fit_transform(X_train)训练数据才能获得合理的结果LinearSVC？

阅读 277

2021-01-20

共1个答案

小编典典

都不行

scaler.transform(X_train)没有任何作用。该transform操作不到位。你所要做的

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

要么

X_train = scaler.fit(X_train).transform(X_train)

您总是需要对训练或测试数据进行相同的预处理。是的，如果标准化能够反映您对数据的信念，那么标准化总是好的。特别是对于内核-svm，这通常至关重要。

2021-01-20