我正在研究语音数据的异常检测。我用 LSTM 编写的原始代码,但我面临着不平衡的数据集。所以我想从Pyod那里得到一些见解。 在尝试使用 Pyod 采样数据时,我只是将他们的代码复制并粘贴到我的 colab 中,但我遇到了错误,因为“ValueError:’c’ 参数有 1000 个元素,这与大小为 500 的 ‘x’ 和 ‘y’ 不一致。”
import numpy as np import pandas as pd import matplotlib.pyplot as plt from pyod.utils.data import generate_data contamination = 0.1 # percentage of outliers 10% n_train = 500 # number of training points n_test = 500 # number of testing points n_features = 2 # number of features X_train, y_train, X_test, y_test = generate_data( n_train=n_train, n_test=n_test, n_features= n_features, contamination=contamination) # Make the 2d numpy array a pandas dataframe for each manipulation X_train_pd = pd.DataFrame(X_train) # print(X_train_pd) # print(y_train) # Plot plt.scatter(X_train_pd[0], X_train_pd[1], c=y_train, alpha=0.8) plt.title('Scatter plot pythonspot.com') plt.xlabel('x') plt.ylabel('y') plt.show()
似乎 c=y_train 是错误的根源。c 选项用于颜色:您可能需要将 y_train “翻译”为某种形式的颜色格式。只是为了使程序在语法上正确运行(但可能不是您想要的),更改为:
plt.scatter(X_train_pd[0], X_train_pd[1], c=[(1,0,0)]*len(X_train_pd[0]), alpha=0.8)