Traditional_machine_learning_tutorial

无监督机器学习

聚类算法（Clustering）

K-Means聚类：随机选择k个初始中心点，将每个样本分配到最近的中心点

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

X,_ = make_blobs(n_samples=300,centers=4,random_state=100)
kmeans = KMeans(n_clusters=4,init='k-means++',n_init=10,max_iter=300,random_state=100)
labels = kmeans.fit_predict(X)

plt.scatter(X[:,0],X[:,1],c=labels,cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],c='red',marker='x')
plt.title('K-means Clustering')
plt.show()

n_clusters 族的数量
init 初始化方法
n_init 运行不同初始化的次数，取最优结果
max_iter 最大迭代次数需要提前指定核心数量，然后进行聚类。

层次聚类（Agglomerative Clustering）

自底向上，每个点初始为一个簇，逐步合并最相似的簇。

无监督学习方法对比

方法	类型	是否需预设簇数	处理噪声	适用数据形状	计算效率
K-Means	聚类	是	否	球形	高
Agglomerative	聚类	是/否（可用阈值）	否	任意（取决于 linkage）	低（大数据）
DBSCAN	聚类	否	是	任意	中
t-SNE	降维	否	—	—	低
Isomap / LLE	降维	否	—	流形结构	中
Isolation Forest	异常检测	需设 contamination	是	—	高
One-Class SVM	异常检测	需设 nu	是	—	中低

This site is open source. Improve this page.