scikit-learn 结合PCA和SVM实现人脸识别

news/2024/9/5 19:29:43
  • 准备数据集

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

import time
import logging
from sklearn.datasets import fetch_olivetti_faces

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')

data_home='datasets/'
logging.info('Start to load dataset')
faces = fetch_olivetti_faces(data_home=data_home)
logging.info('Done with load dataset')

X = faces.data
y = faces.target
targets = np.unique(faces.target)
target_names = np.array(["c%d" % t for t in targets])
n_targets = target_names.shape[0]
n_samples, h, w = faces.images.shape
print('Sample count: {}\nTarget count: {}'.format(n_samples, n_targets))
print('Image size: {}x{}\nDataset shape: {}\n'.format(w, h, X.shape))
  • 数据可视化
def plot_gallery(images, titles, h, w, n_row=2, n_col=5):
    """显示图片阵列"""
    plt.figure(figsize=(2 * n_col, 2.2 * n_row), dpi=144)
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.01)
    for i in range(n_row * n_col):
        plt.subplot(n_row, n_col, i + 1)
        plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
        plt.title(titles[i])
        plt.axis('off')
        
n_row = 2
n_col = 6

sample_images = None
sample_titles = []
for i in range(n_targets):
    people_images = X[y==i]
    people_sample_index = np.random.randint(0, people_images.shape[0], 1)
    people_sample_image = people_images[people_sample_index, :]
    if sample_images is not None:
        sample_images = np.concatenate((sample_images, people_sample_image), axis=0)
    else:
        sample_images = people_sample_image
    sample_titles.append(target_names[i])

plot_gallery(sample_images, sample_titles, h, w, n_row, n_col)

PCA搜索还原率和componment之间的关系

from sklearn.decomposition import PCA

print("Exploring explained variance ratio for dataset ...")
candidate_components = range(10, 300, 30)
explained_ratios = []
start = time.clock()
for c in candidate_components:
    pca = PCA(n_components=c)
    X_pca = pca.fit_transform(X)
    explained_ratios.append(np.sum(pca.explained_variance_ratio_))
print('Done in {0:.2f}s'.format(time.clock()-start))

plt.figure(figsize=(10, 6), dpi=144)
plt.grid()
plt.plot(candidate_components, explained_ratios)
plt.xlabel('Number of PCA Components')
plt.ylabel('Explained Variance Ratio')
plt.title('Explained variance ratio for PCA')
plt.yticks(np.arange(0.5, 1.05, .05))
plt.xticks(np.arange(0, 300, 20))

不同componment下,数据可视化结果

def title_prefix(prefix, title):
    return "{}: {}".format(prefix, title)


n_row = 1
n_col = 5

sample_images = sample_images[0:5]
sample_titles = sample_titles[0:5]

plotting_images = sample_images
plotting_titles = [title_prefix('orig', t) for t in sample_titles]
candidate_components = [140, 75, 37, 19, 8]
for c in candidate_components:
    print("Fitting and projecting on PCA(n_components={}) ...".format(c))
    start = time.clock()
    pca = PCA(n_components=c)
    pca.fit(X)
    X_sample_pca = pca.transform(sample_images)
    X_sample_inv = pca.inverse_transform(X_sample_pca)
    plotting_images = np.concatenate((plotting_images, X_sample_inv), axis=0)
    sample_title_pca = [title_prefix('{}'.format(c), t) for t in sample_titles]
    plotting_titles = np.concatenate((plotting_titles, sample_title_pca), axis=0)
    print("Done in {0:.2f}s".format(time.clock() - start))

print("Plotting sample image with different number of PCA conpoments ...")
plot_gallery(plotting_images, plotting_titles, h, w,
    n_row * (len(candidate_components) + 1), n_col)

svm+pca识别

n_components = 140

print("Fitting PCA by using training data ...")
start = time.clock()
pca = PCA(n_components=n_components, svd_solver='randomized', whiten=True).fit(X_train)
print("Done in {0:.2f}s".format(time.clock() - start))

print("Projecting input data for PCA ...")
start = time.clock()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("Done in {0:.2f}s".format(time.clock() - start))

from sklearn.model_selection import GridSearchCV

print("Searching the best parameters for SVC ...")
param_grid = {'C': [1, 5, 10, 50, 100],
              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01]}
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid, verbose=2, n_jobs=4)
clf = clf.fit(X_train_pca, y_train)
print("Best parameters found by grid search:")
print(clf.best_params_)



start = time.clock()
print("Predict test dataset ...")
y_pred = clf.best_estimator_.predict(X_test_pca)
cm = confusion_matrix(y_test, y_pred, labels=range(n_targets))
print("Done in {0:.2f}.\n".format(time.clock()-start))
print("confusion matrix:")
np.set_printoptions(threshold=np.nan)
print(cm)

print(classification_report(y_test, y_pred, target_names=target_names))

 


http://www.niftyadmin.cn/n/1414396.html

相关文章

Android Studio那些让我感觉到爽的地方

随意切换工程目录的显示结构Project 如果要导入jar包,需要看到libs目录,切换到这个模式 而且在这个目录,可以看到适配的多个文件夹,在Android目录里会将型号标记在XML文件后面。Android 只有java和res目录,看着干净Te…

scikit-learn kmeans实现文本聚类

kmeans 无监督的学习方法。需要根据实际业务需要确定K值。 加载数据集 %matplotlib inline import matplotlib.pyplot as plt import numpy as npfrom time import time from sklearn.datasets import load_filesprint("loading documents ...") t time() docs lo…

【Entity framework】Code First Approach

开篇之前感谢 china_fucan的文章给我的帮助,下面的评论也解决了很多问题同样给予感谢. code first 项目中的ORM框架如果采用的是EF,那么可能会采用code first的方式去使用EF.就是先将数据库的实体类,以及EF的核心DBContext写好之后, 运行程序会通过特定的数据库链接字符串在数据…

算法-高位优先的字符串排序

与之前的低位优先的字符串排序不同,低位优先是从右向左开始排序,高位优先是从左向右开始排序,高位优先排序的过程是字符串切分为独立排序的子数组完成排序任务,切分会为每个首字母得到一个子数组,低位优先排序适用于定…

scikit-learn 支持向量机实现手写体识别

随时代码,阅读笔记 %matplotlib inline import matplotlib.pyplot as plt import numpy as np from sklearn import datasetsdigits datasets.load_digits() # 加载数据# 把数据所代表的图片显示出来 images_and_labels list(zip(digits.images, digits.target)) …

insert()

insert() 用于向列表的指定位置插入元素,如下,表示在索引为1的位置插入元素e In [38]: l [a, b, c]In [39]: l.insert(1, e)In [40]: l Out[40]: [a, e, b, c] 转载于:https://www.cnblogs.com/pzk7788/p/10186564.html

条款19:定义class就相当于定义一个个的内置类型

下面的条框应该是谨记的: 1. 新的type应该如何创建与销毁2. 对象的初始化与赋值应该有什么样的区别3. 新type的对象如果被pass-by-value,有什么影响?4. 什么事新type的合法值5. 新的type需要什么样的转换6. 什么样的操作符和函数对于这个type…

scikit-learn 逻辑回归实现信用卡欺诈检测

读书笔记 import numpy as np import pandas as pd import matplotlib.pyplot as pltdata pd.read_csv(creditcard.csv)#data.head(10)print (data.shape)count_class pd.value_counts(data[Class],sort True).sort_index()print (count_class)from sklearn.preprocessing i…