chap7_Introduction_to_Anomaly_Detection_using_Machine_Learning

1470 days ago by takepwave

Hiroshi TAKEMOTO (take@pwv.co.jp)

入門機械学習による異常検出

井出 剛著の「入門機械学習による異常検出」(以降、井出本と記す)の例題をSageを使ってお復習いします。

7章 時系列データの異常検知

この章でのポイントは、まだ勉強中

準備

いつものように必要なライブラリを読み込みます。

# RとPandasで必要なユーティリティ # Rの必要なライブラリ r('library(ggplot2)') r('library(jsonlite)') # RUtilにRとPandasのデータフレームを相互に変換する関数を追加+GgSaveを追加(2015/07/12) load(DATA + 'RUtil.py') 
       
# FNNライブラリをロード r('library(FNN)') 
       
 [1] "FNN"       "jsonlite"  "ggplot2"   "stats"     "graphics" 
"grDevices" "utils"     "datasets" 
 [9] "methods"   "base"     
 [1] "FNN"       "jsonlite"  "ggplot2"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
 [9] "methods"   "base"     
# Rの例題をそのまま実行 r('dt <- read.table(file="%s")' %(DATA+"qtdbsel102.txt")) r('w <- 100; nk <- 1') r('Xtr <- dt[1:3000, 2]; Dtr <- embed(Xtr, w)') r('X <- dt[3001:6000, 2]; D <- embed(X, w)') r('d <- knnx.dist(Dtr, D, k=nk); a <- d[,1]') # グラフにプロット graph = preGraph("fig7.1.pdf") r('plot(a, ylab="anomaly score", type="l")') postGraph(graph) 
       
# embedのデータの生成方法を調べる Dtr = sageobj(r('Dtr')) Dtr 
       
2901 x 100 dense matrix over Real Double Field (use the '.str()' method
to see the entries)
2901 x 100 dense matrix over Real Double Field (use the '.str()' method to see the entries)
# embedは、100番目から降順にサンプリングしている! Dtr[0] 
       
(4.83, 4.84, 4.855, 4.84, 4.83, 4.83, 4.845, 4.84, 4.83, 4.83, 4.845,
4.845, 4.845, 4.83, 4.85, 4.86, 4.85, 4.845, 4.86, 4.87, 4.86, 4.86,
4.875, 4.88, 4.87, 4.87, 4.89, 4.89, 4.885, 4.885, 4.885, 4.895, 4.89,
4.885, 4.88, 4.88, 4.87, 4.84, 4.835, 4.835, 4.815, 4.805, 4.785, 4.785,
4.77, 4.75, 4.73, 4.73, 4.73, 4.705, 4.695, 4.675, 4.68, 4.66, 4.65,
4.65, 4.66, 4.65, 4.635, 4.625, 4.65, 4.635, 4.625, 4.625, 4.645, 4.655,
4.64, 4.645, 4.665, 4.675, 4.65, 4.655, 4.675, 4.68, 4.66, 4.655, 4.675,
4.675, 4.66, 4.65, 4.66, 4.655, 4.635, 4.635, 4.645, 4.665, 4.68, 4.67,
4.67, 4.675, 4.685, 4.68, 4.675, 4.685, 4.695, 4.71, 4.75, 4.805, 4.82,
4.77)
(4.83, 4.84, 4.855, 4.84, 4.83, 4.83, 4.845, 4.84, 4.83, 4.83, 4.845, 4.845, 4.845, 4.83, 4.85, 4.86, 4.85, 4.845, 4.86, 4.87, 4.86, 4.86, 4.875, 4.88, 4.87, 4.87, 4.89, 4.89, 4.885, 4.885, 4.885, 4.895, 4.89, 4.885, 4.88, 4.88, 4.87, 4.84, 4.835, 4.835, 4.815, 4.805, 4.785, 4.785, 4.77, 4.75, 4.73, 4.73, 4.73, 4.705, 4.695, 4.675, 4.68, 4.66, 4.65, 4.65, 4.66, 4.65, 4.635, 4.625, 4.65, 4.635, 4.625, 4.625, 4.645, 4.655, 4.64, 4.645, 4.665, 4.675, 4.65, 4.655, 4.675, 4.68, 4.66, 4.655, 4.675, 4.675, 4.66, 4.65, 4.66, 4.655, 4.635, 4.635, 4.645, 4.665, 4.68, 4.67, 4.67, 4.675, 4.685, 4.68, 4.675, 4.685, 4.695, 4.71, 4.75, 4.805, 4.82, 4.77)
# Dtr[1]は、101番目から降順に100個をサンプリングしている Dtr[1] 
       
(4.835, 4.83, 4.84, 4.855, 4.84, 4.83, 4.83, 4.845, 4.84, 4.83, 4.83,
4.845, 4.845, 4.845, 4.83, 4.85, 4.86, 4.85, 4.845, 4.86, 4.87, 4.86,
4.86, 4.875, 4.88, 4.87, 4.87, 4.89, 4.89, 4.885, 4.885, 4.885, 4.895,
4.89, 4.885, 4.88, 4.88, 4.87, 4.84, 4.835, 4.835, 4.815, 4.805, 4.785,
4.785, 4.77, 4.75, 4.73, 4.73, 4.73, 4.705, 4.695, 4.675, 4.68, 4.66,
4.65, 4.65, 4.66, 4.65, 4.635, 4.625, 4.65, 4.635, 4.625, 4.625, 4.645,
4.655, 4.64, 4.645, 4.665, 4.675, 4.65, 4.655, 4.675, 4.68, 4.66, 4.655,
4.675, 4.675, 4.66, 4.65, 4.66, 4.655, 4.635, 4.635, 4.645, 4.665, 4.68,
4.67, 4.67, 4.675, 4.685, 4.68, 4.675, 4.685, 4.695, 4.71, 4.75, 4.805,
4.82)
(4.835, 4.83, 4.84, 4.855, 4.84, 4.83, 4.83, 4.845, 4.84, 4.83, 4.83, 4.845, 4.845, 4.845, 4.83, 4.85, 4.86, 4.85, 4.845, 4.86, 4.87, 4.86, 4.86, 4.875, 4.88, 4.87, 4.87, 4.89, 4.89, 4.885, 4.885, 4.885, 4.895, 4.89, 4.885, 4.88, 4.88, 4.87, 4.84, 4.835, 4.835, 4.815, 4.805, 4.785, 4.785, 4.77, 4.75, 4.73, 4.73, 4.73, 4.705, 4.695, 4.675, 4.68, 4.66, 4.65, 4.65, 4.66, 4.65, 4.635, 4.625, 4.65, 4.635, 4.625, 4.625, 4.645, 4.655, 4.64, 4.645, 4.665, 4.675, 4.65, 4.655, 4.675, 4.68, 4.66, 4.655, 4.675, 4.675, 4.66, 4.65, 4.66, 4.655, 4.635, 4.635, 4.645, 4.665, 4.68, 4.67, 4.67, 4.675, 4.685, 4.68, 4.675, 4.685, 4.695, 4.71, 4.75, 4.805, 4.82)
# Sageとsklearnを使って同様の問題を解く # トレーニングで学習したnbrsを作成 from sklearn.neighbors import NearestNeighbors nbrs = NearestNeighbors(n_neighbors=1).fit(Dtr) 
       
# 検証用データでの距離を求める D = sageobj(r('D')) distances, indices = nbrs.kneighbors(D) 
       
# グラフに表示 list_plot(distances, plotjoined =True, figsize=5) 
       
# 検証用に使用した心電図のデータ X = sageobj(r('X')) list_plot(X, plotjoined =True, figsize=5)