Hans Jung's Blog : Introduction of Statistical Data Mining

Statistical Data Mining
본문은 위의 글을 참조 / 번역하고, 필자가 주석을 단 글입니다.

Tutorial

Classification Algorithm, Regression Algorithm, Data Mining Operation 으로 나뉜다.

Decision Tree 가장 많이 쓰이는 Classification 기법. Information gain 이 추후에 들어오는 데이터를 어떻게 잘 모형화할 수 있는지 설명한다.
Information gain Entropy 이론을 다룬다. Entropy 는 Information gain의 가장 중요한 Measure 로 활용된다.
Probability 기본적인 확률지식을 다룬 이후에, Density estimation 등을 다룬다. 그 후에 Bayes 통계방법론으로 연결된다. 마지막으로, Multivariate density function 으로 연결되는 모형이다.
Gaussian 검색필요
MLE Parameter 를 찾는 Technique를 다룬다.
Cross Validation 기존의 Data를 바탕으로 Model 을 구축했을 때, future unseen data 를 얼마나 잘 설명할 지 말해주는 '설명력'에 관한 Topic 이다.
Neutral Networks 먼저 Linear Regression 부터 시작한다. 이를 통해 SSE 방법을 도출한다. 이 외에 Nonlinear Model 에 대해서도 다룬다.
Regression Algorithm Regression Trees, Cascade Correlation, Group Method Data Handling (GMDH), Multivariate Adaptive Regression Splines (MARS), Multilinear Interpolation, Radial Basis Functions, Robust Regression, Cascade Correlation + Projection Pursuit (뭐지 모르겠다.)
Bayesian Networks 확률모형을 다루고, Joint Distribution 을 다루며, 그것의 Drawback 을 다룬다. 그에 대한 대안으로 Bayesian Statistics 를 소개한다. 이를 이용한 Statistical inference 를 다루기도 한다. A typical use of inference is "I've got a temperature of 101, I'm a 37-year-old Male and my tongue feels kind of funny but I have no headache. What's the chance that I've got bubonic plague?".
Gaussian Mixture Model Density Estimation 을 비롯한, Clustering 에 가장 많이 쓰이는 분야이다. Clustering분야를 설명하고, Expectation Maximization 에 대해서 설명한다.
Markov Model DTMC, CTMC
VC dimension Machine learning 의 기초를 다룬다.
Game Theory Zero-sum Game Theory 를 다룬다. Non zero game theory 를 다룬다.

The elements of statistical learning - Data mining, Inference, and Prediction by Prof. Trevor Hastie, Robert Tibshirani, Jerome Friedman

Supervised learning

Supervised learning 에서는, Input information을 바탕으로 결과를 예측하는 방법을 학습한다.

Overview of supervised learning

Two simple approaches to prediction : Least squares & nearest neighbors
Statistical decision theory
Statistical Model : Joint distributions & Function approximation

Linear model for regression

Linear regression and Least squares
Shrinkage method

Linear method for classification
Basis Expansion and Regularization
Kernel smoothing method
Model Assessment and Selection
Model inference and averaging
Additive models, Tree, and related method
Boosting and additive trees
neural networks
Prototype method and Nearest Neighbors

Unsupervised learning

Unsupervised learning 에서는, 결과를 예측하지 않는다. 대신, Input measure의 패턴과 관계를 파악하는 방법을 학습한다.

Association rules
Cluster analysis
Principal components, curves and surfaces
Matrix factorization

Random forest
Ensemble Learning
Graphical method

Hans Jung's Blog

2014년 1월 15일 수요일

Introduction of Statistical Data Mining

Tutorial

The elements of statistical learning - Data mining, Inference, and Prediction by Prof. Trevor Hastie, Robert Tibshirani, Jerome Friedman

Supervised learning

Unsupervised learning

댓글 없음:

댓글 쓰기