Mar 2, 2010

Machine Learning Summary

A grand tool box for vision people

chap 2
Linear Regression with Maximum likelihood estimation
Normal distribution
Student T distribution : SGD(Stochastic Gradient Descent), EM(with Gaussian Scale Mixture)
Lapalce distribution: Linear Programming, EM(with Gaussian Scale Mixture), or Huber Loss function

Censor regression (Kevin: not a big deal, since it only move the line a slightly up, why there are hundreds paper on it?)

chap 3
Logistic Regression
Objective: convex
Parameter Estimation: no close form solution
1. Newton method(IRLS)
2. minfunc in Matlab

always get optimal solution

Multidim Regression: no big deal
Probit Regression: convex objective, use Gradient Descent(minfunc) or EM(slow) to fit it.

chap 4
Model Selection
1. Baysian Approach: P(D|M). Average the all possible theta to protect from overfitting. (need concrete example)
2. BIC approximation. dof(M) can be estimated by minimum encoding of the model( information theory). Good if there are many models and there are some ways to get dof(m) from anther model dof(m')
3. cross validation: not suitable when there are many candidate models, takes too much time

L2 regularization:

L1 regularization(Lasso):
Problem: Laplace is not differentiable at origin
Sol: soft threshold to the point near origin
problem of sol: not a unbiased estimator anymore
sol of above: reestimate the nonzero w with Least Square( a unbiased estimator)

Linear Programming(not editor of choice)
SCAD( not editor of choice): just a adhoc approach. cannot be put into baysian framework
NEG: best but slow

Neural networks
Cascade linear and non-linear model( it has to be, or the different layer will collapse and become single linear layer)
Use gradient descent to do estimation( back propagation algorithm)

chap 12
Generative model

Discriminant Analysis
p(x,y) pic here

Discriminative method(logistic regression)
p(x|y) pic here

chap 13
Feature selection:
Forward Feature Selection:
Greedy put one feature in( editor of choice, simple and better than stochastic approach like genetic algorithm, simulated annealing, ...)

More prior:
Normal Gamma: more spiky at origin, and flatter tail than Laplace

chap 14
Mixture Model
Different from chap 12, since Zi is hidden(need to be inferred from EM), but Yi is given.