A grand tool box for vision people
chap 2
Linear Regression with Maximum likelihood estimation
Normal distribution
Student T distribution : SGD(Stochastic Gradient Descent), EM(with Gaussian Scale Mixture)
Lapalce distribution: Linear Programming, EM(with Gaussian Scale Mixture), or Huber Loss function
Censor regression (Kevin: not a big deal, since it only move the line a slightly up, why there are hundreds paper on it?)
chap 3
Logistic Regression
Objective: convex
Parameter Estimation: no close form solution
1. Newton method(IRLS)
2. minfunc in Matlab
always get optimal solution
Multidim Regression: no big deal
Probit Regression: convex objective, use Gradient Descent(minfunc) or EM(slow) to fit it.
chap 4
Model Selection
1. Baysian Approach: P(D|M). Average the all possible theta to protect from overfitting. (need concrete example)
2. BIC approximation. dof(M) can be estimated by minimum encoding of the model( information theory). Good if there are many models and there are some ways to get dof(m) from anther model dof(m')
3. cross validation: not suitable when there are many candidate models, takes too much time
L2 regularization:
QR
SVD
Gradient
L1 regularization(Lasso):
Problem: Laplace is not differentiable at origin
Sol: soft threshold to the point near origin
problem of sol: not a unbiased estimator anymore
sol of above: reestimate the nonzero w with Least Square( a unbiased estimator)
Linear Programming(not editor of choice)
LARS
SCAD( not editor of choice): just a adhoc approach. cannot be put into baysian framework
NEG: best but slow
chap5
Neural networks
Non-convex
Cascade linear and non-linear model( it has to be, or the different layer will collapse and become single linear layer)
Use gradient descent to do estimation( back propagation algorithm)
chap 12
Generative model
PI->Yi->Xi
Discriminant Analysis
p(x,y) pic here
Discriminative method(logistic regression)
p(x|y) pic here
chap 13
Feature selection:
Forward Feature Selection:
Greedy put one feature in( editor of choice, simple and better than stochastic approach like genetic algorithm, simulated annealing, ...)
More prior:
Normal Gamma: more spiky at origin, and flatter tail than Laplace
chap 14
Mixture Model
PI->Zi->Xi
Different from chap 12, since Zi is hidden(need to be inferred from EM), but Yi is given.
Mar 2, 2010
Machine Learning Summary
Posted by Lono at 13:14
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment