A grand tool box for vision people

chap 2

Linear Regression with Maximum likelihood estimation

Normal distribution

Student T distribution : SGD(Stochastic Gradient Descent), EM(with Gaussian Scale Mixture)

Lapalce distribution: Linear Programming, EM(with Gaussian Scale Mixture), or Huber Loss function

Censor regression (Kevin: not a big deal, since it only move the line a slightly up, why there are hundreds paper on it?)

chap 3

Logistic Regression

Objective: convex

Parameter Estimation: no close form solution

1. Newton method(IRLS)

2. minfunc in Matlab

always get optimal solution

Multidim Regression: no big deal

Probit Regression: convex objective, use Gradient Descent(minfunc) or EM(slow) to fit it.

chap 4

Model Selection

1. Baysian Approach: P(D|M). Average the all possible theta to protect from overfitting. (need concrete example)

2. BIC approximation. dof(M) can be estimated by minimum encoding of the model( information theory). Good if there are many models and there are some ways to get dof(m) from anther model dof(m')

3. cross validation: not suitable when there are many candidate models, takes too much time

L2 regularization:

QR

SVD

Gradient

L1 regularization(Lasso):

Problem: Laplace is not differentiable at origin

Sol: soft threshold to the point near origin

problem of sol: not a unbiased estimator anymore

sol of above: reestimate the nonzero w with Least Square( a unbiased estimator)

Linear Programming(not editor of choice)

LARS

SCAD( not editor of choice): just a adhoc approach. cannot be put into baysian framework

NEG: best but slow

chap5

Neural networks

Non-convex

Cascade linear and non-linear model( it has to be, or the different layer will collapse and become single linear layer)

Use gradient descent to do estimation( back propagation algorithm)

chap 12

Generative model

PI->Yi->Xi

Discriminant Analysis

p(x,y) pic here

Discriminative method(logistic regression)

p(x|y) pic here

chap 13

Feature selection:

Forward Feature Selection:

Greedy put one feature in( editor of choice, simple and better than stochastic approach like genetic algorithm, simulated annealing, ...)

More prior:

Normal Gamma: more spiky at origin, and flatter tail than Laplace

chap 14

Mixture Model

PI->Zi->Xi

Different from chap 12, since Zi is hidden(need to be inferred from EM), but Yi is given.

## Mar 2, 2010

### Machine Learning Summary

Posted by Lono at 13:14

Subscribe to:
Post Comments (Atom)

## 0 comments:

Post a Comment