Seam Carving
Content aware resizing.
Intuition: preserve the most interesting content (remove pixels with low gradient energy). To reduce or increase the size in one dimension, remove irregular shaped "seams". (Optimal solution via dp).
E n e r g y ( f ) = ( ∂ f ∂ x ) 2 + ( ∂ f ∂ y ) 2 Energy(f) = \sqrt{(\frac{\partial f}{\partial x})^2 + (\frac{\partial f}{\partial y})^2} E n er g y ( f ) = ( ∂ x ∂ f ) 2 + ( ∂ y ∂ f ) 2
Want to remove seam where they won't be very noticeable, measure energy as gradient magnitude.
Choose seam based on minimum total energy path across image, subject to 8-connectedness.
Algorithm:
Let vertical seam s s s consist of h h h positions that form an 8-connected path.
Let the cost of seam be C o s t ( s ) = ∑ i = 1 h E n e r g y ( f ( s i ) ) Cost(s) = \sum_{i=1}^h Energy(f(s_i)) C os t ( s ) = ∑ i = 1 h E n er g y ( f ( s i ))
Optimal seam minimize the cost: s ∗ = min s C o s t ( s ) s^* = \min_{s} Cost(s) s ∗ = min s C os t ( s )
Compute efficiently with DP.
Identify min cost seam (height h h h , width w w w ):
Greedy
For each entry ( i , j ) (i, j) ( i , j ) :
M ( i , j ) = E n e r g y ( i , j ) + min ( M ( i − 1 , j − 1 ) , M ( i − 1 , j ) , M ( i − 1 , j + 1 ) ) M(i,j) = Energy(i,j) + \min (M(i-1,j-1), M(i-1,j), M(i-1,j+1)) M ( i , j ) = E n er g y ( i , j ) + min ( M ( i − 1 , j − 1 ) , M ( i − 1 , j ) , M ( i − 1 , j + 1 )) Min value in the last row of M M M indicates the end of minimal connected vertical seam. Backtrack up, selecting min of 3 above in M M M .
Can also insert seams to increase size of imgae in either dimension. (duplicate optimal seam, average with neighbor)
M is min of gradient, high gradient part - energy - edge or important area (seam)
Fit the parameters of transformation according to a set of matching feature pairs (alignment problem)
Parametric Warping
p = ( x , y ) p = (x,y) p = ( x , y ) -> T T T -> p ′ = ( x ′ , y ′ ) p^\prime = (x^\prime , y^\prime) p ′ = ( x ′ , y ′ )
Transformation T T T is a coordinate-changing machine.
p ′ = T ( p ) p^\prime = T(p) p ′ = T ( p )
T is global means is the same for any point p p p and can be described by just a few numbers.
Matrix representation of T:
p ′ = M p p^\prime = Mp p ′ = Mp
[ x ′ y ′ ] = M [ x y ] \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} =M \begin{bmatrix} x \\ y\end{bmatrix} [ x ′ y ′ ] = M [ x y ] Scaling
Scaling a coordinate means multiplying each of its components by a scalar.
Uniform scaling means this scalar is same for all components.
Non-uniform scaling: different scalars per component.
E.g:
x ′ = a x x^\prime = ax x ′ = a x , y ′ = b y y^\prime = by y ′ = b y
Matrix:
[ x ′ y ′ ] = [ a 0 0 b ] [ x y ] \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} =\begin{bmatrix} a&0\\ 0&b\end{bmatrix} \begin{bmatrix} x \\ y\end{bmatrix} [ x ′ y ′ ] = [ a 0 0 b ] [ x y ]
Transformations represented by matrix
Only linear 2D can be represented by 2x2 matrix.
Homogenous Coordinates
convenient.
( x , y ) ⟹ [ x y 1 ] (x, y) \implies \begin{bmatrix} x \\ y \\ 1\end{bmatrix} ( x , y ) ⟹ x y 1 convert to homogenous coordinates.
[ x y w ] ⟹ ( x w , y w ) \begin{bmatrix}x\\ y\\ w \end{bmatrix} \implies (\frac{x}{w} , \frac{y}{w}) x y w ⟹ ( w x , w y ) convert from homogenous coordinates.
How to represent 2d translation as 3x3 matrix using homogenous coordinates.
x ′ = x + t x y ′ = y + t y x^\prime = x + t_x \\ y^\prime = y + t_y x ′ = x + t x y ′ = y + t y
Using rightmost column:
T r a n s l a t i o n = [ 1 0 t x 0 1 t y 0 0 1 ] Translation = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} T r an s l a t i o n = 1 0 0 0 1 0 t x t y 1
[ x ′ y ′ 1 ] = [ 1 0 t x 0 1 t y 0 0 1 ] [ x y 1 ] = [ x + t x y + t y 1 ] \begin{bmatrix} x^\prime \\ y^\prime \\ 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x+t_x \\ y + t_y \\ 1 \end{bmatrix} x ′ y ′ 1 = 1 0 0 0 1 0 t x t y 1 x y 1 = x + t x y + t y 1
Basic 3x3 transformations Affine Transformations - combinations of linear transformations and translations.
E.g parallel lines remain parallel.
[ x ′ y ′ w ′ ] = [ a b c d e f 0 0 1 ] [ x y w ] \begin{bmatrix} x^\prime \\ y^\prime \\ w^\prime \end{bmatrix} =\begin{bmatrix} a&b &c\\ d&e &f \\ 0&0&1 \end{bmatrix} \begin{bmatrix} x \\ y \\w \end{bmatrix} x ′ y ′ w ′ = a d 0 b e 0 c f 1 x y w
Projective transformations: affine and projective warps. Parallel lines does not necessarily remain paralle.
[ x ′ y ′ w ′ ] = [ a b c d e f g h i ] [ x y w ] \begin{bmatrix} x^\prime \\ y^\prime \\ w^\prime \end{bmatrix} =\begin{bmatrix} a&b &c\\ d&e &f \\ g&h&i \end{bmatrix} \begin{bmatrix} x \\ y \\w \end{bmatrix} x ′ y ′ w ′ = a d g b e h c f i x y w
Other def:
Mosaic: obtain wider angle view by combining multiple images.
Image warping: image plane in front -> image plane below, image rectification.
Deep Learning Fundamentals
Linear Classifier
NN
Prametric Approach. f ( x , W ) = W x + b f(x,W) = Wx + b f ( x , W ) = W x + b
Linear Classifier Example Hard if data not linearly separable.
Now - we need a loss function and optimization.
Loss Function
Give a dataset of examples { ( x i , y i ) } i = 1 N \{ (x_i, y_i) \}_{i=1}^N {( x i , y i ) } i = 1 N , where x i x_i x i is image and y i y_i y i is label.
Loss over dataset: L = 1 N ∑ i L i ( f ( x i , W ) , y i ) L = \frac{1}{N}\sum_i L_i(f(x_i, W), y_i) L = N 1 ∑ i L i ( f ( x i , W ) , y i )
Multiclass SVM loss: score of correct class should be higher than that of any other class (by some margin)
Let score vector be s = f ( x i , W ) s = f(x_i, W) s = f ( x i , W )
SVM loss: L i = ∑ j ≠ y i { 0 if s y i ≥ s j + 1 s j − s y i + 1 otherwise L_i = \sum_{j \neq y_i} \begin{cases} 0 & \text{if} s_{yi} \geq s_j + 1 \\ s_j - s_{yi} + 1 & \text{otherwise} \end{cases} L i = ∑ j = y i { 0 s j − s y i + 1 if s y i ≥ s j + 1 otherwise
Can be simplified to: L i = ∑ j ≠ y i max ( 0 , s j − s y i + 1 ) L_i = \sum_{j\neq y_i} \max (0, s_j - s_{yi} + 1) L i = ∑ j = y i max ( 0 , s j − s y i + 1 )
Over full dataset: L = 1 N ∑ i = 1 N L i L = \frac{1}{N} \sum_{i=1}^N L_i L = N 1 ∑ i = 1 N L i
Regularization
prevent model from doing too well on training data.
L ( W ) = 1 N ∑ i L i ( f ( x i , W ) , y i ) + λ R ( W ) L(W) = \frac{1}{N}\sum_i L_i(f(x_i, W), y_i) + \lambda R(W) L ( W ) = N 1 ∑ i L i ( f ( x i , W ) , y i ) + λ R ( W )
λ \lambda λ : regularization strength (param)
L2: R ( W ) = ∑ k ∑ l W k , l 2 R(W) = \sum_k \sum_l W_{k,l}^2 R ( W ) = ∑ k ∑ l W k , l 2
L1: R ( W ) = ∑ k ∑ l ∣ W k , l ∣ R(W) = \sum_k \sum_l |W_{k,l} | R ( W ) = ∑ k ∑ l ∣ W k , l ∣
Elastic Net: R ( W ) = ∑ k ∑ l β W k , l 2 + ∣ W k , l ∣ R(W) = \sum_k \sum_l \beta W_{k,l}^2 + |W_{k,l}| R ( W ) = ∑ k ∑ l β W k , l 2 + ∣ W k , l ∣
Dropout, Batch Normalization, Stochastic depth, fractional pooling, etc.