Seam carving

A content-aware method for image resizing

Introduction

Image resizing is one of the most often used operations in modern digital media. However existent methods only take into consideration geometrical constraints (aspect ratio etc) but are oblivious to the content of the image, causing undesirable distortions to the content.

Current approaches

they really only differ in the interpolation method of pixel values from original to target size
- nearest-neighbor
- bilinear interpolation
- bicubic interpolation
- lanczos interpolation
- box sampling
All of them disregard the features of the image

Seam carving is a method that is meant to aid such side effects when resizing images, originally developed by Shai Avidan and Ariel Shamir, published in a paper.

How it works

remove 'seams' of pixels. A seam is a continuous line of pixels, containing one pixel per row (when resizing horizontally) or one pixel per column (when resizing vertically).
find pixels of interest using the canny edge detection method
anything enclosed by an edge is assumed to be an object of interest, and should be avoided when deciding which pixels to remove
use dynamic programming to find seams computationally efficiently

Steps of the algorithm

transform the image to black and white
apply a noise reduction algorithm: use a Gaussian kernel with a radius of 3-5 pixels.
apply the Sobel kernel on the image, calculate the magnitudes and angles of the image gradient for each pixel.
using the magnitude and angle, find pixels whose gradient is a local maximum in its neighborhood.
apply a multi-thresholding algorithm
connect strong edges to weak edges after the thresholding algorithm using a BFS-labeling algorithm.
after each pixel has been assigned a value of interest (high value: probably part of an object, don't touch, low value: can be removed without distorting the content

find 'path of least resistance' (sum of the values assigned to the pixels in that path is minimal)
naive approaches:
- greedy: fast, but does not guarantee optimal solution
- bfs/dfs: yields optimal solution but runtime is O(n^3), where n is the height or width of image.
solution : dynamic programming - for each pixel store the minimum sum of partial path leading up to it and direction to take in order to achieve it, once a minimum value is found in the last row, reconstruct path from directions. Runtime : O(n x m) n: width, m: height of image.
remove seam from image
repeat

Results

Reasonable resizing of images with regard to their content, which can be used in numerous applications such as image resizing, retargeting, content amplification, and object removal

Strengths and weaknesses

The algorithm performs well on images that have a moderate amount of noise i.e. not too many edges. For example:

The algorithm yields subpar results when the image is very noisy, for example

Future improvements

seam insertion: the opposite operation, enlarging an image beyond the original size, without distorting the content
object removal: manually assigning energy values to certain areas in order to favor it for seam removal
video seam removal: changing aspect ratio of video medium without distorting content

Bibliography

http://graphics.cs.cmu.edu/courses/15-463/2012_fall/hw/proj3-seamcarving/imret.pdf

Appendix

using GrayScalePixel = uint8_t;
using GrayScaleImage = cv::Mat_<GrayScalePixel>;
using LabeledImage = cv::Mat_<float>;
using RGBPixel = cv::Point3_<uint8_t>;
using RGBImage = cv::Mat_<RGBPixel>;

using Kernel = cv::Mat_<float>;

struct KernelPair {
	const Kernel horizontal;
	const Kernel vertical;
};

auto getGaussianBlurKernel(size_t radius = 1, float std_dev = 0.0f) -> KernelPair {
	int size = 2 * radius + 1;
	if (std_dev == 0) std_dev = radius / 6.0;
	Kernel horizontal(size, 1), vertical(1, size);
	for (int i = 0; i < size; i++) {
		float val = (1.0f / (sqrt(2 * PI) * std_dev)) * exp2f(-((i - size / 2.0 + 0.5) * (i - size / 2.0 + 0.5)) / 2.0 * std_dev * std_dev);
		horizontal(i, 0) = val;
		vertical(0, i) = val;
	}
	return KernelPair{ horizontal, vertical };
};

Calculation of Gaussian kernel

auto runKernelOnPixel(const GrayScaleImage& img, const Kernel& kernel, int x, int y) -> float {
	float p = 0.0f;
	for (int i = 0; i < kernel.rows; i++) {
		for (int j = 0; j < kernel.cols; j++) {
			int dx = kernel.rows / 2;
			int dy = kernel.cols / 2;
			int row = x + i - dx >= 0 ? (x + i - dx < img.rows ? x + i - dx : x) : x;
			int col = y + j - dy >= 0 ? (y + j - dy < img.cols ? y + j - dy : y) : y;
			p += img(row, col) * (kernel(i, j));
		}
	}
	return p;
}

auto runKernelOnImage(const GrayScaleImage& img, const Kernel& kernel, bool normalize = true) -> cv::Mat_<float> {
	cv::Mat_<float> result(img.rows, img.cols);
	for (int i = 0; i < img.rows; i++) {
		for (int j = 0; j < img.cols; j++) {
			result(i, j) = runKernelOnPixel(img, kernel, i, j);
		}
	}
	if (normalize) return normalizeImage(result, kernel);
	return result;
}

auto runSeparableKernelOnImage(const GrayScaleImage& img, const KernelPair& kernels, bool normalize = true) -> GrayScaleImage {
	return runKernelOnImage(runKernelOnImage(img, kernels.vertical, normalize), kernels.horizontal, normalize);
}

Running kernels on images

auto edgeDetectCanny(const GrayScaleImage& img, float p, float k) -> GrayScaleImage {
	//blur to reduce noise
	GrayScaleImage blurredImage = gaussianBlur(img, 2, 0.5);

	//run the horizontal and vertical kernels on the image
	KernelPair sobelKernel = getSobelKernel();
	auto horizontalGradient = runKernelOnImage(blurredImage, sobelKernel.horizontal, false);
	auto verticalGradient = runKernelOnImage(blurredImage, sobelKernel.vertical, false);

	//calculate the magnitude and direction of the gradient
	cv::Mat_<float> magnitudeImage(img.rows, img.cols);
	cv::Mat_<float> directionImage(img.rows, img.cols);
	for (int i = 0; i < img.rows; i++) {
		for (int j = 0; j < img.cols; j++) {
			magnitudeImage(i, j) = sqrtf(horizontalGradient(i, j) * horizontalGradient(i, j) + verticalGradient(i, j) * verticalGradient(i, j));
			directionImage(i, j) = atan2f(horizontalGradient(i, j), verticalGradient(i, j));
		}
	}

	// non maxima surpression
	std::pair<int, int> offsets[] = { {1, 0}, {1, -1}, {0, -1}, {-1, -1} };

	GrayScaleImage normalizedMagnitudeImage = normalizeFloatImage(magnitudeImage);
	GrayScaleImage intermediate = normalizedMagnitudeImage.clone();
	for (int i = 0; i < img.rows; i++) {
		for (int j = 0; j < img.cols; j++) {
			//map from angle of gradient to an octant 0...3
			double alpha = directionImage(i, j);
			alpha = alpha < 0 ? alpha + 2 * PI : alpha;
			int octant = (int)floor(4 * alpha / PI + 0.5) % 4;
			auto& [dx, dy] = offsets[octant];
			if (isInside(normalizedMagnitudeImage, i + dy, j + dx)) {
				if (normalizedMagnitudeImage(i, j) <= normalizedMagnitudeImage(i + dy, j + dx)) {
					intermediate(i, j) = 0x00;
				}
			}
			if (isInside(normalizedMagnitudeImage, i - dy, j - dx)) {
				if (normalizedMagnitudeImage(i, j) <= normalizedMagnitudeImage(i - dy, j - dx)) {
					intermediate(i, j) = 0x00;
				}
			}
		}
	}

	return intermediate;

}

Canny edge detection

auto getMinimumEnergyPath(const GrayScaleImage& img) -> std::vector<int> {
	cv::Mat_<int> accumulativePaths(img.rows, img.cols);
	cv::Mat_<int> directions(img.rows, img.cols);

	//copy last row
	for (int i = 0; i < img.cols; i++) {
		accumulativePaths(img.rows - 1, i) = (int)img(img.rows - 1, i);
	}
	for (int i = img.rows - 2; i >= 0; i--) {
		for (int j = 0; j < img.cols; j++) {
			int min = INT_MAX;
			int dir = 0;
			for (int dx = -1; dx <= 1; dx++) {
				if (j + dx < 0 || j + dx >= img.cols)
					continue;
				if (accumulativePaths(i + 1, j + dx) < min) {
					min = accumulativePaths(i + 1, j + dx);
					dir = dx;
				}
			}
			accumulativePaths(i, j) = (int)img(i, j) + min;
			directions(i, j) = dir;
		}
	}

	//find best path from first row of pixels
	std::vector<int> path;
	int min = INT_MAX;
	int index = 0;
	for (int i = 0; i < img.cols; i++) {
		if (accumulativePaths(0, i) < min) {
			min = accumulativePaths(0, i);
			index = i;
			
		}
	}

	//rebuild path
	path.push_back(index);
	for (int i = 0; i < img.rows; i++) {
		int current_x = path.back();
		int current_y = i;
		int dir = directions(current_y, current_x);
		int new_x = current_x + dir;
		path.push_back(new_x);
	}
	return path;
}

Finding the best seam to remove