Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields, such as medicine, remote sensing, precision engineering and scientific research. But despite rapidly growing research interests in learning-based image compression, no published method offers both lossless and near-lossless modes. In this paper, we propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. In the lossless mode, the DLPR coding system first performs lossy compression and then lossless coding of residuals. We solve the joint lossy and residual compression problem in the approach of VAEs, and add autoregressive context modeling of the residuals to enhance lossless compression performance. In the near-lossless mode, we quantize the original residuals to satisfy a given L_infinity error bound, and propose a scalable near-lossless compression scheme that works for variable L_infinity bounds instead of training multiple networks. To expedite the DLPR coding, we increase the degree of algorithm parallelization by a novel design of coding context, and accelerate the entropy coding with adaptive residual interval. Experimental results demonstrate that the DLPR coding system achieves both the state-of-the-art lossless and near-lossless image compression performance with competitive coding speed.
AAAI
Towards End-to-End Image Compression and Analysis with Transformers
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i.e., image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
ACMMM
ChebyLighter: Optimal Curve Estimation for Low-light Image Enhancement
Low-light enhancement aims to recover a high contrast normal light image from a low-light image with bad exposure and low contrast. Inspired by curve adjustment in photo editing software and Chebyshev approximation, this paper presents a novel model for brightening low-light images. The proposed model, ChebyLighter, learns to estimate pixel-wise adjustment curves for a low-light image recurrently to reconstruct an enhanced output. In ChebyLighter, Chebyshev image series are first generated. Then pixel-wise coefficient matrices are estimated with Triple Coefficient Estimation (TCE) modules and the final enhanced image is recurrently reconstructed by Chebyshev Attention Weighted Summation (CAWS). The TCE module is specifically designed based on dual attention mechanism with three necessary inputs. Our method can achieve ideal performance because adjustment curves can be obtained with numerical approximation by our model. With extensive quantitative and qualitative experiments on diverse test images, we demonstrate that the proposed method performs favorably against state-of-the-art low-light image enhancement algorithms.
ACMMM
Multi-Camera Collaborative Depth Prediction via Consistent Structure Estimation
Depth map estimation from images is an important task in robotic systems. Existing methods can be categorized into two groups including multi-view stereo and monocular depth estimation. The former requires cameras to have large overlapping areas and sufficient baseline between cameras, while the latter that processes each image independently can hardly guarantee the structure consistency between cameras. In this paper, we propose a novel multi-camera collaborative depth prediction method that does not require large overlapping areas while maintaining structure consistency between cameras. Specifically, we formulate the depth estimation as a weighted combination of depth basis, in which the weights are updated iteratively by a refinement network driven by the proposed consistency loss. During the iterative update, the results of depth estimation are compared across cameras and the information of overlapping areas is propagated to the whole depth maps with the help of basis formulation. Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
2021
CVPR
Learning scalable \ell_∞-constrained near-lossless image compression via joint lossy image and residual compression
Yuanchao Bai, Xianming Liu, Wangmeng Zuo, Yaowei Wang, Xiangyang Ji
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021
We propose a novel joint lossy image and residual compression framework for learning l_infinity-constrained near-lossless image compression. Specifically, we obtain a lossy reconstruction of the raw image through lossy image compression and uniformly quantize the corresponding residual to satisfy a given tight l_infinity error bound. Suppose that the error bound is zero, ie, lossless image compression, we formulate the joint optimization problem of compressing both the lossy image and the original residual in terms of variational auto-encoders and solve it with end-to-end training. To achieve scalable compression with the error bound larger than zero, we derive the probability model of the quantized residual by quantizing the learned probability model of the original residual, instead of training multiple networks. We further correct the bias of the derived probability model caused by the context mismatch between training and inference. Finally, the quantized residual is encoded according to the bias-corrected probability model and is concatenated with the bitstream of the compressed lossy image. Experimental results demonstrate that our near-lossless codec achieves the state-of-the-art performance for lossless and near-lossless image compression, and achieves competitive PSNR while much smaller l_infinity error compared with lossy image codecs at high bit rates.
2020
TCSVT
Single-Image Blind Deblurring Using Multi-Scale Latent Structure Prior
Blind image deblurring is a challenging problem in computer vision, which aims to restore both the blur kernel and the latent sharp image from only a blurry observation. Inspired by the prevalent self-example prior in image super-resolution, in this paper, we observe that a coarse enough image down-sampled from a blurry observation is approximately a low-resolution version of the latent sharp image. We prove this phenomenon theoretically and define the coarse enough image as a latent structure prior of the unknown sharp image. Starting from this prior, we propose to restore sharp images from the coarsest scale to the finest scale on a blurry image pyramid and progressively update the prior image using the newly restored sharp image. These coarse-to-fine priors are referred to as multi-scale latent structures (MSLSs). Leveraging the MSLS prior, our algorithm comprises two phases: 1) we first preliminarily restore sharp images in the coarse scales and 2) we then apply a refinement process in the finest scale to obtain the final deblurred image. In each scale, to achieve lower computational complexity, we alternately perform a sharp image reconstruction with fast local self-example matching, an accelerated kernel estimation with error compensation, and a fast non-blind image deblurring, instead of computing any computationally expensive non-convex priors. We further extend the proposed algorithm to solve more challenging non-uniform blind image deblurring problem. The extensive experiments demonstrate that our algorithm achieves the competitive results against the state-of-the-art methods with much faster running speed.
TSP
Fast Graph Sampling Set Selection Using Gershgorin Disc Alignment
Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least-squares (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead, we assume a biased graph Laplacian regularization (GLR) based scheme that solves a system of linear equations for reconstruction. We then choose samples to minimize the condition number of the coefficient matrix-specifically, maximize the smallest eigenvalue λmin. Circumventing explicit eigenvalue computation, we maximize instead the lower bound of λmin, designated by the smallest left-end of all Gershgorin discs of the matrix. To achieve this efficiently, we first convert the optimization to a dual problem, where we minimize the number of samples needed to align all Gershgorin disc left-ends at a chosen lower-bound target T. Algebraically, the dual problem amounts to optimizing two disc operations: i) shifting of disc centers due to sampling, and ii) scaling of disc radii due to a similarity transformation of the matrix. We further reinterpret the dual as an intuitive disc coverage problem bearing strong resemblance to the famous NP-hard set cover (SC) problem. The reinterpretation enables us to derive a fast approximation scheme from a known SC error-bounded approximation algorithm. We find an appropriate target T efficiently via binary search. Extensive simulation experiments show that our disc-based sampling algorithm runs substantially faster than existing sampling schemes and outperforms other eigen-decomposition-free sampling schemes in reconstruction error.
TCSVT
Contrast Enhancement via Dual Graph Total Variation-Based Image Decomposition
Xianming Liu, Deming Zhai,
Yuanchao Bai, Xiangyang Ji, Wen Gao
IEEE Transactions on Circuits and Systems for Video Technology, 2020
Images captured in low lighting environment suffer from both low luminance contrast and noise corruption. However, most existing contrast enhancement algorithms only consider contrast boosting, which tends to reveal or amplify noise that is originally not visible in the dark areas. In this paper, we propose a joint contrast enhancement and denoising algorithm, which is based on structure/texture layer decomposition via minimization of dual forms of graph total variation (GTV). Specifically, the structure layer is expected to be generally smoothing but with sharp edges at the foreground background boundaries, for which we propose a quadratic form of GTV (QGTV) as the prior that promotes signal smoothness along graph structure. For the texture layer, a re-weighted GTV (RGTV) is tailored to noise removal while preserving true image details. We provide theoretical analysis about the filtering behavior of these two priors. Furthermore, a boost factor is derived per patch via optimal contrast-tone mapping to improve the overall brightness level of the patch. Finally, an optimization objective function is formulated, which casts image decomposition, brightness boosting, and noise reduction into a unified optimization framework. We further propose a fast approach to efficiently solve the optimization and provide analysis about the convergency. The experimental results show that the proposed method outperforms the state-of-the-art works in subjective, objective, and statistical quality evaluation.
AAAI
FFA-Net: Feature fusion attention network for single image dehazing
In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components: 1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers. The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset. Code has been made available at GitHub.
2019
TIP
Graph-Based Blind Image Deblurring From a Single Photograph
Blind image deblurring, i.e., deblurring without knowledge of the blur kernel, is a highly ill-posed problem. The problem can be solved in two parts: 1) estimate a blur kernel from the blurry image, and 2) given an estimated blur kernel, de-convolve the blurry input to restore the target image. In this paper, we propose a graph-based blind image deblurring algorithm by interpreting an image patch as a signal on a weighted graph. Specifically, we first argue that a skeleton image - a proxy that retains the strong gradients of the target but smooths out the details - can be used to accurately estimate the blur kernel and has a unique bi-modal edge weight distribution. Then, we design a reweighted graph total variation (RGTV) prior that can efficiently promote a bimodal edge weight distribution given a blurry patch. Further, to analyze RGTV in the graph frequency domain, we introduce anew weight function to represent RGTV as a graph l1-Laplacian regularizer. This leads to a graph spectral filtering interpretation of the prior with desirable properties, including robustness to noise and blur, strong piecewise smooth filtering, and sharpness promotion. Minimizing a blind image deblurring objective with RGTV results in a non-convex non-differentiable optimization problem. Leveraging the new graph spectral interpretation for RGTV, we design an efficient algorithm that solves for the skeleton image and the blur kernel alternately. Specifically for Gaussian blur, we propose a further speedup strategy for blind Gaussian deblurring using accelerated graph spectral filtering. Finally, with the computed blur kernel, recent non-blind image deblurring algorithms can be applied to restore the target image. Experimental results demonstrate that our algorithm successfully restores latent sharp images and outperforms the state-of-the-art methods quantitatively and qualitatively.
ICASSP
Reconstruction-cognizant Graph Sampling Using Gershgorin Disc Alignment
Graph sampling with noise is a fundamental problem in graph signal processing (GSP). Previous works assume an unbiased least square (LS) signal reconstruction scheme and select samples greedily via expensive extreme eigenvector computation. A popular biased scheme using graph Laplacian regularization (GLR) solves a system of linear equations for its reconstruction. Assuming this GLR-based scheme, we propose a reconstruction-cognizant sampling strategy to maximize the numerical stability of the linear system-i.e., minimize the condition number of the coefficient matrix. Specifically, we maximize the eigenvalue lower bounds of the matrix, represented by left-ends of Gershgorin discs of the coefficient matrix. To accomplish this efficiently, we propose an iterative algorithm to traverse the graph nodes via Breadth First Search (BFS) and align the left-ends of all corresponding Gershgorin discs at lower-bound threshold T using two basic operations: disc shifting and scaling. We then perform binary search to maximize T given a sample budget K. Experiments on real graph data show that the proposed algorithm can effectively promote large eigenvalue lower bounds, and the reconstruction MSE is the same or smaller than existing sampling methods for different budget K at much lower complexity.
2018
ICASSP
Blind Image Deblurring Via Reweighted Graph Total Variation
Yuanchao Bai, Gene Cheung, Xianming Liu, Wen Gao
IEEE International Conference on Acoustics, Speech and Signal Processing, 2018
Blind image deblurring, i.e., deblurring without knowledge of the blur kernel, is a highly ill-posed problem. The problem can be solved in two parts: i) estimate a blur kernel from the blurry image, and ii) given estimated blur kernel, de-convolve blurry input to restore the target image. In this paper, by interpreting an image patch as a signal on a weighted graph, we first argue that a skeleton image-a proxy that retains the strong gradients of the target but smooths out the details-can be used to accurately estimate the blur kernel and has a unique bi-modal edge weight distribution. We then design a reweighted graph total variation (RGTV) prior that can efficiently promote bi-modal edge weight distribution given a blurry patch. However, minimizing a blind image deblurring objective with RGTV results in a non-convex non-differentiable optimization problem. We propose a fast algorithm that solves for the skeleton image and the blur kernel alternately. Finally with the computed blur kernel, recent non-blind image deblurring algorithms can be applied to restore the target image. Experimental results show that our algorithm can robustly estimate the blur kernel with large kernel size, and the reconstructed sharp image is competitive against the state-of-the-art methods.
FCCM
FPGA-Based Real-Time Super-Resolution System for Ultra High Definition Videos
Zhuolun He, Hanxian Huang, Ming Jiang,
Yuanchao Bai, Guojie Luo
IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines, 2018
The market benefits from a barrage of Ultra High Definition (Ultra-HD) displays, yet most extant cameras are barely equipped with Full-HD video capturing. In order to upgrade existing videos without extra storage costs, we propose an FPGA-based super-resolution system that enables real-time Ultra-HD upscaling in high quality. Our super-resolution system crops each frame into blocks, measures their total variation values, and dispatches them accordingly to a neural network or an interpolation module for upscaling. This approach balances the FPGA resource utilization, the attainable frame rate, and the image quality. Evaluations demonstrate that the proposed system achieves superior performance in both throughput and reconstruction quality, comparing to current approaches.
AO
High cost-efficient and computational gigapixel video camera based on commercial lenses and CMOS chips
Heng Mao, Jie He, Jiazhi Zhang,
Yuanchao Bai, Muyue Zhai, Haiwen Li, Xiange Wen, Rui Chen, Huizhu Jia, Louis Tao, Ming Jiang
The state-of-the-art commercial telephoto lens has already provided us almost one giga space-bandwidth product. Since the single-image sensor cannot take such sampling capacity, we implement a four-parallel-boresight imaging system by using four such lenses and use 64 image sensors to complete full field of view (FOV) imaging for achieving 0.8 gigapixel over 15.6°×10.5°. Multiple sensors mosaicking can make most online computation and data transfer in parallel, and help us to realize a gigapixel video camera. Meanwhile, according to the four-parallel-boresight configuration, the flat image plane simplifies the image registration and image stitching, and allows us to keep high imaging performance in full frame following geometric and optical calibration and correction. Furthermore, considering that working distance changes do bring additional x/y offsets between sensor arrays, we propose a computation-based method and introduce an eight-axis automatic motion mechanism into the system to perform the online active displacement. Our prototype camera using 16 sensors has been validated in 50 m indoor conditions and 145 m outdoor condition experiments, respectively. The effective angular resolution under the 0.2 giga 24 Hz video output is 18 μrad, which is two times the lens instantaneous FOV. Compared with other gigapixel cameras, it is superior in terms of optical system simplicity and cost efficiency, which would potentially benefit numerous unmanned aerial vehicle photogrammetric applications that pursue high angular resolution over moderate FOV.
ICME
Robust Contrast Enhancement via Graph-Based Cartoon-Texture Decomposition
In this paper, we propose a robust contrast enhancement algorithm based on cartoon and texture layer decomposition. Specifically, the cartoon layer is expected to be generally smoothing but with sharp edges at the foreground and background boundaries, for which we propose a quadratic form of graph total variation (GTV) as the prior to promote signal smoothness along graph structure. For the texture layer, a re-weighted GTV is tailored to remove noises while preserving true image details. Finally, an optimization objective function is formulated, which casts image decomposition, contrast enhancement and noise reduction into a unified framework. We propose an efficient algorithm to solve it. Experimental results show that our generated images outperform state-of-the-art schemes noticeably in subjective quality evaluation.
2017
ICSI
Computational Calibration and Correction for Gigapixel Imaging System
Jiazhi Zhang, Jie He, Haiwen Li,
Yuanchao Bai, Huizhu Jia, Louis Tao, Heng Mao
International Conference on Sensing and Imaging, 2017
Large field of view (FOV) imaging with high spatial resolution has been increasingly required for numerous applications in recent years. Obviously, conventional photosensitive detector with tens of megapixels cannot satisfy the requirement. As a result, gigapixel cameras based on the multi-aperture imaging have become a possible solution to overcome the above limitation. In this paper, we developed an alternative gigapixel imaging system which implements the multiple CMOS chips mosaic in the external optical path and presented the computation methods for calibrating the vignetting distributions and other geometric parameters in the system. Consequently, our gigapixel imaging system has achieved the performance of 24 Hz, 0.2Giga, single-pixel resolution.
2015
VCIP
A fast super-resolution method based on sparsity properties
Super-resolution enhancement is a kind of promising approach to enhance the spatial resolution of images. To super-resolve a satisfying result, regularization term design and blur kernel estimation are two important aspects which need to be carefully considered. In this paper, we propose a robust regularized super-resolution reconstruction approach based on two sparsity properties to deal with these two aspects. Firstly, we design a sparse reweighted TV L1 prior to restrict the first derivative of the upsampled image. Then, noticing that only deblurring sparse high gradient areas can sharpen the super-resolution result, we design an over-deblurring control method to decrease the artifacts caused by inaccurate blur kernel estimation. We also design a fast optimization algorithm to solve our model. The experimental results show that the proposed approach achieves a remarkable performance both in visual quality and run time.
2014
PCM
A Multi-exposure Fusion Method Based on Locality Properties
A new method is proposed for fusing a multi-exposure sequence of images into a high quality image based on the locality properties of the sequence. We divide the images into uniform blocks and use variance to represent the information of blocks. The richest information (RI) image is computed by piecing together blocks with largest variances. We assume that images in the sequence are high dimensional data points lying on the same neighbourhood and borrow the idea from locally linear embedding (LLE) to fuse a result image which is closest to the RI image. The result is comparable to the state-of-art tone mapping operators and other exposure fusion methods.
VCIP
Layer-based image completion by poisson surface reconstruction
Image completion has been widely used to repair damaged regions of a given digital image in a visually plausible way. However, it is difficult to infer appropriate information, meanwhile keep globally coherent just from the origin image when its critical parts are missing. To address this problem, we propose a novel layer-divided image completion scheme, which contains two major steps. First, we extract foregrounds of both target image and source image, and then we apply a guided Poisson surface reconstruction technique to complete the target foreground according to parameters obtained from optimal-matching calculation. Second, to fill the remaining damaged part, a related exemplar-based image completion algorithm is further devised. Several experiments and comparisons show the effectiveness and robustness of our proposed algorithm.
PCM
An Adaptive Perceptual Quantization Algorithm Based on Block-Level JND for Video Coding
It has been widely demonstrated that integrating efficient perceptual measures into traditional video coding framework can improve subjective coding performance significantly. In this paper, we propose a novel block-level JND (just-noticeable-distortion) model, which has not only adjusted pixel-level JND thresholds with more block characteristics, but also integrated them into a block-level model. And the model has been applied for adaptive perceptual quantization for video coding. Experimental results show that our model can save bit rates up to 24.5% on average with negligible degradation of the perceptual quality.
2013
An efficient multi-path self-organizing strategy in internet of things