Activiteit

  • Hardin Kara heeft een update geplaatst 1 week, 2 dagen geleden

    The reconstruction of a high resolution image given a low resolution observation is an ill-posed inverse problem in imaging. Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a highresolution output. Unlike existing deep multimodal models that do not incorporate domain knowledge about the problem, we propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture. Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information; therefore, the proposed neural network is interpretable by design. The deep unfolding architecture is used as a core component of a multimodal framework for guided image super-resolution. Tipranavir molecular weight An alternative multimodal design is investigated by employing residual learning to improve the training efficiency. The presented multimodal approach is applied to super-resolution of near-infrared and multi-spectral images as well as depth upsampling using RGB images as side information. Experimental results show that our model outperforms state-ofthe-art methods.This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods.This paper proposes a novel bi-directional motion compensation framework that extracts existing motion information associated with the reference frames and interpolates an additional reference frame candidate that is co-located with the current frame. The approach generates a dense motion field by performing optical flow estimation, so as to capture complex motion between the reference frames without recourse to additional side information. The estimated optical flow is then complemented by transmission of offset motion vectors to correct for possible deviation from the linearity assumption in the interpolation. Various optimization schemes specifically tailored to the video coding framework are presented to further improve the performance. To accommodate applications where decoder complexity is a cardinal concern, a block-constrained speed-up algorithm is also proposed. Experimental results show that the main approach and optimization methods yield significant coding gains across a diverse set of video sequences. Further experiments focus on the trade-off between performance and complexity, and demonstrate that the proposed speed-up algorithm offers complexity reduction by a large factor while maintaining most of the performance gains.Collaborative filters perform denoising through transform-domain shrinkage of a group of similar patches extracted from an image. Existing collaborative filters of stationary correlated noise have all used simple approximations of the transform noise power spectrum adopted from methods which do not employ patch grouping and instead operate on a single patch. We note the inaccuracies of these approximations and introduce a method for the exact computation of the noise power spectrum. Unlike earlier methods, the calculated noise variances are exact even when noise in one patch is correlated with noise in any of the other patches. We discuss the adoption of the exact noise power spectrum within shrinkage, in similarity testing (patch matching), and in aggregation. We also introduce effective approximations of the spectrum for faster computation. Extensive experiments support the proposed method over earlier crude approximations used by image denoising filters such as Block-Matching and 3D-filtering (BM3D), demonstrating dramatic improvement in many challenging conditions.We introduce BSD-GAN, a novel multi-branch and scale-disentangled training method which enables unconditional Generative Adversarial Networks (GANs) to learn image representations at multiple scales, benefiting a wide range of generation and editing tasks. The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features. Specifically, each noise vector, as input to the generator network of BSD-GAN, is deliberately split into several sub-vectors, each corresponding to, and is trained to learn, image representations at a particular scale. During training, we progressively “de-freeze” the sub-vectors, one at a time, as a new set of higher-resolution images is employed for training and more network layers are added. A consequence of such an explicit sub-vector designation is that we can directly manipulate and even combine latent (sub-vector) codes which model different feature scales. Extensive experiments demonstrate the effectiveness of our training method in scale-disentangled learning of image representations and synthesis of novel image contents, without any extra labels and without compromising quality of the synthesized high-resolution images. We further demonstrate several image generation and manipulation applications enabled or improved by BSD-GAN.In this paper, we present a novel end-to-end learning neural network, i.e., MATNet, for zero-shot video object segmentation (ZVOS). Motivated by the human visual attention behavior, MATNet leverages motion cues as a bottom-up signal to guide the perception of object appearance. To achieve this, an asymmetric attention block, named Motion-Attentive Transition (MAT), is proposed within a two-stream encoder network to firstly identify moving regions and then attend appearance learning to capture the full extent of objects. Putting MATs in different convolutional layers, our encoder becomes deeply interleaved, allowing for close hierarchical interactions between object apperance and motion. Such a biologically-inspired design is proven to be superb to conventional two-stream structures, which treat motion and appearance independently in separate streams and often suffer severe overfitting to object appearance. Moreover, we introduce a bridge network to modulate multi-scale spatiotemporal features into more compact, discriminative and scale-sensitive representations, which are subsequently fed into a boundary-aware decoder network to produce accurate segmentation with crisp boundaries.

Deel via Whatsapp