David Benedict Bradshaw, University of Cambridge (U.K.)
Nick G. Kingsbury, University of Cambridge (U.K.)
The technique outlined in this paper extends the ability of current warping motion estimation schemes to allow occlusion and uncovering effects to be modelled. Current methods use a continuous rubber sheet approach, consisting of non-overlapping polygons. The new technique estimates and compensates affine and then translational motion within the scene. The latter is achieved by allowing polygons to overlap through the introduction of rips into the sheet which are located in areas where occlusion or uncovering is thought to occur. Results show a reduction in the prediction error when compared to both traditional block-based methods and recently developed warping schemes (without rips).
Candemir Toklu, University of Rochester (U.S.A.)
Arif Tanju Erdem, University of Rochester (U.S.A.)
Ahmet Murat Tekalp, University of Rochester (U.S.A.)
This paper addresses 2-D mesh-based object tracking and mesh-based object mosaic construction for synthetic transfiguration of deformable video objects with deformable boundaries in the presence of another occluding object and/or self-occlusion. In particular, we update the 2-D triangular mesh model of a video object incrementally to account for the newly uncovered parts of the object as they are detected during the tracking process. Then, the minimum number of reference views (still images of a replacement object) needed to perform the synthetic transfiguration (object replacement and animation) is determined (depending on the complexity of the motion of the object-to-be-replaced), and the transfiguration of the replacement object is accomplished by 2-D mesh-based texture mapping in between these reference views. The proposed method is demonstrated by replacing an orange juice bottle by a cranbery juice bottle in a real video clip.
Holger Eggers, Tech. University Harburg (Germany)
Fabrice Moscheni, Swiss Federal Institute of Technology (Switzerland)
Roberto Castagno, Swiss Federal Institute of Technology (Switzerland)
This paper presents an improved object tracking algorithm in the context of spatio-temporal segmentation. By incorporating invariants for the spatial characterization, the information supplied by the tracking algorithm to the current segmentation is extended from a purely temporal to a more comprehensive spatio--temporal description of the objects in the scene. Thereby, the extraction and the tracking of meaningful objects in video sequences is enhanced. The proposed spatial characterization is shown to be efficiently implementable due to the additivity in feature space of the chosen class of invariants.
Roland Mech, University of Hannover (Germany)
Michael Wollborn, University of Hannover (Germany)
An algorithm for automatic, noise robust segmentation of moving objects in image sequences is presented. Such algorithms are required for object-based coding techniques like the upcoming ISO/MPEG-4 standard. In a first step, a mask of changed image areas is estimated by a local thresholding relaxation technique. Then, areas of uncovered background are removed from this mask, taking into account an estimated displacement vector field. The resulting object mask is finally improved by applying a greylevel edge adaptation and an object mask memory. The algorithm is compared to a global thresholding technique which is known from the literature. Experimental results show the improvement of the estimated object masks.
Jae Gark Choi, ETRI (Korea)
Si-Woong Lee, KAIST (Korea)
Seong-Dae Kim, KAIST (Korea)
The paper presents a morphological spatio-temporal segmentation method, which is based on a new similarity measure. This similarity measure considers jointly spatial and temporal information and consists therefore of two terms. The first term minimizes the displaced frame difference, considering the affine motion model. By the second term the spatial homogenity of the luminance values of every region is maximized. The procedure toward complete segmentation consists of three steps: joint marker extraction, boundary decision, and motion-based region fusion. By incorporating spatial and temporal information simultaneously, we can obtain visually meaningful segmentation results. Simulation results demonstrates the efficiency of the proposed method.
Jeho Nam, University of Minnesota (U.S.A.)
Ahmed H. Tewfik, University of Minnesota (U.S.A.)
We present a new approach to video sequence segmentation into individual shots. Unlike previous approaches, our technique segments the video sequence by combining two streams of information extracted from the visual track with audio track segmentation information. The visual streams of information are computed from the coarse data in a 3-D wavelet decomposition of the video track. They consist of (i) information derived from temporal edges detected along the time evolution of the intensity of each pixel in temporally sub-sampled spatially filtered coarse frames, and (ii) information derived from the coarse spatio-temporal evolution of intra-frame edges in the spatially filtered coarse frames. Our approach is particularly matched to progressively transmitted video.
Paul M. Antoszczyszyn, University of Edinburgh (U.K.)
John M. Hannah, University of Edinburgh (U.K.)
Peter M. Grant, University of Edinburgh (U.K.)
This paper addresses the problem of wire-frame tracking by accurate analysis of the motion and the shape of the facial features in head-and-shoulders scenes. Accurate wire-frame tracking is of paramount importance for correct reconstruction of the encoded image, especially in the areas occupied by the lips and the eyes. An entirely new algorithm for tracking the motion of a semantic wire-frame (Candide) by analysis of the principal components of sub-images containing important facial features of the speaker's face is proposed. This algorithm is suitable for tracking both global motion (motion of the speakers head) and local motion (motion of the facial features). The algorithm was tested on numerous head-and-shoulders sequences with excellent results.
Jörn Ostermann, AT&T Labs - Research (U.S.A.)
In this paper, the motion estimation and motion compensation within a block-based hybrid coder is modified. Input to the motion estimator is the original frame and a representation of the previously decoded frame generated by means of a second prediction loop. The second prediction loop works in parallel to the prediction loop of the decoder. It distinguishes itself from the conventional coder prediction loop such that for blocks with transmitted DCT coefficients, the original image signal is fed into the second prediction loop, and for blocks without transmitted DCT coefficients, the motion compensated signal is fed into the prediction loop. Such, the image generated by the second prediction loop is influenced by motion but not by the quantization noise of the DCT. The motion-compensated prediction image from the second predictor loop can be easily used for control of the encoder. This coder control is not influenced by the actual quantization selected by the encoder and hence very stable for a wide range o
K. Panusopone, University of Texas, Arlington (U.S.A.)
K.R. Rao, University of Texas, Arlington (U.S.A.)
This paper describes a new motion estimation algorithm for block based video compression. Unlike other fast algorithms, the proposed method works efficiently with the adaptive size of data depending on the local information. Motion vector estimated by this mechanism provides a high accuracy close to that of a full search. However, fewer search points are employed in the process leading to a lower complexity. Although the number of searches is greater than that of hierarchical or other fast algorithms, it is less vulnerable to the local minima and also yields an efficient tool for the existing compression standards since it only computes on a fixed block size. The simulation results show the closeness in performance between the new method and the full search BMA.
Fengqi Yu, UCLA (U.S.A.)
Alan N. Willson, UCLA (U.S.A.)
This paper discusses the design of a fast algorithm for motion estimation with emphasis on hardware cost considerations, real-time application, network adaptation, and flexibility. To achieve the best trade-off among hardware cost, computational complexity, and distortion performance, we propose a multi-stage pixel-subsampling motion estimation algorithm. The algorithm has a lower hardware cost than Liu's subsampling algorithm and the three-step hierarchical search algorithm (3SHS) in terms of data flow control, I/O bandwidth, and regularity. Its computational complexity is close to that of 3SHS and its distortion performance, which is better than that of Liu's algorithm and 3SHS, is close to that of full search.
Robert M. Armitano, Georgia Institute of Technology (U.S.A.)
Ronald W. Schafer, Georgia Institute of Technology (U.S.A.)
Frederick L. Kitson, Georgia Institute of Technology (U.S.A.)
Vasudev Bhaskaran, Georgia Institute of Technology (U.S.A.)
Video coding standards use the block-matching algorithm (BMA) and motion-compensated prediction to reduce temporal redundancies present in image sequences. Block matching is used since it is computationally efficient and produces a minimal representation of the motion field that is transmitted as side information. In order to build a robust coder the motion-estimation technique must be able to track motion in a noisy source. The approach presented in this paper uses spatio-temporal motion prediction, providing an accurate motion estimate even in the presence of noise. With this approach, noisy sources can be compressed efficiently and robustly in standard video coders (e.g., MPEG-1, MPEG-2, H.261, and H.263) with little increase in complexity.
Eckehard G. Steinbach, University of Erlangen (Germany)
Bernd Girod, University of Erlangen (Germany)
Subhasis Chaudhuri, IIT Bombay (India)
Given two frames of a dynamic scene with several rigid body objects undergoing different motions in the three-dimensional space, we robustly estimate the motion and structure of each object. The least median of squares (LMedS) estimator is integrated into a robust 3D motion parameter estimation and scene structure recovery framework to deal with the multi-motion problem. Experimental results underline the capability of the approach to deal successfully with multi-component motion. We apply the approach presented in this paper to the problem of automatic insertion of artificial objects in real image sequences.