Video and Multimedia

Home


Audio as a Support to Scene Change Detection and Characterization of Video Sequences

Authors:

Caterina Saraceno, SCL-DEA/UNIBS (Italy)
Riccardo Leonardi, SCL-DEA/UNIBS (Italy)

Volume 4, Page 2597

Abstract:

A challenging problem to construct video databases is the organization of video information. The development of algorithms able to organize video information according to semantic content of the data is getting more and more important. This will allow algorithms such as indexing and retrieval to work more efficiently. Until now, an attempt to extract semantic information has been performed using only video information. As a video sequence is constructed from a 2-D projection of a 3-D scene, video processing has shown its limitations especially in solving problems such as object identification or object tracking, reducing the ability to extract semantic characteristics. A possibility to overcome the problem is to use additional information. The associated audio signal is then the most natural way to obtain this information. This paper will present a technique which combines video and audio information together for classification and indexing purposes. The classification will be performed on the audio signal; a general framework that uses the results of such classification will then be proposed for organizing video information.

ic972597.pdf

ic972597.pdf

TOP



Motion and Shape Signatures for Object-Based Indexing of MPEG-4 Compressed Video

Authors:

Ahmet Müfit Ferman, University of Rochester (U.S.A.)
Bilge Günsel, University of Rochester (U.S.A.)
Ahmet Murat Tekalp, University of Rochester (U.S.A.)

Volume 4, Page 2601

Abstract:

The emerging MPEG-4 standard enables direct access to individual objects in the video stream, along with boundary/shape, texture, and motion information about each object. This paper proposes an object-based video indexing method that is directly applicable to the MPEG-4 compressed video bitstreams. The method aims to provide object-based content-interactivity; thus, defines the audio-visual object as the indexing unit. The scheme involves object-based temporal segmentation of the video bitstream, selection of key-frames and key-video-object-planes, and characterization of the motion and/or shape of each video object including the background object. We also propose syntax and semantics for an indexing field to meet the content-based access requirement of MPEG-4. Experimental results are shown on two MPEG-4 test sequences.

ic972601.pdf

ic972601.pdf

TOP



Using Feature Selection to aid an Iconic Search through an Image Database

Authors:

Kieron Messer, University of Surrey. (U.K.)
Josef Kittler, University of Surrey. (U.K.)

Volume 4, Page 2605

Abstract:

In this paper a method that facilitates an iconic query of an image/video database is presented. A query object is characterised by colour and texture properties. The same characteristics are computed locally for the database images. A statistical decision rule is then used to test for similarity between the iconically specified query and the database image descriptors. We show that by carefully selecting the set of descriptors the false alarm rate can be significantly reduced. The floating search feature selection method has been adapted to make it applicable to the hypothesis testing based query processing. The dimensionality reduction not only improves the performance but also enhances the computational efficiency of the method.

ic972605.pdf

ic972605.pdf

TOP



Hidden Markov Model Parsing of Video Programs

Authors:

Wayne Wolf, Princeton University (U.S.A.)

Volume 4, Page 2609

Abstract:

This paper introduces statistical parsing of video programs using hidden Markov models (HMMs). The fundamental units of a video program are shots and transitions (fades, dissolves, etc.). Those units are in turn used to create more complex structures, such as scenes. Parsing a video allows us to recognize higher-level story abstractions-dialog sequences, transitional scenes, etc. These higher-level story elements can be used to create summarizations of the pro- grams, to recognize the most important parts of a program, and many other purposes. Lexical analysis classifies shots; statistical parsing identifies most-likely state sequences which translate into syntactic struc- tures which correspond to story units.

ic972609.pdf

ic972609.pdf

TOP



A Human-Machine Interface for Medical Image Analysis and Visualization in Virtual Environments

Authors:

Christian Krapichler, Institute of Medical Informatics and Health Services Research (Germany)
Michael Haubner, Institute of Medical Informatics and Health Services Research (Germany)
Andreas Lösch, Institute of Medical Informatics and Health Services Research (Germany)
Karl-Hans Englmeier, Institute of Medical Informatics and Health Services Research (Germany)

Volume 4, Page 2613

Abstract:

Virtual worlds open new dimensions in human-machine and even human-human communication. Medicine is predestined to benefit from this new technology in many ways. For the field of visualization and analysis of tomography data, an application is introduced which expedites identification of spatial coherencies and exploration of pathological regions. To facilitate work in such an environment and to avoid long periods of accustoming, a human-oriented interface is required allowing physicians to interact as close to the real world as possible. Hand gesture recognition (with a data glove) and eye tracking (using biosignals) are essential parts to fulfil this demand. Their integration into the virtual environment as two components of the human-machine interface is presented.

ic972613.pdf

ic972613.pdf

TOP



Gaze Tracking for Multimodal Human-Computer Interaction

Authors:

Rainer Stiefelhagen, University of Karlsruhe (Germany)
Jie Yang, Carnegie Mellon University (U.S.A.)

Volume 4, Page 2617

Abstract:

This paper discusses the problem of gaze tracking and its applications to multimodal human-computer interaction. The function of a gaze tracking system can be either passive or active. For example, a system can identify user's message target by monitoring the user's gaze, or the user could use his gaze to directly control an application or launch actions. We have developed a real-time gaze tracking system that estimates the 3D position and rotation (Pose) of a user's head. We demonstrate the applications of the gaze tracker to human-computer interaction by two examples. The first example shows that gaze tracker can help speech recognition systems by switching language model and grammar based on user's gaze information. The second example illustrates the combination of the gaze tracker and a speech recognizer to view a panorama image.

ic972617.pdf

ic972617.pdf

TOP



Digital Watermarking of MPEG-2 Coded Video in the Bitstream Domain

Authors:

Frank Hartung, University of Erlangen (Germany)
Bernd Girod, University of Erlangen (Germany)

Volume 4, Page 2621

Abstract:

Embedding information into multimedia data, also called watermarking, is a topic that has gained increased attention recently. For video broadcast applications, watermarking schemes operating on compressed video are desirable. We present a scheme for robust watermarking of MPEG-2 encoded video. The watermark is embedded into the MPEG-2 bitstream without increasing the bit- rate, and can be retrieved even from the decoded video and without knowledge of the original, unwatermarked video. The scheme is robust and of much lower complexity than a complete decoding process followed by watermarking in the pixel domain and re-encoding. Although an existing MPEG-2 bitstream is partly altered, the scheme avoids visible artifacts by adding a drift compensation signal. The scheme has been implemented and the results confirm that a robust watermark can be embedded into MPEG encoded video which can be used to securely transmit arbitrary binary information at a data rate of several bytes/second. The scheme is also applicable to other hybrid coding schemes like MPEG-1, H.261, and H.263.

ic972621.pdf

ic972621.pdf

TOP



Error Concealment Improvements for MPEG-2 Using Enhanced Error Detection & Early Re-Synchronization

Authors:

Susanna Aign, DLR (Germany)

Volume 4, Page 2625

Abstract:

For digital TV-transmission the video signal has to be highly protected by channel coding, since it is very sensitive to channel disturbances. However, in the case of bad reception conditions, remaining errors in the video signal may still occur. Hence, error concealment techniques might be required at the receiver. The aim of this article is to study different error concealment techniques for MPEG-2 video sequences by exploiting the remaining error-free part of the bitstream as much as possible. This is done by combining an enhanced error detection of the channel decoder with early re-synchronization. In I-pictures, where no motion vectors (MVs) exist, the gain for early re-synchronization with enhanced error detection compared to other techniques is up to 2.3 dB.

ic972625.pdf

ic972625.pdf

TOP



MPEG-2 nonlinear temporally scalable coding and hybrid quantization

Authors:

Sadik Bayrakeri, Ga. Tech, Atlanta. (U.S.A.)
Russell M. Mersereau, Ga. Tech, Atlanta. (U.S.A.)

Volume 4, Page 2629

Abstract:

In this paper, we investigate the MPEG-2 temporal scalability syntax and introduce a new approach to temporally scalable coding. Temporal scalability is provided by employing various nonlinear prediction and demultiplexing schemes. A nonlinear deinterlacing algorithm is presented and the related issues on interlaced, progressive and mixed mode video processing are addressed. In addition to the considered scalability techniques, a lookahead quantization scheme is presented for P- and B-type picture coding, which improves the coding performance by selective combination of the DCT domain scalar quantization and entropy-constrained vector quantization. Remarkable performance improvement over the simulcast coding is achieved.

ic972629.pdf

ic972629.pdf

TOP



Transcoding Of MPEG-2 Video In The Frequency Domain

Authors:

Pedro A.A. Assunção, University of Essex (U.K.)
Mohammed Ghanbari, University of Essex (U.K.)

Volume 4, Page 2633

Abstract:

Video transcoding techniques offer the possibility of matching coded video to transmission channels of lower capacity by reducing the bit rate of compressed bit streams. In this paper we propose a new frequency domain video transcoder for bit rate reduction of compressed bit streams. A motion compensation (MC) loop, operating in the frequency domain, is used for drift compensation at reduced computational complexity. We derive approximate matrices for fast computation of the MC blocks in the frequency domain. By using the Lagrangian optimisation in calculating the best quantiser scales for transcoding, we show that transcoded pictures from a high quality bit stream are better than those encoded from original frames at the same reduced bit rates.

ic972633.pdf

ic972633.pdf

TOP



Optimisation of Two-Layer SNR Scalability for MPEG-2 Video

Authors:

David Wilson, University of Essex (U.K.)
Mohammed Ghanbari, University of Essex (U.K.)

Volume 4, Page 2637

Abstract:

SNR scalability, used in two-layer video coding, guarantees good base quality pictures at the expense of increased overall bit-rate. By understanding the inherent inefficiencies of enhancement layer coding we have developed an optimisation method called optimal coefficient adjustment in order to reduce overall bit-rates to levels consistent with single-layer operation.

ic972637.pdf

ic972637.pdf

TOP



Non-linear predictive rate control for constant bit rate MPEG video coders

Authors:

Yoo-Sok Saw, University of Edinburgh (U.K.)
Peter M. Grant, University of Edinburgh (U.K.)
John Grant, University of Edinburgh (U.K.)
Bernard Mulgrew, University of Edinburgh (U.K.)

Volume 4, Page 2641

Abstract:

A nonlinear predictive approach has been employed in MPEG (Moving Picture Experts Group) video transmission in order to improve the rate control performance of the video encoder. A nonlinear prediction and quantisation technique has been applied to the video rate control which employs a transmission buffer for constant bit rate video transmission. A radial basis function (RBF) network has been adopted as a video rate estimator to predict the rate value of a picture in advance of encoding. The quantiser control surfaces based on nonlinear equations, which map both estimated and current buffer occupancies to a suitable quantisation step size, have also been used to achieve quicker responses to dramatic video rate variation. This scheme aims to adequately accommodate non-stationary video in the limited capacity of the buffer. Performance has been evaluated in comparison to the MPEG2 Test Model 5 (TM5) in terms of the buffer occupancy and picture quality.

ic972641.pdf

ic972641.pdf

TOP