Chair: Antonio Ortega, University of Southern California, USA
Timothy J Wark, Queensland University of Technology (Australia)
Sridha Sridharan, Queensland University of Technology (Australia)
This paper presents a novel technique for the tracking and extraction of features from lips for the purpose of speaker identification. In noisy or other adverse conditions, identification performance via the speech signal can significantly reduce, hence additional information which can complement the speech signal is of particular interest. In our system, syntactic information is derived from chromatic information in the lip region. A model of the lip contour is formed directly from the syntactic information, with no minimization procedure required to refine estimates. Colour features are then extracted from the lips via profiles taken around the lip contour. Further improvement in lip features is obtained via linear discriminant analysis (LDA). Speaker models are built from the lip features based on the Gaussian Mixture Model (GMM). Identification experiments are performed on the M2VTS database, with encouraging results.
Rakesh Mohan, IBM (U.S.A.)
We present a novel scheme to match a video clip against a large database of videos. Unlike previous schemes that match videos based on image similarity, this scheme matches videos based on similarity of temporal activity, i.e., it finds similar 'actions.' Furthermore, it provides precise temporal localization of the actions in the matched videos. Video sequences are represented as a sequence of feature vectors called fingerprints. The fingerprint of the query video is matched against the fingerprints of videos in a database using sequential matching. The fingerprints are computed directly from compressed MPEG videos. The matching is much faster than real-time. We have used this scheme to find similar actions in sporting events, such as diving and baseball. Keywords: video matching, video search, video databases.
Jeho Nam, University of Minnesota (U.S.A.)
Ahmed H. Tewfik, University of Minnesota (U.S.A.)
We present a novel motion-based video indexing scheme for fast content-based browsing and retrieval in a video database. The proposed technique constructs a dictionary of prototype objects to support query by motion. The first step in our approach extracts moving objects by analyzing layered images constructed from the coarse data in a 3-D wavelet decomposition of the video sequence. These images capture motion information only. Moving objects are modeled as collections of interconnected rigid polygonal shapes in the motion sequences that we derive from the wavelet representation. The motion signatures of the object are computed from the rotational and translational motions associated to the elemental polygons that form the objects. These signatures are finally stored as potential query terms.
Xia Wan, University of Southern California (U.S.A.)
C. C. Jay Kuo, University of Southern California (U.S.A.)
We propose a multiresolution color feature extraction scheme based on octree data structure to achieve efficient and robust image retrieval. With the proposed method, multiple color features, including the dominant color, the number of distinctive colors and the color histogram, can be naturally integrated into one framework. A selective filtering strategy is also described to speed up the retrieval process. Retrieval examples are given to illustrate the performance of the proposed approach.
Zixiang Xiong, University of Hawaii (U.S.A.)
Beong-Jo Kim, Rensselaer Polytechnic Institute (U.S.A.)
William A Pearlman, Rensselaer Polytechnic Institute (U.S.A.)
We address multiresolutional encoding and decoding within the embedded zerotree wavelet (EZW) framework for both images and video. By varying a resolution parameter, one can obtain decoded images at different resolutions from one single encoded bitstream, which is already rate scalable for EZW coders. Similarly one can decode video sequences at different rates and different spatial and temporal resolutions from one bitstream. Furthermore, a layered bitstream can be generated with multiresolutional encoding, from which the higher resolution layers can be used to increase the spatial/temporal resolution of the images/video obtained from the low resolution layer. In other words, we have achieved full scalability in rate and partial scalability in space and time. This added spatial/temporal scalability is significant for emerging multimedia applications such as fast decoding, image/video database browsing, telemedicine, multipoint video conferencing, and distance learning.
Dimitrios Androutsos, University of Toronto (Canada)
Kostas N Plataniotis, Ryerson Polytechnic University (Canada)
Anastasios N Venetsanopoulos, University of Toronto (Canada)
We present a technique for coarsely extracting the regions of natural color images which contain directional detail, e.g., edges, texture, etc., which we then use for image database indexing. As a measure of color activity, we use a perceptually modified distance measure based on the sum-of-angles criterion. We then apply histogram thresholding techniques to separate the image into smooth color regions and busy regions where edge, texture and color activity exists. Database indices are then created from the busy regions using the directional detail histogram technique and retrieval is performed using these.
Edwin A Heredia, Thomson Consumer Electronics (U.S.A.)
With the arrival of terrestrial digital TV, a distribution network able to deliver up to 19 Mbits/s in each of the physical transmission channels will become available. Using the adopted data broadcast protocols, simultaneous transmission of multimedia documents to large population segments can be achieved. While these protocols describe methods for recognizing files in data streams, no method is known yet on how to distribute large collections of files in one or more data streams. This paper addresses this problem. The method proposed in the paper allocates objects in multiple streams according to their sizes and access probabilities, in such a way that average access latency is minimized. We show that the minimization problem can be described as a particular form of the NP hard quadratic allocation model for which an algorithmic solution for finding local minima exists.
Houngjyh Wang, University of Southern California (U.S.A.)
C. C. Jay Kuo, University of Southern California (U.S.A.)
The design of an integrated image coding and watermark system with the wavelet transform is examined in this work. First, the multi-threshold wavelet codec (MTWC) is used to achieve the image compression purpose. Unlike other embedded wavelet coders which use a single initial threshold in their successive approximate quantization (SAQ), MTWC adopts different initial thresholds in different subbands. A superior rate-distortion tradeoff is achieved by MTWC with a low computational complexity. Then, a non-invertible progressive watermark scheme is incorporated in MTWC for copyright protection. This watermark scheme uses the user input data to produce a Gaussian distribution pseudorandom watermark in the wavelet domain. The performance of the proposed watermark technology is supported by experimental results.
Jack B Lacy, AT&T Labs (U.S.A.)
Schuyler R. Quackenbush, AT&T Labs (U.S.A.)
Amy R Reibman, AT&T Labs (U.S.A.)
David H Shur, AT&T Labs (U.S.A.)
James H Snyder, AT&T Labs (U.S.A.)
A watermark is a data stream inserted into multimedia content. It contains information relevant to the ownership or authorized use of the content. A watermark which could be recovered without a priori knowledge of the identifiy of the content could be used by web search mechanisms to flag unauthorized distribution of the content. Since media will be compressed on these sites, a mark detection algorithm that operates in the compressed domain would be useful. We describe in this paper a watermark algorithm which operates in the compressed domain and does not require a reference.
Constantine Kotropoulos, Aristotle University of Thessaloniki (Greece)
Anastasios Tefas, Aristotle University of Thessaloniki (Greece)
Ioannis Pitas, Aristotle University of Thessaloniki (Greece)
Two novel variants of Dynamic Link Architecture that are based on mathematical morphology and incorporate local coefficients which weigh the contribution of each node according to its discriminatory power in elastic graph matching are proposed, namely, the Morphological Dynamic Link Architecture and the Morphological Signal Decomposition-Dynamic Link Architecture. They are tested for face authentication in a cooperative scenario where the candidates claim an identity to be checked. Their performance is evaluated in terms of their receiver operating characteristic and the Equal Error Rate achieved in M2VTS database. An Equal Error Rate of 6.6 % - 6.8 % is reported.