Authors:
Kunio Kashino,
Gavin Smith,
Hiroshi Murase,
Page (NA) Paper number 1792
Abstract:
This paper proposes a search method that can quickly detect and locate
known sound (video) in a long audio (video) stream. The method is based
on active search. Active search reduces the number of candidate matches
between reference and input signals by approximately 10 to 100 times
compared to exhaustive search, while guaranteeing the same retrieval
accuracy. We proposed a quick search method in our previous paper,
and here we focus on improvement of the accuracy. Thus the feature
used has been extended to the audio power spectrum and temporal division
of the histogram windows has been introduced to incorporate time information.
Tests carried out under practical circumstances clearly show the accuracy
improvement. The proposed method is still so fast that it can correctly
retrieve a 15-s commercial in a 6-h recording of TV broadcasting within
2 s, once the features are calculated.
Authors:
Stefan Eickeler,
Stefan Müller,
Page (NA) Paper number 1683
Abstract:
This paper presents a new approach to content-based video indexing
using Hidden Markov Models (HMMs). In this approach one feature vector
is calculated for each image of the video sequence. These feature vectors
are modeled and classified using HMMs. This approach has many advantages
compared to other video indexing approaches. The system has automatic
learning capabilities. It is trained by presenting manually indexed
video sequences. To improve the system we use a video model, that allows
the classification of complex video sequences. The presented approach
works three times faster than real-time. We tested our system on TV
broadcast news. The rate of 97.3% correctly classified frames shows
the efficiency of our system.
Authors:
Tong Zhang,
C.-C. Jay Kuo,
Page (NA) Paper number 1600
Abstract:
A hierarchical system for audio classification and retrieval based
on audio content analysis is presented in this paper. The system consists
of three stages. The first stage is called the coarse-level audio segmentation
and classification, where audio recordings are segmented and classified
into speech, music, several types of environmental sounds, and silence,
based on morphological and statistical analysis of temporal curves
of short-time features of audio signals. In the second stage, environmental
sounds are further classified into finer classes such as applause,
rain, birds' sound, etc. This fine-level classification is based on
time-frequency analysis of audio signals and use of the hidden Markov
model (HMM) for classification. In the third stage, the query-by-example
audio retrieval is implemented where similar sounds can be found according
to an input sample audio. It is shown that the proposed system has
achieved an accuracy higher than 90% for coarse-level audio classification.
Examples of audio fine classification and audio retrieval are also
provided.
Authors:
Yasuyuki Nakajima,
Yang Lu,
Masaru Sugano,
Akio Yoneyama,
Hiromasa Yanagihara,
Akira Kurematsu,
Page (NA) Paper number 2299
Abstract:
Audio information classification becomes a very important task for
such purposes as automatic keyword spotting and other content-based
audio-visual query system. In this paper, we describe a fast and accurate
audio data classification method on MPEG coded data domain. Firstly
silent segments are detected using a robust approach for different
recording conditions. Then the non-silent segments are classified into
three types, music, speech, and applause using temporal density, bandwidth
and center frequency of subband energy. In order to be robust for a
variety of audio sources as much as possible, we use Bayes discriminant
function for multivariate Gaussian distribution instead of manually
adjusting a threshold for each discriminator. In the experiment, every
one-second MPEG audio data is classified and about 90% of audio and
speech segments have been successfully detected. As for the detection
speed, less than 20% of MPEG audio decoding processing power is required.
Authors:
Jose A Lay,
Ling Guan,
Page (NA) Paper number 1221
Abstract:
With the increasing popularity of the use of compressed images, an
intuitive approach for lowering computational complexity towards a
practically efficient image retrieval system is to propose a scheme
that is able to perform retrieval computation directly in the compressed
domain. In this paper, we investigate the use of energy histograms
of the low frequency DCT coefficients as features for the retrieval
of DCT compressed images. We propose a feature set that is able to
identify similarities on changes of image-representation due to several
lossless DCT transformations. We then use the features to construct
an image retrieval system based on the real-time image retrieval model.
We observe that the proposed features are sufficient for performing
high level retrieval on medium size image databases. And by introducing
transpositional symmetry, the features can be brought to accommodate
several lossless DCT transformations such as horizontal and vertical
mirroring, rotating, transposing, and transversing.
Authors:
Yu-Len Huang, Department of Computer Science and Information Engineering, National Chung Cheng University, Taiwan, R.O.C. (Taiwan)
Ruey-Feng Chang, Department of Computer Science and Information Engineering, National Chung Cheng University, Taiwan, R.O.C. (Taiwan)
Page (NA) Paper number 1241
Abstract:
The multiresolution wavelet transform has been shown to be an effective
technique and achieved very good performance for texture analysis.
However, a large number of images are compressed by the methods based
on discrete cosine transform (DCT). Hence, the image decompression
of inverse DCT is needed to obtain the texture features based on the
wavelet transform for the DCT-coded image. This paper proposes the
use of the multiresolution reordered features for texture analysis.
The proposed features are directly generated by using the DCT coefficients
from the DCT-coded image. Comparisons with the subband-energy features
extracted from the wavelet transform, conventional DCT using the Brodatz
texture database indicate that the proposed method provides the best
texture pattern retrieval accuracy and obtains much better correct
classification rate. The proposed DCT based features are expected to
be very useful and efficient for texture pattern retrieval and classification
in large DCT-coded image databases. The detail simulation results can
be found in web page: http://www.cs.ccu.edu.tw/~hyl/mrdct/.
Authors:
Yining Deng,
B. S. Manjunath,
Page (NA) Paper number 2075
Abstract:
In this work, an efficient low-dimensional color indexing scheme for
region-based image retrieval is presented. The colors in each image
region are first quantized so that only a small number of cluster centroids
are needed to represent the region color information. The proposed
color feature descriptor consists of these quantized colors and their
percentages in the region. A similarity distance measure is defined
and shown to be equivalent to the quadratic color histogram distance
measure. The quantized colors are indexed in the 3-D color space so
that high-dimensional indexing can be avoided. During the search process,
each quantized color in the query is used as a separate cue to find
matches containing that color. The matches from all the query colors
are then joined to obtain the final retrievals. Experimental results
show that the proposed scheme is fast and accurate compared to the
color histogram approach.
Authors:
Elif Albuz,
Erturk D Kocalar,
Ashfaq A Khokhar,
Page (NA) Paper number 2208
Abstract:
This paper presents an efficient content based indexing and retrieval
mechanism based on vector wavelet coefficients of color images. We
use highly decorrelated wavelet coefficient planes to acquire a search
efficient feature space. The feature space is subsequently indexed
using properties of the all the images in the database. Therefore the
feature key of an image does not only correspond to the content of
the image itself but also how much the image is different from the
other images being stored in the database. The search time depends
only on the number of images similar to the query image but not on
the size of the entire database. The system is scalable and provides
fast retrievals. We show that in a database of 1000 images, query search
takes less than 50 msec, on a 266 MHz Pentium processor compared to
several seconds of retrieval time in the earlier systems proposed in
the literature.
|