There are many diverse areas where CBIR can play a key role in the use of
images[1]:
Art galleries and museum management
Architectural and engineering design
Interior design
Remote sensing and natural resource management
Geographic information systems
Scientific database management
Weather forecasting
Retailing
Fabric and fashion design
Trademark and copyright database management
Law enforcement and criminal investigation
Picture archiving and communication systems
Education
Entertainment
With so many applications, CBIR has attracted the attention of researchers
across several disciplines.
Content-based image retrieval is the semantic analysis and retrieval of images. Semantic analysis may involve manual intervention, or it may be entirely automated. Manual analysis involves human interpretation to associate semantic properties with an image. Automated semantic analysis extracts image features that are correlated with some semantic meaning of the image. Both analysis methods have their advantages and their drawbacks.
When the data are images rather than text, summarizing the data with labels becomes considerably more difficult. For example, consider a repository of news photographs. A user may wish to pose a query such as
Give me all new photographs containing a US President and a communist leader.To support queries like this, images require labeling that indicates the people in the images, their title, their nationality, and their political alignment.
It is not known how humans can process electromagnetic signals and convert
them into highly detailed semantic interpretations. Therefore, human analysis
is required to generate labels that support sophisticated queries like the
one above. But there are problems with human analysis:
Despite its appeal, automated image analysis suffers drawbacks. The primary problem with statistical analysis is that extracted features can only support a very specific type of query. The features apply to a particular domain, but they are not useful for posing general purpose queries against diverse data sets.
Consider an image database indexed by color histogram. For each image, a feature vector is generated such that each element of the vector represents the percentage of a color quantum found in the image. A three element vector could have quantums representing red, green, and blue (in practice a color feature vector requires more than three elements). The feature vector for an image contains the quantized percentage of red, green, and blue. The more quantums available, the greater the accuracy of the feature vector and the greater the cost of indexing and comparison.
If the database contained fabric images, a color histogram would be a powerful way to pose a query. A user interested designing a men's casual shirt for spring wants bright, spring-like colors. The query is posed with the desired color mix, and all fabrics containing similar mixes of the specified colors are retrieved. On the other hand, if the database contained news photographs as described earlier, then color histograms would not be very useful. The semantics of the images in the database do not correlate well with color histograms.
The delineation between primitive and logical features is not always clear. Consider an image which is a 2D representation of a 3D scene containing several objects. Features representing the objects might be either primitive or logical features. If the extraction generates a feature containing edge information, then it is a primitive feature. On the other hand, if the extraction identifies the object by name, say by utilizing a model-based approach, it is a logical feature.
Primitive features are often used as the basis for generating logical features. A common CBIR system architecture layers logical feature extraction on top of primitive featue extraction. Primitive features are extracted directly from the image to generate a segemented image. From this information, more abstract, logical features are generated[6]. Segementation is the process of dividing the image into regions that correspond to structural units of interest[10].
Since in content based visual databases, all items (images or objects) are represented by pre-computed visual features, the key attribute for each image will be a feature vector which corresponds to a point in a multi-dimensional feature space; and search will be based on similarities between the feature vectors. Therefore, to achieve a fast and effective retrieval...requires an efficient multi-dimensional indexing scheme[11].Multiple indexing schemes may be required to support queries involving a combination of features. To utilize multiple indexes, a hierarchical approach is often used where each component of a query is applied against an appropriate index. A higher layer merges results for presentation to the user.
CBIR queries are posed in a fuzzy fashion. The user is typically interested in results according to similarity rather than equality. This requirement influences the indexing scheme, the methods of feature comparison, and the means by which queries are solicited from the user.
Image similarity is usually determined by computing a distance measure between the query and the appropriate feature vectors in the index structure. Similar images are ranked according to distance. Thresholding may be used to reduce the number of similar images presented to the user.
A query is created by composing primitive and logical feature vectors. To present
a simple and structured query environment, CBIR systems define query classes.
Some typical query classes
are[1]:
An alternative to user-composed queries are queries by example. The user submits a query in the form of a prototype image and the system uses the feature vector(s) of the appropriate query class(es). Often a session will begin with user-composed queries which are then refined through query by example.
To run an interactive query on a system called Query by Image Content (QBIC),
click
here.
The CORE system[6], is a retrieval engine that supports a wide range of features including visual browsing, color similarity measures, and text. Primitive features are combined to create higer level, logical features they call concepts.
The QBIC system[3][12] extracts features that support image query classes for color, texture, shape, sketching, location, and text. The system also supports a set of video oriented query classes.
For color histograms, many different extraction methods are used. The first issue is the dimension of the color feature vector, e.g., the number of colors. Typical numbers range from 64 to 256 dimensions (256 being the number of unique colors representable with one byte). The higher the dimension of the feature vector, the greater its capacity.
The values in each bin of a color histogram are usually either the total number of pixels, or the percentage of pixels for the given color in the entire image.
The use of multiple query classes to compose a query interests researchers. Although many systems claim to support composite queries, few of the papers explained how to combine query classes successfully.
Every distance metric has advantages and drawbacks. For example,
Stricker[7] analyzes two common distance metrics, the L1 and
L2 (euclidean) norms. The L1 norm computes the distance d
between two n element color histograms H and I as:
And the L2 norm is computed as:
Stricker states that "Using the L1-metric results in false negatives, i.e., not all the images with similar color composition are retrieved because the L1-metric does not take color similarity into account. Using a metric similar to the L2-metric results in false positives, i.e., histograms with many non-zero bins are close to any other histogram and thus are retrieved always."
The QBIC system[12] uses a 64 or 256 dimension color histogram where each i-th element is the percentage of color i. The distance between histogram r and database image histogram q is computed as (r - q)T A(r - q). Where T is the transpose operator. The locations a(i,j) in A contain the distance between color i and color j.
IBM's Ultimedia Manager[13] uses a 64-dimensional vector of color percentages. Each dimension represents a range in color space. At analysis time the color of each pixel is quantized into one of the 64 ranges based on its location in RGB space.
By using similarity metrics only for initial indexing, Pickard and Minka[5] avoid the problem of combining query classes with different metrics. Similarity is encoded into clusters of image regions in a tree structure and distance is measured by ancestral distance between clusters. This has the effect of normalizing different query classes so they can be treated identically.
The importance of extensibility was recognized by Wu, et. al.[6] when they developed CORE based on a generic framework for a multimedia DBMS. To demonstrate the flexibility of the architecture, they developed two different applications: a computer-aided facial image inference and retrieval system (CAFIIR), and a trademark archival and registration system (STAR). A medical information system is currently in development. In their conclusion, they state: "Object orientation is very important for a retrieval engine like CORE. One advantage...is an increase in the reusability of codes...to increase its reusability and extensibility."
Wu, et. al.[6], apply fuzzy reasoning to queries where the logical features of the query are only partially defined by the user. They also used a learning based on experiences neural network model to generate self-organizing nodes in a content-based index tree. This allows their system to fuse composite feature measures to support complex and fuzzy queries.
Designers could compose queries using primitive and logical features and
specify constraints according to design principles. Results
of a series of queries, for wallpaper, paint, and carpet, must be
self-consistent according to designer-specified rules of form, space, color,
and style. These requirements led my interest toward the following query
classes:
The StorageManager class, stormgr.h, provides member functions to access the image database. A client uses store() to associate the pixmap in file with the image object and store the image in the database. This function is used by the AnalysisManager to process a user request to add an image to the database. It also attaches the pixmap to the image which allows the AnalysisManager to perform feature extractions on the image.
Clients can request a single image using getImage(), or the entire image database list using getImageList(). The loadDB() function is used to initialize the StorageManager object.
The current implementation of the StorageManager maintains an unordered list of Image objects representing the entire database. In a real system, the StorageManager would maintain multiple indices used to retrieve images. The Image objects would be at the leaves of the index, e.g., in the case of a tree-based index structure.
For efficiency, this implementation only loads the pixmap data for an image when necessary. The Image object determines when to load the data; the task of loading is delegated to the StorageManager.
The Image class, image.h, encapsulates the details of the pixmap implementation and feature vectors. The system manipulates Image objects for convenience. The Pixmap class provides an interface to a color pixel representation of an image (pixmap.h). This class allows the implementation of the on-screen image to change without affecting the rest of the code.
The Features class, features.h, encapsulates a set of feature vectors. This class allows new feature vectors to be added as new query classes are added.
Two changes would be required in a production system to efficiently manage memory. First, the current implementation only loads pixmaps on demand and never invalidates them. Invalidating pixmaps would conserve large amounts of physical memory. A typical pixmap requires 512 x 512 = 262144 bytes of memory just to store the image data, assuming 256 colors. Also a pixmap has a color table and other overhead making for a total of about 264000 bytes.
A second memory saver would be the use of thumbnail versions of the pixmaps. Thumbnails are smaller representations, usually 64 x 64 = 4096 bytes, which are used for display. The current system only requires the entire image during feature extraction. Support for thumbnails would either require a time tradeoff to reduce a large pixmap when it is loaded into memory, or a disk space tradeoff if the thumbnails were generated at load time and stored in the database.
When a user adds a new image to the system, the user interface calls the analyze() member function. This function uses the StorageManager to create a new Image which it then analyzes by calling extract() on each installed feature extractor. New feature extractor objects are added by calling addFeature() when the system is initialized.
The FeatureExtractor abstract class provides an interface definition to be used by all feature extractors (features.h). Concrete feature extractor classes inherit the interface and provide implementations appropriate for the needs of the feature vectors they generate. A feature extractor object is responsible for analyzing the image, creating a FeatureVector, and installing the FeatureVector in the Feature object of the Image.
The FeatureExtractor classes play an important role: they encapsulate all important information for a particular query class. When adding a new query class, most of the effort is in creating a new subclass of FeatureExtractor. Other classes can get query class specific information from feature extractors. For example, the QueryManager calls the similarityFunc() when comparing two images.
The AnalysisManager, feature extractors, Features, and the FeatureVector have an important relationship. At initialization time, all the feature extractors are installed into the singleton instance of the AnalysisManager (init.cpp). The AnalysisManager tells each feature extractor its position in the set of feature vectors stored in the Features objects. It also tells the Features class how many feature vectors every Features object must be capable of storing. Every feature extractor implementation knows how large to make the data portion of its FeatureVector. By encapsulating knowledge in this fashion, new features can easily be added.
Similarity functions are static member functions of the derived feature extractor class. This is necessary when passing a member function pointer as a function argument. They accept two Features objects and compute a distance between them. Similarity functions must return a non-negative value representing the distance between the feature vectors. The function can also optionally return a value less than zero if the two feature vectors are dissimilar. The QueryManager uses the return value to rank images by distance. After the QueryManager completes the processing of a query, it passes the ranked list of images to the DisplayManager for presentation.
The current implementation contains similarity functions that only operate upon a single query class. Since the specifics of the similarity computation are external to the QueryManager it is possible to provide similarity functions which take into account multiple query classes. One way to do this is to provide a hierarchy of queries in which the results of one level of queries are passed to a different similarity function.
A client of the display manager can submit a list of images by calling display(). This will replace the existing list of images held by the display manager with the list passed to this function. Then the images will be displayed.
Note that the Image lists that many of the manager objects maintain are lists of pointers to Image objects. To avoid memory leaks, clients pass a deletion policy to the display() member function telling the DisplayManager how to manage the memory in the image list.
The user interface provides typical menu options according to the Windows(tm) style. To see a screen capture showing the application area filled with wallpaper images, click here. Each image has a label which is the name of the database file containing the pixmap.
Composing a color query is a multi-step process involving two dialogs. The Query By Color dialog allows the user to specify up to 3 color percentages for the query. To specify each color, the user manipulates the controls in the Color dialog. This is a standard Windows dialog that allows selection of colors directly from the system color palette. The user may also create up to sixteen custom colors.
Once the query is composed, the user presses the OK button and the query is submitted to the QueryManager. All similar images will be displayed by the DisplayManager.
Selecting the Features... item presents the user with the Image Features dialog containing a list box of all the image names. The user selects an image and presses the desired button in the Display group. The following screen capture shows what happens after the ColorHistogram button is pressed.
The Color Histogram dialog presents two histograms of the selected image. The Original Image Histogram is the color histogram of the image as stored in the database. Each bar represents the relative number of pixels that are assigned the displayed color. The Total Color Bins field is the number of bins in each histogram, and the Total Empty Bins field is the number of bins which do not contain any pixels.The Image Feature Histogram is the histogram of the feature vector. It is a quantized version of the original histogram. A high percentage of empty bins in the feature histogram is common.
Selecting the Pattern Histogram button from the Image Features dialog, presents the Pattern Histogram dialog. The dialog shows the directional bias of the image. The numbers below each of the three bins represent the relative biases in each direction. The integer values range between 0 and 3 representing WEAK, NOT STRONG, STRONG, and VERY STRONG. The dialog shows that the image is very strongly biased vertically and strongly biased horizontally.
The Color Values... item on the View menu presents the user with a dialog for viewing the predominant feature colors. The Image Color Values dialog displays the five most prevalent colors, their percentage of distribution, and their RGB value. The color and the RGB values shown are the value at the center of each quantization range. This dialog is very useful for testing the quality of the similarity function.
ColorHistogram creates a 64-dimensional color histogram feature vector
similar to the one used in IBM's Ultimedia Manager[13].
Each element of the feature vector represents a cube-shaped subspace of RGB
space as shown in the following figure.
Each pixel is quantized into one of the bins. After traversing the image, the histogram contains the number of pixels contained in each color cube. Each bin value is then converted to a percentage representing the relative number of pixels contained in that color cube.
The similarity function of ColorHistogram computes the distance between two histograms using th L2 norm. This distance is compared to a user-definable threshold value to determine similarity. The QueryManager uses the return value to order images, most similar first.
The L2 norm only works well on sparse matrices. This is fine when the user provides three colors via a dialog because only entries with values are compared. When querying by example, the similarity function ignores small percentages if they are present in both histograms, reducing the problem of false positives due to contribution of insignificant differences between histograms.
Query by example requires a larger similarity threshold since it usually involves many more comparisons than queries via the dialog. For the data I used, 9.0 to 24.0 worked for three color queries and 25.0 to 40.0 was good for query by example.
The pattern extraction algorithm is a multistep process. First it creates
a greyscale copy of the original image. RGB values are converted to
brightness values using the same translation as for black and white television.
Television images are transmitted using the YIQ color scale. Black and white
television recievers only display the Y portion, the luminance signal:
The greyscale image is then processed to detect edges using the Sobel operator[10]. This method applies two 3 x 3 kernels to the nieghborhood of each pixel to estimate the brightness derivatives dB/dx and dB/dy. For the derivative in the horizontal direction, the following kernel is used:
1 0 -1 1 0 -1 1 0 -1And for vertical direction, the following kernel is used:
1 1 1 0 0 0 -1 -1 -1Conceptually, a kernel is applied to an image by sliding the kernel over the image and summing the products of the values in the kernel with the brightness values under them. The result is the derivative, or brightness slope at the pixel under the center of the kernel. After the application of both kernels to a pixel nieghborhood, the magnitude is computed:
This value is assigned to the pixel under the center of the kernel. To emphasize the edges, the algorithm thresholds the value to black or white. The following image is a result of this process:
Finally the edge image is traversed applying the two kernels again. This time the derivatives are much stronger in the bias of the image. The direction of the slope is determine by:
This is a value between zero and PI. The algorithm quantizes the direction into one of the three bins in the histogram. After this process is applied to the entire image, the sums in the bins are converted to percentages.
The result is a feature vector containing percentages of bias in horizontal,
diagonal, and vertical directions. The distance metric ranks the three
direction biases as: very strong, strong,
not strong, and weak. It then computes the L1 norm
distance between the two histograms. This distance is compared to a user-definable
threshold value to determine similarity. The returned distance is used by the
QueryManager to order similar images.
An example-based query uses the color histogram of a displayed wallpaper sample as the query key. The user clicks the right mouse button on the desired example and is presented with a popup menu. Selecting Query by Color Example will initiate the retrieval. STRIPE7.BMP was used as the example wallpaper image and the threshold was set at 38.
The current implementation does not provide enough capacity to distinguish between directional bias from lines and bias from noisy patterns. This would require a more sophisitcated feature extraction algorithm. For example, a smoothing step could be used to supress the noise before performing the gradient computations.
A sample user, unfamiliar with image processing, easily navigated through the system. The user was confused about the meaning of the threshold parameters which must be set as numbers. This confusion could be alleviated by presenting a slider control representing a scale of relative distance.
The prototype could be made embeddable by converting it to a server in a client/server arrangement. First, the user interface code must be externalized. This code is sensitive to change and properly resides in the client application. Second, a client/server communication protocol must be used. In the Microsoft Windows environment, OLE would provide this capability.
Image management is too crude for a production system. The entire image database is loaded into memory and no mechanism exists for maintaining only a working subset of images. There is no support for thumbnail versions of images. The size of an image is about 512x512 even though the software always scales down to 64x64 for display.
The software is a prototype intended for exploring ideas and
therefore does not contain the polish required in production systems.
In the field of interior design, designers and their customers search through hundreds of carpet, drapery, paint, and wallpaper samples. Their selections must be combined to create a pleasing result. Producing an asthetic result often requires even more searching.
Through the use of CBIR, designers could access vast amounts of material, (either on CD-ROM or in a vendor database), and rapidly create high-quality interior decorating solutions. Sophisticated query environments could assist in applying practical design constraints to ensure attractive results. Expensive sample inventories would be obsolete. Significant savings would be passed on to customers.
More and better query classes are needed to support all aspects of interior design materials. Classes for color, texture, pattern, and style are needed to create asthetic designs. Text and other attributes are necessary to represent manufacturer information, wear characteristics, cost, etc.
Judicious use of artificial intelligence will improve system performance. Fuzzy logic is useful in determining image similarity for certain query classes. Expert systems can be built that assist designers in creating solutions that conform to traditional design idioms.
Finally, sophisticated user interfaces will be needed to give designers the power and flexibility their work demands. Users must be able to incrementally build design solutions. The interface must provide access to designers' portfolio so they can use and modify past work allowing quick response to customer demands.
There are many challenges ahead for future CBIR systems builders. The field of interior
design presents its own challenges. The results of my work show promise, but there
is still much to do.
[2] A. Desai Narasimhalu: Special section on content-based retrieval, Multimedia Systems, 1995, 3:1-2.
[3] Flickner, Sawhney, Niblack, et. al.: Query by Image and Video Content: The QBIC System , IEEE Computer, September 1995.
[4] Rajiv Mehrotra, James E. Gray: Similar-Shape Retrieval In Shape Data Management , IEEE Computer, September 1995.
[5] R. W. Pickard, T. P. Minka: Vision Texture For Annotation , Multimedia Systems, 1995, 3:3-14.
[6] J. K. Wu, A. Desai Narasimhalu, B. M. Mehtre, C. P. Lam, Y. J. Gao: CORE: A Content-Based Retrieval Engine for Multimedia Information Systems , Multimedia Systems, 1995, 3:25-41.
[7] Markus A. Stricker: Bounds for the discrimination power of color indexing techniques , Proceedings SPIE Storage and Retrieval for Image and Video Databases II, 1994, 15-24.
[8] Nagarajan Ramesh, Ishwar K. Sethi: Feature Identification As An Aid to Content-based Image Retrieval , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 2-11.
[9] Virginia E. Ogle, Micheal Stonebraker: Chabot: Retrieval from a Relational Database of Images , IEEE Computer, September 1995, 40-48.
[10] John C. Russ: The Image Processing Handbook , Second Edition, CRC Press, 1995.
[11] Hong Jian Zhang and Di Zhong: A Scheme for Visual Feature based Image Indexing , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 36-46.
[12] Jonathan Ashley, Ron Barber, Myron Flickner, James Hafner, Denis Lee, Wayne Niblack, Dragutin Petkovic: Automatic and Semi-Automatic Methods for Image Annotation and Retrieval in QBIC , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 24-35.
[13] Harold Treat, Ed Ort, Jean Ho, Mimi Vo, Jing-Song Jang, Laura Hall, Frank Tung, Dragutin Petkovic: Searching Images Using Ultimedia Manager , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 204-213.
[14] James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, William Lorensen: Object-Oriented Modeling and Design , Prentice Hall, 1991.