Sean Landis' CS718 Project, Fall 1995


Content-Based Image Retrieval Systems for Interior Design




Table of Contents

Introduction
Background
..........Manual Image Analysis
..........Automated Image Analysis
..........Image Features
..........Indexing and Queries
Current Research
..........Feature Extraction
..........Query Specification
..........Distance Metrics
..........Indexing
..........Extensibility
..........Artificial Intelligence
..........Maximizing Domain Knowledge
Project Overview
..........Project Definition
Implemention
..........Storage Manager
..........Analysis Manager
..........Query Manager
..........Display Manager and the User Interface
....................Image Menu
....................View Menu
..........Query by Color Algorithms
..........Query by Pattern Algorithms
Results
..........Color Queries
..........Pattern Queries
..........User Interface
..........Design
..........Limitations
Conclusions
..........Usefulness
..........Future Work
References


Introduction

Computers are beginning to replace photographic archives as the preferred form of repository. Computer-based image repositories provide a flexibility that cannot be attained with collections of printed images. Recently there has been an explosion in the number of images available to computer users. As this number increases, users require more sophisticated methods of retrieval. Content-based image retrival (CBIR) promises to fill this requirement.

There are many diverse areas where CBIR can play a key role in the use of images[1]:

Art galleries and museum management
Architectural and engineering design
Interior design
Remote sensing and natural resource management
Geographic information systems
Scientific database management
Weather forecasting
Retailing
Fabric and fashion design
Trademark and copyright database management
Law enforcement and criminal investigation
Picture archiving and communication systems
Education
Entertainment

With so many applications, CBIR has attracted the attention of researchers across several disciplines.


Background

Content-based retrieval is based on an understanding of the semantics of the objects in a collection. Semantic analysis is performed when the object is inserted into the collection. Given a semantic representation of the objects in a collection, a user can compose a query that retrieves a set of objects with similar semantics. Query analysis is usually performed on an index structure that summarizes the data in the collection.

Content-based image retrieval is the semantic analysis and retrieval of images. Semantic analysis may involve manual intervention, or it may be entirely automated. Manual analysis involves human interpretation to associate semantic properties with an image. Automated semantic analysis extracts image features that are correlated with some semantic meaning of the image. Both analysis methods have their advantages and their drawbacks.

Manual Image Analysis

Traditional databases use text key words as labels to efficiently access large quantities text data. Even complex text data can be automatically summarized and labeled using natural language processing and artificial intelligence[5].

When the data are images rather than text, summarizing the data with labels becomes considerably more difficult. For example, consider a repository of news photographs. A user may wish to pose a query such as

Give me all new photographs containing a US President and a communist leader.
To support queries like this, images require labeling that indicates the people in the images, their title, their nationality, and their political alignment.

It is not known how humans can process electromagnetic signals and convert them into highly detailed semantic interpretations. Therefore, human analysis is required to generate labels that support sophisticated queries like the one above. But there are problems with human analysis:

Personal perspective
One person's interpretation of the important features of an image may not match another person's interpretation. Personal perspective leads to variance in image analysis and labeling.
Domain mismatch
A person's domain of interest may influence image feature selection and analysis.
Interface expressiveness
Human-computer interfaces provide a limited bandwidth of expressive capability. Image analysis os limited by the expressiveness of the interface.
Data entry errors
Humans are error-prone, especially when set to a task which is tedious or redundant.
Because of these, and other problems, it is best to automate image analysis as much as possible. Where intervention is required, the user should be limited to a set of unambiguous choices.

Automated Image Analysis

Automated image analysis calculates approximately invariant statistics which can be correlated to the semantics of the image data. Example statistics are color histograms, invariants of shape moments, and edges. Statistical analysis is useful because it provides information about the image without fickle and costly human interaction.

Despite its appeal, automated image analysis suffers drawbacks. The primary problem with statistical analysis is that extracted features can only support a very specific type of query. The features apply to a particular domain, but they are not useful for posing general purpose queries against diverse data sets.

Consider an image database indexed by color histogram. For each image, a feature vector is generated such that each element of the vector represents the percentage of a color quantum found in the image. A three element vector could have quantums representing red, green, and blue (in practice a color feature vector requires more than three elements). The feature vector for an image contains the quantized percentage of red, green, and blue. The more quantums available, the greater the accuracy of the feature vector and the greater the cost of indexing and comparison.

If the database contained fabric images, a color histogram would be a powerful way to pose a query. A user interested designing a men's casual shirt for spring wants bright, spring-like colors. The query is posed with the desired color mix, and all fabrics containing similar mixes of the specified colors are retrieved. On the other hand, if the database contained news photographs as described earlier, then color histograms would not be very useful. The semantics of the images in the database do not correlate well with color histograms.

Image Features

An image feature is a piece of semantic information extracted from the image. There are several properties for measuring the quality of a feature:
Capacity
The number of distinguishable images that can be represented[7].
Maximal Match Number
The maximum number of images a query could possibly retrieve[7].
Complexity
The amount of computation required to determine if two images are similar for a particular feature.
Compactness
The amount of space required to store and compare a feature.
Image features can be categorized as either primitive or logical[1]. A primitive feature is a low-level or statistical attribute of an image such as an object boundry or color histogram. Primitive features are automatically extracted directly from the image. A logical feature represents an abstract attribute such as the label grass assigned to a region of an image. Logical features rely on information beyond that contained in the image.

The delineation between primitive and logical features is not always clear. Consider an image which is a 2D representation of a 3D scene containing several objects. Features representing the objects might be either primitive or logical features. If the extraction generates a feature containing edge information, then it is a primitive feature. On the other hand, if the extraction identifies the object by name, say by utilizing a model-based approach, it is a logical feature.

Primitive features are often used as the basis for generating logical features. A common CBIR system architecture layers logical feature extraction on top of primitive featue extraction. Primitive features are extracted directly from the image to generate a segemented image. From this information, more abstract, logical features are generated[6]. Segementation is the process of dividing the image into regions that correspond to structural units of interest[10].

Indexing and Queries

The goal of indexing is to create a compact summary of the database contents to provide an efficient mechanism for retrieval of the data. The summary data is based on feature vectors:
Since in content based visual databases, all items (images or objects) are represented by pre-computed visual features, the key attribute for each image will be a feature vector which corresponds to a point in a multi-dimensional feature space; and search will be based on similarities between the feature vectors. Therefore, to achieve a fast and effective retrieval...requires an efficient multi-dimensional indexing scheme[11].
Multiple indexing schemes may be required to support queries involving a combination of features. To utilize multiple indexes, a hierarchical approach is often used where each component of a query is applied against an appropriate index. A higher layer merges results for presentation to the user.

CBIR queries are posed in a fuzzy fashion. The user is typically interested in results according to similarity rather than equality. This requirement influences the indexing scheme, the methods of feature comparison, and the means by which queries are solicited from the user.

Image similarity is usually determined by computing a distance measure between the query and the appropriate feature vectors in the index structure. Similar images are ranked according to distance. Thresholding may be used to reduce the number of similar images presented to the user.

A query is created by composing primitive and logical feature vectors. To present a simple and structured query environment, CBIR systems define query classes. Some typical query classes are[1]:

Color
A partial histogram is created by specifying colors and percentages[3][6] [7][12][13].
Texture
Texture features include directionality, periodicity, randomness, roughness, regularity, coarseness, color distribution, contrast, and complexity[5][12] [13].
Sketch
The user creates a sketch representing an outline to be matched against dominant image edges[3][12].
Shape
An example shape is created using simple painting tools. The shape is compared to objects within images for similarity[3] [4][12][13]
Volume
Volumetric relationships are specified using 3D tools. Feature vectors contain 3D information.
Spatial constraints
The feature vector contains topological relationships among the objects in an image.
Browsing
The user is presented with a structured method of viewing the entire database.
Objective features
Objective features are attributes such as date of image acquisition, light direction, and view direction. These features lend themselves to the methods used in traditional databases[5][9].
Subjective features
Feature extraction is manual or semi-automatic and is subject to human interpretation. Examples are region labels and manual object identification[5][9].
Motion
Motion is applicable to a series of images such as video segments. Motion features measure movement of objects in the sequences or other movement such as camera viewpoint and camera focal point[3].
Text
Either simple or complex text can be associated with images. For the simple case, traditional database methods can be used. Complex systems use natural language processing and artificial intelligence to reason about text annotations[5].
Domain concepts
Domain information lends itself to specific forms of feature vectors and queries.
Query classes provide a meaningful way for a user to create feature vectors that correspond to their notion of image semantics. Queries can be composed of multiple query classes.

An alternative to user-composed queries are queries by example. The user submits a query in the form of a prototype image and the system uses the feature vector(s) of the appropriate query class(es). Often a session will begin with user-composed queries which are then refined through query by example.

To run an interactive query on a system called Query by Image Content (QBIC), click here.


Current Research

There are a large number of researchers exploring CBIR-related topics. I focused on recent work which has been more productive. The following sections describe some of the important topics I studied.

Feature Extraction

Feature extraction is performed when an image is added to the database. CBIR systems provide support for multiple query classes. Pickard and Minka[5] use 6 different features to characterize images from the MIT Photobook image retrieval system.

The CORE system[6], is a retrieval engine that supports a wide range of features including visual browsing, color similarity measures, and text. Primitive features are combined to create higer level, logical features they call concepts.

The QBIC system[3][12] extracts features that support image query classes for color, texture, shape, sketching, location, and text. The system also supports a set of video oriented query classes.

For color histograms, many different extraction methods are used. The first issue is the dimension of the color feature vector, e.g., the number of colors. Typical numbers range from 64 to 256 dimensions (256 being the number of unique colors representable with one byte). The higher the dimension of the feature vector, the greater its capacity.

The values in each bin of a color histogram are usually either the total number of pixels, or the percentage of pixels for the given color in the entire image.

Query Specification

The papers I read treated query specification as a secondary issue. Researchers recognized the need for simple ways to specify queries. Unlike text-based databases where the desired information is retrieved with a single query, a suitable image may require many queries. CBIR systems typically return several of the best images for selection by the user. Many systems allow the user to select one of these images as an example for another query. This is an example of query refinement. Researchers are exploring ways of providing easy refinement of queries that yield high success.

The use of multiple query classes to compose a query interests researchers. Although many systems claim to support composite queries, few of the papers explained how to combine query classes successfully.

Distance Metrics

Most image query classes rely on similarity metrics rather than exact matching. Distance metrics produce a relative distance between two image feature vectors. A threshold is used to determine if two features are similar. In many cases, the user can control the threshold to relax or constrain a query.

Every distance metric has advantages and drawbacks. For example, Stricker[7] analyzes two common distance metrics, the L1 and L2 (euclidean) norms. The L1 norm computes the distance d between two n element color histograms H and I as:

And the L2 norm is computed as:

Stricker states that "Using the L1-metric results in false negatives, i.e., not all the images with similar color composition are retrieved because the L1-metric does not take color similarity into account. Using a metric similar to the L2-metric results in false positives, i.e., histograms with many non-zero bins are close to any other histogram and thus are retrieved always."

The QBIC system[12] uses a 64 or 256 dimension color histogram where each i-th element is the percentage of color i. The distance between histogram r and database image histogram q is computed as (r - q)T A(r - q). Where T is the transpose operator. The locations a(i,j) in A contain the distance between color i and color j.

IBM's Ultimedia Manager[13] uses a 64-dimensional vector of color percentages. Each dimension represents a range in color space. At analysis time the color of each pixel is quantized into one of the 64 ranges based on its location in RGB space.

Indexing

The predominant CBIR research is in the area of image feature indexing. There are many difficult problems to solve. First, image features are typically high dimensional requiring complex, multi-dimensional indexing. Second, traditional indexing assumes exact matching; similarity matching complicates the indexing structure. Finally, it is difficult to combine multi-dimensional, similarity-based indexing methods to efficiently support queries composed of mutliple query classes.

By using similarity metrics only for initial indexing, Pickard and Minka[5] avoid the problem of combining query classes with different metrics. Similarity is encoded into clusters of image regions in a tree structure and distance is measured by ancestral distance between clusters. This has the effect of normalizing different query classes so they can be treated identically.

Extensibility

Systems must be extensible to overcome the immaturity of indexing methods, query specification, and feature extraction. Most of the important work in CBIR lies ahead.

The importance of extensibility was recognized by Wu, et. al.[6] when they developed CORE based on a generic framework for a multimedia DBMS. To demonstrate the flexibility of the architecture, they developed two different applications: a computer-aided facial image inference and retrieval system (CAFIIR), and a trademark archival and registration system (STAR). A medical information system is currently in development. In their conclusion, they state: "Object orientation is very important for a retrieval engine like CORE. One advantage...is an increase in the reusability of codes...to increase its reusability and extensibility."

Artificial Intelligence

Research focuses on three applications of Artificial Intelligence to CBIR: reasoning about logical features, similarity metrics, and index construction and maintenance. For example, Pickard and Minka[5] apply AI when annotating images to dynamically select from multiple feature models based on how the user labels image regions. Their system is also capable of improving the indexing structure based on new positive and negative examples.

Wu, et. al.[6], apply fuzzy reasoning to queries where the logical features of the query are only partially defined by the user. They also used a learning based on experiences neural network model to generate self-organizing nodes in a content-based index tree. This allows their system to fuse composite feature measures to support complex and fuzzy queries.

Maximizing Domain Knowledge

Query performance can be drastically improved in cases where assumptions can be made about the nature, or domain, of the images in the database. Wu, et. al., observe that "CORE has comprehensive functions. However each application has its domain-specific problems." And "In application development, domain expertise must be added to customize the indexing and retrieval module."


Project Overview

Exploration of CBIR problems in the domain of interior design. I am interested in the methods of CBIR which apply to problems faced by interior designers. Interior designers work with paint, wallpaper, fabric, and floor coverings. They also follow general principles of form, space, color, and style. Designers and their customers regularly face the tedious chore of manually searching for, and matching materials according to general design principles and taste. There is an opportunity for a high degree of computer assistence with these tasks.

Designers could compose queries using primitive and logical features and specify constraints according to design principles. Results of a series of queries, for wallpaper, paint, and carpet, must be self-consistent according to designer-specified rules of form, space, color, and style. These requirements led my interest toward the following query classes:

Color
Fabric, wallpaper, etc., are often selected based upon color content. This is a perfect application of color histograms.
Texture
Floor coverings, wallpaper, and fabric all have important textural components ideal for queries based on textural features.
Shape
Examples of shapes are stripes, plaids, floral and patterns.
Objective features
Styles such as Victorian could be modeled as objective features.
Subjective features
Taste, mood, or sensation related design concepts can be specified using subjective feature queries. Examples are: feminine vs. Masculine, cheery, cool, warm, etc.
Text
Product attributes such as part numbers, supplier information, and first production date are all text features.
Domain specific
The design rules described above are domain specific features.
Query by example is a powerful tool for interior designers. For example, given a particular carpet sample, the designer could find window covering that compliments the carpet.

Project Definition

I implemented a software prototype that allowed me to explore the following areas:
Color
Color is an important feature of the materials interior designers use. Color also allows automatic image analysis. I explored a few implementations of color histogram feature vectors and related similarity metrics.
Pattern
Patterns are also very important to interior designers. I focused on automated feature vector generation using edge detection.
Software Design
One of the difficulties of CBIR systems is that no small number of query classes provide enough flexibility to support interior design. Further, the technology of image feature vectors, similarity metrics, and indexing is immature. A real-world system needs to be extensible and configurable. Using object oriented techniques, I designed a system that encapsulates the areas of highest change. I created a framework to support the addition of new query classes and distance metrics. The framework divides major system tasks among manager objects that interact through well-defined interfaces.
User Interface
I defined a simple user interface for color histogram queries and provided the ability to query by example using color histograms and shapes.
Image Management
Basic to a QBIC system is the ability to efficiently manage the performance, storage, and memory requirements of images. Because of their size and complexity, images have special computation and resource requirements. I have identified some of these issues and suggest some solutions.
I specifically avoided exploring indexing issues in this project.


Implementation

I implemented a software prototype for the purpose of exploring the details of CBIR. The software was written using Microsoft(tm) Visual C++ Compiler, version 2.0. The platform was a Intel 486DX2 system running the Windows NT operating system, version 3.51. The software architecture is shown in the following block diagram:


This architecture is loosely based on CORE[6]. Each of the managers is implemented as a singleton object; the class definition restricts instantiation to only one system-wide object.

Storage Manager

The Storage Manager provides an interface to the image database. It is responsible for maintaining an in-memory virtual mapping of images and for performing system specific I/O operations. The following class diagram[14] depicts the key relationships.

The StorageManager class, stormgr.h, provides member functions to access the image database. A client uses store() to associate the pixmap in file with the image object and store the image in the database. This function is used by the AnalysisManager to process a user request to add an image to the database. It also attaches the pixmap to the image which allows the AnalysisManager to perform feature extractions on the image.

Clients can request a single image using getImage(), or the entire image database list using getImageList(). The loadDB() function is used to initialize the StorageManager object.

The current implementation of the StorageManager maintains an unordered list of Image objects representing the entire database. In a real system, the StorageManager would maintain multiple indices used to retrieve images. The Image objects would be at the leaves of the index, e.g., in the case of a tree-based index structure.

For efficiency, this implementation only loads the pixmap data for an image when necessary. The Image object determines when to load the data; the task of loading is delegated to the StorageManager.

The Image class, image.h, encapsulates the details of the pixmap implementation and feature vectors. The system manipulates Image objects for convenience. The Pixmap class provides an interface to a color pixel representation of an image (pixmap.h). This class allows the implementation of the on-screen image to change without affecting the rest of the code.

The Features class, features.h, encapsulates a set of feature vectors. This class allows new feature vectors to be added as new query classes are added.

Two changes would be required in a production system to efficiently manage memory. First, the current implementation only loads pixmaps on demand and never invalidates them. Invalidating pixmaps would conserve large amounts of physical memory. A typical pixmap requires 512 x 512 = 262144 bytes of memory just to store the image data, assuming 256 colors. Also a pixmap has a color table and other overhead making for a total of about 264000 bytes.

A second memory saver would be the use of thumbnail versions of the pixmaps. Thumbnails are smaller representations, usually 64 x 64 = 4096 bytes, which are used for display. The current system only requires the entire image during feature extraction. Support for thumbnails would either require a time tradeoff to reduce a large pixmap when it is loaded into memory, or a disk space tradeoff if the thumbnails were generated at load time and stored in the database.

Analysis Manager

The Analysis Manager, analmgr.h performs analysis on images as they are added to the system. The result is a set of feature vectors, one for each type of query class. The greatest change in a CBIR system is likely to be the addition of new query classes. The Analysis Manager provides extensibility via registration of feature extractors which generate feature vectors. The class relationships are shown in the following class diagram.

When a user adds a new image to the system, the user interface calls the analyze() member function. This function uses the StorageManager to create a new Image which it then analyzes by calling extract() on each installed feature extractor. New feature extractor objects are added by calling addFeature() when the system is initialized.

The FeatureExtractor abstract class provides an interface definition to be used by all feature extractors (features.h). Concrete feature extractor classes inherit the interface and provide implementations appropriate for the needs of the feature vectors they generate. A feature extractor object is responsible for analyzing the image, creating a FeatureVector, and installing the FeatureVector in the Feature object of the Image.

The FeatureExtractor classes play an important role: they encapsulate all important information for a particular query class. When adding a new query class, most of the effort is in creating a new subclass of FeatureExtractor. Other classes can get query class specific information from feature extractors. For example, the QueryManager calls the similarityFunc() when comparing two images.

The AnalysisManager, feature extractors, Features, and the FeatureVector have an important relationship. At initialization time, all the feature extractors are installed into the singleton instance of the AnalysisManager (init.cpp). The AnalysisManager tells each feature extractor its position in the set of feature vectors stored in the Features objects. It also tells the Features class how many feature vectors every Features object must be capable of storing. Every feature extractor implementation knows how large to make the data portion of its FeatureVector. By encapsulating knowledge in this fashion, new features can easily be added.

Query Manager

The QueryManager class provides two important functions: the ability to query the database, and the ability to maintain a query history (querymgr.h). The Query Manager accepts a Features object and a similarityFunc() from the User Interface and formulates a query. The formulated query is used to retrieve similar images from the Storage Manager. A client requests an initialized Features object from the AnalysisManager, (which knows the position and size information for every feature), and fills it in with the information to be matched in the query. The client gets the similarityFunc from the feature extractor class defined for the query class.

Similarity functions are static member functions of the derived feature extractor class. This is necessary when passing a member function pointer as a function argument. They accept two Features objects and compute a distance between them. Similarity functions must return a non-negative value representing the distance between the feature vectors. The function can also optionally return a value less than zero if the two feature vectors are dissimilar. The QueryManager uses the return value to rank images by distance. After the QueryManager completes the processing of a query, it passes the ranked list of images to the DisplayManager for presentation.

The current implementation contains similarity functions that only operate upon a single query class. Since the specifics of the similarity computation are external to the QueryManager it is possible to provide similarity functions which take into account multiple query classes. One way to do this is to provide a hierarchy of queries in which the results of one level of queries are passed to a different similarity function.

Display Manager and the User Interface

The display manager, displmgr.h, is responsible for displaying images on the screen and tracking mouse selection of images. This class is closely related to the user interface provided by the operating system. It maintains the list of currently displayed images and responds to paint messages generated by the window system.

A client of the display manager can submit a list of images by calling display(). This will replace the existing list of images held by the display manager with the list passed to this function. Then the images will be displayed.

Note that the Image lists that many of the manager objects maintain are lists of pointers to Image objects. To avoid memory leaks, clients pass a deletion policy to the display() member function telling the DisplayManager how to manage the memory in the image list.

The user interface provides typical menu options according to the Windows(tm) style. To see a screen capture showing the application area filled with wallpaper images, click here. Each image has a label which is the name of the database file containing the pixmap.

Image Menu

The Image menu provides options for adding images and for querying the database. Selecting a query option presents a dialog designed to solicit a query for a particular query class. For example, the following screen capture shows the dialogs used to compose a query by color.

Composing a color query is a multi-step process involving two dialogs. The Query By Color dialog allows the user to specify up to 3 color percentages for the query. To specify each color, the user manipulates the controls in the Color dialog. This is a standard Windows dialog that allows selection of colors directly from the system color palette. The user may also create up to sixteen custom colors.

Once the query is composed, the user presses the OK button and the query is submitted to the QueryManager. All similar images will be displayed by the DisplayManager.

View Menu

The View menu provides selections that allow the user to view information about the database and its images. The Database item causes the DisplayManager to display the entire image database. In a real application, this would be impractical; intelligent browsing tools would be necessary for traversing a large set of images.

Selecting the Features... item presents the user with the Image Features dialog containing a list box of all the image names. The user selects an image and presses the desired button in the Display group. The following screen capture shows what happens after the ColorHistogram button is pressed.

The Color Histogram dialog presents two histograms of the selected image. The Original Image Histogram is the color histogram of the image as stored in the database. Each bar represents the relative number of pixels that are assigned the displayed color. The Total Color Bins field is the number of bins in each histogram, and the Total Empty Bins field is the number of bins which do not contain any pixels.

The Image Feature Histogram is the histogram of the feature vector. It is a quantized version of the original histogram. A high percentage of empty bins in the feature histogram is common.

Selecting the Pattern Histogram button from the Image Features dialog, presents the Pattern Histogram dialog. The dialog shows the directional bias of the image. The numbers below each of the three bins represent the relative biases in each direction. The integer values range between 0 and 3 representing WEAK, NOT STRONG, STRONG, and VERY STRONG. The dialog shows that the image is very strongly biased vertically and strongly biased horizontally.

The Color Values... item on the View menu presents the user with a dialog for viewing the predominant feature colors. The Image Color Values dialog displays the five most prevalent colors, their percentage of distribution, and their RGB value. The color and the RGB values shown are the value at the center of each quantization range. This dialog is very useful for testing the quality of the similarity function.

Query by Color Algorithms

When an image is added to the database, the AnalysisManager calls the extract() member function on the ColorHistogram object, clrhistx.h. This function is responsible for creating a query class specific feature vector and installing it in the Image.

ColorHistogram creates a 64-dimensional color histogram feature vector similar to the one used in IBM's Ultimedia Manager[13]. Each element of the feature vector represents a cube-shaped subspace of RGB space as shown in the following figure.

Each pixel is quantized into one of the bins. After traversing the image, the histogram contains the number of pixels contained in each color cube. Each bin value is then converted to a percentage representing the relative number of pixels contained in that color cube.

The similarity function of ColorHistogram computes the distance between two histograms using th L2 norm. This distance is compared to a user-definable threshold value to determine similarity. The QueryManager uses the return value to order images, most similar first.

The L2 norm only works well on sparse matrices. This is fine when the user provides three colors via a dialog because only entries with values are compared. When querying by example, the similarity function ignores small percentages if they are present in both histograms, reducing the problem of false positives due to contribution of insignificant differences between histograms.

Query by example requires a larger similarity threshold since it usually involves many more comparisons than queries via the dialog. For the data I used, 9.0 to 24.0 worked for three color queries and 25.0 to 40.0 was good for query by example.

Query by Pattern Algorithms

Query by pattern algorithms automatically recognize the directional biases of an image. When an image is added to the database, the AnalysisManager calls the extract() member function on the Pattern class, patternx.h. Pattern creates a 3-dimensional feature vector where each element contains a percentage of bias in the horizontal, diagonal, and vertical directions.

The pattern extraction algorithm is a multistep process. First it creates a greyscale copy of the original image. RGB values are converted to brightness values using the same translation as for black and white television. Television images are transmitted using the YIQ color scale. Black and white television recievers only display the Y portion, the luminance signal:

The greyscale image is then processed to detect edges using the Sobel operator[10]. This method applies two 3 x 3 kernels to the nieghborhood of each pixel to estimate the brightness derivatives dB/dx and dB/dy. For the derivative in the horizontal direction, the following kernel is used:
			 1   0  -1
			 1   0  -1
			 1   0  -1
And for vertical direction, the following kernel is used:
			 1   1   1
			 0   0   0
			-1  -1  -1
Conceptually, a kernel is applied to an image by sliding the kernel over the image and summing the products of the values in the kernel with the brightness values under them. The result is the derivative, or brightness slope at the pixel under the center of the kernel. After the application of both kernels to a pixel nieghborhood, the magnitude is computed:

This value is assigned to the pixel under the center of the kernel. To emphasize the edges, the algorithm thresholds the value to black or white. The following image is a result of this process:

Finally the edge image is traversed applying the two kernels again. This time the derivatives are much stronger in the bias of the image. The direction of the slope is determine by:

This is a value between zero and PI. The algorithm quantizes the direction into one of the three bins in the histogram. After this process is applied to the entire image, the sums in the bins are converted to percentages.

The result is a feature vector containing percentages of bias in horizontal, diagonal, and vertical directions. The distance metric ranks the three direction biases as: very strong, strong, not strong, and weak. It then computes the L1 norm distance between the two histograms. This distance is compared to a user-definable threshold value to determine similarity. The returned distance is used by the QueryManager to order similar images.


Results

The results of my project were positive. Using a scanner, I was able to produce on-line pixmaps for 50 wallpaper samples. I implemented software that supported both color and pattern based queries of a wallpaper database. The software provides an intuitive user interface and supports easy addition of new query classes.

Color Queries

There are two ways to query the database by color: dialog and example. The Query By Color dialog was described above. A dialog query using three user-selected colors demonstrates the effectiveness of the query. Three images are returned in order of similarity.

An example-based query uses the color histogram of a displayed wallpaper sample as the query key. The user clicks the right mouse button on the desired example and is presented with a popup menu. Selecting Query by Color Example will initiate the retrieval. STRIPE7.BMP was used as the example wallpaper image and the threshold was set at 38.

Pattern Queries

The software currently supports pattern queries by example only. The example-based queries are initiated from a popup menu. A query based on wallpaper sample PLAID1.BMP results in a set of wallpapers containing strong directional bias in vertical and horizontal directions. This query was restricted to display 10 images.

The current implementation does not provide enough capacity to distinguish between directional bias from lines and bias from noisy patterns. This would require a more sophisitcated feature extraction algorithm. For example, a smoothing step could be used to supress the noise before performing the gradient computations.

User Interface

The user interface is fairly intuitive. The interface presents a familiar environment to the Windows user by following Microsoft Windows interface guidelines.

A sample user, unfamiliar with image processing, easily navigated through the system. The user was confused about the meaning of the threshold parameters which must be set as numbers. This confusion could be alleviated by presenting a slider control representing a scale of relative distance.

Design

The software design I used takes full advantage of object-oriented principles. The key features of the design are:
Change is encapsulated
The area of greatest change is the addition of new query classes. This functionality is encapsulated in the feature extractor concrete classes. A new query class is added by copying an existing feature extractor and modifying it to suit the needs of the new query class.
Implemenation details are encapsulated
Two classes hide implementation details: Pixmap and StorageManager. To support multiple image types, Pixmap could be converted into an abstract class providing only interface; specific image types would inherit from this class. The StorageManager class hides details of the database and operating system and is responsible for logical image management. In a full-featured application, the StorageManager would delegate system details to another class, and would only manage images.
Abstraction is enforced
Abstraction is enforced using well-defined manager classes.
Interface inheritance eases expansion
The FeatureExtractor abstract class defines the interface for all concrete feature extractors. Client code can be written to an interface without knowledge of query class-specific details.
Ghese design principles produced an extensible and embeddable system.

The prototype could be made embeddable by converting it to a server in a client/server arrangement. First, the user interface code must be externalized. This code is sensitive to change and properly resides in the client application. Second, a client/server communication protocol must be used. In the Microsoft Windows environment, OLE would provide this capability.

Limitations

The limitations of the prototype are performance and resource management. Adding a new image to the database takes a long time because of image analysis. Although the current algorithms are not optimized, this step will always be too expensive to perform in real time. A production system would have so many feature vectors that a batch processing mechanism would be necessary.

Image management is too crude for a production system. The entire image database is loaded into memory and no mechanism exists for maintaining only a working subset of images. There is no support for thumbnail versions of images. The size of an image is about 512x512 even though the software always scales down to 64x64 for display.

The software is a prototype intended for exploring ideas and therefore does not contain the polish required in production systems.


Conclusions

The project goals were met: a prototype CBIR system was built demonstrating color and pattern queries; an intuitive user interface provides ease of use; an object-oriented design supports extensibility and embeddability.

Usefulness

With the vast number of images available on-line, quality CBIR systems are critical. By using the right system, people can quickly find the image they need.

In the field of interior design, designers and their customers search through hundreds of carpet, drapery, paint, and wallpaper samples. Their selections must be combined to create a pleasing result. Producing an asthetic result often requires even more searching.

Through the use of CBIR, designers could access vast amounts of material, (either on CD-ROM or in a vendor database), and rapidly create high-quality interior decorating solutions. Sophisticated query environments could assist in applying practical design constraints to ensure attractive results. Expensive sample inventories would be obsolete. Significant savings would be passed on to customers.

Future Work

A CBIR system for interior design requires access to large databases of flooring, paint, fabric, and wallpaper samples. Efficient retrieval from multiple, large, image databases relies on new data representations and indexing methods. These methods must support queries composed of multiple query classes.

More and better query classes are needed to support all aspects of interior design materials. Classes for color, texture, pattern, and style are needed to create asthetic designs. Text and other attributes are necessary to represent manufacturer information, wear characteristics, cost, etc.

Judicious use of artificial intelligence will improve system performance. Fuzzy logic is useful in determining image similarity for certain query classes. Expert systems can be built that assist designers in creating solutions that conform to traditional design idioms.

Finally, sophisticated user interfaces will be needed to give designers the power and flexibility their work demands. Users must be able to incrementally build design solutions. The interface must provide access to designers' portfolio so they can use and modify past work allowing quick response to customer demands.

There are many challenges ahead for future CBIR systems builders. The field of interior design presents its own challenges. The results of my work show promise, but there is still much to do.


References

o [1] Venkat N. Gudivada, Vijay V. Raghavan: Content-Based Image Retrieval Systems, IEEE Computer, September 1995.

o [2] A. Desai Narasimhalu: Special section on content-based retrieval, Multimedia Systems, 1995, 3:1-2.

o [3] Flickner, Sawhney, Niblack, et. al.: Query by Image and Video Content: The QBIC System , IEEE Computer, September 1995.

o [4] Rajiv Mehrotra, James E. Gray: Similar-Shape Retrieval In Shape Data Management , IEEE Computer, September 1995.

o [5] R. W. Pickard, T. P. Minka: Vision Texture For Annotation , Multimedia Systems, 1995, 3:3-14.

o [6] J. K. Wu, A. Desai Narasimhalu, B. M. Mehtre, C. P. Lam, Y. J. Gao: CORE: A Content-Based Retrieval Engine for Multimedia Information Systems , Multimedia Systems, 1995, 3:25-41.

o [7] Markus A. Stricker: Bounds for the discrimination power of color indexing techniques , Proceedings SPIE Storage and Retrieval for Image and Video Databases II, 1994, 15-24.

o [8] Nagarajan Ramesh, Ishwar K. Sethi: Feature Identification As An Aid to Content-based Image Retrieval , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 2-11.

o [9] Virginia E. Ogle, Micheal Stonebraker: Chabot: Retrieval from a Relational Database of Images , IEEE Computer, September 1995, 40-48.

o [10] John C. Russ: The Image Processing Handbook , Second Edition, CRC Press, 1995.

o [11] Hong Jian Zhang and Di Zhong: A Scheme for Visual Feature based Image Indexing , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 36-46.

o [12] Jonathan Ashley, Ron Barber, Myron Flickner, James Hafner, Denis Lee, Wayne Niblack, Dragutin Petkovic: Automatic and Semi-Automatic Methods for Image Annotation and Retrieval in QBIC , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 24-35.

o [13] Harold Treat, Ed Ort, Jean Ho, Mimi Vo, Jing-Song Jang, Laura Hall, Frank Tung, Dragutin Petkovic: Searching Images Using Ultimedia Manager , Proceedings SPIE Storage and Retrieval for Image and Video Databases III, 1995, 204-213.

o [14] James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, William Lorensen: Object-Oriented Modeling and Design , Prentice Hall, 1991.


CS718 Course Description


Send questions and comments to Sean Landis at scl@isis.com
Last modified 12/7/95 by Sean Landis