|May 25th, 2010
LIG - UMR CNRS 5217 Laboratoire d'Informatique de Grenoble / Université Joseph Fourier (UJF)
LIP6 - UMR CNRS 7606 Laboratoire d'Informatique de Paris 6 / Université Pierre et Marie Curie-Paris 6 (UPMC), project leader
LSIS - UMR CNRS 6168 Laboratoire des Sciences de l'Information et des Systèmes / Université du Sud Toulon-Var (USTV)
LTCI - UMR CNRS 5141 Laboratoire Traitement et Communication de l'Information / TELECOM ParisTech (ENST)
People (in alphabetic order)
Massih-Reza Amini (LIP6), Isabelle Bloch (LTCI), Marine Campedel (LTCI), Marcin Detyniecki (LIP6),
Ali Fakeri Tabrizi (LIP6), Marin Ferecatu (LTCI), Patrick Gallinari (LIP6),
Hervé Glotin (LSIS), Young Min Kim (LIP6), Jacques Le Maitre (LSIS),
Xi Li (LTCI), Henri Maître (LTCI), Philippe Mulhem (LIG), Trong-Ton Pham (LIG), Georges Quenot (LIG),
Hichem Sahbi (LTCI), Sabrina Tollari (LIP6), Zhong-Qiu Zhao (LSIS)
Leader Patrick Gallinari, email@example.com
AVEIR (ANR-06-MDCA-002) is a project of the call for projects Masse de Données et Connaissances ambiantes (MDCA) of the National Agency for Research (ANR). It's labelled by the regional business cluster (known as a « pôle de compétitivité») Cap Digital.
Retrieving images in very large databases has been an active field for several years now.
Image retrieval systems roughly fall into two categories: content based image retrieval (CBIR)
and retrieval using manual keyword annotation. For CBIR, queries are images, image parts or
sometimes mixture of drawing and image characteristics. This approach never succeeded to
close the semantic gap between user information need and the expressiveness limit of query
by sample techniques in the image domain. Web search engines (e.g. Google, Yahoo) have
developed image retrieval techniques relying on keyword annotations of images which are
limited to simple keyword queries. Both approaches have up to now failed to reduce the well
known semantic gap between user expectations and image expressive power. CBIR is mostly
limited to (sometimes complex) comparisons based on low image features. Retrieval by text is
limited, due to its weak recall: only images that were indexed with high confidence can be
accessed while others are ignored. Besides, such search engines completely fail whenever the
user is interested in the visual aspects of the image itself.
A new emerging and maybe more challenging field in this domain is the automatic concept
recognition from visual features. It relies on two key issues: "feature detection and rich image
representation and indexing" and robust and accurate "image annotation". The project targets
these two specific problems and proposes new and original solutions.
The overall goal of the project is to enrich image retrieval systems with semantic indexation
and annotation and with symbolic relational description, all being automatically extracted and
built from the textual and image content of documents and web pages. This semantic and
symbolic information will be used in order to reduce the visual ambiguity in images and to
enhance the retrieval of images from large databases.
As for the target application, we will consider in this project multi thematic general families
of images such as those found on web pages, documents and professional collections like the
classical Corel database. The project will develop 3 research axes.
- The first axis is focused on image analysis, feature extraction and visual feature
representations. Most annotation systems divide images into blobs and annotate the collection
of blobs. The originality of our proposal is to bypass this baseline approach and to develop
rich image representations. First, state of the art image segmentation algorithms focusing on
robustness of the segmentation will be used for identifying salient components of the image
and on spatial relations between them (geometry, topology, adjacency) will be extracted, both
imbedded in a high level attributed graph representation. Second, the representation will rely
on multiple views (facets) of the image.
- The second axis is concerned with the automatic labeling of image components or objects
with textual concepts. Labeling is formulated here as a classification problem where the labels
are noisy and defined in an imprecise way. Labels are often defined at the global image level
(not at the targeted component level) and with uncertainty. We propose to explore different
formal statistical settings developed in the machine learning (ML) community and to adapt
some ML paradigms for the annotation problem in order to make this labeling task fully
automatic. The techniques we propose to use heavily rely on state of the state of the art and
new machine learning methods.
- The third axis considers image retrieval and evaluation of the proposed algorithms. Retrieval
will offer the possibility to use the rich image representations developed in the first axis,
allowing the user to use high level semantic queries. Fusion of visual and semantic queries
will be studied in this axis. Tests will be performed on classical benchmarks and annotated
collections will be developed in the project and released as project deliverables. Tests will then
be performed on different multimedia document collections and specific annotated corpora
will be developed for the project and made available to the community.
Main open problems
The main open problems and challenges addressed in this project are:
- Reduction of the Semantic Gap between images and its textual description
- Developing new rich image representations which allow to limit ambiguity and to perform visual and semantic queries.
- Using machine learning techniques in order to reduce the gap between the semantics (of the text) and the signal description of an image.
- Finding in a multimedia document (structured) which part of the text describes the image.
- Combining text-specific techniques with image-specific techniques: Because of its fundamental differences text and image techniques have been developed in parallel. One challenge is to confront them and develop specific fusion algorithms.
- New learning challenges:
- Treating images and text on the same level reveals new learning problems as for instance how to learn from multi-instanced, multi-facetted examples, with non univocal object labels and noisy labels..
- Learning in high dimensional spaces with sparse labeled examples is a challenged for the state of the art semi-supervised learning approaches
- The definition of evaluation criteria is not a trivial task in this case, since we mix semantics, structures and visual aspects.
- Developing new collections for the evaluation of image retrieval.
| ||0||Site Web||Web site||LIP6||LIP6||T0-T2
|1||1||State of Art and specification for the characterization relevant concept dependent visual features||Report||LTCI||LIG, LTCI, LSIS||T0-T6
|2||Library for advanced image processing techniques (relational modeling of images) and specifications for the implementation into the prototype||Algorithm||T7-T12
|3||Developments of the software modules and tests||Model||T13-T24
|2||4||Analysis of the annotation problem from a machine learning perspective and the state of the art on the correspondence between text and image||Report||LIP6||ALL||T0-T6
|5||Annotations for rich image descriptions||Algorithm||T7-T12
|6||Developments of the software and tests||Model||T13-T36
|3||7||Contacts with European partners for multimedia collections and corpus specification||Report||LIG||ALL||T0-T12
|8||Corpus development - Web Corpus||Corpus||T7-T24
|9||Final prototype of the retrieval engine||Search engine||T25-T36
The main results expected at the end of the AVEIR project are:
- definition of a model that represent different facets (views) of the images
- definition of probabilistic approaches for the automatic annotation of usages according to the image content and text describing the images,
- definition of a set of test collections for the evaluation of image annotation and retrieval
- prototype of image retrieval system based on the different advances of AVEIR.
Multi-facets descriptions allow reducing image ambiguity and open promising perspectives
for querying large image databases. The semantic labeling of complex image descriptions is
however an open problem. For now, simple blob like representations have been used for
automatic annotation. Adapting complex representations for general families of image
databases is also challenging.
We believe that the proposed approach has the potential to meet these challenges so as to
bypass the limitations of the current approaches.
The project handles both very practical problems (design of efficient and expressive image
search engines) and open theoretical problems in the domains of visual concept
representation, semantic concept extraction and machine learning problems.
Developing robust and accurate solution for the automatic semantic annotation of images has
important consequences for many applications in the multimedia domain. The project will
provide principled methods for this problem which could be developed for large scale
application by future industrial collaboration. This project may have a strong impact for the
development of national and European R&D projects.