AVEIR 2007-2009
Automatic annotation and Visual concept Extraction for Image Retrieval
Next meeting
May 25th, 2010


Cap Digital
Research institutes


LIG - UMR CNRS 5217 Laboratoire d'Informatique de Grenoble / Université Joseph Fourier (UJF)
LIP6 - UMR CNRS 7606 Laboratoire d'Informatique de Paris 6 / Université Pierre et Marie Curie-Paris 6 (UPMC), project leader
LSIS - UMR CNRS 6168 Laboratoire des Sciences de l'Information et des Systèmes / Université du Sud Toulon-Var (USTV)
LTCI - UMR CNRS 5141 Laboratoire Traitement et Communication de l'Information / TELECOM ParisTech (ENST)

People (in alphabetic order)

Massih-Reza Amini (LIP6), Isabelle Bloch (LTCI), Marine Campedel (LTCI), Marcin Detyniecki (LIP6), Ali Fakeri Tabrizi (LIP6), Marin Ferecatu (LTCI), Patrick Gallinari (LIP6), Hervé Glotin (LSIS), Young Min Kim (LIP6), Jacques Le Maitre (LSIS), Xi Li (LTCI), Henri Maître (LTCI), Philippe Mulhem (LIG), Trong-Ton Pham (LIG), Georges Quenot (LIG), Hichem Sahbi (LTCI), Sabrina Tollari (LIP6), Zhong-Qiu Zhao (LSIS)


Patrick Gallinari, aveir@poleia.lip6.fr

AVEIR (ANR-06-MDCA-002) is a project of the call for projects Masse de Données et Connaissances ambiantes (MDCA) of the National Agency for Research (ANR). It's labelled by the regional business cluster (known as a « pôle de compétitivité») Cap Digital.


Retrieving images in very large databases has been an active field for several years now. Image retrieval systems roughly fall into two categories: content based image retrieval (CBIR) and retrieval using manual keyword annotation. For CBIR, queries are images, image parts or sometimes mixture of drawing and image characteristics. This approach never succeeded to close the semantic gap between user information need and the expressiveness limit of query by sample techniques in the image domain. Web search engines (e.g. Google, Yahoo) have developed image retrieval techniques relying on keyword annotations of images which are limited to simple keyword queries. Both approaches have up to now failed to reduce the well known semantic gap between user expectations and image expressive power. CBIR is mostly limited to (sometimes complex) comparisons based on low image features. Retrieval by text is limited, due to its weak recall: only images that were indexed with high confidence can be accessed while others are ignored. Besides, such search engines completely fail whenever the user is interested in the visual aspects of the image itself.

A new emerging and maybe more challenging field in this domain is the automatic concept recognition from visual features. It relies on two key issues: "feature detection and rich image representation and indexing" and robust and accurate "image annotation". The project targets these two specific problems and proposes new and original solutions.

The overall goal of the project is to enrich image retrieval systems with semantic indexation and annotation and with symbolic relational description, all being automatically extracted and built from the textual and image content of documents and web pages. This semantic and symbolic information will be used in order to reduce the visual ambiguity in images and to enhance the retrieval of images from large databases.

As for the target application, we will consider in this project multi thematic general families of images such as those found on web pages, documents and professional collections like the classical Corel database. The project will develop 3 research axes.

  • The first axis is focused on image analysis, feature extraction and visual feature representations. Most annotation systems divide images into blobs and annotate the collection of blobs. The originality of our proposal is to bypass this baseline approach and to develop rich image representations. First, state of the art image segmentation algorithms focusing on robustness of the segmentation will be used for identifying salient components of the image and on spatial relations between them (geometry, topology, adjacency) will be extracted, both imbedded in a high level attributed graph representation. Second, the representation will rely on multiple views (facets) of the image.
  • The second axis is concerned with the automatic labeling of image components or objects with textual concepts. Labeling is formulated here as a classification problem where the labels are noisy and defined in an imprecise way. Labels are often defined at the global image level (not at the targeted component level) and with uncertainty. We propose to explore different formal statistical settings developed in the machine learning (ML) community and to adapt some ML paradigms for the annotation problem in order to make this labeling task fully automatic. The techniques we propose to use heavily rely on state of the state of the art and new machine learning methods.
  • The third axis considers image retrieval and evaluation of the proposed algorithms. Retrieval will offer the possibility to use the rich image representations developed in the first axis, allowing the user to use high level semantic queries. Fusion of visual and semantic queries will be studied in this axis. Tests will be performed on classical benchmarks and annotated collections will be developed in the project and released as project deliverables. Tests will then be performed on different multimedia document collections and specific annotated corpora will be developed for the project and made available to the community.

Main open problems

The main open problems and challenges addressed in this project are:

  • Reduction of the Semantic Gap between images and its textual description
    • Developing new rich image representations which allow to limit ambiguity and to perform visual and semantic queries.
    • Using machine learning techniques in order to reduce the gap between the semantics (of the text) and the signal description of an image.
    • Finding in a multimedia document (structured) which part of the text describes the image.
  • Combining text-specific techniques with image-specific techniques: Because of its fundamental differences text and image techniques have been developed in parallel. One challenge is to confront them and develop specific fusion algorithms.
  • New learning challenges:
    • Treating images and text on the same level reveals new learning problems as for instance how to learn from multi-instanced, multi-facetted examples, with non univocal object labels and noisy labels..
    • Learning in high dimensional spaces with sparse labeled examples is a challenged for the state of the art semi-supervised learning approaches
  • Evaluation
    • The definition of evaluation criteria is not a trivial task in this case, since we mix semantics, structures and visual aspects.
    • Developing new collections for the evaluation of image retrieval.

Main deliverables

Axe DeliverablesTypeResponsableParticipantsPeriod
 0Site WebWeb siteLIP6LIP6T0-T2
11State of Art and specification for the characterization relevant concept dependent visual featuresReportLTCILIG, LTCI, LSIST0-T6
2Library for advanced image processing techniques (relational modeling of images) and specifications for the implementation into the prototypeAlgorithmT7-T12
3Developments of the software modules and testsModelT13-T24
24Analysis of the annotation problem from a machine learning perspective and the state of the art on the correspondence between text and imageReportLIP6ALLT0-T6
5Annotations for rich image descriptionsAlgorithmT7-T12
6Developments of the software and testsModelT13-T36
37Contacts with European partners for multimedia collections and corpus specificationReportLIGALLT0-T12
8Corpus development - Web CorpusCorpusT7-T24
9Final prototype of the retrieval engineSearch engineT25-T36

Expected results

The main results expected at the end of the AVEIR project are:

  • definition of a model that represent different facets (views) of the images
  • definition of probabilistic approaches for the automatic annotation of usages according to the image content and text describing the images,
  • definition of a set of test collections for the evaluation of image annotation and retrieval
  • prototype of image retrieval system based on the different advances of AVEIR.

Multi-facets descriptions allow reducing image ambiguity and open promising perspectives for querying large image databases. The semantic labeling of complex image descriptions is however an open problem. For now, simple blob like representations have been used for automatic annotation. Adapting complex representations for general families of image databases is also challenging.

We believe that the proposed approach has the potential to meet these challenges so as to bypass the limitations of the current approaches. The project handles both very practical problems (design of efficient and expressive image search engines) and open theoretical problems in the domains of visual concept representation, semantic concept extraction and machine learning problems.

Developing robust and accurate solution for the automatic semantic annotation of images has important consequences for many applications in the multimedia domain. The project will provide principled methods for this problem which could be developed for large scale application by future industrial collaboration. This project may have a strong impact for the development of national and European R&D projects.