Content

Hide video with translation into sign language
Switch to fluid-width version
Česká verze této stránky English

Paper details

Communicative Images – New Approach to Accessibility of Graphics - Plhak, Jaromir


Ivan Kopeček, Radek Ošlejšek
Faculty of Informatics, Masaryk University, Brno, Czech Republic



Abstract

1 Introduction

Although the image recognition techniques are still far from being able to fully describe an analyzed picture, current technologies enable us to associate many useful pieces of information to images. For example, the date and time of a snapshot, GPS information, and in some cases recorded sound are often directly associated with photographs by the camera. Typically, this information is saved in the format of the image and can be exploited for image classification and semantics retrieval, see e.g. [1–3].
 
Most of the relevant information, however, is not directly feasible. Let us imagine a photo from a holiday ten years ago: the woman in the middle is my wife, but who is the guy standing behind her? It is apparently somewhere in the Alps, but what place? What is that peak in the background? Such pieces of information are virtually inaccessible. However, some relevant pieces of information can be retrieved using current information technologies. GPS coordinates allow us to determine where the photo has been taken. Face recognition [4–6] may help reveal the identities of persons. The orientation may help to determine some objects in the picture (e.g. the peak in the background).
 
The idea of communicative images presented in this paper lies in enabling the people to communicate with the images, i.e. enabling the users to get easily relevant pieces of information from the images and enabling simultaneously the images to gather relevant information about themselves from the users. Communicative images learn from the communication, enlarge they knowledge base and use it in further communication. They also provide intelligent and efficient natural language dialogue with the user.
 
The concept of communicative images is very general and allows many applications, e.g. in image searching, e-learning, image recognition, etc. However, probably the most important and direct impact is in making the images accessible for blind and visually impaired people.
 
Because of space limitation, this extended abstract presents just some basic ideas and examples to introduce the communicative images concept. More detailed description and implementation details will be presented in the full paper.
 
2 Building up Communicative Images and Communicating with Them
 
To make a standard image communicative, it is necessary to transform it into a format suitable to support structured annotation data. In our approach, the ability of the SVG format [7] to wrap the original raster image and to integrate annotation data is exploited. The general knowledge base is described by the OWL ontologies [8] that are linked from the SVG pictures via the owl:imports statement. In this way, the knowledge base can be shared by many pictures. On the other hand, concrete annotation data, i.e. concrete values of the properties prescribed by the ontologies, are stored directly in the SVG format in the form of XML elements. Technical details related to coupling OWL ontologies with the SVG formatare discussed in [9].
 
Once an image is transformed into the SVG format, the system tries to acquire as much information about the image as possible, using auto-detection and image recognition techniques, e.g. face detection and recognition algorithms [4–6], similarity search algorithms searching in large collections of tagged pictures [10–12], EXIF data extraction from photos, etc. After this initial stage, the user is informed about the estimated content and invited to confirm or refute the information and to continue with questioning. New pieces of the acquired information are stored in the image ontology and reused by subsequent interactions.
 
2.1 Graphical Ontologies
 
The OWL ontologies present formal specification of shared conceptualization. They describe the semantics of data with modeling domain concepts, their relationships and attributes. In [13], we have presented a basic graphical ontology which restricts abstraction to the aspects that are suitable and utilizable for dialogue-based investigation of a graphical content. Using this ontology, the annotator can express, for instance, that some object is “mostly red, oval and unusually big”, etc.
 
Although the graphical ontologies cover basic visual characteristics that are useful for generic dialogue interactions, verbal descriptions of domain-specific pictures require to employ specialized ontology extensions helping to generate domain-specific dialogues. For example, a domain model, People, provides vocabulary and background knowledge to classify people by their relationships, similarly to the popular “circles” known from social networks. People in a photo are identified and assigned into ”circles” (e.g. my family, friends, colleagues, etc.). Similar models can be found in [14, 15].
 
2.2 Dialogues with Communicative Images
 
Instead of more detailed discussion, which will be included in the full version of the paper, we present two examples in this section showing typical dialogues between the user and communicative images (U stands for User, I for Image).
 
Example 1.
 
U: [activates the dialogue with the image]
I: This photo has been taken at January 16, 2012. There is a group of people in the picture, among them your friends Peter and Eve.
/In the corresponding ontology, GROUP_OF_PEOPLE is a predefined ontology relation.
FRIEND is an ontology relation related to the user (modal ontology relation).
PETR and EVE are objects of the ontology. The photo is of the type PEOPLE, therefore in the description of the environment is omitted in the brief summary./
U: What are the other people?
I: The other people are John, Mary and Michal. John is brother of Mary and Michal is a friend of John and Mary. Do you know John, Mary or Michal?
U: Yes, they are my schoolmates.
/This information is encoded into the ontology format, compared with relevant ontology information, and, if it does lead to conflicts or if it does not need other explanation, this piece of information is added to the graphical ontology./
In this example, the user is assumed to be sighted. Important feature of the communicative images concept is that the annotation made by sighted people can be in further communication exploited by blind and visually impaired users. Internet and especially social networks form a suitable environment for this concept. Moreover, because communicative images gather also the information about the user, the image is able to apply special dialogue strategies when communicating with blind users.
 
Example 2.
 
U: [activates the dialogue with the image]
I: This photo has been taken at January 16, 2012. This is a view of the Brno city with the castle Spilberk and the cathedral Petrov.
U: Where is Brno located?
I: Brno is located in the eastern part of Czech Republic.
U: What is the population of Brno?
I: The population is about 400 000 people.
U: Is there a friend of mine that lives in Brno?
I: Yes, your friend Michal Novak lives there.
U: Tell me some interesting facts about Brno.
/The requests activate the search engine, which searches at the internal pieces if information encoded in the image graphical ontology, and, if this search fails, it activates real time searching Internet. If this search is successful, the relevant pieces of information are exploited to answer the request./
 
In this example, the user could be both sighted or blind. In both cases, the dialogue shows its functionality.
 
3 Exchange
 
The core of the management of the dialogue strategies for the communication between the user and the image is based on graphical ontologies, as a basic source of information, and a suitable logic standardizing syntax and semantics for information interchange. The interface between natural language and formalized ontology framework provides an engine transforming natural language into corresponding formal schemes. Typically, we can restrict ourselves to a small fragment of natural language, so that the engine can be based on relatively simple grammars in combination with the frames technology and standard techniques for misunderstanding solving.
 
The complexity and the strength of the chosen logic is determining the complexity and the strength of the dialogue strategies. Predicate logic, the logics that are developed within the Semantic web (e.g. Common Logic [16]), modal logics, temporal logics, Transparent Intensional Logic [17], etc., can be used for this purpose. This approach enables to develop a single general scheme, into which standardized formalizations of different logics can be easily implemented.
 
4 Communicative Images and the People with Special Needs
 
The concept of communicative images seems to be a promissing approach in accessibility of graphics for people with special needs, especially for blind and visually impaired people. Moreover, the communicative images paradigm makes it possible to build other useful applications. In [18], Chai et al. proposed an intelligent photo album enabling to organize and search collection of family photos by means of ontologies and SWRL questioning [19]. If communicative images are implemented into this scheme, we see straightforward way to enhance its functionality. Because the photos in the album are organized by means of OWL ontology, it might be possible to employ the mechanism of generating dialogues from domain ontologies. In this way, the user could organize photos via dialogue as well.
 
E-learning is another field, which could benefit from communicative images. A very important feature of the presented communicative images concept is, that because of the concept is based on formal ontologies, it is fully compatible with the Semantic web paradigm and simultaneously fully supporting multilinguality.
 
References
 
[1] Sandnes, F. Where was that photo taken? Deriving geographical information from image collections based on temporal exposure attributes. Multimedia Systems, 2010. p. 309–318.
[2] Boutell, M., Luo, J. Photo classification by integrating image content and camera metadata. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Volume 4. 2004. p. 901–904.
[3] Yuan, J., Luo, J., Wu, Y. Mining Compositional Features From GPS and Visual Cues for Event Recognition in Photo Collections. IEEE Trans. on Multimedia, 7. 2010. p. 705–716.
[4] Li, S. Z., Jain, A. K. (Eds.). Handbook of Face Recognition. Second edition edn. Springer-Verlag, 2011.
[5] Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y. Robust face recognition via sparse representation. Pattern Analysis and Machine Intelligence, IEEE Trans. on 31(2). 2009. p. 210–227.
[6] Haddadnia, J., Ahmadi, M. N-feature neural network human face recognition. Image and Vision Computing (12). 2004. p. 1071–1082.
[7] Dahlström, E., et al. Scalable vector graphics (svg) 1.1 (second edition). 2011. http://www.w3.org/TR/SVG/.
[8] Lacy, L.W. Owl: Representing Information Using the Web Ontology Language. Trafford Publishing , January 2005.
[9] Kopeček, I., Ošlejšek, R. G.a.t.e. to accessibility of computer graphics. In Proceedings of ICCHP. Berlin: Springer-Verlag, 2008. p. 295–302.
[10] Batko, M., Dohnal, V., Novak, D., Sedmidubsky, J. MUFIN: A Multi-Feature Indexing Network. In: SISAP 2009: 2009 Second Int. Workshop on Similarity Search and Applications, IEEE Computer Society. 2009.
p. 158–159.
[11] Jaffe, A., Naaman, M., Tassa, T., Davis, M. Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of the 8th ACM internat. workshop on Multimedia information retrieval, ACM. 2006. p. 89–98.
[12] Abbasi, R., Chernov, S., Nejdl, W., Paiu, R., Staab, S. Exploiting Flickr Tags and Groups for Finding Landmark Photos. In Advances in Information Retrieval. Berlin: Springer-Verlag, 2009. p. 654–661.
[13] Ošlejšek, R. Annotation of pictures by means of graphical ontologies. In Proc. Int. Conf. on Internet Computing ICOMP 2009. CSREA Press, 2009.
p. 296–300.
[14] Oellinger, T., Wennerberg, P.O. Ontology based modeling and visualization of social networks for the web. In GI Jahrestagung (2). Volume 94 of LNI., GI. 2006. p. 489–497.
[15] Wennerberg, P. Ontology based knowledge discovery in social networks. JRC Joint Research Center, 2005.
[16] Obitko, M. Common logic. 2007. http://www.obitko.com/tutorials/ontologies-semanticweb/common-logic.html.
[17] Tichý, P. The Foundations of Frege’s Logic. Berlin and New York: De Gruyter, 1988.
[18] Chai, Y., Xia, T., Zhu, J., Li, H. Intelligent digital photo management system using ontology and swrl. In Computational Intelligence and Security (CIS), 2010 International Conference on. 2010. p. 18–22.
[19] Boley, H., et al. Swrl: A semantic web rule language combining owl and ruleml. 2004. http://www.w3.org/Submission/SWRL/.

For the full paper and presentation material as well as the record of the presentation please sign in.

top