Content

Display video with translation into sign language
Switch to fluid-width version
Česká verze této stránky English

Paper details

Integrated and Innovative Solutions to support Deaf Students during Class Attendance - Tedesco, Roberto


Licia Sbattella, Roberto Tedesco
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy



Abstract

Motivation

The main idea of the innovative solution, named PoliLips, arose during conversations with deaf students, at the MultiChancePoliTeam (the service for students with disability at Politecnico di Milano). Students reported a strong difficulty in class attendance (and their scores, got at the end of courses, confirmed the problem).

Considering that students did not require the Sign Language Translation, we initially tried to adopt only an Automatic Speech Recognition (ASR) system for on-the-fly transcription of the teacher’s voice, but we did not get reliable results. In particular, accuracy was deeply affected by the terminology adopted by the teacher; in fact, whenever the teacher used a highly specific technical lexicon during classes, the recognition accuracy decreased; this is the case, for example, of scientific- and technological-related courses, which often mix Italian and English words, and force teachers to make use of a very specific technical speech.

Several deaf students reported that lipreading was their preferred compensation mechanism, sometimes mixed with aural information even if, during classes, several factors affects lipreading effectiveness. Some words are inherently hard to lip-read; some people can be particularly hard to understand (especially who talks very fast, or mumbles, or covers the mouth when speaking); or, finally, the position of the speaker can prevent good observation of the facial movements.

Main Idea and Contribution

None of the aforementioned compensation mechanism provided an effective solution to the problem; thus, our idea was to mix the three information modalities we can collect from the teacher – visual (lipreading), aural, and textual (generated by an ASR) – in a single, integrated solution. In doing so our goal was twofold: first, we argued that each modality could have compensated for errors present in or induced by the others (for example, if the ASR had failed to transcribe a word, the student could have used lipreading to correct the error and understand the correct word); second, the resulting system could have been able to handle different degrees of hearing loss, and students’ preferences in compensation mechanisms. The device could have been particularly useful for students with severe hearing impairment, where no aural information is perceived.

PoliLips captures, and sends to students’ laptops via wired or wireless network, an aural/visual/textual stream composed of a video of the teacher’s face, her/his voice, and a textual transcription performed by an ASR. The device is simple and relatively cheap to build and deploy. Moreover, PoliLips facilitates class attendance when the student cannot see the teacher’s face (for example, whenever the teacher writes on the blackboard) or the teacher is too far, or she/he not in front of the student. Finally, PoliLips allows students to remotely attend classes, as the aural/visual/textual stream can be sent over the Internet. Of course, the device could prove useful not only in university classrooms, but in whatever context where a speaker talks to a large audience, and network connections are available.

Supporting Deaf Students

Several aids can support deaf students during classes. Voice recorders – coupled with ASR, for off-line voice transcription – are often used. This solution, however, is conditioned by the accuracy of the ASR, which is often sub-optimal. Magnetic induction loops can be used with students that wear a compatible cochlear implant, but are expensive to deploy (small, portable magnetic induction loops exist, but work well for very small audiences). Subtitling services – based on human experts in captioning – are often too expensive. Sign-language interpreters are also expensive and usually not required by deaf students (in our university, the office for students with disabilities didn’t ever receive any request for sign-language interpreters in more than 10 years). PoliLips tries to address these issues, providing a simple and cheap, yet effective, solution.

The PoliLips solution

PoliLips is a hardware/software solution: the teacher wears a hardware device, while specific software applications are installed on teacher’s and students’ laptops. We designed and built the hardware, relying on off-the-shelf components, and developed the applications. ASR functionalities were provided by a commercial application.

The PoliLips hardware consists of a wearable device and a base station. The wearable device is composed of a tiny video camera, coupled with a high-quality, noise-cancelling microphone, and a transmitter unit. The weight of the camera/microphone mounting was 127 grams, and proved to be light and stable enough for the teacher to wear the device without any particular problem. Audio and video signals enter the transmitter unit, composed of two elements: a video transmitter and a high-quality audio transmitter, both powered by battery packs.

The critical components of the wearable device turned out to be the microphone and the audio transmitter. In fact, the cheap camera and video transmitter we used proved to be good enough for our goals; the lesson we learnt is that, once a decent resolution, frame rate, and signal/noise ratio are provided, increasing these specifications does not affect the lipreading ability of the person.

Conversely, the audio signal must be as clear as possible, in order for the ASR to properly generate the text. The cheap microphone (without noise cancellation) and transmitter we tested in our first prototype proved to be almost useless; switching to professional high-quality microphone and transmitter solved the problem.

The base station is composed of three elements: a video acquisition box and a high-quality audio receiver. The video acquisition box, which contains a video receiver and a video capture device, digitizes the video signal and provides a stream through a USB connector.

The video acquisition box and the audio receiver are connected to the teacher’s laptop, where the ASR and the PoliLips server software are installed.

The PoliLips software is composed of two parts: server and client. The server is installed on the teacher’s laptop (along with the ASR), while the client is installed on students’ laptops; they communicate by means of a wired or wireless network.

The PoliLips server acquires the digital video stream and the audio, combining them in an audio/video digital stream; audio also comes to the ASR, which generates the text. Finally, the PoliLips server waits for connections from clients and, once a connection has been established, sends the visual/aural/textual stream. The PoliLips server application also provides a preview of the visual/textual stream (so that the teacher can check whether the camera is well positioned and the ASR is working fine).

In order to save money, we chose not to integrate the ASR into the PoliLips server (as the ASR Software Development Kit is quite expensive); instead, we used the regular, desktop ASR application, integrating it with the PoliLips server in a naive but effective way: the user interface of the PoliLips server application contains a text field where the teacher must click (giving it the “focus” of the user interface) before starting the ASR; then, simply relying on the dictation ability of the ASR, the generated text enters such a text field and is captured by the application engine, which adds newly inserted words to the stream.

The PoliLips client, once connected to the server, displays the aural/visual/textual stream.

Conclusions and Future Work

The PoliLips prototype – hardware and software – is ready to be tested with all the deaf students at Politecnico di Milano. Preliminary tests have been carried out, and results are encouraging. A controlled experiment is planned to gather quantitative measures about the effectiveness of the system. We plan to add new functionalities to PoliLips; in particular, the ability to save the aural/visual/textual stream on the student’s laptop; integration with our note-taking application PoliNotes; and automatic compensation of the “barrel distortion” caused by the camera lens.

For the full paper and presentation material as well as the record of the presentation please sign in.

top