VOICE > TAP > User Requirements for Subtitling Prototype (EN)

Hearing impaired users' needs for the VOICE subtitling prototype

< English > < Italiano >

The methodology
The mode
The outline of the interviews
The answers
Storage of the entire captured text for a subsequent use for educational purposes
Video recording of the entire subtitled lecture for a subsequent use (archive)
Simplicity of use of the subtitling: Speaker's training (easy and fast)
Simplicity of use of the subtitling: Use of the system by more persons
Simplicity of use of the subtitling: Multimedia accessories
Displaying of the subtitles for a time long enough to read them
Subtitles' size, colour and font
Background
Languages
Conclusions

The methodology

The survey was carried out with the purpose of addressing the programming of the prototype towards choices, which could meet the users' subtitling needs (deaf, hearing impaired or elderly users).
The adopted methodology involved a series of interviews based on an outline of subjects to be freely treated. The task of the interviewer was to identify indications, which, compared with those highlighted by the other interviewers, could help to set the basis for the macro and microanalysis to be, then, passed to the analysts/programmers who realise the prototype.
It was, then, defined a first investigation level, which could allow to define an outline of the subjects. This was followed by a series of interviews based on such outline and, at the end, by the comparison of the collected information and by its transfer for analysis and programming purposes.

The model

The first interviews for the definition of the model were made to associations' representatives. The interlocutors were chosen on the basis of their level of knowledge (or better synthesis of knowledge), which could rapidly allow to build up the outline.
The answers obtained in this phase would be also an effective basis for the comparison against the needs subsequently highlighted by the interviewed users.

This phase was set up and developed by Giuliano Pirelli of JRC-ISIS and Angelo Paglino of the FBL Software House.
Interviews were made to the representatives of 4 Associations such as Alfa, Cecoev, Afa, Ascolta e Vivi Onlus, a total of 8 interviews, that is, two for each Association, first to an operative representative, then to the top representative, who could confirm the collected information.
In addition to making the targets of the interview more precise, the interviews highlighted an unexpected issue: the high priority given to the request for the storage of the text, that is, of the subtitle, during its generation, for educational purposes.

After a few months we widened the field of the interviewees to schools, in order to get a more neutral opinion and a larger follow-up. Eight new interviews were, then, carried out, with four schools; also in this case, first, with some teachers and then with the schoolmasters. The schools involved in the survey were: Arona II Primary School, Artistic High School of Varese, Scientific High School of Mortara and Scientific High School of Ravenna.

The outline of the interviews

The subjects to be dealt with the interviewees are summarised in the following outline.
This is not a rigid model, it is just a reminder for the interviewer. So, at the end of the interview, it was always possible to collect and classify the information and to generate the input for the programming.

Storage of the entire captured text for a subsequent use for educational purposes.
Video recording of the entire subtitled lecture for a subsequent use (archive).
Simplicity of use of the subtitling:
- Speaker's training (easy and fast);
- Use of the system by more persons in a sequence and, sometimes, at the same time;
- Multimedia accessories (slides, videos and sounds) to keep the audience attention level high;
Displaying of the subtitles for a time long enough to read them.
Subtitles' size, colour and font of the .
Background (coloured or transparent).
Languages.

It is immediately to be remarked that characteristics (size, colour, font, and background) were left at the end of the interview, well even they seem the most obvious subjects to deal with.

The answers

The interviews were carried out, for the initial input, on the occasion of initiatives aimed at the presentation of Project Voice, and then during initiatives when the prototype was used to set up a demo version of the system.
When we started being able not only to "speak about the demo version", but also to show the audience the results achieved by a microphone and a computer, we noticed a higher level of attention, and therefore better suggestions.

This regular interaction between developers and final users generated a series of requests, which were realised progressively in an average short timeframe, and immediately presented to the users for a further validation.
It is still to highlight that the presentation of the demonstrator created a remarkable interest both in the users and in the involved Associations.

We cannot provide the exact number of interviews made, because, often, more people participated to them. From the interviewers' notes we can indicate 283 interviews made. The number is impressive: we were able to collect a lot of information and judgements, since each of the almost 100 presentation initiatives (lectures, seminars, meetings with school classes) was preceded and followed by one or more survey sessions.
For each interview the interviewer kept his or her own outline according to the way it best suited him or her.

The results of the interviews were the subject of a comparison made in 8 meetings hold in Ispra.
The first two (February 20, 1998 meeting with the associations, also some schools were present, at Ispra, and April 19, 1998 at the AFA Association in Cantù) were helpful to draw the outline and to provide the first elements for the programming.
The following two (May 9, Ispra and May 28, 1998 during the Accamatica seminar in Crema) were used to define the analysis of the demonstrator.
Four meetings (September 2-3, 1998 in Vienna with the Austrian associations, November 26, 1998 in Bologna during the Handimatica conference, March 15-17, 1999 at the ASL of Pavia and November 5-6, 1999 with RAI and the European television broadcast networks in Bologna) were hold to check the remarks about the demonstrator.

We must say that in the 8 follow-up meetings we received no particular suggestion, but rather confirmations of the acceptance of the prototype itself and of the programming choices. Also, we were informed about specific needs, which could be hardly foreseen because they stem from extremely particular situations: for instance, very strong contrast and very large fonts for sight impaired users and specific techniques for the training of blind users.

At a certain point, we had to give priority to meetings with teachers and doctors rather than with deaf users, because the latter proved to be particularly impressed by the novelty of the on-line subtitling and, therefore, scarcely prone to carry out critical analyses.

Storage of the entire captured text for a subsequent use for educational purposes

We have already pointed out that the request surprised us, not so much for the request itself, but most of all for the priority given to it.
The availability of the text at the end of the lecture, but mostly, at the end of the lesson given in a classroom, was indicated as a greatly helpful tool to support the student's (both deaf and hearing) attention gaps.

In our opinion (as technicians), the availability on magnetic support (text file) of the lesson allows a more rapid and precise training of the speaker, thus providing the recognition tool with a language, that is, the expressions (form and contents) related to a certain subject.
The text is stored following the pauses of the speaker and this allowed us, during the speakers' training, to analyse together with them these pauses, that is to check if and how sentences have a complete meaning, and to provide an educational tool to improve the teachers' training.
During the follow-up, teachers have always highlighted this last issue, even if not specifically asked for.

Also: being the text in an electronic format, mistakes can be easily corrected and the text can be easily formatted before printing. The storage of the text alone does not involve problems of space on the computer, whereas the storage of the voice files requires a lot of disk space.

Video recording of the entire subtitled lecture for a subsequent use (archive)

The availability of subtitled videotapes seemed a rather "normal" request, since, often, lectures (and sometimes also lessons) are filmed and later subtitled.

The attention was focused on the possibilities for the recognition tool to make mistakes: during the follow-up meetings we could check how the evolution of the recognition tools strongly reduced these worries, even if some mistakes still remain.
The correction of the mistakes on the videotape can be made only with rather sophisticated and expensive techniques.
A minimum level of mistakes is however accepted, provided that it does not impact the meaning of the speech.

A particular request was the following: to add to the text file a "time code", which allows, once the mistakes, if any, are corrected by means of an ad-hoc programme, the creation of a subtitled video starting from two inputs: the videotape and the text file (corrected!).

Simplicity of use of the subtitling: Speaker's training (easy and fast)

The recognition tool used in the demonstrator has its own mechanism to capture information regarding the voice of the speaker and, being Voice Dependent, to optimise its understanding.
The first 30 minutes, required only once for the training, were considered a reasonable amount of time and accepted by all users. The improvement of the recognition tool and the reduction in the training time to 10-12 minutes improved the degree of acceptance also by occasional users.

More complex, but extremely well accepted and with an increasingly higher degree of satisfaction, was the training for the management of the pauses. Almost all of the trained users acknowledged that the system, by imposing rhythms (due to the need of reading the subtitles) and pronunciation rules (to improve the recognition) improves the expression skills. This remark was made, mostly, by teachers.
The overall result is definitely better if the teacher complies, in the preparation of the text of the lesson, with some simple basic rules:

identification of the pauses, thus breaking the text into sentences with a complete meaning,
check of the text by the spelling utility, a function which identifies the words not included in the vocabulary and proposes for their inclusion, but also examines the context, thus making the sentence more understandable.

During the lesson there will be no need to use the same sentences as those previously prepared, but it is important to use the same style (words and context).

Simplicity of use of the subtitling: Use of the system by more persons in a sequence and, sometimes, at the same time

The request for simultaneity stems from lecture rooms: during discussions it would be nice to subtitle all the interventions, even if carried out at the same time.
There is no problem for interventions in a sequence, typical of the school classroom. Since the first release of the demonstrator, there has been the possibility of having, in the same personal computer, more voice profiles, which can be activated one at a time.
Even the more recent and updated recognition tools have voice profile load times which do not match the needs of a discussion; a technically valid alternative, which is however very expensive, is the possibility to assign to each speaker a personal computer with his or her voice profile.

Simplicity of use of the subtitling: Multimedia accessories (slides, videos and sounds) to keep the audience attention level high

The idea of introducing multimedia accessories during the subtitling of a lesson was not of the users, but of the programmers' team.
During a lesson, there are always moments when the attention diminishes; a subtitled lesson, where certain rhythms (acceleration of the voice and reduction in the pauses) are to be avoided, can be boring. The activation of an image or of a video (or of a sound) by means of a voice command is useful to catch the listeners' attention. Not only: it makes the lesson more "modern" and pushes the teacher to get information from Internet, the "enormous container".
Therefore, these aspects were not highlighted in the initial requests, but they were highly appreciated, mostly, by teachers. These techniques were more and more accepted during the validation meetings.

Displaying of the subtitles for a time long enough to read them

The problem was highlighted by the users, that is, by the deaf, who often cannot read the subtitles because they remain on the screen for a too short time.
The determination of the ideal display time is a problem we could not solve, being it a function of the capabilities of each single individual and of his knowledge of the spoken language (oral deaf, signing deaf, hearing foreigner, elderly). A frequently accepted indication experimented by Dr. Ioghà, children's neuro-psychiatrist at the ASL of Pavia, is 30 characters per second (cps). The deaf, that is, reads 30 characters each second and, as a consequence, a subtitle of 90 characters should remain on the screen for at least 3 seconds.

The ways to achieve the results are two, and both have already been implemented.

To train the speaker: responses were on average good, sometimes excellent, sometimes poor. However, teachers have always admitted that this training improved their presentation skills in the classroom.
To introduce a software mechanism, which, by reckoning the characters, ensures a minimum display time. The target is to educate the speaker, since this mechanism, by delaying the display of the subtitle with respect to its pronunciation, provokes an evident phase displacement between lip-reading and the reading of the subtitle, thus forcing the speaker to keep an appropriate pace appropriate for reading, according to the type of users.

Given the impossibility of determining an average ideal value, the programmer chose to give the speaker the possibility to select a pace, which goes from 1 to 50 cps.

Subtitles' size, colour and font

There are standards but they are connected, not only in Italy, to the use of the videotext to subtitle television broadcasts. It is the still analogic broadcasting system, which requires the use of subtitles via videotext, whereas the digital broadcasts will broaden the array of possibilities offered by television networks to meet users' needs.

The currently adopted standard is 35 characters per each line and one or two lines.
Therefore, speaking about choices, in this case, is difficult because everybody today think that television broadcasts' subtitling is, indeed, the only viable option.

The different colours, which characterise dialogues, are used only in the subtitling of television movies and not for news broadcasts.
The programmer introduced the MSWindows colours and fonts tables, thus offering a wider choice of font+size+colour.
The adopted solution obtained the unanimous consensus of users.

Discussions, sometimes, took place about one subject: the use of uppercase or lowercase fonts in the subtitles. Users are equally divided on the choice to adopt, with a slight predominance for ALL UPPERCASE which seems easier to read and does not create problems with names.

We have faced the following problems:

Text on a single line: the text runs through;
Text on two lines;
Text prepared always on the last line and scroll of the lines towards the top of the screen;
Maximum number of lines.

But we have not been able to get indications, if not generically, such as "the line that scrolls up makes me sick", "the reading of the text that runs through is wearing", "three lines are too many but I don't know why".
Therefore, once more it was preferred to increase the fields offered as prototype's "options", in order to let more choices to the final user.

Also the environmental issues were monitored: the teachers or the speakers speak looking at the class or at the audience, a camera films them often providing foreground images which allow lip-reading.

The ideal environment was described as follows:

A screen behind the speaker where the image generated by the personal computer (camera image and subtitles) is projected;
A monitor in front of the speaker who can then check the image and the subtitles (exactness and display timing, thus allowing corrections or slowing down if necessary).

In schools, lights must be kept under control: the class must work in a bright environment, therefore the projector must be powerful enough (700 lumen).
The solution, suggested by someone for cost containment reasons, to install a monitor in front of the deaf student (thus not using the video projector) was not accepted: the student will be the only one who sees the subtitles and will no longer be part of the group-class.
The teacher or the speaker will have to repeat the questions the students or the audience will make: this is the only way to transfer the information to the deaf.

Background

"Coloured or transparent, this is the question" we could recite with the poet.
Transparent is preferred because it does not affect the image, but sometimes the subtitle can be difficult to read because of the lack of contrast with the underlying image itself.
A coloured background band allows to select a font with a colour in contrast with it, therefore the subtitle is always visible, but cuts out the lower part of the image.
Users almost unanimously agreed that the band is preferred for lectures and school lessons and a transparent background is better for movies. Also in this case the programmer preferred to offer a large choise of options available for the user, allowing to face different specific situations.

Languages

The criteria used for the user's selection are the same applied for the language selection and described above.

It requires the installation of the recognition tool in all the languages used in the subtitling: the demonstrator was tested in five languages: Italian, English, French, German and Spanish. Other languages can be added provided that the relevant recognition tool version exists.

In some cases Secondary Junior Schools adopted the product in more languages to teach a foreign language.
The interviewed teachers of Italian for foreigners admitted that they have not developed a sufficient experience to provide a reliable judgement, yet.

Conclusions

The global acceptance level of the demonstrator was very good.

As mentioned above, the users in the Associations of the hearing impaired considered it fully corresponding to their needs.

Contrasting opinions were highlighted only within groups of teachers: some of them, most of them, actually, accepted to use the subtitling system, even being aware of the fact that, at least at the beginning, they would have an increase in workload with the preparation of lessons focused on the multimedia aspects and the trials to look for appropriate paces to manage pauses. Additional workload comes from the preparation of images and videos to be used during the lesson.
These teachers stated that the inclusion of the disabled student in the group-class and the information (text and recorded images) provided by the system justify the effort and the increased workload.
Other teachers assessed the additional workload as too much and decided not to use the system.

July 2000
Angelo Paglino and Giuliano Pirelli

Top of page