On Thursday, 3rd of September 1998, a workshop was held within the Austrian days of the ICCHP. The workshop has been announced on a press conference and also the information that this workshop would be held had been disseminated to all Austrian schools and special education teachers. These efforts resulted in the high number of 65 participants.
The workshop intended to
b. Contents of the Workshop
The contents of the workshop consisted of two parts:
In the first part, a prototype of the voice recognition system was demonstrated. The target of the demonstration was to offer an overview impression of the capabilities and the imperfections of the voice to text recognition system.
Based on the demonstration done in the first part, in the second part of the workshop the audience was invited to discuss the usability of the system.
c. Workshop Programme
Part 1 - presentation
Part 2 - discussion
1. Welcome
Mr Marckhgott welcomes the workshop participants. He shortly presents the background of the project and gives an overview of the activities of the Institut für Hör- und Sehbildung.
2. Introduction
Mr Pirelli introduces the Joint Research Centre in Ispra, Northern Italy. The Joint Research Centre is the European Commission's own research centre. It was created to share, on a European level, the large investments needed to carry out research on nuclear energy. Throughout the years its tasks have developed into other areas in which a common approach on European level is necessary. JRC provides neutral and independent advice in support of the formulation and implementation of the European Union's policies. In addition, it offers unique training services to individuals and companies and organises workshops for scientific and technical workers in advanced sectors of science.
Main objectives of the Project are:
3. Technical aspects of the prototype
One of the objectives of the VOICE Project is the development of a demonstrator necessary in generating awareness and stimulating discussion regarding the possible applications of voice to text recognition. The Project proposes not only the promotion of new technologies in the field of voice to text recognition, but also to stimulate and increase the use of new, upcoming technologies (such as the Internet) with a particular emphasis on the problems that may be encountered by the deaf.
Within the last few years, voice recognition software has improved considerably, while the prices have fallen to a small fraction of the original amount. The hardware which is necessary to run voice recognition software can be regarded as standard: PC Pentium 200 MMX, 64 MB RAM, Creative SoundBlaster, Matrox Millenium Video and Rainbow Runner video card. The voice prototype software consists of a voice recognition package like Dragon Systems NaturallySpeaking or IBM ViaVoice and Video software which is provided with the Matrox Video card. A video camera takes the image of a speaker, while simultaneously the voice recognition software converts the spoken text into written text. The output of these two standard software packages is combined by the newly developed VOICE prototype software and displayed on the screen in the form of a subtitled image.
The hardware and software costs of this prototype are comparatively low in a range between 1500 and 2500 ECU.
4. The
communication needs of the deaf
(Alessandro Mezzanotte / Gernot Kerschbaumer)
The difficulties of the deaf go beyond the loss of hearing itself and underline a more general problem of lack of communication. The auditory handicap creates a barrier to the autonomy of the deaf, who has to base most of his communication on visual tools as channels of information. He is looking for help from the informatics to his needs, as using telematics systems for communication (Internet, E-mail, Internet phone software) and more particularly for television subtitling. The statements underline the importance of the collaboration between deaf and hearing people for the telematics projects and how the VOICE Project may help the users. Voice to text recognition systems, developed just as tools of commercial use, may become a socially useful tool, helping in reaching equal opportunities for the deaf, by its application to the television for providing subtitles of the international TV network.
5. Presentation of the system
Gerold Wagner presents the demonstrator prototype. For the presentation a prepared text is read and subtitled live.
6. Discussion
After the introduction by Klaus Miesenberger, invited speakers present their opinion on the system:
6.1.Hannes Märk, ORF (Austrian broadcasting corporation)
On the one hand the demonstration has been impressive. On the other hand, there still remain open questions: There are still too many errors in the automatically produced subtitles. However, if the system worked reliably, this would be an opportunity for the production of more subtitles at the same costs. In spite of this, in the beginning the introduction of the new systems would mean additional efforts and costs.
6.2.Erich Pammer (special education centre, Freistadt):
The centre for special education in Freistadt has got experiences with the use of computers for special education in the following six areas:
The advances in the area of speech recognition are encouraging. Speech recognition not only makes sense for hearing impaired, but also for spastic students.
6.3.Walter Rainwald (Odilieninstitut, Graz)
Walter Rainwald is a special education teacher at the Odilieninstitut Graz. He talks about his experiences with speech synthesis. It is obvious to him that the monotony of today's speech synthesis results in the fact that use of speech synthesis for the speech output of larger documents is very fatiguing. As a result pupils don't like to use speech synthesis at all. Of course speech recognition can be helpful for hearing impaired persons, but maybe in conjunction with the improvements of voice recognition also speech synthesis can be further improved.
6.4.Klaus Miesenberger (University of Linz)
Blind computer users most of the time use a keyboard for input and a Braille display for the output of the data. Both devices are used with the fingers, which means that users have to move between the Braille display and the keyboard each time they switch between reading and writing. Maybe the use of speech recognition software would allow to dictate the input and at the same time (with a slight delay) to read the entered items on the Braille display.
6.5.Klaus Ortner (Institut für Hör- und Sehbildung, Linz)
People who cannot read sign language could also be supported tremendously by such a subtitling system, be it in conferences or at school. The use of voice recognition in combination with the telephone, may it be used privately, in education or on the job, would offer a great help to close the gap between hearing impaired and normally hearing persons. This would be very important because current telephone systems for hearing impaired persons only work when adaptive devices are used on both sides of the line. The new system would be used by the hearing impaired user only, while the hearing partner uses his normal telephone as usual.
6.6.Hans Domes (Österreichisches Statistisches Zentralamt)
In his job Mr Domes often faces the situation that in a meeting a sign language interpreter is not available. In this case he is not able to follow the discussion. If the demonstrated system could be used in this context, it would help a lot, although he would of course prefer to have a sign language interpreter at hand in every meeting.
6.7.Wolfgang Zagler (Technical University of Vienna)
Some years ago Mr Zagler had a vision that in some 15 years hearing impaired persons would be supported by some special glasses which provide subtitles of the text spoken by his communication partner. Today it has come clear to him that a part of this vision will come true in the near future.
6.8.Anton Neuber (IBM Austria)
He gives an overview of the technical issues of speech recognition software, starting with the incoming audio signal and ending with the written text on the screen. He also mentions the wide range of applications of speech recognition systems. Normally these systems work reliably when the spoken texts are of a certain context, e.g. at a lawyer, but not changing the context to often.
His personal experiences with voice recognition software are encouraging: A spastic person who could hardly be understood by humans, could train the system and afterwards the voice recognition worked reliably. He also knows a person who has written his thesis by use of voice recognition.