The difficulties of the deaf are beyond the loss of hearing itself, and underline a more general problem of lack of communication. Automatic recognition of speech in conversation, conferences and telephone calls, with their translation into PC screen messages, could be a powerful help in reducing the gap between the deaf and the hearing world. The paper presents an overview of the VOICE Project, an EC Telematics Programme Accompanying Measure. The Project proposes the promotion of new technologies in the field of voice to text recognition and to unite, by means of an Internet VOICE Forum, Associations, producers and organisations interested in this research.
2. The Joint Research Centre of the European Commission
The Joint Research Centre (JRC) is the EC own research centre. It was created to share the large investments needed to carry out research on nuclear energy. Over time its tasks have developed into other areas in which a common approach on EU level is necessary. JRC provides neutral and independent advice in support of the formulation and implementation of the EU policies. In addition, it offers unique training services and organises workshops in advanced sectors of science.
2.1. JRC's Institute for Systems, Informatics and Safety (ISIS)
The Institute for Systems, Informatics and Safety (ISIS), based at Ispra in Italy, is the JRC impartial centre of expertise in the multi-disciplinary analysis of industrial, socio-technical and environmental systems, the innovative application of information and communication technology, and the science and technology of safety management. The activity areas of the ISIS's Unit for Software Technologies and Automation (STA) include dependable software applications (safety critical computing systems, requirements engineering), multimedia network applications (animation in medical imaging, multimedia techniques in training and education), sensor based applications (surveillance techniques, 3D reconstruction of real environments, learning approaches for control), robotics and remote handling.
2.2. JRC-ISIS's Exploratory Research Programme
JRC-ISIS's role in 1996 in the previous themes was oriented towards the provision of scientific and technical support to the EU services and initiatives. Moreover, a levy of 6% of the institutional budget was used to finance Exploratory Research. In 1996 the scientific staff of ISIS made a total of 65 proposals. The ISIS Scientific Committee judged the proposals on originality, appropriateness, soundness and cost and produced a shortlist of 16 proposals, 12 of which were then funded. In particular, two projects are carried out by the STA Unit, concerning the interface between Life Science and information technology to provide help for the disabled and the elderly: Information technology aids for people with special needs - Voice to text conversion for the deaf; Brain-actuated control - using EEG pattern recognition to help the disabled.
3. Applications of voice to text recognition for the deaf
Although voice to text recognition packages are marketed primarily as a means of allowing people in businesses to create documents without using the keyboard, it is an application that holds great advantages for the hearing impaired, blind, physically handicapped and elderly, as well as people without special needs. These systems are reaching a very good level of development and begin to be widely available for PCs. The software that until now could only recognise words separated by short pauses, is being replaced by new releases, which present very significant improvements and recognise continuous dictated text. Finding solutions and ways of adapting such software for the use of a disabled person is encouraged by this increase in market, affordability and user-friendliness.
The difficulties of the deaf are beyond the loss of hearing itself,
and underline a more general problem of lack of communication. One of
the main forms of modern communication, the telephone, is of no or of
very little use to this community for oral communication (while it is
useful for the transmission of faxes). Other modern means of communication,
although not completely useless, generate frustration by providing only
part of the information in a form accessible to them. An example is
the television which, when not subtitled, supplies very limited information.
In some European countries, it is usual to think that hearing impaired
people would had difficulties trying to learn to lip-read and speak
and should therefore make use of sign language and attend special schools.
In others there is another approach to the problem. In Italy the law
encourages the integration of deaf children in the normal schools, with
a remedial teacher, without the use of sign language. Some Associations
of the deaf, like ALFA in Milan, are getting very good results from
helping the children following this approach, and do so with children
joining primary school right through to those finishing the University
and finding job afterwards. Despite the fact that good results are achievable,
they demand an enormous effort, which could be greatly reduced through
the use of new technologies.
A widely used application is the Teletext subtitling of television
transmissions, very powerful help for deaf people. The importance of
the educational aspect lies in the fact that subtitles are for a deaf
child one the most powerful learning tools, just as a hearing child
would learn from things it heard. Similarly it gives hearing impaired
adults the opportunity to enrich their vocabulary. Since subtitling
is the result of a manual preparation of files to be transmitted via
Teletext, most of the subtitled transmissions are films. Subtitling
of live programs and of the news is rarely performed.
Subtitling of conferences, even those addressed to the deaf, is usually
not available. Sign language interpreters provide a significant help
for the deaf used to sign language, but other deaf participants or partially
hearing impaired, elderly, foreigners are unable to understand sign
language. Moreover this activity is lost after the conference, being
of no use for producing proceedings or abstracts.
In telephone communication, Text-telephones have already proved themselves
vital from a deaf person's point of view. These systems do, however,
present one major problem, that all people wishing to contact a deaf
person on such a machine must possess one themselves. This makes such
a means of communication awkward and expensive, both for the deaf and
those they wish to call.
3.3. Autonomy and quality of life
When more conferences, meetings and discussions slowly become subtitled, there will be a larger participation from the deaf. By an increase in subtitling capabilities, television will become a more useful source of information. The use of subtitles in the telephone calls, which involve everyday communication in society, will greatly increase the interaction of the deaf community. This contribution will increase the effect that their decisions have on the surrounding environment, which will subsequently improve their standard of living. An easier access to schools and universities will allow a more satisfying life and also a better choice of a work corresponding to personal capabilities and, at large, more economic productivity for the society.
3.4. Market situation and prospects
It is worth remembering that the market of the hearing impaired consists
of between 1% and 5% of the population (according to the degree of the
hearing loss), which represents millions of people in Europe. This field
can be enlarged to take into account also those loosing their hearing,
having hearing problems and even normal hearing people who cannot hear
due to the noise in their environment. Moreover, a lack of communication
similar to that experienced by the deaf also affects the disadvantaged,
the people living in foreign environments and the elderly. When united
this group consists of more than 30% of the total population.
The new products seem well suited for the needs of the deaf. The modification
necessary for some tests are of limited extent, but the deaf rarely
has the technical awareness and the social power in order to address
the activities. Nevertheless this could be an opportunity of a great
interest for the producers of speech recognition systems, since the
deaf could accept the present limited accuracy of recognition, as a
complement to his lip reading skill. Even the more limited accuracy
of recognition over the telephone line, is an interesting starting point
for the deaf. The Associations of the deaf are considered both as the
most interested and critical user group for all the possible applications
in this area, and thus the most motivated for testing a system which
will be improved for all users, also in related fields, such as video-telephones
or on-line television subtitling in several languages.
Hardware, software and services producers hesitate to invest more, since the user needs are not translated into technical specifications and are sometimes not even known. On the other hand, the Associations of the disabled have a limited overview of possible technical new solutions and rarely have the opportunity to participate in the feasibility studies of new projects. What lacks is a better definition, from a technical point of view, of the needs of the disabled to enhance collaborative work between technicians and non-technicians. A broader co-operation and a European dimension are of great importance, allowing a large-scale factor for the study and the development of technical aids and ensuring a large impact of the results. Also the multilingual aspects should be considered at a European level, since most of the concerned Associations are only at a National level.
4.1. The VOICE Project's first steps
JRC-ISIS has undertaken, as from the beginning of 1996, a number of
the tasks here described. The first step was the set up of a VOICE Laboratory
provided with the necessary software, hardware and network capabilities.
Contacts with producers of speech recognition systems, research centres,
telecommunications firms and television broadcasters, created a coherent
overview of the state of the art. Regular contacts with the Associations
of the deaf gave the opportunity of analysing the special needs, resulting
from difficulties in hearing and in speech, in many aspects of the everyday
life.
In view of facilitating the contacts and establishing a common goal,
JRC-ISIS gave some Associations the opportunity of creating a VOICE
Forum on the Internet, by allocating space for them on a Web server
and providing technical assistance. Since then, the Associations have
shown great interest in participating to the Project. The VOICE Forum
begins to be a known Internet site and several Associations of the hearing
impaired are adding information to it or communicate their interest
in testing the demonstrator and participating to the foreseen meetings
and workshops.
4.2. The VOICE Project - a Telematics Applications' Accompanying Measure
We felt that all the activities started at JRC-ISIS with the collaboration
of its Italian Partners, could get a particularly important push if
the tests and the dissemination of the results could be organised in
several countries. So we enlarged our group, proposing as first to the
Educational Endeavour Computer Science for the Blind of the Institute
for Computer Science of the Johannes Kepler University in Linz and to
the Institute for Auditory and Visual Training (IHSB) in Linz to join
us.
We prepared a proposal for an Accompanying Measure, which we submitted
to the Telematics Applications Programme Call in April 1997. The proposal:
VOICE - Giving a VOICE to the deaf, by developing awareness of VOICE
to text recognition capabilities, has been selected and we are at present
(March 1998) in the last negotiation phases for starting the Project.
The Project proposes to continue the activities in this field, increasing
the contacts to a broader European dimension and disseminating the awareness
of the capabilities of voice to text recognition systems. The Project
will provide an Accompanying Measure playing a technical and social
role in collecting information and presenting it in a coherent way to
the producers of speech recognition systems and researchers. The aim
is that of disseminating information on how the producers may help the
users with disabilities by limited improvement of their standard products
and on how the users with special needs may collect useful information
and translate it into technical specifications.
JRC-ISIS is acting as scientific and technical co-ordinator of the Project,
developing several specific aspects of the research. FBL software house,
experienced in applications of speech recognition to the disabled, is
developing additional software and integrating it into the demonstrator
to turn off-the-shelf voice to text recognition packages into user-friendly
programs modelled on the requirements of the users. Each step of the
activity is discussed and checked with ALFA and CECOEV Associations
of the deaf in Milan. Kepler University examines the Italian results,
verifying their validity in Austria and helping IHSB in the Austrian
validation phase.
5. Objectives and strategic approach
Main objectives of the VOICE Project are: to investigate into voice to text recognition for automatic subtitling of conferences, school lessons, television transmissions and telephone conversations; to spread the use of general purpose voice to text recognition systems and to improve the prototypes developed until now; to demonstrate the prototypes to relevant organisations and in international conferences; to use a VOICE Forum on the Internet as a Project tool for collecting and spreading information on technical aids for the deaf.
5.1. Technical aspects of the demonstrator
One of the objectives is the set up of a cluster of demonstrator applications
related to voice to text recognition, on the basis of a multimedia laboratory
prototype. The system could be of use for conferences and live television
transmissions subtitling. This operational capability involves the manipulation
of both: the functions available on the commercial dictation packages
and the generated text (converting strings of text into groups of subtitles,
positioning them against blank screens, displaying them with video signals
and providing various other options).
The system will be of help for any user in producing at the same time
a first draft of conferences' proceedings. A prototype has been developed
for generating subtitles of live television transmissions and broadcasting
them by Teletext systems. A different approach is also being considered
for specific television transmissions or radio broadcasts, in order
to make the generated subtitling lines available through the Internet.
The subtitles do not necessarily have to be created by broadcasting
companies themselves. Independent members of the public with the correct
equipment and programs could listen to the radio or television, summarise
what is being said into a microphone and the subtitles will be broadcast
world-wide over the Internet.
For the use of voice to text recognition with telephones, the basic
principle is that a person would speak down the phone line, the message
would be passed into a PC at the deaf person's end and the words (via
some form of voice to text recognition) would be printed out on his
screen. In this situation only the deaf person would need the appropriate
equipment. The application will also include a text to speech system
to allow the deaf person to reply (should he/she have difficulties in
speaking), which may also be useful in providing the person at the dictating
end with feedback on whether what was said has been recognised correctly.
The VOICE Project, according to JRC-ISIS background and TIDE policy, is looking for developing prototype applications using, as far as possible, hardware and software commonly available on the market. This allows reducing development costs and times as well as the future maintenance of the products. Moreover this helps in improving the quality of products for the normal market, for any user, eliminating new barriers that often are created by new information technology tools. The cost of some voice to text recognition commercial package is about 100 ECU (10% of the original price). The approximate cost of the basic structure for the application (a Pentium PC 200 MMX with 64Mbyte Ram, Cd-Rom and Soundblaster16 Sound card) is 1500 ECU. This includes a fully operational PC that can be used in many other useful ways. A great advantage is also the fact that the system is not dependent on any particular company or software release.
5.3. Conferences, school lessons
All the phases of the Project will be developed with continuous and
tight participation of the users. Several European conferences and workshops
will be organised in view of helping them to discuss their needs with
the industry and services providers: ICCHP-98, Vienna and Budapest,
August 98; HANDImatica-98, Bologna, November 98; Linz, first semester
99; JRC-Ispra, second semester 99. The demonstrator will be presented
and used for generating prototype live subtitling for the deaf participating
to the conferences. The meetings will not only concern the technical
aspects, but will also try to bring the manufacturers and producers
closer to the users' needs.
The prototype system has been presented to some schools, where it will
be tested in real situations of use for subtitling school lessons for
the benefit of the deaf students. It will visualise the dialogue pronounced
during the foreign language lessons, for the benefit of the hearing
students, or the lessons of the host country's language for the benefit
of any user, particularly the immigrated. Some tests have been also
foreseen for subtitling university lessons and printing summaries.
One of the Project's aims is to stimulate and increase the use of new,
widely diffused technologies, namely the Internet. The objective is
that of uniting, by means of an Internet VOICE Forum, Associations,
companies, universities, schools, public administrations and anyone
else interested in voice recognition that could benefit from such research.
The Forum will become an intermediary between the different concerned
groups and will help in collecting information on the user needs and
on the validation of the prototype demonstrator, as well as in disseminating
the results.
At present JRC hosts and maintains the sites of AFA, ALFA, CECOEV and
ENS Associations of the deaf, with information including: Statutes,
contact numbers and addresses, meetings, electronic copies of a selection
of their newspaper, a research carried out into the hours and accuracy
of the television broadcasters, a list of their archive of subtitled
videocassettes. The current site provides a very b foundation on which
the creating awareness side of the VOICE Project can be built. This
is an important part of the Project itself, since it demonstrates, to
all those involved, the effectiveness of this means of communication
for the deaf community.
5.5. VOICE Special Interest User Group
The linguistic aspect of the software packages has been considered choosing software packages already available in several European languages. Since most of the new IT packages are produced in English language, JRC-ISIS is testing them in English and the users in Italian and German, as to cover different linguistic approaches. The acquired know-how will be made available for applications in the other languages. Some contacts have been already established with the University of York and the NDCS Association in UK, French ANPEDA and Belgian APEDAF and TELECONTAC, which showed interest in following the Project. As complement to the VOICE Forum, a VOICE Special Interest User Group is being created and will hold its first meeting during the ICCHP-98 Conference in Vienna. It will provide the Project with a larger audience and will participate to the peer review of the deliverables for which this is appropriate.