European Commission Joint Research Centre, Institute for the Protection and Security of the Citizen, VOICE Project
English
Home
Projects
Events
Education
Media
VoiceLab
Forum
Path:
Events > 1998 > ICCHP-'98 > The VOICE Project

Presentations of the VOICE Project and of the prototype demonstrator

THE VOICE PROJECT
Giving a VOICE to the deaf
by developing awareness of VOICE to text recognition capabilities

ICCHP Congress, Vienna 1998

Giuliano Pirelli
Istituto per i Sistemi, l'Informatica e la Sicurezza

voice@jrc.it
--> http://voice.jrc.it


 

Abstract: the difficulties of the deaf go beyond the loss of hearing itself and underline a more general problem of lack of communication. The paper presents an overview of the VOICE Project, a European Commission's Telematics Programme Accompanying Measure. The Project is chaired by the Institute for Systems, Informatics and Safety of the Joint Research Centre, in collaboration with Kepler University of Linz, Software Solutions and FBL software houses near Milan, ALFA and CECOEV Associations of the deaf of Milan, the Institut for the deaf of Linz. The project proposes the promotion of automatic recognition of speech in conversation, conferences, television broadcasts and telephone calls, with their translation into PC screen messages. It also proposes to stimulate and increase the use of new, widely diffused technologies, namely the Internet, with the objective of uniting, by means of an Internet VOICE Forum, Associations, companies, universities, schools, public administrations and anyone else, who may be interested in voice recognition and could benefit from such research.

1. The Joint Research Centre of the European Commission

The Joint Research Centre is the European Commission's own research centre. It was created to share, on European level, the large investments needed to carry out research on nuclear energy. Over time its tasks have developed into other areas in which a common approach on European level is necessary. JRC provides neutral and independent advice in support of the formulation and implementation of the European Union's policies. In addition, it offers unique training services to individuals and companies and organises workshops for scientific and technical workers in advanced sectors of science.

The activity areas of the Institute for Systems, Informatics and Safety (ISIS) and of its Unit for Software Technologies and Automation (STA) include the innovative application of information and communication technology, dependable software, animation in medical imaging, network multimedia techniques in training and education.

1.1. JRC-ISIS's Exploratory Research Programme

JRC-ISIS's role in 1996 in the previous themes was oriented towards the provision of scientific and technical support to the EU services and initiatives. Moreover, a levy of 6% of the institutional budget was used to finance exploratory research. In 1996 the scientific staff of ISIS made a total of 65 proposals. The ISIS Scientific Committee judged the proposals on originality, appropriateness, soundness and cost and produced a shortlist of 16 proposals, 12 of which were then funded. In particular, two projects are carried out by the STA Unit concerning the interface between Life Science and information technology to provide help for the disabled and the elderly:

Information technology aids for people with special needs - Voice to text conversion for the deaf

Brain-actuated control - using EEG pattern recognition to help the disabled.

I had the honour of relating to the ICCHP-96 Conference on the starting of these Projects. Since then, they achieved encouraging results and are providing a better definition of the requirements of people with special needs and a more collaborative work between technicians and non technicians, in these interdisciplinary activities.

1.2. The VOICE Project's first steps

JRC-ISIS has undertaken, as from the beginning of 1996, a number of the tasks described later on in more details and related to integrating voice to text recognition into local conversation and telephone conversation for the hearing impaired. The objective was the development of a demonstrator necessary in generating awareness and stimulating discussion regarding the possible applications of voice to text recognition. Technical objective of this research was the set up of a cluster of laboratory prototype applications related to voice to text recognition, intended for any user and particularly for the deaf. This VOICE Laboratory included the necessary software, hardware and network capabilities.

Contacts with producers of voice to text recognition systems, research centres, telecommunications firms and television broadcasters, created a coherent overview of the state of art in voice to text recognition, voice analysis and text to speech systems. Regular contacts with the Associations of the hearing impaired gave the opportunity of analysing the special needs, resulting from difficulties in hearing and in speech. Applications of information technology have been considered, in relation to a general problem of lack of communication, in many aspects of the life of the deaf (and in a different way for the blind) and of the elderly.

With the aim of facilitating the contacts and establishing a common goal, JRC-ISIS gave some Associations the opportunity of creating a VOICE Forum on the Internet, by allocating space for them on a Web server and providing technical assistance. Since then, the Associations have shown great interest in participating to the Project and they feel reassured by JRC-ISIS' mission, as an impartial European R&D centre, with expertise in innovative applications of information technology. The VOICE Forum begins to be a known Internet site and several Associations are adding information to it or communicate their interest in testing the demonstrator and participating to the foreseen meetings and workshops.

Additional software is being created and integrated into the system to turn commercial speech recognition packages into user-friendly programs modelled on the requirements of the users. The technical part of the Project is developed in collaboration with FBL software house, which is experienced in applications of voice to text recognition systems to the disabled. The final operational capabilities of the demonstrator is to achieve a necessary standard of functionality (in order to prove the validity of such applications to companies and manufacturers) in subtitling school lessons, conferences, television transmissions, telephone calls.

2. Applications of voice to text recognition for the deaf

Although voice to text recognition packages are marketed primarily as a means of allowing people in businesses to create documents without using the keyboard, it is an application that holds great advantages for the hearing impaired, blind and physically handicapped, as well as people without special needs.

2.1. State of the art

A great deal of money and man-hours have been invested in developing voice to text products in the last ten years, but the progress only in the last three years has been very noticeable. This is in part due to the wider diffusion of PC's with greater processing power. Voice recognition systems are reaching a very good level of development and begin to be widely available for PCs. They are used by lawyers for preparing drafts that will be read and checked for errors and by radiologists, who do not have their hands free and make use of a very specific dictionary. The software that until now could only recognise words separated by short pauses (disjointed speaking), is being replaced by new releases, which present very significant improvements and recognise continuous dictated text (continuous speaking).

Our interests are concentrated on systems that run on PC's since they are more affordable and appropriate to the final user. In this sector, IBM and Dragon Systems offer systems working in several European languages. Finding solutions and ways of adapting such software for the use of a disabled person is in fact encouraged by this increase in market, affordability and user-friendliness.

A widely used application is the subtitling of television transmissions, very powerful help for deaf people, particularly for the language learning and training for deaf children. The importance of the educational aspect lies in the fact that subtitles are for a deaf child one of the most powerful learning tools of any language, just as a hearing child would learn from things it heard. Similarly it gives hearing impaired adults the opportunity to enrich their vocabulary. Since subtitling of television transmissions is the result of a manual preparation of files to be transmitted in Teletext format, most of the subtitled transmissions are films. Subtitling of live programs and of the news is rarely performed.

Subtitling of conferences, even those addressed to the deaf, is usually not available. Sign language interpreters provide a significant help for the deaf who knows sign language, but other participants or partially hearing impaired, elderly and foreigners are unable to understand sign language. Moreover this activity is lost after the conference, being of no use for producing proceedings or abstracts.

In telephone communication, Text-telephones have already proved themselves vital from a deaf person's point of view. These systems do, however, present one major problem, that is, all people wishing to contact a deaf person on such a machine must possess one themselves. This makes such a means of communication awkward and expensive, both for the deaf and for those they wish to call.

2.2. User needs

The difficulties of the deaf are beyond the loss of hearing itself, and underline a more general problem of lack of communication. Help in reducing the gap between the deaf and the hearing world should be enforced. Automatic recognition of speech in conversation, conferences and telephone calls, with their translation into PC screen messages, could be a powerful help. Please refer to an other paper (The VOICE Project - Part 3 - The communication needs of the deaf), presented by Alessandro Mezzanotte, President of CECOEV, to the VOICE Workshop.

Hearing impairment is a particularly important disability to be taken into consideration since it affects people of all ages and is something that often becomes worse with age. It is also important since one of the main forms of modern communication, the telephone, is as yet of no or of very little use to this community for oral communication (while it is useful for the transmission of faxes). Other modern means of communication, although not completely useless, generate frustration by providing only part of the information in a form accessible to them. An example of this is the television which, when not subtitled, supplies very limited information.

In some European countries, it is usual to think that hearing impaired people would had difficulties trying to learn to lip-read and speak and should therefore make use of sign language and attend special schools. In others there is another approach to the problem. In Italy the law encourages the integration of deaf children in the normal schools, with a remedial teacher, without the use of sign language. Some Associations, like ALFA in Milan, are getting very good results from helping the children following this approach, and do so with children joining primary school right through to those finishing the University and finding job afterwards. Despite the fact that good results are achievable, they demand an enormous effort, which could be greatly reduced through the use of new technologies.

2.3. Market situation and prospects

It is worth remembering that the market of the hearing impaired consists of between 1% and 5% of the population (according to the degree of the hearing loss), which represents millions of people in Europe. This field can be enlarged to take into account also those loosing their hearing, having hearing problems, who can hear but are vocally impaired and even normal hearing people who cannot hear due to the noise in their environment. Moreover, a lack of communication similar to that experienced by the deaf also affects the disadvantaged, the people living in foreign environments and the elderly. When united this group consists of more than 30% of the total population.

The new products seem well suited for the needs of the deaf. The modification necessary for some test are of limited extent and could have been foreseen and developed by the producers of voice to text recognition systems, if only they could have the time and the willingness of concentrating on this aspect. But the rapid grow in the voice recognition systems has as a consequence the fact that the experts in this field are very few and they work on the development of other aspects of more immediate use.

Nevertheless this could be an opportunity of a great interest for the producers of speech recognition systems, since the deaf could accept the present limited accuracy of recognition, as a complement to his lip-reading skill. Even the more limited accuracy of recognition over the telephone line, is an interesting starting point for the deaf. The Associations of the deaf are considered both as the most interested and critical user group for all the possible applications in this area, and thus the most motivated for testing a system which will be improved for all users, also in related fields, such as video-telephones or on-line television subtitling in several languages.

The proposed alterations or additions to existing software could be easily added to future releases by the software producers interested in enlarging their targeted market. This will improve the quality of life of persons who at present have difficult access to information and communication. The proposed demonstrator will enhance a better use of standard products and the definition of new services. The market is ready to accept and spread them, as soon as their quality will be improved and considered good by the users.

3. European Dimension

Hardware, software and services producers of voice to text systems hesitate to invest more, since the user needs are not translated into technical specifications and are sometimes not even known. On the other hand, the Associations of the disabled have a limited overview of possible technical new solutions and rarely have the opportunity to participate in the feasibility studies of new projects. Those who have to take decisions in associations, institutions, political bodies, information technology factories, telecommunications services need for a valid reference point. All the concerned parties look for Positive Actions, which might be of specific use to them and an important reference for others.

What lacks is essentially a better definition, from a technical point of view, of the needs of the disabled to enhance collaborative work between technicians and non-technicians. The VOICE Forum could play an important role in this field and the European dimension of such a broader co-operation is of great importance, allowing a scale factor for the study and the development of technical aids and ensuring a large impact of the results. This will improve the mobility and the accessibility to information, offering an additional means to participate fully in the information society and improving the quality of life.

There are technical solutions, at a pre-competitive stage, to help the deaf and an effort is required to promote them at EU level, so as to benefit of a large scale factor. Also the care of multilingual aspects should be considered at a European level, since most of the concerned Associations are only at a national level: JRC-ISIS will provide know-how independent of the language. The expertise of the Partners, the previous analysis of the user needs, the availability of laboratories (hardware/ software) as well as of demonstrations, the experience in organising meetings and workshops, will help in expanding the present VOICE Forum at EU level and to use it as an Internet server for the deaf.

3.1. The VOICE Project - a Telematics Applications' Accompanying Measure

We felt that all the activities started at JRC-ISIS with the collaboration of its Italian Partners, could get a particularly important push if the tests and the dissemination of the results could be organised in several countries. So we enlarged our group, proposing at first to the Institute for Computer Science of the Johannes Kepler University of Linz and to the Institut für Hör- und Sehbildung (IHSB) of Linz to join us.

We created a Consortium of partners with whom we could collaborate on the Project. In order to bring the activities to an broader European level, we prepared a proposal for an Accompanying Measure, which we submitted to the Telematics Applications Programme Call in April 1997. The proposal: VOICE - Giving a VOICE to the deaf, by developing awareness of VOICE to text recognition capabilities, has been selected and we are at present (April 1998) in the last negotiation phases for starting the Project.

The Consortium proposes to continue and enlarge the activities in this field, and to develop awareness of the capabilities of voice to text recognition systems. The Consortium will play a technical and social role in collecting information and presenting it in a coherent way to the producers of voice to text recognition systems and researchers. The aim is that of disseminating information on how the producers may help the users with disabilities by limited improvement of their standard products and on how the users with special needs may collect useful information and translate it into technical specifications.

JRC-ISIS is acting as scientific and technical co-ordinator of the Project and is developing several specific aspects of the research. FBL software house, which is experienced in applications of speech recognition to the disabled, is developing additional software and integrating it into the demonstrator to turn commercial speech recognition packages into user-friendly programs modelled on the requirements of the users. Each step of the activity is discussed and checked with ALFA and CECOEV Associations of the deaf in Milan. Kepler University examines the Italian results, verifying their validity in Austria and helping IHSB in the Austrian validation phase.

3.2. Objectives and strategic approach

Main objectives of the VOICE Project are: to investigate into voice to text recognition for automatic subtitling of conferences, school lessons, television transmissions and telephone conversations; to spread the use of general purpose voice to text recognition systems and to improve the prototypes developed until now; to demonstrate the prototypes to relevant organisations and in international conferences; to use a VOICE Forum on the Internet as a Project tool for collecting and spreading information on technical aids for the deaf.

The VOICE Project proposes not only the promotion of new technologies in the field of voice to text recognition, but also to stimulate and increase the use of new, widely diffused technologies, namely the Internet. The objective of the project is that of uniting, by means of an Internet VOICE Forum, Associations, companies, universities, schools, public administrations and anyone else, who is interested in voice recognition and could benefit from such research. The Forum will become an intermediary between the different concerned groups and will help in collecting information on the user needs and on the validation of the prototype demonstrator. It will enhance collaborative work between technicians and non-technicians and will help in disseminating the results.

At present JRC hosts and maintains the sites of AFA, ALFA, CECOEV and ENS Associations of the deaf, with information including: Statutes, contact numbers and addresses, meetings, electronic copies of a selection of their newspaper, a research carried out into the hours and accuracy of the television broadcasters, a list of their archive of subtitled videocassettes. The current site provides a very strong foundation on which the creating awareness side of the VOICE Project can be built. This is an important part of the Project itself, since it demonstrates, to all those involved, the effectiveness of this means of communication for the deaf community.

All the phases of the Project will be developed with continuous and tight participation of the users. Several European conferences and workshops will be organised in view of helping them to discuss their needs with the industry and services providers: ICCHP-98, Vienna and Budapest, August 98; HANDImatica-98, Bologna, November 98; Linz, first semester 99; JRC-Ispra, second semester 99. The demonstrator will be presented and used for generating prototype live subtitling for the deaf participating to the conferences. The meetings will not only concern the technical aspects, but will also try to bring the manufacturers and producers closer to the users' needs.

The Partners of the Consortium represent different sectors of experience and of activity (research, universities, private IT companies, Associations of the deaf and of their families, Institute for the deaf) and may ensure the complementary skills in order to cover all the aspects of the VOICE Project. ALFA, CECOEV, IHSB, whose members are more than one thousand, represent three different ways of approaching the problems of deafness, due to different culture and language aspects. JRC-ISIS, as an impartial European R&D centre, is in an ideal position to facilitate the dissemination of information and understanding of user-requirements.

The linguistic aspect of the software packages has been considered choosing software packages already available in several European languages. Since most of the new IT packages are produced in English language, JRC-ISIS is testing them in English and the users are doing so in Italian and German, as to cover different linguistic approaches. The acquired know-how will be made available for applications in the other languages. Some contacts have been already established with the University of York and the NDCS Association in UK, French ANPEDA and Belgian APEDAF and TELECONTAC, which showed interest in following the Project. As complement to the VOICE Forum, a VOICE Special Interest User Group is being created and will hold its first meeting during the ICCHP-98 Conference in Vienna. It will provide the Project with a larger audience and will participate to the peer review of the deliverables for which this is appropriate.

3.3. Technical aspects of the demonstrator

One of the objectives is the extension of a cluster of demonstrator applications related to voice to text recognition, some of which have been developed on the basis of a multimedia laboratory prototype. The system could be of use for subtitling conferences, television transmissions and telephone conversations. It involves integrating standard speech recognition software into flexible applications that will help in ensuring low costs and easy use. The technical aspects are described in a second paper (The VOICE Project - Part 2) presented to ICCHP-98. In view of an other objective of the Project, which is the VOICE Forum, the laboratory will also provide a means of generating and managing Web pages on the Internet, as well as e-mail capabilities.

On the basis of the first experiences, a new prototype demonstrator of automatic subtitling of conferences, based on speech recognition, has been developed. It has been presented in the first quarter of 1998 to some schools that had declared their interest to participate to the Project. The presentation of the Project has been followed by a simulation of a school lesson, with topics on literature, history, world explorations, spatial geography, electronics and art, by using the prototype demonstrator for subtitling the speaker's voice.

The prototype will be tested in real situations of use for subtitling school lessons for the benefit of the deaf students. It will visualise the dialogue pronounced during the foreign language lessons, for the benefit of the hearing students, or the lessons of the host country's language for the benefit of any user, particularly the immigrated. Some tests have been also foreseen for subtitling university lessons and printing summaries. The use of the VOICE Forum and of the Internet will be encouraged, since this aspect is particularly important for the deaf in order to communicate with his hearing friends for home works and social contacts.

 

4. Final goal, autonomy and quality of life

The impact that such a Project may have is enormous, changing several aspects of every day life for an important portion of the population. At present, the difficulties in communication maintain the deaf community rather isolated from the world of the others. This demands relevant costs for sign language interpreters or not automatic subtitling. Moreover these services are not available in meetings which are considered less interesting for the deaf, thus increasing the communication difficulties of the hearing impaired and their feeling of being obliged to a few specific fields of interest.

A wider diffusion of subtitles will greatly increase the interaction of the deaf with each other as well as with the society in which they live. When more conferences, meetings and discussions slowly become subtitled, there will be an increasing in participation from the hearing impaired community. Once started this improvement of their integration and interaction in the society will have a snow ball effect and it is therefore: this initial push that is so vital.

By an increase in subtitling capabilities, television will become a more useful source of information. The use of subtitles in the telephone calls, which involve everyday communication in society, will greatly increase the interaction of the deaf community. This contribution will increase the effect that their decisions have on the surrounding environment, which will subsequently improve their standard of living. If it will be possible to close a huge gap in the distancing caused by inappropriate means of communication, also the national spending for benefits for the deaf will be reduced. More integration in society and more autonomy in they every day life are the basis for any further improvement. An easier access to schools and universities will allow a more satisfying life and also a better choice of a work corresponding to personal capabilities and, at large, more economic productivity for the society.

The demonstrator will be tested not only on a technical point of view, but also as opportunity for discussing other problems related to the technical ones. The different implications will be discussed with the users, the producers of voice to text systems, television broadcasters, telecommunications firms, etc. in order to see, foresee and understand the problems that will come out in the exploitation of the systems. The Consortium will be in some way at the disposal of the Associations of the deaf, that may contact the developers, as representatives of the needs of a large group of users, and clarify some precise technical points. Thanks to the gained experience, the deaf users should be in a position as to influence, by valid technical results, some aspects of the commercial development of Voice products and to convince the services producers of the opportunity of using the newly available products.

We feel that the proposed way of managing the pauses in speech (as it is explained in the aforementioned second paper) gives a very deep feeling of communication between the speaker and the audience. The speaker may so decide at each moment the rate of speaking in function of the audience, of their familiarity with the dictionary, of their being fluent in reading, etc. This proposal is quite different from many other projects, since we do not propose to develop specific software. We just feel that the commercial products will reach good results in the near future and we try to convince the producers to take into account the needs of the deaf. At the same time we try also to help the deaf to get ready to use the systems and explain their expectations to the services providers.

The technical goal of the Project is to develop a prototype with just the basic functions for holding conferences. The final aim is not that of developing a final commercial tool, but on the contrary that of using a prototype demonstrator of limited life time (possibly less than the two years' life of the Project) for disseminating awareness so that the producers will include some of the basic functions of the demonstrator into their standard commercial products.


Map
Accessibility
Copyright
Contacts
top of page