ven though more than a century has passed since Alexander Graham Bell
invented the telephone in 1876, it is quite amazing how little the
technology has changed.
Most of us are still using our most primeval communication method -
voice - to communicate with another party. Video telephony and
Internet-based video-conferencing have not had much impact; in fact, a
lot of people have reservations about sending their own video images to
the other party. It is still a mystery why there is this instinctive
aversion to chatting 'face-to-face' with someone else.
However, it is a different story with text-based chat systems. These
are growing at a tremendous rate on the Internet today. This interesting
aspect of human behaviour suggests that people are uncomfortable about
revealing their emotional state, and can converse more freely if they
hide their identity.
However, text-based communication can be monotonous and tedious. For
avid Internet chatters jaded by text-based chat systems, help is at
hand.
Research is being carried out at the National University of Singapore's
Department of Electrical and Computer Engineering to design and
implement a three-dimensional (3D) model-based facial animation system
that can be incorporated into a 3D visual chat environment. A 3D avatar
or graphical image is created to represent the user, complete with
expressions, while he or she conducts an online conversation.
The interactive 3D model-based text-to-audio-visual synthesis (TTAVS)
system can be an alternative for low bandwidth video-conferencing or
informal chat sessions.
The system incorporates a 3D model of a human head with facial
animation parameters (emotion parameters) and speech producing
capabilities (lip-sync). At the transmitter side, the user inputs text
sentences via the keyboard, which are sent through the communication
channel to the correspondent's PC. At the receiving end, the system
converts incoming text into speech. The receiver sees a 3D head model -
with appropriate facial emotions and lip movements - and hears speech
corresponding to the text sent.
The user can use a predefined set of symbols to express certain
emotions, which in turn is reproduced at the receiving end. Thus, the
chat session is enhanced, although the quality of high bandwidth
video-conferencing cannot be reached.
There are advantages to this approach. It eliminates existing problems
in video-conferencing due to transfers of large data packets, while
still providing a reasonably natural image appearance.
The system can work in a similar way in duplex mode, which basically
allows the transmitter and the receiver to switch roles at will. The
entire process can also be implemented as a virtual chat room with more
than two users.
The visual chat system can also be potentially applied in a classroom,
where teaching can be delivered in a more interesting manner. In
long-distance learning, students could interact online with their
teachers, with exchanges expressed through 3D avatars.
The system can also be fine-tuned for computer game enthusiasts to
create virtual worlds and 3D scenarios that are more interactive and
realistic.
Implementation Issues
The researchers have already addressed the lip-synchronisation issues
and developed a fully working 3D model of human lips with a database of
the most common lip shapes.
Non-Uniform Rational B-Splines (NURBS) surfaces were used to model the
lips and 3D face. NURBS have been created using computers
specifically for 3D modelling to represent contours or shapes. In NURBS
modelling, the surface is not defined by joining points in 3D space as
for a conventional polygonal 3D model, but is given its shape by control
points. When the control points are moved, the shape of the NURBS
surfaces also changes, thus retaining the smoothness of the underlying
surface.
The various parameters required for a realistic appearance of the lips
were obtained from the video clips of a natural speaker. The entire
system has been designed with existing sound wave editors and 3D
modelling and animation software. The final human head NURBS is
incorporated into an interactive model-based visual chat environment
.
The research team is also exploring the possibility of using Festival
Speech system at the input end of the system to convert text into
speech.
Festival Speech system is a text-to-speech software developed at the
University of Edinburgh which can extract intermediate phoneme
information for lip-sync. The researchers are looking into the design
and methodology for a text-to-audio-visual system that can input a
string of phonemes or speech sounds, and output them in the form of
"talking lips".
The NURBS used in the 3D models is a relatively new concept in 3D lips
modelling, but a recent demonstration of the work has attracted the
interest of a number of 3D graphics companies. The concept is being
expanded into the area of computer graphics-based sign language
generation, in collaboration with the Human Interface Engineering Lab at
the University of Osaka, Japan.
The main idea here is to convert typed text or spoken words into
animated computer graphics depicting sign language. This concept can be
used in communication with hearing-impaired people.
Dr Liyanage C De Silva is an Assistant Professor at the Department of
Electrical and Computer Engineering, NUS.
Hari Gurung is a Master of Engineering student working with De Silva on
the project.
For more information, contact De Silva at: elelcds@nus.edu.sg
or check out: http://face.ee.nus.edu.sg
|