History
(1962-1977): The first generation of
speech synthesis featured a formant like synthesis of phonemes based on the
phonetic breakdown of phrase to formant frequency contours which at that time
was at the peak.
(1962-1977): The synthesis had low
precision and naturalness due to the limited resources of that generation and was
soon replaced by successors.
(1977-1992): In the second generation
of speech synthesis the standards of intelligibility were improved with the
immediate use of LCP parameters, however many would say that the lifelikeness
of this process still remained low. This system relied on converting the
appropriate units from text input to speech form.
(1992-present): This system can be customized to suit the given process, and uses
“Unit Selection Synthesis” which
according to a web based article were introduced to the public Sagisaka at ATR
Labs in Kyoto. The latest version of this is available for American and British English, Danish, Finnish, French, German,
Icelandic, Italian, Norwegian, Spanish, Swedish, and Dutch. Digital Equipment Corporation [3] (DEC) talk system is originally
descended from MITalk and Klattalk and offers nine different voice
personalities, four male, four female and one child (depending on the
equipment). The present DECtalk structure is based on digital formant synthesis.
Speech
synthesis has increased its commercial acclaim in modern applications mainly
due to advancements and further requirements of existing research organizations
in the past decade, most of which aim to reduce the cost and time of standard procedures. The
product consists of a text-to-speech system that
converts (e.g. : Plain text based input to synthetic speech, Additional
ramification of phonological and acoustic details must be shared for greater
accuracy).Due to a large increase of widely used speech databases simpler
applications have been developed to adapt with the acquired standards that are
met with, while these waveform techniques are in great demand improvements have
been made to original TTS systems currently in use in many research based
companies. The growth of speech applications for both recognition and synthesis
has increased since computers developed. The implemented software or hardware
product can also render symbolic linguistic representations like phonetic transcriptions to
synthetic speech. Speech synthesizers have also been used to allow people with
disabilities to interact with people. Danish scientist Christian Kratzenstein has been credited for building a prototype model mechanized to produce five distinct sounds (commonly associated as the 5 vowels in the international phonetic alphabet). This was followed by the bellows-operated "acoustic-mechanical speech machine" by Wolfgang von Kempelen of Pressburg, Hungary, described in a 1791 paper. This machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. Further developments of the system include
attempts to program emotion into synthesized speech operational systems;
several small studies have been conducted as an attempt at progress on
emotional speech synthesis. Both engineers and linguists who work in the
research region of TTS are trying to enlist a great deal of data required for
TTS, an European author claims that the general view of this field is dominated
by Americans even though the pioneers of this systems are almost exclusively
European.
No comments:
Post a Comment