Back to top

Books for listening

Keywords: audio books, speech synthesis, prosodic markers of text structure, phonetic database of foreign names, text-to-speech interface for library materials


The article describes the text-to-speech system Vox Populi, the Iselugeja (Self-Reader) application of Elisa Raamat, and the interactive service of reading aloud library materials in DIGAR user environment. Digitation of written library materials, electronic book circulation and the developments in speech technologies now enable the linguistically challenged – people suffering from visual impairment or dyslexia – to receive library services in the form of texts read aloud in synthetic speech or of audio books instead of paper books (no matter whether published or still in print). Text-to-speech systems consist of two components: editor interface and synthesis interface. The editor interface analyses the input text and makes up a list of supposed foreign words and unidentified character strings. The online environment of the phonetic database of foreign names enables addition of pronunciation equivalents and checking the pronunciation of foreign names. The synthesis interface enables the user to choose between different synthetic voices and to modify the speech rate for the voice selected. Those two interfaces cooperate to convert text files into synthetically voiced audio files. The hub of the system is input text analysis and the processing of abbreviations, digits and special characters. Perception tests have been used to specify how to best reflect the text structure (paragraphs, titles and direct speech) in synthesised speech. The text-to-speech system Vox Populi is available as a public service and Iselugeja is a free additional feature to the application Elisa Raamat. Thus, the devices for listening to audio books and for mediating the digital archives of libraries are obviously taking us to an era where being read to in interactive style is becoming an everyday service to library readers.


Meelis Mihkla (b. 1955), PhD, Institute of the Estonian Language, Senior Researcher,

Indrek Hein (b. 1963), Institute of the Estonian Language, Senior Software Developer,

Andrus Hiiepuu (b. 1966), Elisa Estonia, Head of Private Customer Division and Member of the Board,

Indrek Kiissel (b. 1968), Institute of the Estonian Language, Software Developer,

Raivo Ruusalepp (b. 1970), MA, National Library of Estonia, Director of Development; Tallinn University, School of Digital Technologies, Lecturer of Records Management,

Urmas Sinisalu (b. 1974), National Library of Estonia, IT Architect, ­







DIGAR = Rahvusraamatukogu digiarhiiv.

Elisa Raamat.

EPR = Eesti Pimedate Raamatukogu veebiraamatukogu.

Vabamorf. Eesti keele morfanalüsaator.

Vox Populi.



European Commission 2012 = Final report of EU high level group of experts on literacy. Luxembourg: Publications Office of the European Union, 2012. (23. I 2017).

H a a g, Kathrin 2011. HMM-based Speech Synthesis from Audio Book Data. Master of Science. Speech and Language Processing School of Philosophy, Psychology and Language Sciences. University of Edinburgh.

H e r t r i c h, Ingo, D i e t r i c h, Susanne, A c k e r m a n n, Hermann 2011. Cross-modal interactions during perception of audiovisual speech and nonspeech signals: An MRI study. – Journal of Cognitive Neuroscience, kd 23, nr 1, lk 221–237.

M i h k l a, Meelis, H e i n, Indrek, K i i s s e l, Indrek, O r u s a a r, Margit, R ä p p, Artur 2011. Kõnetempo eelistused ja audiosüsteem nägemispuudega inimestele. – Keel ja Kirjandus, nr 5, lk 334−342.

M i h k l a, Meelis, H e i n, Indrek, K a l v i k, Mari-Liis, K i i s s e l, Indrek, S i r t s, Risto, T a m u r i, Kairi 2012. Estonian speech synthesis: applications and challenges. – Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference „Dialogue”. Toim A. E. Kibrik. Moskva: РГГУ, lk 443−453.

M i h k l a, Meelis, H e i n, Indrek, K i i s s e l, Indrek, R ä p p, Artur, S i r t s, Risto, V a l d n a, Tanel 2014. A System of Spoken Subtitles for Estonian Television. – Human Language Technologies – The Baltic Perspective. (Frontiers in Artificial Intelligence and Applications 268.) Toim Andrius Utka, Gintarė Grigonytė, Jurgita Kapočiūtė-Dzikienė, Jurgita Vaičenonienė. IOS Press, lk 19−26.

M o o s, Anja, H e r t r i c h, Ingo, D i e t r i c h, Susanne, T r o u v a i n, Jürgen, A c k e r m a n n, Hermann 2008. Perception of ultra-fast speech by a blind listener – Does he use his visual system? – Proceedings of the 8th Seminar on Speech Production, ISSP 2008. Toim Rudolph Sock, Susanne Fuchs, Yves Laprie. INRIA, lk 297–300.

P r a h a l l a d, Kishore, B l a c k, Alan W. 2011. Segmentation of monologues in audio books for building synthetic voices. – IEEE/ACM Transactions on Audio, Speech and Language Processing, kd 19, nr 5, lk 1444–1449.

Puuetega inimeste õiguste konventsioon ja fakultatiivprotokoll. – Riigi Teataja II, 04.04.2012, 6.

Säilituseksemplari seadus. – Riigi Teataja I, 07.07.2016, 1.

Võrdse kohtlemise seadus. – Riigi Teataja I, 06.07.2012, 22.