Replacing Festival Embedded Scheme with Python

A very long time ago, Dr. Alan Black and colleagues put together a text-to-speech framework that allowed for both research and practical applications: The Festival Speech Synthesis System

These days, deep learning techniques are the state-of-the-art in TTS and the difference between DL and Festival are striking.

However, the computational requirements to run a text-to-speech using festival (even more with its simplified flite sister project) mean that for many people and language communities in the world, festival is the only available resource.

I looked in detail into Festival when I taught a workshop on building synthetic voices at Foulab. The system includes a full NLP pipeline (part-of-speech tagging, date and time expressions recognition, etc), a full classifier/regressor implementation using a type of decision trees (CART trees), with high level scripting available to its users through an embedded scheme interpreter.

As it stands Festival is a great tool for both teaching and implementing TTS for languages with a small number of speakers but the use of scheme is currently a big barrier for adoption. Using scheme was a great decision at the time as it was accessible to NLP researchers and many universities were teaching it in into programming courses. These days, we will use python instead.

After having used the festival system for building new voices, it seems to me the scheme scripts can be automatically transformed into python equivalent. And changing the embedded scheme into python might be as straightforward as changing festival into a python library that can be loaded. Such changes can help festival expand its appeal for the next decades.