What to evaluate?

Depending on our goals, we may need to evaluate the whole end-to-end TTS system, or just some of its components.

9 minutes 4 seconds

Sometimes system testing is called black-box testing because the evaluation progresses without any knowledge of how the system operates, and by contrast unit testing is sometimes called glass-box testing since we are “seeing inside” the system (“box”) when doing the tests. (Taylor, 2009; p523)

This is potentially confusing because Taylor doesn’t define what he means by “unit”. He is using standard software engineering terminology, where a unit is the “smallest testable part of an application“. Taylor conflates unit in that software engineering sense with a component of a text-to-speech system, such as the the letter-to-sound module. Such a component might be large and complex: it may involve several algorithms and use external data resources. Let’s try to clear this up: Unit testing: this is a software engineering methodology for checking that code executes correctly and produces the expected output for a (probably small and fixed) test case. Component testing: evaluating the performance of a sub-part of a text-to-speech system; this may be possible to do objectively, or may require subjective testing (see Taylor). System testing: evaluating the end-to-end performance of a complete text-to-speech system. References: Paul Taylor (2009) Text-to-Speech Synthesis. Cambridge University Press, Cambridge, UK. DOI: 10.1017/CBO9780511816338.019