Introduction to Web SpeechSynthesis

7th July 2023

In the world of Web, there's a cool thing called Speech Synthesis or Text-to-Speech (TTS) synthesis. It's a special technology that turns written text into human-like speech. It's like having a computer that can talk! In this blog, we'll explore what Web Speech Synthesis is all about, how it helps people, and why it's so cool.

What is Web Speech Synthesis?

Before diving into the main topic, let's first understand what Web Speech Synthesis or Speech Synthesis in general means.

The SpeechSynthesis is an interface for the Web Speech API. Speech Synthesis is a way of transforming text into voice through a computerized voice. In a simple form of output, the computer reads the typed text in a human-like voice. This is used to retrieve information about the synthesis voices available, start and pause speech, and other commands besides. In short, by the use of SpeechSynthesis, we can command our browser to read a given text.

So, we just came across a term called the Web Speech API. Now, let's explore what it actually refers to.

What is Web Speech API?

The Web Speech API was first introduced in 2012 and it is used by web developers to provide audio input and output features in web apps. This is built in for the browser so we don’t need to use any third party app or plugins. This API has two major interfaces, SpeechSynthesis and SpeechRecognition.

How to use Web Speech Synthesis?

Now we will see how to use the magic of Web Speech Synthesis. Using Speech Synthesis in your application is a very simple and straightforward process. As we know, Speech Synthesis is built-in for the browser, so we don’t need to use anything complex to use this. We have a Speech Synthesis controller, which we can access using window.speechSynthesis.

SpeechSynthesis have some methods, properties and events to be used. It also inherits properties from its parent interface EventTarget, an interface which can have events and listeners.

Speech Synthesis Properties and Events

Below are most common properties provided by Speech Synthesis:

SpeechSynthesis.paused
A boolean value that returns true if the SpeechSynthesis object is in a paused state.
SpeechSynthesis.pending
A boolean value that returns true if the utterance queue contains as-yet-unspoken utterances.
SpeechSynthesis.speaking
A boolean value that returns true if an utterance is currently in the process of being spoken.

Below is the event provided by Speech Synthesis:

voiceschanged

Fired when the list of SpeechSynthesisVoice objects that would be returned by the SpeechSynthesis.getVoices() method has changed. Also available via the onvoiceschanged property Speech Synthesis Methods

Below are most common methods provided by Speech Synthesis:

SpeechSynthesis.cancel()
Removes all utterances from the utterance queue.
SpeechSynthesis.getVoices()
Returns a list of SpeechSynthesisVoice objects representing all the available voices on the current device.
SpeechSynthesis.pause()
Puts the SpeechSynthesis object into a paused state.
SpeechSynthesis.resume()
Puts the SpeechSynthesis object into a non-paused state: resumes it if it was already paused.
SpeechSynthesis.speak()
Adds an utterance to the utterance queue; it will be spoken when any other utterances queued before it have been spoken.

SpeechSynthesisUtterance (Utterance)

Since you were curious about the term "utterance" mentioned in the previous section, let's take a moment to understand what exactly an utterance means.

The SpeechSynthesisUtterance interface of the Web Speech API represents a speech request. It contains the content the speech service should read and information about how to read it(e.g. language, pitch and volume.). In short its a message or text we want the browser to read.

It has its own properties like, .pitch, .rate, .text, .voice, .volume etc.

It has its own events like, end, resume, start, pause etc.

Now lets look at some benefits and limitations of Speech Synthesis

Benefits of Speech Synthesis

There are various useful applications of Speech Synthesizers and some of them are:

It can help visually impaired people by reading books and web pages.
In education, Speech Synthesizers can be used to teach spellings and pronunciations.
It can also be used in the multimedia and telecommunication field to replace human efforts and reading emails and messages through the phone lines.
Let your user engage with your application and device with the voice user interface.
Personalize communication with users based on preference.
Improve customer interaction with intelligent and natural responses.

Limitations of Speech Synthesis

Speech Synthesis have some limitations also as everything have.

It has some quirks and issues which can be solved after spending some time on google. Like it sometime stop speaking if the message is very large.
Not all browsers support Speech Synthesis but all the main and modern browsers do.
Voices can be different on all browsers so if we want to use same voice so there can be issue in cross browser compatibility.
It doesn’t have a good support.
It’s not widely used but its good for some less complex requirement and you should give it a try.

Example

You can see it working here https://github.com/corevalue-technologies/js-text-reader"

Now, let's examine how the application operates:

The application's primary functionality allows us to hear the given text in various voices using SpeechSynthesis.
It utilizes speechSynthesis.getVoices to retrieve all the available voices supported by the particular browser.
Then, it utilizes SpeechSynthesis utterance to associate our provided text with the voice to be spoken.
Once a specific voice is selected from the available choices and the text is associated with the utterance, we can use speechSynthesis.speak to listen to the provided message.
This project is a straightforward implementation that incorporates fundamental HTML, CSS, and Bootstrap design.

Below is a screenshot for the application:

Conclusion

Speech synthesis is an amazing technology that turns written words into spoken language. It helps people with reading difficulties, improves communication for those who can't speak, and makes digital experiences more accessible and enjoyable. As this technology continues to advance, we can expect even more exciting features and possibilities. So, get ready for a world where computers can talk and make our lives even more awesome!