Create a React Teleprompter using SpeechRecognition, string-similarity, and scrollIntoView

Elijah Manor
InstructorElijah Manor

Share this video with your friends

Send Tweet
Published a year ago
Updated 10 months ago

In this lesson we are going to build a React Teleprompter web application using the Web Speech API. In particular, we'll use the SpeechRecognition interface to recognize the user's voice, match the words to a predefined script, and automatically scroll to the next unspoken position.

Instructor: [0:00] In this lesson, we're going to build a teleprompter web application using the Web Speech API. Here's some docs on MDN. We'll be using the SpeechRecognition interface to build this app.

[0:12] The intent is we'll be able to type up some text in the above text area, click "Start," and then it'll recognize your voice and will automatically scroll for you so you don't have to do it manually.

[0:25] On the left, we have the beginnings of a web application, which is handling text input, button interactions, and managing the progress of the teleprompter. However, the teleprompter itself isn't written yet.

[0:38] It has just been stubbed out. It currently accepts an array of words, a Boolean if it should be listening, the current progress, and a callback function when the progress changes. The teleprompter component is just a shell of what we'll be building.

[0:54] Right now, we're just looping over the words and displaying them in spans. Eventually, we want to wire up the SpeechRecognition API, and auto-scroll the content for us as the user speaks. The first thing we're going to do is to create an instance of SpeechRecognition.

[1:11] To do this, we'll create a recog reference using react.useRef(null). Then, we'll use a react.useEffect. This one, we'll just execute after the initial render of the component. Here, we will either reference the real SpeechRecognition constructor or the vendor-prefixed webkitSpeechRecognition version.

[1:39] Depending on which one exists, we'll create a new instance of it and assign it to the current property of recog reference. Then, we're going to set it to a continuous mode so that results are continuously captured when we start instead of just one result, and also will set interim results to true.

[1:59] This will let us get access to quicker results, although they aren't final and may not be as accurate if we waited a bit longer. Now, we'll add another React useEffect so that we can toggle whether or not our SpeechRecognition system should start or stop listening.

[2:15] We'll add our listening prop inside of our useEffect dependency array. If we are listening, then we will tell our recall graph to start. Otherwise, we will stop the recognition instance. At this point, starting or stopping doesn't do anything yet.

[2:32] Let's wire that up. We'll have another React useEffect. This one we'll grab the recog reference and add an event listener. Listen to the result event, and handle that with the handle result callback, which we haven't defined yet, but will very soon.

[2:51] Also, we'll want to clean up after ourselves, so we'll return a function that will remove event listener for the results event bound to the handle result function. Now, let's define the handle result function that we've wired up to the recog reference.

[3:07] In here, we'll grab out the results portion of the argument passed to us. We'll create an interim variable and take the SpeechRecognition result list returned to us and convert it to an array, and then filter those and only grab those that are not final. We'll grab the first transcript from each of those and then join them all together in a big string.

[3:33] For now, so we could see how all things work, let's temporarily also create a final variable that is just like interim, that takes the final results instead. Then, we'll save off these results and state with set results, and set an object with these two values.

[3:49] To do this, we'll come up and add new state (const results, set results) to react. useState. We'll default that with an empty object. In order to see these results in our app, we'll come up to our styled teleprompter definition, convert it to a text area instead of a div and temporarily comment out the huge font declaration.

[4:14] Then come down and comment out the children of the teleprompter, and give it a value of the string defined version of our results with parameters of NULL and true to provide some formatting. Now, we should be able to come in to our app and start listening.

[4:35] This is a test to see how things work. This should scroll as you approach the next word. If all goes well, you can talk and it will move along. What you can see is that the first words start getting recognized in the interim bucket. Then, once it figures out what you most likely said, it moves to the final bucket.

[5:04] With that in mind, let's remove this stringified results from our teleprompter, uncomment the words being added to it as children, replace the text area back to the div we had before, and restore the font size.

[5:19] Instead of storing both the interim and final states, let's just save the interim. We'll initialize our results to an empty string. Then down here we won't need the final portion anymore. When we set results, we will only provide the interim value.

[5:36] To see these interim results let's come down to our return and conditionally show the results if they exist. Now we could come back to our app. Click start. This is a test to see how things work. Then we'll stop.

[5:59] Now let's focus on scrolling. We'll come up and first create a scroll ref using a react dot use ref, setting it to null. We'll grab that, come down to our teleprompter, and add the ref to it. To our span we'll add an HTM 05 data attribute of data dash index and assign it to the index of the word.

[6:22] Then we'll manually add a color style for this word. If the word index is less than the progress, meaning it has already been said, then it'll look grey. Otherwise it'll look black. In order to add the scrolling logic, we'll add another user fact. This one we want to be invoked once the progress prompt has changed.

[6:43] We'll grab the scroll ref's current value and query selector the data index that is three words past what the current progress is currently set to. That's to hopefully scroll before we've run out of words that are in view.

[6:56] Here we'll use the optional chaining operator in case nothing was found. If it was, we'll use the scrollIntoView method, passing behavior smooth, block nearest, and inline start. Since we use the optional chaining operator, we temporarily angered the no-unused-expressions ESLint rule. For now, let's just disable that rule and proceed along.

[7:20] Now let's go back over to our app component and test out what we've done this far. Down in the progress prop let's hard-code the value to three and save. On the right the first three words show as if they've been spoken. If we change the number a bit higher, let's say to 10, then our DIB on the right will scroll to upcoming content.

[7:43] Now let's put back progress and switch back to our teleprompter component. We'll start focusing on updating the progress indexed based on what the user has spoken. Here we'll create a new index and take our interim string and break it back up into a word array, splitting on spaces.

[8:00] Then we'll reduce, taking our memo and the word we are testing. We'll initialize the memo to the current progress index value. In reduce, usually the first thing we want to do is return the memo. We want to compare each word in our interim string with the next word that is expected to be spoken.

[8:21] To do that we'll install a helper module so that the word doesn't have to be exact. The module is string similarity and we'll install that from npm. Then we'll come up to the top of our file and import string similarity from string similarity.

[8:39] Now we can use our new module by string similarity and call the compare two strings method passing the current word that we're reducing over and the next unspoken word in our teleprompter. However, each of our words may have spaces or punctuation inside of them that can interfere with our comparison.

[8:58] Let's make a CLEAN Word function to help results be a little bit more accurate. Our CLEAN function, we'll take a string. We'll trim any white space from the beginning or end. Lowercase the whole string and then replace any non-alpha character with an empty string.

[9:18] Now that we have a CLEAN comparison let's adjust the NEXT Indexed accordingly. If the similarity between our words is greater than 75 percent, then we'll increment our Index by one. Otherwise we'll keep it the same.

[9:32] Then if our new Index is greater than it previously was and is still less than the total number of words, then we'll let our consuming component know that something has changed. In our reduce let's add one final bit of code. We will exit early if our memo is equal to or greater than the total number of words.

[9:54] In this case we'll just return the same memo. Since we added quite a bit of code to our user fact, we should double-check our dependency array. We should include on-change, progress, and words. This is important because of closures and we want the data inside of the function to be up to date.

[10:14] Now, we should be able to test our teleprompter. We'll click "Start," and this is the test to see how things work. This should scroll as you approach the next word. If all goes well, you can talk and it will move along.

[10:32] After I stopped the teleprompter and click the reset button, the words automatically scroll to the top. That's because our consuming component was listening and set the progress to zero, which caused our scrolling logic to kick in, and scrolled us to the next and spoken word, which happens to be up at the top.