How IVR Design Can Help Us Improve VUI with Maria Spyropoulou, Speech Systems Analyst at Eckoh

Inside VOICE


Think coming from a linguistic background in working in the voice industry. It has helped me likes so much because sometimes I haven't advantage because of the courses I have done for example because I have done a lot of courses on speech recognition if someone for example says let's ask the user. Do you want to get your parcel today or Saturday? Because I know how speech recognition works. I know that we can never ask for example because the system will very easily confuse the two words but also because in university we also did a lot of courses on saucer linguistics and analyzing text and Syntax and all of that and we did a lot of semantics and pragmatics and we did a lot of how to structure conversation. We even broke down humor so that was very helpful for me. That is so interesting. I personally am fascinated by linguistics. We've had a lot of linguists on this show and have brought so much value to the show she the voice space as well and you currently work designing. Ib Our systems. Can you describe to those? That don't know what it is. What an Ivy. Our system is and why you think they are just like any other speech enabled application skill killer action the IVR is some people call is the regional voice. User interface be are. It's an automated teller system. Basically that interact with callers through Internet and gathered information and routes the call to the appropriate agents. And it's on what we call the voice web while I'll be up to -cation they use the voice web. An idea does the same and that we've even have voice browser and just like you have graphical user interface you have voice user interface just like you have crow or which is a browser but collects data from servers and displays visual information to you just like that. We have voice browsers that Goto the servers and they collect information from a database. Or and they speak to you instead of displaying to you and it collects prompts any played back to you ride so this is what voice browser does and many people have been you know that connection is the same as the visual. Internet but his voice Internet the voice web. So yeah I mean the ideas where the original voice user interfaces and the most prominent well many of the most prominent people now in the industry originally they worked for the VR and all the principles that we follow now when we developed boy skills or actions they came of course from years and years and years of developing VR's making mistakes and learning how to fix them by looking at the danger. And I've your systems speech enabled skills your Alexa Google system there. They have the exact same thing so they both do automatic speech recognition and natural language understanding. They both utilized memory so when you use a skill the skill remembers it so when you use it the second time than it will say welcome back. Carry you know so uses memory. It saves that information. We save that information in the VR system so when caller bank. And let's say you say hello. What can I help you today? I and you say yes. I would like my balance than you hang up. Then you call after a week. The system remembers it. So it might ask you. Oh Stein you called. You wanted to check your balance. Would you like to do the same thing today? So we call. We have that functionality as well in the VR system and of course the whole design process is the same Flows we have problems. We have intense addresses says slots so it's the same reason. Just the different format and you're saying to me before we started this podcast that you actually think there is a great advantage of. I the ours over skills or actions. What would you say that is? And why do you think it's such an advantage? Yeah they have a lot of similarities. Ivr's skin actions are capsules. If you're talking about speak bad there's also defenses but the greatest difference is that when we are building Ivr's we are using enterprise scale platforms and they store the audio files of. What the user say so that we can improve the system. So let's say if I want to see how the recognition of number is doing in my system. I'll just go and download one thousand Watts. One thousand audio files from the server and then. I'll put him through the grammar that I have created and then I'll do some other process called Junie Save that I have for example eighty percent accuracy and Ben. I can walk on that and see why did not recognizing everything. Why do I need to change is my prompt? Not Clear is my recognition by grammar's not covering all the cases of numbers or in my cutting off the user too early so I can work with that but when you have been alexis killer Google action. You don't have access to those audio files in neither platform and he's actually one of the highest requests from the developers. What you have is a transcription off. What's the system? Thinks they use their said. Of course that is highly problematic because mean what is. Let's say that I have restaurants and people can order food and I have named my dishes with is like extraordinarily extravagance like the Magic Fountain or the Super Kelly Preciado Shas dish or whatever and a user strung to order these foods. So then if the system doesn't give me the correct transcriptions of what people said I don't know what my client orders so I'm less a human actually years to those audio files. You know what was said. So I think that's the great disadvantage and I think in the future there will change.

Coming up next