Mobile App Development Blog

Blog Home

Don’t Type. Just Speak: Voice Enabled Mobile Apps

March 29, 2016

VoiceSearch

Don’t type. Just Speak. And don’t let Gwen Stefani tell you any different.

Like it or not, it is imperative that you keep up with technology; partly because it sparks some really amazing ideas and partly because you need to know about its compelling use cases. In this post, we are excited to tell you all about voice based apps and how it can help you step up your mobile experience.

There was a time when you could look looney while talking to your phone. But today, you’d just look ‘smart.’ Because by integrating voice capabilities you can change the way your users interact with devices and provide them the best hands free experience.

If you have a fitness app, you could have your users log their meals on-the-go by verbally describing what they ate. ‘Don’t have enough time’ won’t be their number 1 excuse anymore. You could increase user engagement and improve health outcomes by just adopting ‘speech.’ You can give ‘voice’ to any app. Do not feel limited by your app category.

Why is ‘Now’ the Best time to integrate Voice Commands with your Mobile app?

It goes without saying that you would obviously expect a lot of advantages after integrating speech in your application. Why else would you go through all the trouble? With that in mind, here are some of the pros that will convince you to incorporate voice search:

  • Exclusive: Not many apps have actually integrated the voice search functionality so it is definitely a good move to include it in yours. Make your app stand out by mentioning it on your app description on the Google play store/iTunes.

Do keep in mind that you won’t enjoy this exclusivity in, say, the next 1 or 2 years, because a lot of developers are quick to include features that indicate that their tech is bleeding edge. Voice search is going to be the most popular technology of this decade (ask Siri if you don’t believe us).

  • Increased convenience: The reason for voice search being so massively popular these days is the level of convenience and hassle-free operation it provides. Combine that with accurate recognition of voice and you have every reason to include it in your app. It can even distinguish the most seemingly similar things (the difference between Billy and Lily)

Even though the success rate of such accurate recognition is not a 100%, it’s still pretty high. Plus, if in some instances (and this will happen), the system fails to differentiate between Billy or Lily, it’s smart enough to display a screen where both of your contacts will be mentioned. You can choose who you want to call and depending on your selection, the system will store this info to get better. Meaning, the more you use this feature, the better it gets at recognizing your voice.

  • Little Performance Hit: More often than not, developers refrain from adding certain features (fancy animations for example) in their apps because they end up using a lot of resources on the phone of the end user.  No one wants their app performance to take a hit and create a negative impression before the user.

However, voice search integration won’t let you down because a majority of the job done by a voice recognition system is done through the cloud (online), thus causing minimal impact on performance.

  • Great dev support: Had it been 2010 or 2011, we wouldn’t have recommended integrating voice search because the major tech giants like Apple, Google and Microsoft were themselves striving to perfect their voice assistants – Siri, Google Now and Cortana.

But now, with their massive database and a much more stabilized version of their voice assistant softwares, various APIs (Application Programming Interfaces) have been available for the regular developer to integrate those features in their app. The only exception in this case is iOS which does not provide official voice recognition and voice commands API to its developers, meaning you will have to stick with the available third-party SDKs and APIs until Apple decides to make that change (and it should happen soon).

How is Voice integrated into mobile apps?

There are various voice recognition software libraries available online. A few of them are Nuance, OpenEars, AT&T Speech Library, Kaldi, PocketSphinx, and VoxSigma. Some of these libraries provide online voice recognition which means that an internet connection is required for this, while others provide offline functionality for voice recognition.

Generally speaking, voice recognition converts spoken words to text. This is the same principle that is used to provide voice search in different types of apps. A step-by-step process of how it is done:

  • The user provides voice input to the app running by invoking voice recognition and speaking aloud some words.
  • The spoken words are captured via a microphone which are then processed by the voice recognition software and converted to text.
  • The converted text is then provided as input to the search mechanism which returns the results.

Developers typically use one or more of these voice recognition libraries in the apps that they build, and write logic which connects the app’s functionality to that of the library. By invoking voice conversion functionality provided by the library, and then by processing the text result for search, the developer is able to integrate voice search functionality into the app.

Challenges faced when Integrating Voice Capabilities

Since voice integration is a relatively new technology, many programmers may find it daunting, but there is nothing that a few fixes can’t improve.

  • Real-time responsive behavior: Understandably, this is an issue which depends on both the device’s network capability and the network connection on which it is run. And not to forget, also on the microphone. When the user provides a voice command or provides speech input to an app, the app must communicate with a server to get the speech data converted to text. Once the text data is sent back to the device, it can then execute some action. This process is called the “real-time” responsive behavior of the app. If that action is “search”, then the device sends another request to the server to fetch results. Network latency is usually a challenge in such cases.

The best thing that you can do right now is to make sure that the source code of your app is properly optimized (making changes in the compiler, installing extensions that check your program’s code, etc). Another way to improve performance of the app overall is by moving the actual voice recognition and search functionality to the server side. By properly designing the system and making sure that remote delegation takes place, performance can be optimized.

  • Languages: Not all voice recognition software support all the languages available. There are free solutions and paid ones. Developers need to identify not only the regions to which they want their apps to be deployed, but also make strategic decisions regarding use of a voice conversion service. That decision could primarily be formed after understanding a few features of the voice recognition service like languages and/or accents recognized.
  • Punctuations: One of the biggest challenges that even the Big Bosses of the tech industry are quite unable to achieve in their voice based softwares are understanding where punctuations are and aren’t present, which makes it even more difficult for you as an independent developer to tackle since not all of us has 3000+ employees to spare just for a particular software. Unfortunately, even the best improvements and algorithms will be futile for now mainly because there are virtually endless sentences possible with all different sorts of punctuation in them. You can wait this one out, mainly because not everyone is expecting a perfectly working voice-based software given its infancy.
  • Accent: Accent is somewhat a similar problem to language, where the user’s voice simply won’t be recognized due to a different accent of speaking english. Since there are a lot of accent from different regions in a country alone, it can be difficult to target and recognize each one of them. Fortunately, Google’s API supports a ton of different accents thanks to its database that has Gigabytes and Gigabytes of content regarding voice recognition, and is the best way to make your app support a ton of different accents for now.

iOS vs Android – Which is a better platform for Voice Integration?

In case you thought this will be a heated debate comparing the two seemingly best operating systems in the smartphone industry, it isn’t (there’s already a ton of them out there). Instead, we aim to find out the differences that you might notice while developing voice based apps on these two different platforms:

  • Cost: Fortunately, working on an Android app is a bit more economical at $25 a year compared to an iOS app. Apple, on the other hand, charges $99 every year, which shouldn’t be a problem since that amount would mostly get covered through app monetization by inclusion of ads, in-app purchases and so on. This doesn’t impact your inclusion of the voice search feature in your app, but it does decrease your overall money available for investing on the app, which maybe something to think about.
  • Restriction: Don’t mistake us for Google fanboys (because we aren’t), but this is yet another department where Google trumps Apple in terms of software development. While Apple has a restricted (and thus secure) environment for development, Android goes for a ‘you are in control’ approach, giving you lots of options to tweak around, thus allowing you to add more voice-based functionality in comparison to Apple. If you don’t believe us, use ‘Google Now’ on an iPhone and you’ll notice that there are a lot of glitches and features not working due to restrictions in the code ( not to mention Siri getting upset with you).
  • Consistency: Now this is a place where Apple beats Android any day of the week. While you can publish your Android app on a number of different stores besides the Google Play Store (potentially increasing the number of downloads), you do find inconsistencies in reviews that are present on them. iTunes, on the other hand, is a one-stop solution to getting your app recognized since every download of your app is made through iTunes and every review that you get is also published on iTunes, making it more manageable to track your progress.
  • SDK: The number of third-party SDKs available for building apps are also different. While we have the popular ones such as Nuance and Eclipse available for both platforms, certain IDEs are only exclusive to one particular environment such as iSpeech for iPhone and Android Studio for, you guessed it, Android. To be honest, there isn’t a specific answer to which one is better as both OS environments have some great IDEs.

How Can you Benefit from this Invisible Technology?

There are a lot of benefits that you can acquire from using voice search; benefits that are actually useful in your day-to-day life. Increased productivity is one of them. Which one is easier: Navigate through your phone to locate the ‘Calendar’ app, select the time and set a reminder or just say – “Set a reminder for 5PM”? The latter, without a doubt. Even though the touchscreen mode of operation has made it marginally easier to perform mundane tasks, it still takes up a lot of your productive time. With voice commands, you can make to-do lists, set alarms and send messages much faster.

Voice-Driven Device Control also makes your day-to-day entertainment and usage tasks much easier. Saying phrases like “Play some music” or “Call Jack Jenkins” will do the job for you quicker and much more efficiently.  So, it shouldn’t come off as a surprise that we suggest that you consider adding voice capabilities in your application, not only to increase downloads and revenue, but to learn something, and keep up with the recent trends.

Looking for a Matured Technology Partner to Build your App?

Let’s Connect