In the third part of our “Alexa, How’s the Surf?” blog series, I explain how I transformed an Alexa Skill into a serverless, microservice, multi-modal chatbot platform.
Introduction
A few years ago, I wrote a blog about an emerging technology called Amazon Echo. That blog post chronicled my journey down the path of creating an Alexa Skill (the technical term for programming with Alexa) called Surf Check.
While that first post focuses on Alexa, the same principles apply to Google Home and some of the other chatbot technologies available on the market today.
And though I’d like to claim the skill is hugely popular with thousands of hits per day, in reality, Surf Check is just my playground for learning new concepts with Voice User Interfaces. Over the last few years, I tweaked and refactored the original skill as I discovered new Alexa and AWS features. I documented some of those updates, such as creating a Facebook Messenger interface, in a follow-up blog.
In this next iteration of the blog series, I take a more holistic view of the various modules to show how, with a little bit of refactoring and a microservice architecture, I transformed my original Surf Check Alexa Skill into a “serverless, microservice, multi-modal chatbot monster”! Hyperbole aside, the result is certainly serverless, leverages microservices, and provides several interfaces for interaction: Alexa, Facebook Messenger, Text Messaging, and a good old fashioned Telephone Call.
How I Did It
As you will see, my journey involved a lot of exploration, experimentation, and trial-and-error. While I can’t guarantee that you’ll end up in the same place I did, I’d like to walk you through my process in the hopes it may help you develop your own, if you aren’t able to replicate mine exactly.
1. Review of Existing Components
As I mentioned in the introduction, I’ve implemented several iterations of my Surf Check Alexa Skill. Starting with a fundamental Alexa Skill developed in C#, each successive iteration benefited from refactoring in some cases, and wholesale replacement with better components in others. The images in Figure 1 show the basic flow of the Alexa interaction.
The table in Figure 2 provides a glimpse into the components that served as a starting point for the current iteration.
2. Introducing Amazon Pinpoint and Connect
With these existing components in mind, I set out to add two new ways of asking “How’s the surf?” My goal was to preserve much of the work already done. I wanted to add text messaging as the first new method. Several services now offer this capability, including my bank, which allows me to text simple commands to a phone number.
For example, I can check the balance of my account by texting the word “balance” to a specific phone number. Likewise, I can withdraw money from my account by texting “withdraw.” Amazon provides this capability through a relatively newer service called Pinpoint. Amazon Pinpoint positions itself as a marketing and analytics service that enables an organization to engage with users by sending email, SMS, and push notifications.
The second new method is a bit more pedestrian, but it arguably covers a broader swath of users: a telephone service that allows callers to dial a phone number and ask the virtual assistant, “How’s the surf?” Amazon provides this capability through another of its newer services, called Connect.
Amazon Connect uses the same technology as Alexa and Lex to deliver a cloud-based customer contact service via telephone. Older Interactive Voice Response (IVR) systems, based on decision trees and universally loathed by customers, force the customer to press numbers on their phone to arrive at pre-configured answers. Connect, on the other hand, users natural, intuitive conversation to assist callers.
In my case, a caller receives a couple of options, including the now-familiar “How’s the surf?” The caller can speak the instructions instead of pressing a number on his or her phone.
The images in Figure 3 show some of the new methods for asking, “How’s the surf?”
The table in Figure 4 provides a glimpse into the components used to add the new functionality in the current iteration.
3. Microservices Thinking
I also examined at how all four methods for asking “How’s the surf?” did their work, in addition to the two new methods. Developing software by breaking the modules in an application into loosely coupled services is now one of the more popular trends in software architecture. Developers designed these so-called microservices to perform very fine-grained functions with little or no dependencies on other services in the system.
An application stitches together these microservices using lightweight messaging protocols to develop and scale the microservices independently. With this microservices thinking in mind, the original Alexa Skill leveraged several Python methods to answer the question. A request calls these methods, which get funneled through AWS Lambda.
When adding the two new methods (Text Messaging and Telephone Calls), I had a design goal to reuse all of the Python methods I developed previously. Conceptually, the only new code needed should be limited to the specifics of the new interfaces (Text Messaging and Telephone Calls). In that respect, I succeeded, mostly.
The diagram in Figure 5 illustrates how I brought all of the AWS components together to provide a seamless Alexa and Lex experience.
Not surprisingly, a lot has changed in the technology world in only a couple of years. Hopefully, this blog has shown the improvements made around Voice Assistants, Chatbots, the Cloud, and Serverless Computing.
By bringing all of those pieces together, I managed the following:
- Reused my existing Alexa Skill to create a Chatbot hosted in Facebook Messenger
- Re-architected my Alexa Skill handler to use Lambda, an AWS Serverless technology, thereby ridding me of the cost and complexity of maintaining a web server
- Added a text messaging interface through Amazon Pinpoint that uses the same Lambda used by the previous interfaces
- Added a telephone interface through Amazon Connect that uses the same Lambda used by the previous interfaces
Conclusion
I did all of the above without needing a data center or much money. My AWS bill for the month was less than $1 – including all of the components described in this blog and my entire personal web site! I hope these steps help you try your hand at improving or upgrading your own chatbots.