This article is intended for anyone interested in technology. Most of the sections are at a high level when describing the technology.
However, I have included some fairly detailed sections on programming for Alexa. Those sections will be annotated with a (nerd alert!) warning. In addition, the last section will provide links for references and further reading for the initiated reader.
If you’re like most people, the end of the year tends to be a time for reflection, maybe looking ahead at some goals for the New Year. And like most people, you probably had a few days off from work to think about these things.
The end of the year is also a good time for technologists like me to take stock of their skill set, look at industry trends, and maybe peek under the hood of that one technology you’ve always wanted to learn more about, but just never found the time for in your busy schedule.
I must have been a good developer this year because Santa left a shiny new Amazon Echo under my tree, so I was stoked to be able to spend a couple of hours learning about this technology. Before I knew it, a couple of hours blurred into a few days as I coded away feverishly, struggled through the debugging process, waited an agonizingly long time for my code to be approved by Amazon, and generally annoyed my family by constantly saying “Alexa, how’s the surf?” to no one in particular.
Now that I’ve had some time to digest those early experiences, I thought it might be useful to give some insight into the technology, its potential uses, and some tips on developing for this new platform.
What is Amazon Echo?
Before jumping into the deep end of the technology pool, let’s first take a step back and describe what Amazon Echo is – and what is not.
Amazon Echo is a wireless speaker and voice command device developed by Amazon. The device responds to the name “Alexa” or “Amazon” and operates in a similar fashion to the Apple Siri service. For example, I can say “Alexa, what is the weather like today?” The main distinction between Alexa and Siri is that Alexa does not require an iPhone; anyone can walk up to Alexa and ask (her) a question. Technically, all you need to do is plug Alexa in, connect (her) to your wireless network, and start asking (her) questions.
Note: I will dispense with the (her) idiom from here forward except to note that, much like Siri, Alexa is designed to have human-like interactions. So it feels natural to refer to the device using the “her” pronoun.
The comparison of Alexa to Siri is a bit of an oversimplification. While both services allow a user to ask questions using spoken language, there are quite a few differences in how they operate and what developers are allowed to do with the platform. For one, Alexa does not require a smartphone; it is a self-contained device. Second, Alexa offers a way for developers to tap into this platform and extend it. Here are a few of the services advertised by Alexa.
Feature | Description |
General Search | Report information about any general topic that can be found with a typical web search on Wikipedia |
Weather | Report the weather for the local area or a specific city |
News | Report the news for the local area or a specific city |
Music | Play music from Amazon Music, Pandora, or other sources |
Alarm | Alert the user at a certain time of day |
Timer | Count down timer for a specific task, like baking |
Shopping or To-Do List | Maintain a list of items to buy at the grocery store or tasks to be performed |
Calendar | Report upcoming events in a user’s calendar |
These are just a few of the common commands that Alexa responds to. What makes this platform even more exciting is that developers can extend the list to suit the needs of their application or service.
Why Should I Care About This New Platform?
The biggest difference between Alexa and Siri is also the reason you should care about this platform. Siri has been available to users for several years but has never been exposed to developers to make use of in 3rd party apps. Apple has recently provided minimal Siri integration with iOS 9 Search API; however, this is a glorified playback mechanism, not true integration. This is in keeping with Apple’s “walled garden” approach to apps and services.
Conversely, Amazon has invited developers to the Echo party by allowing them to write code that creates entirely new voice-driven capabilities for Alexa. The Alexa Skills Kit (ASK) basically lets developers extend its behavior, something Apple has been loathing to allow with Siri. But, ASK could be a whole new service platform for Amazon Echo; with ASK, developers define a unique “Skill” for Alexa.
A Skill can be defined as follows:
- An interaction model
- Key phrases that are recognized
- Prompts that Alexa will speak
- Custom code for providing responses to questions
- Icons and graphics for the Alexa smartphone app
Imagine a customer service kiosk that can be customized for a specific company with unique phrases and responses to questions. For example, a bank could embed an Alexa in an ATM and allow customers to ask, “What is my account balance?” Or a grocery store could put an Alexa in the store and allow shoppers to ask, “What aisle has potato chips?” These are only a couple of the scenarios that are possible with ASK.
The next section goes into the details of creating your own Alexa Skill.
How Do I Get Started?
Creating your own Alexa Skill with ASK will require some tools and services. Here’s what you need in order to get started:
- An account on the Amazon Developer Portal. The developer portal is the place where you configure for new skill and prepare the skill for deployment.
- The ability to develop and deploy a cloud-based service to an Internet-accessible endpoint. The service processes user questions and returns replies to the user.
- A development environment appropriate for the language you plan to use. Choices for language include Node.js, Java or Python.
- An Amazon Echo for testing.
Not surprisingly, you can choose the tools and services from Amazon, such as Amazon Web Services (AWS). This is probably a good place to start unless you are well versed on cloud technologies.
Once you have chosen all of the tools and services, you are ready to begin developing your Skill in earnest.
For the next few sections, I will use the Skill I created for checking the surf as a sample. The Skill allows the user to ask Alexa the following:
- What is my favorite surf spot?
- My favorite surf spot is ABC.
- How’s the surf?
- How’s the surf at ABC?
(nerd alert!)
Create the Skill on the Developer Portal
At the time of this writing, the steps for creating a new Skill were as follows:
- Sign in to the Amazon Developer Portal using your favorite browser (https://developer.amazon.com/home.html)
- Click on Apps & Services.
- Click on Alexa.
- Click on Alexa Skills Kit.
- Click on Add a New Skill.
- Give your new Skill a name (this is the name that is display in the Alexa App on your smartphone).
- Give your new Skill an Invocation name (this is the name users will say to interact with your Skill).
- Provide an Endpoint URL for your skill (this is the code that Alexa will use to process your Skill).
(nerd alert!)
Define your Voice Interface
The definition of the Interaction Model is the next step in making a Skill, as the Interaction Model defines how the customer will use your Skill.
The steps below outline the flow of an interaction:
- Customer asks a question.
- Alexa identifies the proper Skill name and sends the request to that Skill.
- Skill processes the request and sends a text response.
- Alexa converts the text response to speech and streams it to the speaker.
- (option) Skill sends a graphical response for any companion app.
The definition of an Interaction Model for a Skill has the following components:
- An Intent Schema – a JSON structure which declares the set of intents (actions) your Skill can accept and process.
- Sample Utterances – a structured text file that connects the intents to likely spoken phrases and containing as many representative phrases as possible.
Let’s take a look at a simple Intent Schema for letting a user set his/her favorite surf spot. The user would do this by saying “SurfCheck, my favorite surf spot is <spot name here>”.
{
“intents”: [
{
“intent”: “SurfCheck”,
“slots”: [
{
“name”: “SurfSpot”,
“type”: “ListOfSurfSpots”
}
]
}
]
}
If the JSON above looks foreign to you, then you’re probably not a developer.
Fear not, I will break down what each section means below.
- intents – defines all of the intents (or actions) your Skill will support
- intent – defines a specific intent with a given name (SurfCheck)
- slots – defines all of the slots (or placeholders) within an intent
- name – defines the name of the slot (SurfSpot)
- type – defines the type of the slot (ListOfSurfSpots). This could be a custom type or one of the many pre-defined types provided by Amazon.
So with just a few lines of JSON, I’ve defined an Intent that lets the user set his/her favorite surf spot. But this definition is only half of the story. We need to also tell Alexa which phrases interact with this intent. For that, we need to provide some sample utterances.
SurfCheck my favorite surf spot is {SurfSpot}
SurfCheck {SurfSpot}
SurfCheck what is my favorite surf spot?
Notice anything familiar in these utterances? The first is the name of the intent, SurfCheck. This ties the utterance back to one of the intent definitions above (we only had one intent, but you get the idea). The second is the name of the slot, SurfSpot. This tells the intent to only accept phrases that are defined in the slot definition. In this case, the slot accepts the name from a list of surf spots, such as South Beach.
Both the Intent definition (JSON) and Sample Utterances (text file) are set in the Developer Portal for your skill in the Interaction Model section.
Once you have defined your Interaction Model, the next step is to provide the code that processes the requests for your Skill.
(nerd alert!)
Process Requests
Recall that the last step in creating a new Skill is to provide an Endpoint URL. This URL is where the code that processes requests is hosted.
For this endpoint, there are two (2) choices currently available:
- Develop an Alexa Skill as a Lambda Function.
- Develop an Alexa Skill as a Web Service.
For small, fairly simply Skills, the Lambda Function is the easiest way to go. A Lambda function is hosted by AWS, loads on demand (meaning you don’t need a dedicated server), and scales automatically since it is hosted on AWS. If you insist on hosting your own service to process your Skill, Amazon provides a well-defined interface for doing so; however, the availability and scalability of your server is your responsibility.
For the purposes of this article, the Lambda function is perfect. Creating a Lambda function is done from the AWS console under Services. I opted for Java (Node.js) as my language for developing my Skill. (Python is yet another option.) The AWS provides a plain text editor where you can code and test your Skill all in one place. However, I found it easier to use my favorite text editor to develop the code before copying the finished product code into the AWS Lambda console for testing. The code for processing a Skill must handle the following requests:
- LaunchRequest – invoked when Alexa identifies your Skill as being requested by the user.
- IntentRequest – invoked when Alexa identifies an intent within your Skill as being requested by the user.
- SessionEndedRequest – invoked when Alexa ends the session with your Skill.
(nerd alert!)
Test your Skill
Once you have defined your Voice Interface and developed the code for processing requests for your Skill, the next step is to test your Skill. Testing is done from the Developer Portal again. Specifically, the Lambda function section of the AWS portal provides the tools needed to test your skill. The Developer Portal allows you to define Input Test Events in the form of JSON. This JSON simulates requests that come from Alexa in response to a user command. For example, if a user says, “SurfCheck, my favorite surf spot is Lake Worth”, the following JSON is sent to my Skill for processing:
{
“session”: {
“new”: false,
“sessionId”: “session1234”,
“attributes”: {},
“user”: {
“userId”: null
},
“application”: {
“applicationId”: “amzn1.echo-sdk-ams.app.stoked-software.surf-check”
}
},
“version”: “1.0”,
“request”: {
“intent”: {
“slots”: {
“SurfSpot”: {
“name”: “SurfSpot”,
“value”: “Lake Worth”
}
},
“name”: “SurfCheck”
},
“type”: “IntentRequest”,
“requestId”: “request5678”
}
}
The values in this JSON should look familiar based on the Voice Interface and Interaction Model sections above. Further breakdown of the Input Test Event mechanism is beyond the scope of this article. Suffice it to say that every intent supported by your Skill can (and should) be tested using this service. Testing is the last step in the process before submitting your Skill to Amazon for certification.
Submit for Certification
Now that you’ve gone through all the work of configuring, defining, developing and testing your new Alexa Skill, you want the world to rejoice in your brilliance, don’t you? For an Alexa Skill, that means you have to Submit your Skill for certification, tantamount to publishing a smartphone app to one of the app stores. Amazon has provided a submission checklist to help with this process. But the basics are that your Skill must follow the rules defined by Amazon. For example, the Skill must not be targeted to children or solicit sensitive customer information, such as social security numbers, financial account information, health information or similar data. Finally, your Skill must do what it advertises in the Sample Utterances. In my SurfCheck sample, if the Skill didn’t allow the user to set his/her favorite surf spot, the Skill would not get certified. Assuming you pass all of these tests, your new Skill will be made available to ALL Amazon Echo devices!
What Is The Bigger Picture?
The Amazon Echo is a cool device all by itself. When I showed it off to some of my guests over the holidays, they were amazed by the accuracy of voice recognition. The ability to speak to a device in your home, ask a random question, and usually get a decent answer would have been unheard of only a few years ago. Apple’s Siri service made that somewhat more commonplace to iPhone users, but it was limited and not always accurate.
The bigger picture with services like Alexa is all about smart assistants and home automation. “Alexa, turn on the TV and dim the lights” is already a phrase that will do exactly what you think it does (if you have the right equipment!) – and this is just the beginning.
Remember, there are two parts for each Skill: the voice interface and request processing. First, you request something from Alexa with your voice. Second, that request is processed by code floating out in the cloud somewhere. With the use of home automation devices, that code could turn on your TV, dim the lights, start your car in the winter, lock your front door at night, and so on. The possibilities are limitless and maybe a little frightening. As I’m writing this, I can only imagine what hackers might be thinking of…but I digress. Needless to say, security with anything on the cloud is always a concern. But putting those concerns aside, it should be clear how powerful technology like Amazon Echo is poised to become.
My example talked mostly about the consumer-facing aspects, but the commercial possibilities of the technology are equally as promising. Customer (self-)service as we know it could be revolutionized by an offering like Alexa. Recall the bank and grocery store scenarios posited earlier in the article. These sorts of kiosks could answer most customers’ basic questions and allow human employees to focus on the more difficult tasks, saving enterprises time and money.
For now, the technology is in the early adopter phase. Gadget gurus and technologists like myself have embraced it and the masses have shown that Siri is, if anything, entertaining. The crucial next step will be the use of technology in home automation and smart assistants. Amazon Echo looks to be a good contender for making that next step. At the very least, I can now check the surf with the sound of my voice, one small step for (this) man.
References and Further Reading
- https://developer.amazon.com/appsandservices/solutions/alexa/alexa-skills-kit/getting-started-guide
- https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference
- https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/developing-an-alexa-skill-as-a-lambda-function
- https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/developing-an-alexa-skill-as-a-web-service
- https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-submission-checklist
- https://www.amazon.com/echo