The Text and Image Utilities in the Alexa Skills Kit Node.js SDK
In the last post in the Dig Deep series, we looked at the new template builders in the Alexa Skills Kit Node.js SDK that allowed us to assemble our templates for the visual representation on the Echo Show. We saw a handful of times utilities to create text and image objects and, while we looked at them from a high-level, we didn’t “dig deep.” We’ll do that in this post.
As a reminder, this is the Dig Deep series, where we look line-by-line at the tools and libraries we use to build voice-first experiences. This is not the place to go for tutorials, but if you want to learn interesting little nuggets about what you use every day, off we go…
TextUtils
First up, we’ve got the text utilities. You might use them like so:
“That’s it?” you ask yourself. “There must be more to it than that!” There is, but only very little. We’ll dig deep into the code and you’ll come out of it both wondering “why?” and being very grateful that it’s there. Life can be funny that way.
Here is the code for the text utilities (the TextUtils
class):
That’s all there is to it: three static methods.
(If you’re not familiar, static methods are methods that can be used directly off of the class rather than an instance of the class. You would say TextUtils.makeTextContent
rather than const textUtilsInstances = new TextUtils; textUtilsInstance.makeTextContent
.)
Once we get to makeRichText
we’re going to take a mighty detour, so we won’t go from top-to-bottom. Instead, we’ll start with makeTextContent
.
makeTextContent
The makeTextContent
method is used in situations like ListTemplate1
where you would have three levels of text:
(Note that in the template above, there is only primary and tertiary text.)
In all templates other than ListTemplate1
, the different levels of text are concatenated so there is, in reality, not much use for them.
This method returns an object with potential keys of primaryText
, secondaryText
, and tertiaryText
. It checks each argument in order to see if it exists and adds that property if so. The upshot of this is that if you want to skip one of the texts like is done in the list template above, you’ll pass in a falsy value (undefined
, null
, empty string; it’s your call).
The tricky thing, though, is that you are not sending in text for each of these values. You are sending in text objects that you’ve created with makePlainText
or makeRichText
.
makePlainText
The makePlainText
method takes in text and returns an object with two keys. A key of type
has the value of 'PlainText'
and the key of text
has the text that was passed in as an argument. This is the point where you say “Yeah, this is really simple, but I’m glad I don’t have to create this object over and over again. Thanks Amazon!”
This method’s a lot less interesting than makeRichText
.
makeRichText
With makePlainText
you could add text and that was it. Quite literally, the text stands alone.
But with rich text… oh boy. Bold! Line breaks! Actions! Rich text is similar to the HTML you wrote twenty years ago (although, I have to add, that in reality this is XML). So forget what you’ve learned about avoiding the use of <b>
for bold or <i>
for italics and check out what you can do with rich text for the Echo Show.
Here’s what you can do inside rich text:
- Bold
- Italics
- Underline
- Font size
- Line break
- Images
- Actions
Bold, Italics, Underline, Line Break, Font Size
Use <b>
for bold, <i>
for italics, and <u>
for underline. Got it? Got it.
Line breaks are just like you know from HTML—add a break(return) in between lines. Escape it when you’re done: <br/
>
Font size is interesting. Put away your pixels and don’t even think about reaching for your ems or rems. With the Echo Show, you’re sizing with numbers which in turn correspond to pixel sizes. What’s tricky is that there are just four numbers: 2, 3, 5, and 7. The default size is 3, equivalent to 32px.
Value | Pixel Size |
---|---|
2 | 28px |
3 | 32px |
5 | 48px |
7 | 68px |
Inline Images
In the previous post, I mentioned seeing a skill that “hacked” BodyTemplate1
in order to get a grid of items. The way this worked was by using rich text and, more specifically, inline images. These are set using the <img>
tag:
Note that the source must be absolute (of course) and the width and height are absolute values with no unit. While the height doesn’t have a specified limit, it should fit within the Echo Show screen, which is 600px minus an unknown amount of padding on the top and bottom. The width cannot be larger than 880px, which accounts for the width of the Echo Show minus 72px of padding on the left and the right.
You are not required to add anything to your inline image tag but the src
. If you’re like me, the alt
attribute might seem pretty pointless, as there is no cursor with which to display a tooltip or no “SEO” beenfits to come. Then you realize that the Echo Show comes with a screen reader and that accessibility is important, so you remember to always add the alt
attribute to your inline images.
Actions
Actions allow the user to interact with the skill through items on the display templates. Your template might display the user’s shopping cart and have two actions: purchase or cancel.
If a user touches “Purchase” the Display.ElementSelected
event will be sent to your fulfillment. Using the ASK Node.js SDK, you won’t listen for that event precisely. Instead you’ll listen for ElementSelected
. All prefixed events are stripped of their prefix in the EventParser
class.
To determine which action was chosen, you’ll look to this.event.request.token
. For the user in this example who wishes to purchase, the value will be 'confirm_purchase'
.
Actions can really set apart your skill on the Echo Show, but don’t forget that Alexa is still a voice-first platform. Don’t expect that the actions will be the primary means of interaction, even when using the Echo Show. Most users will still use the Show from afar and will only touch the screen in rare circumstances.
Because they’re using the Show from afar, images are a useful way to add at-a-glance information to your templates. Doing this will involve the image utilities.
ImageUtils
The ImageUtils
class serves the same purpose as the TextUtils
class, but for images: building the object that will be sent to the Alexa service.
Here is an example of how it would be used:
And here’s the code powering it:
There are two methods to examine: makeImage
and makeImages
.
makeImage
accepts the following arguments:
- URL: Must be hosted on HTTPS, JPEG or PNG, and the CORS settings must be configured to allow the Alexa service to access the image.
- width and height: Optional
- size: What? We specified the size with width and height. Do they expect a size in MB? No, this is a size descriptor and is a string that is one of the following:
– X_SMALL
: 480px x 320px
– SMALL
: 720px x 480px
– MEDIUM
: 960px x 640px
– LARGE
: 1200px x 800px
– X_LARGE
: 1920px x 1280px
– If you only care about the Echo Show, only provide the X_SMALL
size. If any larger size is provided, it will be given precedence and scaled down.
- description: Used for vision imparied people with the Echo Show.
If you want to provide a size string and a description, but you don’t want to specify a width or a height, set those values to something falsey (empty string, ‘undefined’, ‘null’) as the method checks for their presence before setting the attributes.
After assembling an image object, makeImage
passes it along as a single item in an array with the description to the makeImages
method.
makeImages
is used by makeImage
but can also be used on its own. If you use it on its own, you’ll have something like this:
Note that the description is per image set, not per image. The code that powers that is:
There’s not much interesting here, other than what we just saw where description is for all of the images and not each individual image.
Here we have it: text and image utilities for the Alexa Skills Kit SDK for Node.js. These utilities assemble the data provided to them into the object that must be sent to the Alexa platform and will be used for Echo Show templates.
That’s it for this post. Until next time…