The Text and Image Utilities in the Alexa Skills Kit Node.js SDK

In the last post in the Dig Deep series, we looked at the new template builders in the Alexa Skills Kit Node.js SDK that allowed us to assemble our templates for the visual representation on the Echo Show. We saw a handful of times utilities to create text and image objects and, while we looked at them from a high-level, we didn’t “dig deep.” We’ll do that in this post.

As a reminder, this is the Dig Deep series, where we look line-by-line at the tools and libraries we use to build voice-first experiences. This is not the place to go for tutorials, but if you want to learn interesting little nuggets about what you use every day, off we go…

TextUtils

First up, we’ve got the text utilities. You might use them like so:

const makePlainText = Alexa.utils.TextUtils.makePlainText;
const textContent = setTextContent(makePlainText('Chunky Bacon'));

“That’s it?” you ask yourself. “There must be more to it than that!” There is, but only very little. We’ll dig deep into the code and you’ll come out of it both wondering “why?” and being very grateful that it’s there. Life can be funny that way.

Here is the code for the text utilities (the TextUtils class):

'use strict';

class TextUtils {
  static makePlainText(text) {
    return {
      text : text,
      type : 'PlainText'
    };
  }

  static makeRichText(text) {
    return {
      text : text,
      type : 'RichText'
    };
  }

  static makeTextContent(primaryText, secondaryText, tertiaryText) {
    const textContent = {};
    if(primaryText) {
      textContent.primaryText = primaryText;
    }

    if(secondaryText) {
      textContent.secondaryText = secondaryText;
    }

    if(tertiaryText) {
      textContent.tertiaryText = tertiaryText;
    }

    return textContent;
  }
}

module.exports.TextUtils = TextUtils;

That’s all there is to it: three static methods.

(If you’re not familiar, static methods are methods that can be used directly off of the class rather than an instance of the class. You would say TextUtils.makeTextContent rather than const textUtilsInstances = new TextUtils; textUtilsInstance.makeTextContent.)

Once we get to makeRichText we’re going to take a mighty detour, so we won’t go from top-to-bottom. Instead, we’ll start with makeTextContent.

makeTextContent

The makeTextContent method is used in situations like ListTemplate1 where you would have three levels of text:

ListTemplate1

(Note that in the template above, there is only primary and tertiary text.)

In all templates other than ListTemplate1, the different levels of text are concatenated so there is, in reality, not much use for them.

This method returns an object with potential keys of primaryText, secondaryText, and tertiaryText. It checks each argument in order to see if it exists and adds that property if so. The upshot of this is that if you want to skip one of the texts like is done in the list template above, you’ll pass in a falsy value (undefined, null, empty string; it’s your call).

The tricky thing, though, is that you are not sending in text for each of these values. You are sending in text objects that you’ve created with makePlainText or makeRichText.

makePlainText

static makePlainText(text) {
  return {
    text : text,
    type : 'PlainText'
  };
}

The makePlainText method takes in text and returns an object with two keys. A key of type has the value of 'PlainText' and the key of text has the text that was passed in as an argument. This is the point where you say “Yeah, this is really simple, but I’m glad I don’t have to create this object over and over again. Thanks Amazon!”

This method’s a lot less interesting than makeRichText.

makeRichText

static makeRichText(text) {
  return {
    text : text,
    type : 'RichText'
  };
}

With makePlainText you could add text and that was it. Quite literally, the text stands alone.

But with rich text… oh boy. Bold! Line breaks! Actions! Rich text is similar to the HTML you wrote twenty years ago (although, I have to add, that in reality this is XML). So forget what you’ve learned about avoiding the use of  for bold or  for italics and check out what you can do with rich text for the Echo Show.

Here’s what you can do inside rich text:

- Bold
- Italics
- Underline
- Font size
- Line break
- Images
- Actions

Bold, Italics, Underline, Line Break, Font Size

Use  for bold,  for italics, and  for underline. Got it? Got it.

Line breaks are just like you know from HTML—add a break(return) in between lines. Escape it when you’re done: <br/>

Font size is interesting. Put away your pixels and don’t even think about reaching for your ems or rems. With the Echo Show, you’re sizing with numbers which in turn correspond to pixel sizes. What’s tricky is that there are just four numbers: 2, 3, 5, and 7. The default size is 3, equivalent to 32px.

<font size="2">This text is smaller than the default</font>

Value	Pixel Size
2	28px
3	32px
5	48px
7	68px

Inline Images

In the previous post, I mentioned seeing a skill that “hacked” BodyTemplate1 in order to get a grid of items. The way this worked was by using rich text and, more specifically, inline images. These are set using the <img> tag:

<img src='https://example.com/image.png' width='200' height='200' alt='My image'/>

Note that the source must be absolute (of course) and the width and height are absolute values with no unit. While the height doesn’t have a specified limit, it should fit within the Echo Show screen, which is 600px minus an unknown amount of padding on the top and bottom. The width cannot be larger than 880px, which accounts for the width of the Echo Show minus 72px of padding on the left and the right.

You are not required to add anything to your inline image tag but the src. If you’re like me, the alt attribute might seem pretty pointless, as there is no cursor with which to display a tooltip or no “SEO” beenfits to come. Then you realize that the Echo Show comes with a screen reader and that accessibility is important, so you remember to always add the alt attribute to your inline images.

Actions

Actions allow the user to interact with the skill through items on the display templates. Your template might display the user’s shopping cart and have two actions: purchase or cancel.

<action value='confirm_purchase'>Purchase</action> <action value='cancel_purchase'>Cancel</action>

If a user touches “Purchase” the Display.ElementSelected event will be sent to your fulfillment. Using the ASK Node.js SDK, you won’t listen for that event precisely. Instead you’ll listen for ElementSelected. All prefixed events are stripped of their prefix in the EventParser class.

To determine which action was chosen, you’ll look to this.event.request.token. For the user in this example who wishes to purchase, the value will be 'confirm_purchase'.

Actions can really set apart your skill on the Echo Show, but don’t forget that Alexa is still a voice-first platform. Don’t expect that the actions will be the primary means of interaction, even when using the Echo Show. Most users will still use the Show from afar and will only touch the screen in rare circumstances.

Because they’re using the Show from afar, images are a useful way to add at-a-glance information to your templates. Doing this will involve the image utilities.

ImageUtils

The ImageUtils class serves the same purpose as the TextUtils class, but for images: building the object that will be sent to the Alexa service.

Here is an example of how it would be used:

const builder = new Alexa.templateBuilders.BodyTemplate6Builder();
const image = Alexa.utils.ImageUtils.makeImage(
                'https://example.com/image.png',
                800,
                600,
                'MEDIUM',
                'Standings List'
              );

const template = builder
                  .setTitle('Baseball Scores')
                  .setImage(image);

And here’s the code powering it:

'use strict';

class ImageUtils {

  static makeImage(url, widthPixels, heightPixels, size, description) {
    var imgObj = {
      url : url
    };

    if (widthPixels && heightPixels) {
      imgObj.widthPixels = widthPixels;
      imgObj.heightPixels = heightPixels;
    }

    if (size) {
      imgObj.size = size;
    }

    return ImageUtils.makeImages([imgObj], description);
  }

  static makeImages(imgArr, description) {
    var image = {};
    if(description) {
      image.contentDescription = description;
    }
    image.sources = imgArr;
    return image;
  }
}

module.exports.ImageUtils = ImageUtils;

There are two methods to examine: makeImage and makeImages.

makeImage accepts the following arguments:

- URL: Must be hosted on HTTPS, JPEG or PNG, and the CORS settings must be configured to allow the Alexa service to access the image.
- width and height: Optional
- size: What? We specified the size with width and height. Do they expect a size in MB? No, this is a size descriptor and is a string that is one of the following:
– X_SMALL: 480px x 320px
– SMALL: 720px x 480px
– MEDIUM: 960px x 640px
– LARGE: 1200px x 800px
– X_LARGE: 1920px x 1280px
– If you only care about the Echo Show, only provide the X_SMALL size. If any larger size is provided, it will be given precedence and scaled down.
- description: Used for vision imparied people with the Echo Show.

If you want to provide a size string and a description, but you don’t want to specify a width or a height, set those values to something falsey (empty string, ‘undefined’, ‘null’) as the method checks for their presence before setting the attributes.

After assembling an image object, makeImage passes it along as a single item in an array with the description to the makeImages method.

makeImages is used by makeImage but can also be used on its own. If you use it on its own, you’ll have something like this:

ImageUtils.makeImages([
  {
    url: 'https://example.com/image_x_small.png',
    size: 'X_SMALL'
  },
  {
    url: 'https://example.com/image_x_large.png',
    size: 'X_LARGE'
  }
], 'A photo of Steve Buscemi');

Note that the description is per image set, not per image. The code that powers that is:

static makeImages(imgArr, description) {
  var image = {};
  if(description) {
    image.contentDescription = description;
  }
  image.sources = imgArr;
  return image;
}

There’s not much interesting here, other than what we just saw where description is for all of the images and not each individual image.

Here we have it: text and image utilities for the Alexa Skills Kit SDK for Node.js. These utilities assemble the data provided to them into the object that must be sent to the Alexa platform and will be used for Echo Show templates.

That’s it for this post. Until next time…