Kindle support for Unicode pt1: dispelling a myth

I seem to be working through a mountain of jobs to format and edit eBooks and paperbacks at the moment. It’s gratifying that people trust me with their precious writing, but because I work with so many books, I get to find all the dark secrets of eBook design. In fact the darkness isn’t with eBook design at all, it is with the design of the tablets and eReaders and the software they use to display eBooks. If there’s a secret to eBook design, it is to know the current set of idiosyncrasies and bugs in reading devices, and know how to code your way around them. One day these inconsistencies and workarounds will be a thing of the past, and we will be able to concentrate on good typography and layout. But I think we’re still years away from that.

Since I have to face these problems head on, I’ll share some of what I find on the website. After all, I can’t only post about Lego creations. Can I?

A problem I faced recently was with the ePUB version of an anthology called Looking Landwards (in association with the Institute of Agricultural Engineers, no less). One of the stories had extracts in Polish, which meant accented Eastern European characters, and that means Unicode.

I was inspired to write this post after coming across a  post from someone who provides a freelance eBook production service. In her post, the author explains that Kindles only support the basic ASCII character set. In other words, pretty much what you can type in on a basic keyboard. To be fair to the person who wrote that, she was probably influenced by advice from Amazon themselves. Now, in comparison with Barnes & Noble, Sony and Kobo, Amazon are by far the best of the bunch for documenting how to write books for their eReader device. Not great, but better than the others, so I’m not having a go at Amazon.

What this blogger had probably read was Amazon’s simple guidance for self-publishers who upload a Word document (possibly saved as an html file) and get Amazon to automatically convert that to the Kindle format. That approach makes sense for many self-publishers, but sometimes the simple guide for beginners says things that contradict the Amazon Coding Guide PDF. I’m sure Amazon are trying to simplify things for beginners and shield them from topics, such as Unicode, where it is easy for the unwary to become ensnared. But because some self-publishers, and even freelance book designers, will blog about how to make eBooks without ever getting as far as reading the coding guidelines, various myths and half-truths sometimes spread.

One of these is that eBooks, and Kindles in particular, don’t support Unicode. In fact, Unicode support is very good with Kindles and other eReaders, though not perfect (for example, see Devanagari support Kindle screenshots here)

I almost have some slight sympathy for the mistaken view that Kindles only support ASCII, because if you go to the Amazon KDP site, you will see this comment:

In case the text is too small to read on your browser, I’ve ringed passages that say: ‘avoid UTF-8 as the encoding type’ and ‘Amazon Kindle Direct Publishing supports text in the Latin-1 format, and ‘all characters from that set not currently supported are: spades, clubs, hearts, up-arrow, down-arrow.’. In other words: don’t use Unicode!

But hold on, I’ve just said I’ve made a book with passages in Polish, and some of the characters used are not in the Latin-1 format. And I have most definitely encoded the content files for my eBook in UTF-8 character encoding that Amazon says you should not use. In fact, I always use UTF-8 encoding and this is far from the first time I have used characters not in the Latin-1 character set.

I’ve also just worked on a book called Jack the Fish set in New York. There are plenty of references to “I ♥ New York’ t-shirts. Plenty of I ♥… all sorts of things; it’s a running gag. Now in internet-world, I can’t be sure those last two sentences displayed correctly: you should have seen a heart symbol. That’s the same heart symbol that Amazon says explicitly you shouldn’t use. But having used the Unicode black suit heart symbol (Unicode code point U+2665), it displays fine with Kindle devices from Kindle 3 onwards.

I worked in the software industry for twenty years before I came to make eBooks. There’s a common saying there: you don’t know what you don’t know. Sounds a bit Donald Rumsfeld, I know, but it reflects the fact that in such a fast-changing world as software coding, your knowledge is constantly becoming outdated, and one of the biggest risks is that there is no cheap and easy way to know what new things have come along that means you are now out of date. The same is true with making eBooks.

But I’m not working with the beginners’ guide. I’m referring to the next level up of Amazon coding guidelines. You can find these on the Amazon KDP site, but the easiest way is from the Amazon Kindle Previewer from the Help menu as in this screenshot.

(By the way, if you are a self-publisher and don’t have Kindle Previewer, then you really should do. The only reason not to is if you already have all the devices. There used to be problems with the accuracy of the rendering. There still are but the previewer is vastly improved on earlier versions.)

And what do the Kindle Publishing Guidelines say? Unicode is fine (see section 3.1.4). There’s no reference whatsoever to there being anything special about ‘Latin-1’. After all, if you look through the Unicode glyphs (a glyph is a single character or symbol in a font) defined in the fonts on a Kindle then you will discover there are thousands of them. Why would Amazon put thousands of glyphs onto their Kindles and then tell you never to use them?

Thousands of glyphs? An exaggeration? Actually, pick up this free book (here) and load it onto your kindle. It simply lists all the Unicode code points in turn. Any code point supported on your Kindle will display nicely. Any that doesn’t will get the little ‘huh?’ box. (A code point is basically a serial number of a glyph. That serial number is defined by the Unicode Consortium. If you want to know more about Unicode, this post is a good place to start, and there’s more from me in my part2 follow-up post.)

Here’s a shot of that book on my iPad, which is running the Kindle app for iOS. I’d love to credit the author, by the way, but I can’t find his or her name anywhere. These glyphs also display correctly on my Kindle 3, Kindle Touch, Kindle Fire, Kindle for PC, iBooks on the iPad, Nook Glow, and very nearly on my Kobo Mini, though there’s a problem with the Kobo I’ll come to in my next post.

Here’s another shot, showing our Looking Landwards Polish characters looking fine on a Nook Glow, Kobo Mini, and Kindle Touch. Look closely and you will see there are also proper double quotes, not the ‘straight up’ typewriter quotes. There are other Unicode symbols elsewhere in this book, such as some fancy right arrows.

The take-aways from this post

Hopefully I’ve dispelled the myth that ‘Kindle can’t handle Unicode’.

On the way, I’ve also pointed out that Amazon seems to give Kindle book formatting advice at two levels: one for beginners who upload a Word .doc file or one saved to html, and a more detailed one for those who build the eBook themselves and upload the resulting file. If you want to take the second approach, you do need to read the Kindle Publishing Guidelines that I showed you.

The third take-away is to beware of what people post on the internet about how to make eBooks because there is a lot out there that isn’t accurate. Treat whatever you find with suspicion, test your books thoroughly, and try to get multiple opinions. I’m not setting myself up as the definitive expert, though. The advice to treat what you find online with suspicion, of course, goes for my posts too, every bit as much as anyone else’s. I might have taken the time to actually read Amazon’s documentation about how to make Kindle books, which is something some people who post advice and sell formatting how-to books have clearly never bothered to do, but I still suffer from the same Rumsfeld-esque weakness as everyone else.

You don’t know what you don’t know.

I first heard that 20 years ago. It’s just as true today.

What I haven’t done is do anything more than hint at what Unicode is and why you would want to use it. Nor have I explained how you can use it. Well, we’ve stretched this post out long enough, so I’ll give you all those goodies in the next post.

Take care and beware.

Tim

Click here for part 2 of my Unicode posts

Follow this link to my other writing and publishing tips

122313_1859_Kindlesuppo6.jpgAvailable now: ‘Format Your Print Book for Createspace: 2nd Edition‘ available now as a Kindle eBook, and as a 296 page paperback:

eBook: at amazon.com | amazon.co.uk

Paperback  at amazon.com | amazon.co.uk

Advertisements

About Tim C. Taylor

Science fiction publisher and author of the bestselling Human Legion series. I live with my wife and young family in an English village. I am currently writing full time, when I'm not roped into building Lego.
This entry was posted in Writing Tips and tagged , , , , , , , , . Bookmark the permalink.

2 Responses to Kindle support for Unicode pt1: dispelling a myth

  1. Pingback: Kindle support for Unicode pt2: how to use Unicode | Tim C. Taylor

  2. Pingback: Kindle support for Unicode pt2: how to use Unicode | Tim C. Taylor

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s