Having an ebook reader means you can take a library with you to remote locations, and in that library you might want to include tech docs that are available online as HTML. The ebook conversion tool calibre can convert HTML to many formats, but getting non-ugly results relies on a bit of fiddling. This is how I’m doing it today.

My ideal is to have a downloaded collection of web pages included as chapters in the ebook. This is done by downloading each file (via Firefox) with Save Page As... > Web Page, complete and putting each page into a shared folder. Then create an index page that links to each of those pages, in order. Test it by pointing your browser at your index page and checking the links work.

There is some configuration of calibre required.

Common Options. In ‘Look & Feel‘ leave ‘Input character encoding‘ blank.

Output Options. In ‘HTMLZ Output‘ leave ‘CSS‘ as ‘class’ and ‘class based CSS‘ as ‘external’.

Adding Books. Set ‘Automerge added books‘ to ‘Overwrite existing duplicate formats’.

Plugins. In ‘File type plugins‘ > ‘HTML to ZIP‘ set char encoding to ‘UTF-8’ and tick ‘Add linked files in breadth first order‘.

Then

  1. add your book by adding your created index page,
  2. convert, changing the output format to HTMLZ,
  3. extract the resulting htmlz file (it’s really just a zip),
  4. edit the generated index.html and style.css,
  5. keep editing till your browser confirms you have it how you want it,
  6. re-zip the files back into a file with .htmlz suffix,
  7. use that file to create your real target format (eg mobi).

Without those steps you won’t just get ugly fonts and layout, you’ll also have unprintable chars thanks to faulty handling of encodings. For example, Firefox saves “—” as UTF-8 ’em-dash’, and if that’s fed to calibre using its guessed default encoding it will come out as a garbled blob in your ebook. The above steps ensure calibre does the ‘add’ using UTF-8 (smartly converting it back to —) and then is allowed to make correct guesses from that point onwards.

Advertisements