Archive for January, 2012

Display EPUB metadata and rename EPUB files accordingly

Wednesday, January 25th, 2012

I’ve been programming. First a fast replacement for displaying metadata of an EPUB-file:

$ epub-meta -v 1632.epub File: 1632.epub
Title: 1632
Author: aut: Eric Flint(Flint, Eric )
ID: ISBN (Unspecified:0-671-31972-8)
ID: uuid (uuid_id:b1d2b16c-f68d-4a80-9d02-f6fcac36e1f3)
Subject: Science Fiction
Publisher: Baen Publishing Enterprises
Date: Unspecified: 2000-02-07T05:00:00+00:00
Lang: en
Contrib: bkp: calibre (0.6.51) [http://calibre.kovidgoyal.net](calibre (0.6.51) [http://calibre.kovidgoyal.net])
Meta: calibre:series_index: 1.0
Meta: calibre:timestamp: 2010-05-07T16:59:51.299000+00:00
Meta: cover: 0671578499_Cover
Meta: calibre:series: Ring of Fire
Meta: calibre:user_categories: {}
Meta: calibre:author_link_map: {}

It has commandline-switches to selectively choose which metadata should be displayed. And it does it very fast, 3ms on my system, as opposed to 670ms ebook-meta from Calibre needs. However, it can only display the metadata, not change it. For changing metadata, you still need ebook-meta.

Having the ability to display metadata fast, made it possible to rename EPUB-files. Initially, I had the idea to do that in C too, but working with strings is actually quite tedious in C, so I decided on perl. So there’s now also a program called epub-rename, which renames EPUB-files according to it’s metadata in the format Author - Series SeriesIndex - Title. Moreover, it also has, trough ebook-meta, the ability to fix certain issues in metadata-tags. Namely change inverted Title/Author tags, fix Author-Tags which are in the wrong(!) Last, First-Format, and some more.

Well, here’s the “–help”
$ epub-rename --help
Usage: [options] [directory ...]

Options:
-c|--compat
-f|--fix
-h|--help
-t|--title
-r|--rename
-x|--exchange
-v|--verbose

Options:
-c|--compat
Use ebook-meta from calibre instead of epub-meta. Much slower.

-t|--title
Fix title. This means the tag gets sanitized, as it would if
destined for a filename, and then written back to the metadata.
Uses ebook-meta.

-f|--fix
Fix all tags: author, title, and in some cases date. Uses
ebook-meta and touches every file, even those that don't need
fixing. Slow.

-h|--help
Print a brief help message and exit.

-r|--rename
rename files to the pattern "Author - Series SeriesIndex -
Title"

-x|--exchange
changes title for author-tag and vice versa. For all those files
that have the author in the title-field and the title in the
author- field. Uses ebook-meta, thus is slow.

-v|--verbose
Show how all files would be renamed, not just those really
renamed.

And here’s the program itself: epub-meta-0.2.tar.gz. MIT-Licensed. Enjoy.

If you don’t like the spaces, punctuation and UTF-8-characters in the output filenames, I’d recommend another program of mine: bicapitalize.

How to Enter EPUB Metadata

Friday, January 20th, 2012

If you have a certain library of E-Books from different sources (e.g. Baen, Gutenberg, Archive.org, Google Books) you will notice a disparaging plethora of different styles of annotating EPUB-files, sometimes blatantly wrong and in violation of the EPUB Standard itself.

So this is a Howto on how to enter these metadata correctly. I’ll mostly cover the program “ebook-meta” (part of Calibre) which is available on about every platform.

Encoding

EPUB uses UTF-8, and UTF-8 only. Still, if you don’t use things like left-and-right quotes and backquotes, you’ll make sure your tags don’t get messed up. Ideally, only use the single quote “‘”.

Vocabulary

Try to be consistent in the vocabulary for tags (genres, categories). Sadly, no vocabularies are specified by the standards right now.

Tags

  • Title: This will contain the Title as it’s read. Don’t put in the author (yes, seen that). Don’t anticipate sorting by naming it “Title, The”, this is the task of the library program which sould do this. Don’t enter Series and Series Index. Don’t enter the author here.
  • Title sort: You don’t need to enter that; at least ebook-meta usually sets this correctly.
  • Author(s): Enter the author as named. Don’t enter the title here (also seen..), and don’t enter things like series or title after the author’s name. Don’t anticipate sorting by naming it “Name, First Name”. Enter it in the form “First Middle Last”. If the authors name is usually used with initials, use these. Don’t enter “John Ronald Reuel Tolkien”, but “J. R. R. Tolkien”. After an initial, enter a dot and a space. If there are several authors, enter all of them, when using “ebook-meta” separate them with “&”.
  • Author sort: You don’t need to enter that; at least ebook-meta usually sets this correctly.
  • Publisher: This is the original publisher. If you’re preparing an out-of-copyright e-book, don’t enter yourself. Also, don’t anticipate sorting but enter it as given.
  • Languages: At least one language must be set, you can set several if the book is multi-lingual. The language-code is the 2-letter iso-code. Apparently it ignores localized ones such as “en-gb”.
  • Published: This is the original publishing date. Not the date you’re preparing the e-book!
  • Rights: Enter the year and copyright holder, if applicable, and a license if necessary. Like this: “Copyright 1954 by J. R. R. Tolkien” or “Copyright 2012 by Peter Keel, License CC-By-2.5” or “Public Domain” if the work is not protected by copyright anymore.
  • Identifiers: Here go ISBN or ISSN (for magazines) or UUID. You can put in as many as ou like. “ebook-meta” allows only to set the ISBN and a BookID specifically.
  • Comments: This is actually the “Description”-tag, and it’s supposed to hold the blurb which would otherwise go onto the flap or the back of a physical book. And it should not contain HTML-tags. Also, don’t make this too long.
  • Series: This is a Calibre-specific tag, however it’s honored in many e-book-readers, so you really want to use this. Enter the series as spelled. Don’t take sorting into account. Don’t enter any series number.
  • Series Index: Also Calibre-specific, but goes with support for the “Series”-tag. Enter a number here, corresponding to the number in the series.
  • Tags: This one is really the “Subject”-tag. It contains as many tags as you wish on what the book is about. Enter the genre here as well. Enter tags separated by comma. Do NOT enter a blurb here.
  • Category: This is probably the “Type” tag but support seems to be rather limited. in any way, the genre does NOT go into that, but rather things like “textbook” or “novel”.