Archive for the 'Computers' Category

Questions for game-system rule makers

Sunday, November 2nd, 2014

As I’ve been playing some computer RPGs, and read some of the changelogs and wishlists, I noticed some issues relating to history and physics which I will address here, in the hopes it might help game designers to achieve more believable game-(or mostly combat-)systems.

Q: Does a weapon which has the whole mass distributed along its length do more damage on impact than one that centers it on the top?

A: A two-handed sword distributes its maybe 2.5kg along the say 160cm of the blade, with actually its centre of gravity near the hilt. A polearm has a lot of its 2.5kg near the top of its 240cm. Its clear that the momentum of the polearm upon hitting will be much higher than that of the sword, and thus the damage it can inflict. On stabbing motions however, both weapon will inflict similar damage as this is mostly dependant on the user. The advantage of the sword is of course control, which is much better with the centre of gravity near the users hand.

Of course, this is something a lot of games get wrong.

Q: Heavy is bad?

A: For armour, yes. You want the maximum protection at the least weight. Or maybe the weight you can wear in battle, which is around 20-40kg, depending on your size. Heavier armour than that was only worn for tournaments, and there only on horse.

So don’t make your full suits of armour heavier than that. If your game considers size, have it influence the weight of the armour (and the fun of having the player find armour which just doesn’t fit), otherwise make it 30kg (or less for especially good armour…)

For weapons, you want something rather heavy that you can still control with ease, which brings us to…

Q: How much mass and momentum on an elongated hitting device can you control with one hand or with two hands?

A: This depends a bit on the length and centre of gravity, but it’s about 1kg for a 1 meter long thing, and 2.5kg for a 2.5 meter long thing. Which is nicely supported by historical evidence: One-handed weapons tend to have a weight around 1kg with 1 meter length, and all polearms weigh around 2.5kg with a length of 2.5m. Two-handed swords also tend to weigh 2.5kg with a length of 150cm (with shorter ones being lighter).

Of course this may vary a bit depending on who made it and who wants to wield it, but usually history shows weapons to be much lighter than their equivalents in game systems (It’s gotten a bit better. D&D, 1st ed. shows a one-handed sword at 6lbs and a halberd at 15lbs, D&D, 5th ed. shows them at 3lbs and 6lbs).

Q: Why do you think there are flat wide arrowheads used for hunting and tetragonal ones for war?

A: It’s about damage to flesh versus armour-piercing. This can be very much generalised: A pointy bit used for stabbing that’s broad and flat will probably do more damage to flesh but its chance to penetrate armour will be lower than one that’s square.

This means, with thrusting, damage will depend on that, and mostly on one other factor: whether the weapon was used two- or one-handed or what device was used to launch the thrust.

Q: What’s the difference between a blade and a pointy extrusion?

A: Pick or axe? As far as the pointy versus cutty is concerned, this again is a question of damage to armour versus damage to flesh.

As I already answered the case of where the stabbing bit is at the end of something and is used to thrust. But this is a bit different, since we’re actually hitting, not thrusting. And the momentum will vary a lot depending on how long this thing is and whether its used with both hands or not; and also, the momentum will usually be much bigger than with a straight thrust.

A lot of polearms will allow you to choose whether you want to hit your opponent with a blade or a pick, depending on what kind of armour he wears.

Q: What’s the difference between a rounded blade and a straight blade?

A: The question is, do we have a cutting or hitting edge. This is also a question about the armour worn on the other side. The difference between round and straight edges will probably be small, with the straight edge transferring more energy to the target, whereas the round edge will convert some of that energy into a lateral motion (cutting). The cutting will be rather useless against things it can’t cut, so it’s probably less useful against things like chainmail, whereas the damage might be bigger against things it can cut (leather, skin, flesh).

The other thing of course is the question what happens if the whole thing is curved, and there the answer is that with curved blades you can stab around something, making stabs more difficult to parry.

All in all, if you don’t have mechanisms to take these two issues into consideration, just treat them as equal.

Q: Why would you want to ditch a shield for a two-handed weapon?

A: Because if you’ve got two hands to use on the weapon, you’ve got more control, can use longer weapons, have more momentum and inflict more damage. And since you need something to take care of incoming projectiles, you have armour to take care of that.

You’ll notice that shields vanish from the battlefields with the advent of late middle ages plate armour. Made of steel, getting more and more hardened with time. Because that’s the thing that stops most projectiles. You also notice that romans also had some kind of “plate” armour but still carried shields. That’s because it’s made of iron (or even bronze), not steel, and can be rather easily penetrated by arrows.

Q: Why wouldn’t you want to ditch a shield for a second weapon?

A: If it’s about parrying, the bigger the thing you use to parry is, the bigger the chance of not getting hit. If it’s about projectiles, a second weapon won’t help you, but a shield will. And lastly: You can’t hit somebody with two weapons at the same time. So you’ll use one two bind the others weapon (parry) and attack with the other one. And where’s the advantage in that? You could do it with a shield as well.

Of course, if your opponent only has a one-handed weapon and no shield, you will have an advantage (or no disadvantage, if the other also uses a second weapon).

Dual wielding is inferior to anything but single-handedly wielding only one weapon.

Q: If I had a blade on the other end of the weapon could I hit the enemy with it?

A: Yes, as long as the blades are short (or just a pointy bit), and the stick is long, it makes perfectly sense. If not, the blade on the other end makes it impossible to fight with others alongside you, and it makes you loose momentum and control because of the counterweight.

So these things that are basically two swords attached to a hilt in opposite direction are completely useless. Unwise.

Q: What about the difference between a longsword and a broadsword?

Actually, “longsword” does not exist. “long sword” does, and it refers to a small late medieval two-handed sword, with the size about what you can still carry on the hip. In the 19th century mis-named as “bastard sword” or “one-and-a-half-hander”.

But the broadsword is a term used to distinguish it from the smallsword in the 17th century. Both broadsword and smallsword have about the same length (around 1 meter) and the same weight (around 1 kg). The blade of a broadsword is just so much broader and thinner. It probably has some implications regarding bigger wounds inflicted versus reduced capability of piercing armour compared to the smallsword.

Unless your setting incorporates smallswords, forget about broadswords.

Q: How do you carry a weapon?

A: On the hip. And if its too long, on the shoulder. Yes, that’s it. Apart from small weapons in you boot, throwing knifes on your arms or chest and other things like that, you carry it on your hip. Even quivers, unless you’re an american indian. And you don’t carry weapons on your back you intend to use, because you wouldn’t be able to draw them.

Yes, you could draw some short sword or machete from your back, but then, while you’re drawing it, you’re wide open to attack. There’s a reason nobody ever did that in history.

Q: How are quality differences in arms and armour expressed?

A: Basically, it varies with a) the materials and b) the techniques used to process these materials, and both tend to get better with time (unless suddenly constrained by financial or logistical questions).

Useable materials for weapons are wood (sharpened stick, hardened in fire), flint, copper (yes, there was actually a “copper age”) bronze, iron and steel, plus maybe some mythical metals such as mithril. For armour it’s leather, wood, copper, bronze, paper, cloth, iron, steel (with leather and copper so bad, you don’t want it).

The general mechanics is this: It must be workable; it should not break; it should be able to have an edge; it should keep an edge, it should not bend and stay bent, it must have the right weight. You can’t really have a sword of a material that has a totally different weight unless you make it smaller or more massive. Since weight and possible damage are mostly fixed by the form the product takes, you need to differentiate mostly on durability. Which of course is more interesting for armour, because there it also impacts the protection it offers.

Usually the one that matters is iron and steel.

And there’s a huge variance between different things made of steel. Depending on the techniques used (and whether the ore found already contained the right traces of other elements and carbon) you get from rather soft (roman lorica segmentata) to incredibly hard and resilient (gothic plate armour).

So rather than invent a plethora of new materials, just add techniques (look for “damascene steel”, “crucible steel” and “wootz”) or flowery names of where the steel (or even the product) should come from. It was even common to refer to the workshop. So a “Helmschmied breast plate” or an “Ulfberht sword” might be rather exquisite.

With armour, you could also conflate several layers of armour that was worn above each other at various times: tunic and unriveted chainmal, tunic and riveted chainmail, tunic and lorica segmentata, gambeson and chainmail, gambeson and chainmail and coat-of-plate, light gambeson and chainmail and brigantine, light gambeson and chainmail and (soft) full plate, arming doublet and (hardened) full plate. You get the picture.

Q: But a bronze weapon will cause less damage than a steel weapon?

A: No. Bronze is soft and can be ground to an extremely sharp edge in a very short time. Which it will also loose rather fast. It tends to bend and can’t be worked into very long shapes, which is why bronze axes are more interesting than swords. But against flesh, the effect of a bronze weapon is the same as if it was iron or steel.

The thing changes very much when it comes to armour. A bronze edged weapon goes through leather like butter, has troubles against bronze armour and can’t do anything against anything made of iron (except battering; which incidentally works also extremely well against bronze armour, nicely against iron and soft steel, and not at all against hardened steel).

Q: Leather armour is bad?

A: Well, it’s not armour in most cases, since even stone weapons cut through it nicely, let alone bronze weapons or even medieval eating knives (yes, I rammed an eating knife through 1cm so-called cuir-boilli with ease). It’s one of these roleplaying-game myths.

Leather was used within armour, tough, as carrier for riveting on small metal plates for instance, and sometimes also to cover these up (leading to something wich looks like leather with rivets on it).

Just don’t use it; if you need light armour, go with gambesons or other armour made of layers of cloth, or with only parts of armour. A gothic breast plate and an open helmet don’t inhibit any movement, do not make noise (even less than leather would) and they weigh about 4kg, respectively 2.5kg, but they protect your vitals.

Q: Can I swim with armour?

A: Basically, no. Wearing chainmail, you’re 8-20kg heavier than otherwise, and most people can swim only 2-4 meters with that. Plus there’s probably some gambeson beneath your chainmail. With gambesons alone, in the league from 4-8kg, your chances are better, you might get some 50 meters until any trapped air has gone out and the whole thing starts sucking up water.

The useful thing to do is to get rid of armour while you’re sinking, which actually might be possible with chainmail or gambeson (although you need to loose your belt), or some parts of plate armour (neither shoulder nor arms and legs probably).

Pictures from the late middle ages show soldiers swimming for an attack in their underpants, shoes (they’re rather light: my reconstruction half-high boots are 480g each) and hats, with their pikes(!).

Swisscom Peering Policy Perversions

Wednesday, May 21st, 2014

Was ist ein Peering?

Wenn man von einem Internetprovider Daten zu einem anderen schicken will, dann geht das an erster Stelle über einen Upstream, einen grösseren Internetprovider an dem andere Internetprovider angehängt sind. Diesen Upstream bezahlt man.

Ein Peering ist nun, wenn man eine direkte Leitung zum anderen ISP einrichtet, und allen Traffic vom und zu diesem ISP (aber nur den) direkt darüber leitet. Dies geht relativ kostengünstig wenn man bereits im selben Datacenter Infrastruktur hat, und es gibt Vereine, hier die SwissIX welche die gemeinsame Infrastruktur (die Switches) in diesen Datacentern betreiben.

Mit einem Peering sparen nun beide Seiten Upstream-Kosten, und die Kunden profitieren von kürzen Pfaden, also schnellerem Zugriff. Es ist also in den meisten Fällen eine Win-Win Situation.

Es gibt vereinzelt Fälle wo der eine Partner mehr profitiert als der andere, typischerweise profitiert dann der der mehr Daten saugt als er liefert.

Swisscom saugt

Die Swisscom ist einer der grössten Endkunden-Provider der Schweiz, und damit auch einer der grössten Empfänger von Daten. Man würde nun erwarten dass die Swisscom ein sehr grosses Interesse daran hat zu peeren, speziell mit Providern deren Kunden bei Swisscom-Kunden beliebte Seiten anbieten.

Stattdessen verlangt die Swisscom eine monatliche Miete. Mit anderen Worten, die Swisscom spart Upstream-Kosten, die Kunden der Swisscom haben besseren Zugriff auf Webseiten, und die Swisscom lässt sich das auch noch bezahlen.

Der andere ISP spart etwas Upstream-Kosten, und drückt dann stattdessen gleich wieder Geld an die Swisscom ab. Finanziell kann das nur bei sehr grossem eingespartem Datenvolumen funktionieren; wenn die Einsparungen grösser sind als die monatliche Rente an die Swisscom.

Der einzige Grund weshalb das funktionieren kann, ist dass die Kunden anderer ISPs, die Webseiten anbieten, ein Interesse daran haben dass ihre Seiten schnell bei den Swisscom-Kunden ankommen. Und dieses Interesse haben sie, weil die Swisscom einen Grossteil aller Endkunden bei sich angehängt haben. Ein sehr deutlicher Missbrauch einer marktbeherrschenden Stellung.

Tatsächlich haben sich auch schon Provider in der Schweiz dagegen gewehrt, z.b. hat Init7 einen Teilsieg gegen die Swisscom errungen. Aber dass die Swisscom immer noch für Peerings Geld verlangen kann, zeigt deutlich dass da von Wettbewerb keine Spur vorhanden ist, und die Swisscom nach wie vor bereit ist ihre Kunden, und die Qualität deren Internetverbindung, gegen kleinere Internetprovider auszuspielen.

Die Verlierer dieser Monopolrentenpolitik der Swisscom sind die anderen Internetprovider, deren Service anbietende Kunden, und die Kunden der Swisscom.

A Guide to Movie Encoding

Saturday, April 26th, 2014

This is a guide to encoding and recoding movies, mostly on Linux, and also partly a rant against the most egregious practices.

I’m talking of encoding here, but actually, just about all the sources you can get movies will already be encoded, be it DVDs, bluerays, modern cameras or files. Very rarely you will get an unencoded stream, e.g. from a VHS. So all this applies actually mostly to re-encoding.

Also, being on Linux, one of the main requirements is that all the formats are supported by open source software. I don’t care about any possible patent-violations, because those would involve software patents, and these would haven been granted illegally anyway.

The tools used and denoted by fixed font are Linux commands or Debian/Ubuntu packages; but most of the software is available on other platforms as well.

Use the source

The quality of the encoding relies most heavily on the quality of the source you have. The more artifacts — no matter where from, be it from the actual film, dust, scratches, the projector, the camera, the VHS-drive, or the more modern electronic encoding-artifacts — the bigger the encoding will get to retain the same quality as the source. Some of the worst things I’ve seen are early black and white movies with loads of dust, scratches and grain.

Basically, artifacts increase the entropy, and the more entropy the less compression is possible.

  • Use the best source available. Usually blueray, unless the producer just interpolated from a DVD, in which case adjust the resolution back down to the DVD level, usually 720 wide (but 704 or 352 is possible).
  • Codecs matter. Some are notoriously efficient at encoding artifacts, that any re-encode will actually increase the file size. DIV3 is one such.
  • Otherwise you might gain from 20% to 50% by re-encoding DIVX, XVID, DX50 with a better codec, with no loss in visible quality. And of course, with MPEG2 from DVDs you can gain around 80-90% space, and with MPEG-4 AVC or VC-1 from bluerays, around 50-80%, depending on your quality needs.
  • Generally, a 500MB file encoded from a blueray will look much better than the same 500MB file encoded from a DVD, at the same resolution. Actually, you might even get a better resolution from the blueray, at the same file size.

Acquiring target

For the target, there are basically three factors that matter in the overall quality: container, codec and encoder. Apart from resolution, of course, but there the maximum is dictated by the source.

  • Container is easy: It must support multiple audio streams, multiple subtitles, preferrably in textual format (e.g. srt or ssa), and metadata, preferrably also cover images. This Comparison of container formats makes clear this is Matroska, probably followed by MP4.
  • Codec is a bit more tricky. But basically, you want one of the newer ones, they’re offering consistently better quality at lower file size. Which about leaves H.264 and VP9. You probably want H.264, Bluerays already mostly come in it, so do youtube-videos nowadays.
  • Stop using DIV3, DIVX, XVID, DX50 right now. They’re vastly inferior compared to what modern codecs deliver in half the filesize.
  • Audio codecs don’t have a large influence on file size, But as AC3 can’t do VBR, you don’t want that, and MP3 can’t do more than 2 channels. That leaves AAC and Opus as viable options, which happen to be the defaults to go with H.264 and VP9 respectively. Don’t use AC3, and don’t use DTS, both are obsolete.
  • Fortunately, handbrake-gtk already comes with H.264 and AAC as defaults, you only need to set the container to Matroska, and you’re good. A quality factor RF of 20 is usually good; 25 is still acceptable everything more is visually bad.
  • If you’ve already got a load of MP4-files encoded with H.264 and AAC, mmg (from mkvtoolnix-gui) can rewrite the container of the file to Matroska without re-encoding. And it also supports adding more audio-tracks, subtitles and image-attachements.
  • If you want to reduce the dimensions of the movie in order to reduce filesize, don’t go below a width of 720, Actually, rather reduce the quality somewhat before reducing dimensions, the visual impact is less noticeable.
  • Don’t ever go for a “filesize of 700MB”, that’s just stupid. Nobody wants to put a movie on a CD (and actually most people wouldn’t, even 15 years ago).
  • But be careful about filesize. Sadly, there’s still VFAT filesystems out there, which can’t contain files bigger than 2GB. some of them used by todays “Smart” TVs.

Dub stands for Dumb

There is only one reason for dubbing a movie — making it available to children who haven’t learned to read yet, and to the illiterate.

  • Whoever, ever had and has the idea to voice-over instead of just leaving the original language alone and subtitle it, is a total moron. And so is everyone encoding a movie with such an audio track. However, it is acceptable to voice-over parts with foreign speakers in documentaries (but not the whole documentary!).
  • If you still want to encode a dubbed audio track, make sure to also include the original language track. If it’s not possible with your container format, you’re using the wrong one.
  • Since not everyone is expected to read every language, include all available subtitles. Again, if your container doesn’t allow that, you’re using the wrong one
  • Hardcoded subtitles (within the movie stream itself) probably means you’re either a moron or using the wrong software. It’s only acceptable if the source had them too.
  • Those pesky vobsub-files, which are actually (mpeg-)streams, can be OCR’d to textfiles (srt, ssa) with vobsub2srt. Whatever vobsub2srt cannot recognize can be OCRd with SubRip (works with wine), for instance, but it will require heavy work. So you would be better off either to get them from opensubtitles.org or just include the vobsub.
  • Subtitles that are out of sync can be fixed with subtitleeditor. If they just start late or early, you can also just set an offset within mmg (from mkvtoolnix-gui)

Finishing Touches

After having a decent file, you might want to add metadata and (if applicable) cover-images.

  • The minimum metadata you need to provide is title, year and director (yes, there are at least two movies with the same name, published the same year).
  • If the movie is a known published one, can fetch the metadata, and my nfo2xml can convert it into a Matroska meta-data xml which can be muxed in with mmg.

Scanning Books on Linux

Monday, March 24th, 2014

I’ve been scanning books for a long time, and it’s always kinda problematic with the toolchain, and with the workflow. Now I’ve pretty much found out what works, and what does not.

As a note: All the shell-code on this page assumes your files do not have spaces and other weird characters like “;” in them. If not, my bicapitalize can fix that..

Scanner

The first thing you want to have is a decent scanner, preferably one with and automatic document feeder (ADF). According to the internet, what you need would be the Fujitsu ScanSnap iX500, since it appears to be the most reliable.

However, that’s not the one I have, mine is an EPSON Perfection V500, a combined flatbed/ADF scanner, which needs iscan with the epkowa interpreter. It works, but it’s rather slow.

Scanning Software

I mostly use xsane. With the epkowa-interpreter, it gives me some rather weird choices of dpi, that’s why I mostly scan at 200×200 dpi (I would recommend 300x300dpi, but epkowa does not give me that choice, for some weird reason). Also, I scan to png usually, since this gives me the best choices later on, and is better suited to text than jpeg.

Of course, never scan black-and-white; alaways colour or greyscale. Also, don’t scan to pdf directly. Your computer can produce better pdf files than your scanner does, and also, you would need to tear the files apart anyway for postprocessing.

Get Images from PDF

If you happen to have your images already inside a pdf, you can easily extract them with pdfimages (from poppler-utils):

pdfimages -j source.pdf prefix

Usually, they will come out as (the original) jpeg files, but sometimes you will get .ppm or .pbm. In that case, just convert them, something like so:

for i in *.ppm; do convert $i `basename $i .ppm`.jpg; done

(The convert command is of course from graphicsmagick or imagemagick)

Postprocessing Images

Adjust colour levels/unsharp mask

Depending on how your scan looks, you might want to change colour-levels or unsharp mask first. For that, I’ve written some scheme-scripts for gimp:

batch-level.scm
batch-level.sh
batch-unsharp-mask.scm
batch-unsharp-mask.sh

The scheme-files belong into your ~/.gimp-2.8/scripts/ directory, the shell-scripts into your path. Yes, they’re for batch-processing images from the commandline.

Fix DPI

If the DPI is screwed, or not the same for every file, you might want to fix that too (without changing the resolution):

convert -density 300 -units PixelsPerInch source.jpg target.jpg

Tailor Scans

If your scans basically look good, as far as brightness and gamma is concerned, the thing you need is scantailor. With it, you can correct skew, artifacts at the edges, spine shadows, and even somewhat alleviate errors in brightness.

Be sure to use the same dpi in the output as in the input, as scantailor will happily blow up your output at no gain of quality. Also, don’t set the output to black-and-white, because this will most probably produce very ugly tooth-patterns everywhere.

You will end up with a load of .tif images in the out-folder; which you either can shove off to OCR directly, or produce a pdf out of it.

Don’t even try to use unpaper directly. It requires all the files converted to pnm (2MB jpeg will give 90MB pnm), and unless your scans are extremely consistent and you give it the right parameters, it will just screw up.

Create PDF from Images

We first want to convert the tif-images to jpeg, as it will be possible to insert them into a pdf file directly, without converting them to some intermediate format. Most of all, this will allow us to do it via pdfjam (from texlive-extra-utils) which will do it in seconds instead of hours.

for i in *.tif; do convert $i `basename $i .tif`.jpg; done

And then:

pdfjam --rotateoversize false --outfile target.pdf *.jpg

NEVER, ever use convert to create pdf-files directly. It will run minutes to hours, at 100% load and fill up all your memory. or your disk. And produce huge pdf-files.

Create PDF Index

Even if your PDF consists entirely of images, it might still be worthwile to add an index. You create a file like this:
[ /Title (Title)
/Author (Author)
/Keywords (keywords, comma-separated)
/CreationDate (D:19850101092842)
/ISBN (ISBN)
/Publisher (Publisher)
/DOCINFO pdfmark
[/Page 1 /View [/XYZ null null null] /Title (First page) /OUT pdfmark

And then add it to the PDF with gs:
gs -sDEVICE=pdfwrite -q -dBATCH -dNOPAUSE \
-sOutputFile=target.pdf index.info \
-f source.pdf

The upper part, the one with the metadata is entirely optional, but you really might want to add something like this. There’s some other options for adding metadata (see below).

Another option is jpdfbookmarks, however it doesn’t seem to be very comfortable either.

OCR

The end product you want with this is either a) a PDF (or EPUB) in which text is really native text and not an image of text, rendered in a system font, or b) a PDF in which the whole image is underlied with text, in a way in which each image of a character is underlied with the (hopefully correctly recognized) character itself.

Sadly, I don’t know any software on Linux which can do the latter. Unless you want to create an EPUB file, or a PDF which does not contain the original image on top of the text, you need to use some OCR software on some other platform. The trouble of course is, going all the way (no original image of the page) means your OCR needs to be perfect, as there is no way to re-OCR, or sometimes even no way to correct the text manually. And of course, the OCR software should retain the layout.

For books, doing a native text version is of course preferred, but for some things like invoices, you really do need the original image on top of the text.

Apparently, tesseract-ocr now incorporates some code to overlay images on text, but I haven’t tested that. Also, there’s seems to be some option with tesseract and hocr2pdf. But I’m not keen to try it, since ocropus, which can’t do that, has had consistently the better recognition-rate, and even that one is lower than the ones of commercial solutions.

Adding metadata to PDF files

I’ve written about this, and I’ve also written some scripts to handle this. You can do it by hand, with exiftool, or you can use my exif-meta which will do it automatically, based on the file- and directory-name, for a lot of files.

For books, unless Your name is “Windows User” and your scientific Paper is called “Microsoft Word – Untitled1” you want to at least add Author, Title, Publishing Year, Publisher. ISBN if you have one.

Needed software

On a decent operating system (Linux) with a decent package-management (Debian or derivative), you can do:

apt-get install scantailor graphicsmagick exiftool xsane poppler-utils texlive-extra-utils

to get all the packages. The rest is linked in the text.

See also

I’ve found some other related pages you might find interesting:

Life with Calibre

Tuesday, November 26th, 2013

Calibre is undisputedly the number one when it comes to e-book management. It’s HUGE. It’s got a plethora of functions.

And it’s got quirks, design decisions which may not suit to your workflow. Certainly a lot of them don’t suit to mine.

  • Calibres own space. Every document imported into the library ends up copied into some private directory of calibre, and named according to some /Author/Title/Title scheme. The way I cope with this, is import into calibre, and save-to-disk again.
  • Metadata on the filesystem Metadata is stored not within the file, but in some database, and apparently in some opf-file with the book as well. Luckily, calibre tries to put metadata into the file when saving to disk. So the solution here is the same as above.
  • Name like Yoda, A When writing files, it misnames them to some library sort order, with the article appended at the end. To fix this, there’s a parameter in “Preferences” -> “Tweaks” -> “Control Formatting of Title and Series when used in Templates”, called save_template_title_series_sorting which needs to be set to strictly_alphabetic
  • No such Character There’s a set of characters Calibre does not want in file names. They are the same on all platforms, and while it’s not wise to use asterisks and such on unix filesystems, because they would wreak havoc on shell-processing, they would still work. The only character really not allowed is the “/”. But Calibre also replaces various ballast from Windows, like desirable critters like “:” and “+”. The way to fix this is to edit
    /usr/lib/calibre/calibre/__init__.py and have them removed from _filename_sanitize_unicode.
  • Publishing by the Month Before the advent of the e-books, publishing dates are by definition expressed in years. Copyright law also uses the year only. To get rid of the ridiculous month in the publishing date, go to “Preferences” -> “Tweaks” -> “Control how dates are displayed” and set gui_pubdate_display_format to yyyy
  • Not unique As librarians know, in the absence of ISBN, books are identified by author, title, publishing year and publisher. Now when saving pdf files, Calibre neither puts in an ISBN, nor the publishing year, nor the publisher. Apparently, this is a problem of podofo, which does not know these. Speaking of which:
  • podofail Sometimes podofo also fails to write some tags. It’s not quite clear when this happens, as all my pdf files do not have any encryption, and exiftool can write metadata to them without problems.

Over time, I’ve written a slew of scripts to read and set metadata, these are:

  • epub-meta (c) — A very fast EPUB metadata-viewer based on ebook-tools’ libepub from Ely Levy.
  • epub-rename (perl) — A script to rename epub-files according to the EPUB’s metadata. Needs epub-meta and ebook-meta (from calibre).
  • exif-rename (perl) — A script to rename files according to their EXIF-tags. Tested with PDF, DJVU, M4V, OGM, MKV and MP3
  • exif-meta (perl) — A script to set EXIF/XMP-metatags according to the filename.
  • exif-info (perl) — Displays metadata and textual content of PDF files. Thought as filter for Midnight Commander

For further technical information and rants, you might want to read How to Enter EPUB Metadata Display EPUB metadata and rename EPUB files accordingly and Your name is “Windows User” and your scientific Paper is called “Microsoft Word – Untitled1″, also on this blog.

Minecraft: Semi-Automatic Farm

Thursday, October 24th, 2013

Welcome, this is my “1890 Fruit Company”, an automatic farm for Minecraft, which isn’t even about fruit. It looks rather 1890ies, though, and I couldn’t resist the name.

1890 Fruit Co.

It produces patatoes, carrots, wheat and seeds. You need to sow and plant yourself, fertilizing and harvest are pretty much automated, and the products are automatically sorted.

The schematic

The license of these files and my screenshots is the OPL 1.0 (which is about the same as CC-by-sa).

Matroshka and the State of Movie Metadata

Saturday, September 21st, 2013

I like my metadata for files within the file. The reason is simple, a filesystem is only a temporary storage for a file, and things like filenames or paths only make sense within the filesystem itself. If you move the file, the filesystem might not support your particular metadata.

Starting with the path. For instace, /movies/Pirate/ won’t exist on other peoples machines, and it actually can’t even exist on stupid windows filesystems. So the fact that the file residing within this path is probably a pirate movie would get lost. And of course, not every filesystem supports all characters or encodes them the same way, and thus the movie “Pippi Långstrump på de sju haven” might end up with a totally garbled title on a filesystem.

Since I work on the Unix shell and on the web a lot, spaces in filenames tend to get garbled (“%20”) or interfere with commandline processing. So my filenames do not have spaces or umlauts in them, they are instead BiCapitalized. In fact, I’ve written a program bicapitalize to do just that.

Enter Matroshka

When it comes to metadata, the one container format that can just about contain everything is Matroshka. MP4 would be a possibility, but it’s rather constricted in it’s use of subtitles, codecs and audio tracks or even cover images. Also, matroshka looks much less as if “designed by commitee” as MP4 does; and is generally better supported by open source software. Not quite enough, as we’ll see..

To get from, say, avi containers to mkv is easy (after apt-get install mkvtoolnix):

for i in *.avi; do mkvmerge -o `basename $i .avi`.mkv --language 1:eng --title "`respacefilter $i | cut -d . -f 1`" $i ; done

This only changes the container, it won’t recode anything. It usually works with avi, mp4, mpeg, flv, ogm and more, but not with wmv.

You’ll notice the program respacefilter, which I’ve written to display BiCapitalized filenames as strings containing spaces. And if you’ve got some experience with the unix shell, you’ll also notice the above commandline will fail for files containing spaces. That’s exactly the reason why spaces in filenames are bad.

The above command also sets the “Title” tag to something probably meaningful, and the language of the first audio track to english. You can change the latter later on with
mkvpropedit --edit track:a1 --set language=ger target.mkv

If the title is screwed, you could set it with
mkvpropedit --edit info --set title="This Movie" target.mkv

Of course, if you already do have Matroshka files, and their title tags are not set or wrong, you might not want to set all titles by hand. I’ve also written a script called titlemkv to fix this. It can also fix some drawn out all-caps acronyms. Apart from the mkvtools, this needs mediainfo (install on Debian/Ubuntu with apt-get install mediainfo).

All the above can also be done, one file at a time, with the graphical interface mmg (of course: apt-get install mkvtoolnix-gui).

By now, you should have all you movie files in Matroshka-containers, and if not, because things like wmv-files, or files containing ancient codecs can’t just be re-containered, there’s HandBrake (as usual, apt-get install handbrake-gtk)

Matroshka Metadata

Apart from title and the languages of audio-tracks and subtitles, Matroshka files do not contain any metadata directly. Instead, they are in an xml-file, which is muxed into the container. Which makes the whole process obviously rather tedious. You don’t want to do it by hand.

Also, it turns out, most application do not read any metadata from the containers AT ALL. mediainfo of course can do it. So can avinfo, surprisingly. vlc can display most of them in a special window. mpv will display the Title as the window title. But the ones really needing metadata, the media center applications CAN’T. Neither MythTV, nor xbmc. Instead, both of these rely on filenames, and put the metadata into their database, with the added option of using some accompanying file with the movie which gets interpreted as well.

To add insult to injury, given one of these accompanying files with correct data, xbmc will display it, but when trying to fill in the blanks, it will happily try to look it up — by interpreting the filename again, wrongly. At least MediaElch can do this right (and that’s why it gets linked).

So the questions are a) how do we get these “accompanying files” (assuming they’re really needed for getting metadata from the web) and b) how do we get better metadata into them, and c) how do we put this metadata into the files itself.

For this, titlemkv can produce a rudimentary .nfo file for xbmc, when given the -n switch. It will contain the title, and the year, if it is already set in the mkv. Going from this, MediaElch or any other not broken scraper, can now fill in the blanks and produce .nfo files which contain a lot of information, like directors, actors, summaries and so on.

The last piece is my nfo2xml script, which will walk over a directory and produce a mkv-compatible XML file out of every .nfo-file it finds. The XML can the be muxed into the mkv-container, thus:
for i in *.mkv; do mkvpropedit $i --tags all:`basename $i .mkv`.xml ; done

The Future

I’ll probably update titlemkv to generate complete .nfo files from mkv metadata (or split the functionality into another program), also, I want to look at the question of how to incorporate cover images and such. I want all my files to contain useful metadata, and second, as long as this sorry state persists, I want to be able to generate whatever external metadata an application wants out of the incorporated metadata (which has its own merits: I would also be able to rename and sort my whole collection solely according the metadata in the files themselves).

(Edit 1: I wrote a rather stupid shellscript mkvattachcover to convert and attach cover images. It expects them with the filenames provided by MediaElch.)

(Edit 2: For use with mediainfo --inform=file:///path/to/AVinfo.csv I put up a decent template, AVinfo.csv which will show Matroshka specific tags. No, I have no idea why they’re calling their templates .csv, they aren’t.)

But crucially, the media center applications and the file managers will need to support metadata incorporated into files; just as one expects with audio files where this is absolutely the case.

Metadata MUST reside within the same file. I do understand that certain programs do not want to incorporate code to change this metadata, but just about everything accessing these files must be able to READ it, including media players, scrapers and file managers.

(Edit 3: nautilus displays either cover.jpg or small_cover.jpg as icon. But that’s it, apparently it can’t read any other metadata.)

Patents on Bronze Age Technology

Friday, May 10th, 2013

This here is from Apple’s Slide-to-Unlock patent, which is currently being invalidated.
Slid to Unlock Patent
However, the question remains why this could be granted in the first place. Laziness? A case of “it said computer, so I turned off my brain”? Or job-blindness “I couldn’t find any prior art in the patent database”?

Because the amount of prior art is actually staggering. This here is one of the earliest I could casually find:
 Abydos King List. Temple of Seti I, Abydos
Yes, it’s hieroglyphs, and they’re from roughly 1290 B.C. The topmost hieroglyph is a “z” (or hard “s”), and the symbol is that of a door bolt. And since hieroglyphs are rather old, and Seti I. by no means one of the early pharaohs, this means there’s most probably much older evidence out there for “slide-to-unlock”.

And I’d wager there’s so much more of this crap out there. Chances are very slim that this is an isolated case, this is most probably endemic, system inherent.

Your name is “Windows User” and your scientific Paper is called “Microsoft Word – Untitled1”

Saturday, April 20th, 2013

At least that is what I get from the metadata in your publication.

Google finds about 250’000 of these papers. It gets much worse if you only search for documents called “untitled1”. Not just the documents themselves have this meta-information, but all kinds of conversions, to html, and to pdf as well.

Sometimes, to make the whole thing even more ironic, the publisher has added his own information — but neither the title, nor the author.

Yes, metadata is a kind of a pet issue for me, and I’ve even written about How to Enter EPUB Metadata, apart from also having written Software to fix metadata in PDF- and epub-files (epub-meta/epub-rename and exif-meta/exif-rename. The latter works for PDF; the name comes from exiftool, altough technically the PDF metadata is XMP).

But still, if your paper should be worth anything, it should be worth to be found, and this also means worth being provided with accurate meta-information.

Librarians either work with an ISBN, and if no ISBN can be found (because it was published before 1969, or because no ISBN was ever registered), they need the following to correctly identify a work:

  • Author
  • Title
  • Publishing Year
  • Publisher

So you should take care that at least the first three of those are correctly filled in. If you’re doing a paper or book in the course of your work or study and publish it on the internet, consider entering the university or company as publisher.

Minecraft: Mob Factory

Tuesday, December 18th, 2012

I noticed some time ago that multiple spawners could be active at the same time, as long as the player was within 16 blocks of each of them. However, if too many mobs of the same kind the spawner spawns were within 9x16x16 around the spawner, it would stop spawning after 6 mobs or so.

So in principle, it must be possible to have a lot, maybe 8, spawners, completely with their delivery- and killing-system within a sphere of 16 blocks around the player, all churning out mobs and items. So that’s where I started:

In green you can see the 16 block sphere around where the player would be standing, in yellow are the 9x16x16 areas where no mobs of the same type should be (and consequently, the area any spawned mobs need to leave as soon as possible). The cyan circle is the ground layout, and of no consequence. The spawners along with their spawn-boxes are in brown and in stone. Those structures made of end-stone are elevators and droppers, to the left is one for skeletons, to the right one for cave spiders.

This made for a rather cramped internal layout, with 7 spawners and all the mobs which needed to be lead out, upwards, thrown down, and led to the middle again. Plus the redstone, mostly for lighting. it was a mess, the spider-grinder didn’t really work, for blazes and endermen I hadn’t implemented any automatic system, and I didn’t know where to put them because of lack of space.

Then I watched Etho Plays Minecraft – Episode 234: Egg Delivery where he demonstrated with Minecraft 1.4 items will go up a solid block, if there’s no other space around where they could go. So I redesigned the whole interior. I decided that only blazes would be left to kill for XP, and the other mobs would just get killed as soon as possible, and their items sent up to the central room.

This I did. And I moved the spiders to one side, making space for another spawner, slimes, making it the whole 8 spawners I initially envisioned. Of course, if I hadn’t cared for isolating zombies, creepers and skeletons from each other, it would have been possible to put in more spawners. Probably all of them. So this isn’t as efficient as it could be.

I initially had some problems with the redstone-circuits, but I finally realised that something simpler would do the job just as well. Now it’s only tow clocks, one for the item elevator and one for the grinder, a T-flipflop, also for the grinder, and a pulser, for sweeping out items.

The two mobs posing the biggest problems were blazes and slime. blazes, because they need a light level of 11 or higher in order not to spawn (which I solved with lots of redstone lamps and a smaller spawn-area) and the slimes, which would spawn in any light. I now put half their spawn area under water if spawning is turned off, but small slimes still spawn. For the cave spiders, I just turned the above item elevator into a killing machine, killing spiders and sending up items at the same time.

Right now, I’m still not entirely happy with the blaze-situation, I would like to have them delivered to the central room, so I can kill blazes while I wait for items, but I’ve not yet found a good solution.

Finally, I couldn’t resist to give the thing a facade, and I decided upon a late 19th century industrial look. Half of it is buried in the ground, so this makes the main control room in the middle of the structure easily acessible from ground level:

I call it “The Manufacture”, although it’s of course none. But this fits with the 19th century theme, where factories sometimes still were called manufactures, although the production wasn’t really “handmade” any more. And it works day and night ;):

Level and schematic:

Update:

Mob Products

Minecraft 1.5 is out, and it makes item-handling so much easier. So this is the totally revamped mob factory, now called “Mob Products”, featuring lettering on the roof (idea and font by Etho), and using hopper conveyors, dropper elevators and an item sorter.

Also, I went over it with 1.6.2 and the HostileSpawnOverlay mod, and fixed some lighting.

Update 2:

Mob Products Front

I fixed the typo on the roof ;).

No new level but simply the schematic:

The license of these files and my screenshots is the OPL 1.0 or CC-by-sa 3.0.