./Acquiring Images - To Scan or Not to Scan?\.

Traditionally there are two ways of getting physical text to digital. Typing, or scanning. Typing is not going to happen really is it? So let's move on to scanning. Hand held scanners don't traditionally produce very steady or high quality scans, and aren't very popular. Most people, however, have access to a flat-bed A4 scanner, and you can pick them up from eBay for a pittance. If this is the route you choose to go down, I'd go for a relatively fast USB one (Parallel ones are horrendously slow!). Make sure the lid comes off, it'll make life a lot easier.

Now I'll take you through the method I use. I've not found much info on the web about it, but I've found it by far the best. The Gutenberg guide refers to something called a planetary scanner, which is basically a high-end digital camera placed on a stand, and retail at US$20,000 upwards. I started thinking about this. I know for a fact that this guide was written a number of years ago, and digital imaging has moved on in leaps and bounds. So I started playing with my camera...

Now my camera is not bottom end, it's a Sony Alpha a-100 10mp D-SLR, and set me back a small fortune, but adequate images should be available on lower-end cameras as well. I use a 25-70mm lens, normally on about 50mm. I leave the ISO, shutter speed and aperture on automagical. I pop up the on-board flash (the paper in the sample book isn't shiny, so the flash isn't a problem). I open the book and put it on the floor between my feet. I hold the book open and the page flat with my left hand, and with my right hold the camera in a portrait orientation with straps out of the way. I adjust the zoom so one page is showing, with a bit of space around it. I let the auto-focus do it's thing and... CLICK
Some things to remember...
  • You don't want the page too dark or too light, and may need to make amendments to the settings.
  • You want the page as FLAT as possile. Even a slight curvature can screw things up a lot!
  • Try and keep the pages straight and parallel with the sides of the photo. The OCR process can straighten to a certain extent, but you may as well make its life easy.
  • If the end of the chapter only has a few lines on the page, point the centre of the lens at the writing rather than the centre of the page. This normally helps it auto-focus. Manual focusing often goes wrong.
How many photos you take depend on a number of things. Now I've got the process down, I tend to do a book in one long go. This has upsides and downsides, which we'll go over as we hit them. For now you may find it easier to do a chapter at a time. I'm hoping to get hold of a tripod and remote release, which will hopefully make the process less physically hard work. As a rule of thumb, a 200 page book should take about 1/2-1 hour to photograph once you get going.

Go Home | Optical Character Recognition (Index Page) | Previous | Next

Please click here to become my friend!
Back Up! ./28 Sep 2008 22:58\.
Well, after a period of nigh on a year, my website is back online. I'll try and leave more details on my life shortly, but to summarise, I'm on a new host, the gallery is now based on flickr, and that's about it for the site! More to come in the future (including an easier to use colour scheme, some people don't like grey!)
Website Issues ./28 Oct 2007 13:37\.
I've just moved the site over to a new host, which (I hope) is substantially more reliable, and which definately gives me a lot more space! However, until I get the sub-domains moved over, and the database sorted, there may be a few issues. Deal with it :)
Website Issues ./28 Oct 2007 12:50\.
I've just moved the site over to a new host, which (I hope) is substantially more reliable, and which definately gives me a lot more space! However, until I get the sub-domains moved over, and the database sorted, there may be a few issues. Deal with it :)
PGCE ./09 Oct 2007 07:28\.
As most people now know, I am currently studying for a PGCE. This is a Post Graduate Certificate of Education. Yes, I'm learning to become a teacher! Specifically, a secondary AICT teacher. It's good, I'm really enjoying the course, but as you may have noticed by the lack of recent posts, I'm very busy. More information will be forthcoming!
Digital Photoframes ./09 Oct 2007 07:24\.
My digital photo frame project is now defunct. Partly 'cos I broke the laptop I was using, but mainly because I have received two digital photo frames as presents. They're very pretty!
Digital Photoframe ./18 Jun 2007 04:25\.
I've always liked the idea of them, but (a) I resent paying for ANYTHING, and (b) the ones available commercially just don't seem to hit the spot. Read how I made my own here
Timelines ./15 Jun 2007 05:42\.
I've been playing with timelines recently, and have got SIMILE's timeline working. It's FUN! Check through my links to have a look at the timelines I'm creating at the moment!
New CMS ./01 May 2007 03:43\.
I've been working on my content management system a lot recently. I've adjusted the way a document set is saved on the server. They're now stored as what I refer to as stories, and the content is kept completely seperate to the display. This enables me to use a the in-built editor in the CMS to edit stories really easily. Good innit? You guys don't care, and probably won't see any difference, but I'm chuffed :)
XJ600 ./06 Mar 2007 01:24\.
I've acquired a new bike! A 1984 model Yamaha XJ600 Japanese import. The first Japanese 600s to come onto the market. Read the restoration story here.
Books books books ./16 Jan 2007 06:17\.
I've got a lot of books. I do mean a lot. So I'm selling loads off as most of them are stuff I never read/wanted/understood. And what better way to do this than to have a bit of interweb fun? So the cataloguing system uses a barcode reader and a lookup on both isbndb.org and amazon. This captures maybe 75% of the books (many are bizarre ones!). This is then going into a database, and I am working on an automatic script to post to gumtree. Also thinking about getting together with some friends who've purchased a large number of surplus books, and using something like abe to sell them in a more formal manner. I will of course also be setting up a website of my own for direct sales, as all the major ones (ebay, abe, amazon) charge a flippin' fortune! The website will be up at books.londonis.co.uk once it's ready.