./Proofing Images - The most important stage!\.

Proofing images doesn't take long. I use ACDSee, and just blast through the images one at a time. What you're looking for here is blurred/out of focus text, or curvature to the page. Either of these often causes the OCR process to go totally wrong. Spending the time replacing the duff images with good ones will be well spent. I re-photo the blurred or otherwise useless images, and rename them while they are still on the camera to have the same filename as the ones they are to replace. This means that when they are copied over, they overwrite the existing ones. See below for common problems that can be picked up by proofing, and what problems they can cause.


Some people may wish to edit the images in something like photoshop, and this is open to debate. The problem this causes is that photoshop will change the resolution of the photos, and possibly the physical size, which can mean the OCR software won't play niceley. The upside of using photoshop would be to use sharpening filters and what-not to improve the readability of the images. If someone can send me some useful actions to improve images substantially I'll think about changing my mind, but as it is I reccomend NOT using photoshop.

At the same time, you may choose to break your images down into resizeable chunks. Doing the OCR in one large go means that there is less post processing to do, less files to play around with, etc. However, it also means that if you mess something up you mess up the whole thing, and some readers (such as the notepad on my old smartphone) won't handle such large files. It also means you can't chop and change what you're doing, so you can get VERY bored of it! I sometimes do full books, but normally will do one chapter at a time, so create folders 01 to say 22, then drop photos from each chapter into the relevant folders.

Go Home | Optical Character Recognition (Index Page) | Previous | Next

Please click here to become my friend!
Back Up! ./28 Sep 2008 22:58\.
Well, after a period of nigh on a year, my website is back online. I'll try and leave more details on my life shortly, but to summarise, I'm on a new host, the gallery is now based on flickr, and that's about it for the site! More to come in the future (including an easier to use colour scheme, some people don't like grey!)
Website Issues ./28 Oct 2007 13:37\.
I've just moved the site over to a new host, which (I hope) is substantially more reliable, and which definately gives me a lot more space! However, until I get the sub-domains moved over, and the database sorted, there may be a few issues. Deal with it :)
Website Issues ./28 Oct 2007 12:50\.
I've just moved the site over to a new host, which (I hope) is substantially more reliable, and which definately gives me a lot more space! However, until I get the sub-domains moved over, and the database sorted, there may be a few issues. Deal with it :)
PGCE ./09 Oct 2007 07:28\.
As most people now know, I am currently studying for a PGCE. This is a Post Graduate Certificate of Education. Yes, I'm learning to become a teacher! Specifically, a secondary AICT teacher. It's good, I'm really enjoying the course, but as you may have noticed by the lack of recent posts, I'm very busy. More information will be forthcoming!
Digital Photoframes ./09 Oct 2007 07:24\.
My digital photo frame project is now defunct. Partly 'cos I broke the laptop I was using, but mainly because I have received two digital photo frames as presents. They're very pretty!
Digital Photoframe ./18 Jun 2007 04:25\.
I've always liked the idea of them, but (a) I resent paying for ANYTHING, and (b) the ones available commercially just don't seem to hit the spot. Read how I made my own here
Timelines ./15 Jun 2007 05:42\.
I've been playing with timelines recently, and have got SIMILE's timeline working. It's FUN! Check through my links to have a look at the timelines I'm creating at the moment!
New CMS ./01 May 2007 03:43\.
I've been working on my content management system a lot recently. I've adjusted the way a document set is saved on the server. They're now stored as what I refer to as stories, and the content is kept completely seperate to the display. This enables me to use a the in-built editor in the CMS to edit stories really easily. Good innit? You guys don't care, and probably won't see any difference, but I'm chuffed :)
XJ600 ./06 Mar 2007 01:24\.
I've acquired a new bike! A 1984 model Yamaha XJ600 Japanese import. The first Japanese 600s to come onto the market. Read the restoration story here.
Books books books ./16 Jan 2007 06:17\.
I've got a lot of books. I do mean a lot. So I'm selling loads off as most of them are stuff I never read/wanted/understood. And what better way to do this than to have a bit of interweb fun? So the cataloguing system uses a barcode reader and a lookup on both isbndb.org and amazon. This captures maybe 75% of the books (many are bizarre ones!). This is then going into a database, and I am working on an automatic script to post to gumtree. Also thinking about getting together with some friends who've purchased a large number of surplus books, and using something like abe to sell them in a more formal manner. I will of course also be setting up a website of my own for direct sales, as all the major ones (ebay, abe, amazon) charge a flippin' fortune! The website will be up at books.londonis.co.uk once it's ready.