Wednesday, November 15, 2006

OCRing Books for classes

Ok, this is a very broken comment that I posted on Student Tablet PC.com.
I have 10 min before getting ready for classes so here it is.

Well, let's just say that this whole "paperless" experience is becoming a pain. I LOVE my tablet pc. I love having books in PDF to read besides once in a while getting headaches from looking at a computer screen all day. My problem is this;
When I first grabbed my tablet and purchased my OpticBook scanner, I was scanning away. 300 DPI grayscale. My books were enormous. I even "OCRed" my books and it only added a very slight amount of space. Cool I thought! Then I found out I was doing it "wrong." I kept the image and OCRed the book, but I also never spent the time to correct the OCR problems. That's one thing that keeping the image (and the size) helped with. In order to get your PDF to a very small size (a 3 gig book into 5 megs), you'd have to literally spend DAYS correcting every stupid little spacing problem, every period that it sees as a comma, every sigma or alpha or square root or fraction or.. You get my point. This is much much much more hassle than it's worth. And to think I spent thousands of dollars to do this as well. I spend more time this quarter trying to figure how to use my tablet in an efficient way and got lost doing it.I love how there are options, but this option is no where close to actually being what it should. The OCR technology isn't there yet. I used ABBYY finereader pro 8.0 - the latest and greatest (Ranked #1 from businesses and other ratings sites) but it still falls short by a wide margin.As I said, I entered this experience a year ago with wide open eyes and great expectations. I still use my tablet daily (taking it to school this morning for my round of Econ, Finance and Math classes) and actually have been trying to get a hold of Fujitsu to see if I can get some sort of return because it is far lacking in what their sales staff and repair staff have promised. This is not exactly a problem with Fujitsu as a whole. I say this because I still love the form of this computer (much like I'm sure Tracy loves her Motion) but the technology isn't there yet. I read constantly what all of these tablet websites state. I also watch the ink shows as an economist would. I think it's lacking, period. It will get there, but I don't know when and I don't know how many of the trendsetter or early adopter's dollars it will cost before it actually delivers what it should.
That small rant being said,600 DPI still gives me errors when you print a page with a thousand stock quotes. All math symbols are not in the english ascii and thus won't be recognized. Pictures will have to be resized and bordered and you'll spend a lot of time looking at this through your software making sure it's perfect. Rs are seen as

Good luck. This is the last book I'll scan and want to make into a text only ebook style PDF. I'd rather buy one for $80 than spend 2 or 3 days on it.