[nzlug] computer read/write access to paper?
Michael Field
michael.field at concepts.co.nz
Mon Jun 25 08:39:49 NZST 2007
> Vague thoughts: I could hack up my own method by choosing a subset of
> characters which are consistently well recognized by ordinary OCR
> software.
>
> Karl.
Now that is a simple to test...
I've got a program that I wrote to allow me to suck binary files down a
dial-up terminal connection (without X/Y/Z modem), which I slightly
modified. You could use that, but just customize the alphabet. My goal
was a small encoder that I could cut-and-paste it to the remote system
and compile there, or write the equivalent in any other handy language.
# echo "this is a test" | ./ocrdump
EHJGIGDHACIGDHACBGACEHFGDHEHKA
# echo "EHJGIGDHACIGDHACBGACEHFGDHEHKA" | ./ocrundump
this is a test
#
Should be OK to test ideas with - it might actually be useful. Just
customize "outchars[]" in both programs to use the alphabet of your
choice.
A 'real' solution would most probably use
- Sequence numbers at the start of lines (of some sort), in case the
pages get shuffled!
- Distributed ECC to correct one or two character errors on the line.
- Check-summing of lines insure integrity
Even just implementing a per-line parity character would be pretty good
at picking up most errors. But checksumming combined with ECC this would
really reduce the amount of OCR errors that would go undetected or need
fixing up manually.
But even better would be rather then using OCR and images would be some
kind of 'brail' like system. Have a character size space diced up into a
3x3 grid. Use each space to store a byte's bits with parity (and to
allow the OCR to 'key'. Use the solid bock at the start and end of the
lines (and maybe in the middle) to allow the decoder program to sync
onto the grid.
You should be able to achieve a reliable 6k per page (data cells at
25dpi), with minimal complexity in the encoding/decoding software. But
it would be harder to correct scanning errors!
Mike
Use this to encode a stream:
-----------------------------------------------------------------------
/* ocrdump.c
*
* Dump binary file into a reduced character set for OCR retrieval
*
*/
#include <stdio.h>
static char outchars[16] = "ABCD" "EFGH" "JIKL" "MNPQ";
#define WIDTH 78
int main(int c, char *v[])
{
int ch, count=0;
while((ch = getc(stdin))!= EOF)
{
putchar(outchars[ch&0xf]);
putchar(outchars[ch>>4]);
count += 2;
if(count >=WIDTH)
{
putchar('\n');
count = 0;
}
}
if(count != 0)
putchar('\n');
return 0;
}
-----------------------------------------------------------------------
And this to decode a stream:
-----------------------------------------------------------------------
/* ocrundump.c
*
* Restore a binary file dumped with ocrdump.c - reads stdin,
* writes stdout
*
*/
#include <stdlib.h>
#include <stdio.h>
static char outchars[16] = "ABCD" "EFGH" "JIKL" "MNPQ";
#define WIDTH 78
int main(int c, char *v[])
{
int ch, lastchi = -1,line=1,count=0;
while((ch = getc(stdin))!= EOF)
{
int i;
if(ch == '\n') {
count=0;
line++;
continue;
}
count ++;
/* Scan for char */
for(i=0; i < sizeof(outchars); i++)
if(ch == outchars[i])
break;
if(i == sizeof(outchars))
{
fprintf(stderr,"Error - Unknown char '%c' on line %i pos %i\n",ch,
line,count);
exit(1);
}
if(lastchi != -1)
{
putchar((i<<4)+lastchi);
lastchi = -1;
}
else
lastchi = i;
}
if(lastchi != -1)
{
fprintf(stderr,"Error - Unenven number of input characters\n");
exit(1);
}
return 0;
}
-----------------------------------------------------------------------
_______________________________________________
NZLUG mailing list NZLUG at linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Computer Concepts Limited
25 Leslie Hills Drive
PO Box 8744 Riccarton
Christchurch, New Zealand
Phone: +64-3-348-2500
Fax: +64-3-343-7569
Notice of confidential information:
The information contained in this e-mail message is
confidential information and may also be legally privileged,
intended only for the individual or entity named above.
If you are not the intended recipient you are hereby
notified that any use, review, dissemination, distribution
or copying of this document is strictly prohibited.
If you have received this document in error, please
immediately notify the sender by telephone and destroy the
message. Thank you.
All prices quoted in this email are exclusive of GST & Freight and
valid only while stocks last.
More information about the NZLUG
mailing list