--- tesseract-2.04.orig/debian/tesseract-ocr.install +++ tesseract-2.04/debian/tesseract-ocr.install @@ -0,0 +1,4 @@ +usr/bin/* +usr/share/tessdata/configs/* usr/share/tesseract-ocr/tessdata/configs/ +usr/share/tessdata/confsets usr/share/tesseract-ocr/tessdata/ +usr/share/tessdata/tessconfigs/* usr/share/tesseract-ocr/tessdata/tessconfigs/ --- tesseract-2.04.orig/debian/wordlist2dawg.1 +++ tesseract-2.04/debian/wordlist2dawg.1 @@ -0,0 +1,32 @@ +.TH WORDLIST2DAWG 1 "August 21, 2007" +.SH NAME +tesseract \- command line OCR tool +.SH SYNOPSIS +Part of the process to train tesseract for a new language. Tesseract uses 3 dictionary files for each language. Two of the files are coded as a Directed Acyclic Word Graph (DAWG), and the other is a plain UTF-8 text file. To make the DAWG dictionary files, you first need a wordlist for your language. The wordlist is formatted as a UTF-8 text file with one word per line. Split the wordlist into two sets: the frequent words, and the rest of the words, and then use wordlist2dawg to make the DAWG files: +.PP +.B wordlist2dawg +.RI "frequent_words_list freq-dawg" +.PP +.B wordlist2dawg +.RI "words_list word-dawg" +.SH DESCRIPTION +This manual page documents briefly the +.B wordlist2dawg +command. +.PP +\fBtesseract\fP is a commercial quality OCR engine originally developed at +HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated +by UNLV. It was open-sourced by HP and UNLV in 2005. +.SH SEE ALSO +.BR feh (1), +.BR convert (1), +.BR mftraining (1), +.BR cntraining (1), +.BR unicharset_extractor (1), +.BR tesseract (1). +.br +.SH AUTHOR +tesseract was written by Ray Smith. +.PP +This manual page was written by Jeffrey Ratcliffe , +for the Debian project (but may be used by others). --- tesseract-2.04.orig/debian/compat +++ tesseract-2.04/debian/compat @@ -0,0 +1 @@ +7 --- tesseract-2.04.orig/debian/cntraining.1 +++ tesseract-2.04/debian/cntraining.1 @@ -0,0 +1,31 @@ +.TH CNTRAINING 1 "August 21, 2007" +.SH NAME +tesseract \- command line OCR tool +.SH SYNOPSIS +Part of the process to train tesseract for a new language. When the character features of all the training pages have been extracted, we need to cluster them to create the prototypes. The character shape features can be clustered using the mftraining and cntraining programs: +.PP +.B cntraining +.RI "fontfile_1.tr fontfile_2.tr ..." +.PP +This will output the normproto data file (the character normalization sensitivity prototypes). +.SH DESCRIPTION +This manual page documents briefly the +.B cntraining +command. +.PP +\fBtesseract\fP is a commercial quality OCR engine originally developed at +HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated +by UNLV. It was open-sourced by HP and UNLV in 2005. +.SH SEE ALSO +.BR feh (1), +.BR convert (1), +.BR mftraining (1), +.BR tesseract (1), +.BR unicharset_extractor (1), +.BR wordlist2dawg (1). +.br +.SH AUTHOR +tesseract was written by Ray Smith. +.PP +This manual page was written by Jeffrey Ratcliffe , +for the Debian project (but may be used by others). --- tesseract-2.04.orig/debian/control +++ tesseract-2.04/debian/control @@ -0,0 +1,38 @@ +Source: tesseract +Section: graphics +Priority: optional +Maintainer: Jeffrey Ratcliffe +Build-Depends: debhelper (>= 7.0.50~), libtiff4-dev, quilt (>= 0.46-7~) +Standards-Version: 3.8.3 +Homepage: http://code.google.com/p/tesseract-ocr/ +DM-Upload-Allowed: yes + +Package: tesseract-ocr +Architecture: any +Depends: ${shlibs:Depends}, ${misc:Depends}, tesseract-ocr-eng | tesseract-ocr-language +Replaces: tesseract-ocr-data +Description: Command line OCR tool + The Tesseract OCR engine was originally developed at HP between 1985 and 1995. + It was open-sourced by HP and UNLV in 2005 and Google has lead further + development. + . + The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV + Accuracy test. Between 1995 and 2006 it had little work done on it, but it + is probably one of the most accurate open source OCR engines available. It + will read a binary, grey or color image and output text. + +Package: tesseract-ocr-dev +Architecture: any +Depends: ${shlibs:Depends}, ${misc:Depends}, tesseract-ocr +Description: Development files for the tesseract command line OCR tool + The Tesseract OCR engine was originally developed at HP between 1985 and 1995. + It was open-sourced by HP and UNLV in 2005 and Google has lead further + development. + . + The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV + Accuracy test. Between 1995 and 2006 it had little work done on it, but it + is probably one of the most accurate open source OCR engines available. It + will read a binary, grey or color image and output text. + . + This package contains the header files + --- tesseract-2.04.orig/debian/copyright +++ tesseract-2.04/debian/copyright @@ -0,0 +1,218 @@ +This package was debianized by Jeffrey Ratcliffe +on Mon, 06 Aug 2007 21:27:22 +0200. + +It was downloaded from http://code.google.com/p/tesseract-ocr/ + +Upstream Authors: +Ray Smith (lead developer) +Phil Cheatle +Simon Crouch +Dan Johnson +Mark Seaman +Sheelagh Huddleston +Chris Newton +... and several others. + +Copyright: + + Copyright 2007 Google Inc. + +License: + + Licensed under the Apache License, Version 2.0 (the "License"); you + may not use this file except in compliance with the License. You may + obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +On a Debian system the complete text of the Apache-2.0 license can be found in +/usr/share/common-licenses/Apache-2.0 + +The Debian packaging is copyright 2007--2009, +Jeffrey Ratcliffe and is licensed under the +Apache-2.0 licence. + + +The files below have different copyright notices: + +Files: cutil/listio.cpp cutil/listio.h cutil/oldlist.cpp cutil/tessarray.cpp + dict/lookdawg.cpp dict/lookdawg.h dict/makedawg.cpp dict/makedawg.h + dict/reduce.cpp dict/reduce.h classify/baseline.cpp + classify/baseline.h classify/hideedge.cpp classify/hideedge.h + classify/protos.cpp classify/protos.h cutil/cutil.cpp cutil/cutil.h + cutil/oldlist.h cutil/tessarray.h dict/choicearr.h dict/dawg.cpp + dict/dawg.h dict/hyphen.cpp dict/hyphen.h dict/permdawg.cpp + dict/permdawg.h dict/permnum.cpp dict/permnum.h dict/trie.cpp + dict/trie.h wordrec/chop.cpp wordrec/chop.h wordrec/chopper.cpp + wordrec/chopper.h wordrec/closed.cpp wordrec/closed.h + wordrec/findseam.cpp wordrec/findseam.h wordrec/gradechop.cpp + wordrec/gradechop.h wordrec/heuristic.cpp wordrec/heuristic.h + wordrec/makechop.cpp wordrec/makechop.h wordrec/measure.h + wordrec/metrics.cpp wordrec/metrics.h wordrec/olutil.cpp + wordrec/olutil.h wordrec/pieces.cpp wordrec/pieces.h + wordrec/plotseg.cpp wordrec/plotseg.h wordrec/seam.cpp + wordrec/seam.h wordrec/split.cpp wordrec/split.h wordrec/tally.cpp + wordrec/tally.h +Copyright: Copyright 1987, Hewlett-Packard Company +License: Apache-2.0 + +Files: classify/adaptive.cpp classify/adaptive.h classify/adaptmatch.cpp + classify/adaptmatch.h classify/blobclass.cpp classify/blobclass.h + classify/cluster.cpp classify/cluster.h classify/clusttool.cpp + classify/clusttool.h classify/cutoffs.cpp classify/cutoffs.h + classify/extract.cpp classify/extract.h classify/featdefs.cpp + classify/featdefs.h classify/flexfx.cpp classify/flexfx.h + classify/float2int.cpp classify/float2int.h classify/fpoint.cpp + classify/fpoint.h classify/fxdefs.cpp classify/fxdefs.h + classify/intfx.cpp classify/intfx.h classify/intmatcher.cpp + classify/intmatcher.h classify/intproto.cpp classify/intproto.h + classify/kdtree.cpp classify/kdtree.h classify/mf.cpp + classify/mfdefs.cpp classify/mfdefs.h classify/mf.h + classify/mfoutline.cpp classify/mfoutline.h classify/mfx.cpp + classify/mfx.h classify/normfeat.cpp classify/normfeat.h + lassify/normmatch.cpp classify/normmatch.h classify/ocrfeatures.cpp + classify/ocrfeatures.h classify/outfeat.cpp classify/outfeat.h + classify/picofeat.cpp classify/picofeat.h classify/sigmenu.cpp + classify/sigmenu.h classify/speckle.cpp classify/speckle.h + classify/xform2d.cpp classify/xform2d.h cutil/bitvec.cpp + cutil/bitvec.h cutil/danerror.cpp cutil/danerror.h cutil/efio.cpp + cutil/efio.h cutil/emalloc.h cutil/funcdefs.h cutil/general.h + cutil/minmax.h cutil/oldheap.cpp cutil/oldheap.h dict/matchdefs.h + dict/stopper.cpp dict/stopper.h training/cnTraining.cpp + training/mergenf.cpp training/mergenf.h training/mfTraining.cpp + training/name2char.h wordrec/badwords.cpp wordrec/badwords.h + wordrec/mfvars.cpp wordrec/mfvars.h +Copyright: Copyright Hewlett-Packard Company, 1988 +License: Apache-2.0 + +Files: ccutil/host.h +Copyright: Copyright Hewlett-Packard Company, 1988-1996 +License: Apache-2.0 + +Files: cutil/globals.cpp ccstruct/blobs.cpp ccstruct/blobs.h + ccstruct/vecfuncs.cpp ccstruct/vecfuncs.h classify/fxid.h + cutil/debug.cpp cutil/debug.h cutil/globals.h cutil/tordvars.h + cutil/variables.cpp cutil/variables.h dict/choices.cpp dict/choices.h + dict/permute.cpp dict/permute.h wordrec/djmenus.cpp + wordrec/msmenus.cpp wordrec/outlines.cpp wordrec/outlines.h + wordrec/plotedges.cpp wordrec/plotedges.h wordrec/render.cpp + wordrec/render.h ccmain/expandblob.cpp ccutil/errcode.cpp ccutil/globaloc.cpp +Copyright: Copyright 1989, Hewlett-Packard Company +License: Apache-2.0 + +Files: classify/extern.h cutil/freelist.h cutil/structures.cpp + cutil/structures.h cutil/tordvars.cpp dict/context.cpp dict/context.h + dict/states.cpp dict/states.h wordrec/associate.cpp + wordrec/associate.h wordrec/bestfirst.cpp wordrec/bestfirst.h + wordrec/djmenus.h wordrec/matchtab.cpp wordrec/matchtab.h + wordrec/matrix.cpp wordrec/matrix.h wordrec/msmenus.h + wordrec/wordclass.cpp wordrec/wordclass.h ccutil/basedir.cpp + ccutil/basedir.h ccutil/errcode.h ccutil/fileerr.h ccutil/globaloc.h + ccutil/lsterr.h ccutil/memryerr.h ccutil/memry.h ccutil/serialis.cpp + ccutil/serialis.h ccutil/stderr.h image/imgerrs.h image/img.h + image/imgio.cpp image/imgio.h image/imgs.cpp image/imgs.h + image/imgtiff.cpp image/imgtiff.h image/imgunpk.h +Copyright: Copyright 1990, Hewlett-Packard Company +License: Apache-2.0 + +Files: ccmain/pgedit.cpp textord/blkocc.cpp textord/blkocc.h + ccmain/charcut.h ccmain/pagewalk.cpp ccmain/pagewalk.h + ccmain/tessio.h ccstruct/blckerr.h ccstruct/blread.cpp + ccstruct/blread.h ccstruct/coutln.cpp ccstruct/coutln.h + ccstruct/crakedge.h ccstruct/genblob.cpp ccstruct/genblob.h + ccstruct/ipoints.h ccstruct/linlsq.cpp ccstruct/linlsq.h + ccstruct/mod128.cpp ccstruct/mod128.h ccstruct/ocrblock.cpp + ccstruct/ocrblock.h ccstruct/ocrrow.cpp ccstruct/ocrrow.h + ccstruct/pdblock.cpp ccstruct/pdblock.h ccstruct/points.cpp + ccstruct/points.h ccstruct/polyblob.cpp ccstruct/polyblob.h + ccstruct/polyvert.cpp ccstruct/polyvert.h ccstruct/poutline.cpp + ccstruct/poutline.h ccstruct/quadratc.cpp ccstruct/quadratc.h + ccstruct/quspline.cpp ccstruct/quspline.h ccstruct/rect.cpp + ccstruct/rect.h ccstruct/statistc.cpp ccstruct/statistc.h + ccstruct/stepblob.cpp ccstruct/stepblob.h ccstruct/werd.cpp + ccstruct/werd.h ccutil/bits16.cpp ccutil/bits16.h ccutil/clst.cpp + ccutil/clst.h ccutil/elst2.cpp ccutil/elst2.h ccutil/elst.cpp + ccutil/elst.h ccutil/mainblk.cpp ccutil/mainblk.h ccutil/ndminx.h + ccutil/strngs.cpp ccutil/strngs.h ccutil/varable.cpp ccutil/varable.h + image/bitstrm.cpp image/bitstrm.h textord/drawedg.cpp + textord/drawedg.h textord/edgblob.cpp textord/edgblob.h + textord/edgloop.cpp textord/edgloop.h textord/scanedg.cpp + textord/scanedg.h textord/tessout.h +Copyright: Copyright 1991, Hewlett-Packard Ltd +License: Apache-2.0 + +Files: ccmain/adaptions.h ccmain/callnet.cpp ccmain/callnet.h + ccmain/charcut.cpp ccmain/control.cpp ccmain/control.h + ccmain/fixxht.cpp ccmain/fixxht.h ccmain/imgscale.cpp + ccmain/imgscale.h ccmain/reject.cpp ccmain/reject.h + ccmain/scaleimg.cpp ccmain/scaleimg.h ccmain/tessbox.cpp + ccmain/tessbox.h ccmain/tessedit.cpp ccmain/tessedit.h + ccmain/tessembedded.cpp ccmain/tessembedded.h + ccmain/tesseractmain.cpp ccmain/tesseractmain.h ccmain/tessvars.cpp + ccmain/tessvars.h ccmain/tfacep.h ccmain/tfacepp.cpp ccmain/tfacepp.h + ccmain/tstruct.cpp ccmain/tstruct.h ccmain/werdit.cpp ccmain/werdit.h + ccstruct/blobbox.cpp ccstruct/blobbox.h ccstruct/lmedsq.cpp + ccstruct/lmedsq.h ccstruct/normalis.cpp ccstruct/normalis.h + ccstruct/pageres.cpp ccstruct/pageres.h ccstruct/pdclass.h + ccstruct/ratngs.cpp ccstruct/ratngs.h ccutil/hashfn.cpp + ccutil/hashfn.h ccutil/memblk.cpp ccutil/memblk.h ccutil/memry.cpp + cmain/adaptions.cpp textord/drawtord.cpp textord/drawtord.h + textord/makerow.cpp textord/makerow.h textord/pithsync.cpp + textord/pithsync.h textord/pitsync1.cpp textord/pitsync1.h + textord/tordmain.cpp textord/tordmain.h textord/wordseg.cpp + textord/wordseg.h wordrec/drawfx.cpp wordrec/drawfx.h + wordrec/tessinit.cpp wordrec/tessinit.h wordrec/tface.cpp +Copyright: Copyright 1992, Hewlett-Packard Ltd +License: Apache-2.0 + +Files: ccmain/applybox.cpp ccmain/applybox.h ccmain/blobcmp.cpp + ccmain/blobcmp.h ccmain/charsample.cpp ccmain/matmatch.cpp + ccmain/matmatch.h ccmain/paircmp.cpp ccmain/paircmp.h + ccstruct/labls.cpp ccstruct/labls.h ccstruct/polyaprx.cpp + ccstruct/polyaprx.h ccstruct/polyblk.cpp ccstruct/polyblk.h + ccstruct/quadlsq.cpp ccstruct/quadlsq.h ccstruct/rwpoly.cpp + ccstruct/rwpoly.h ccstruct/txtregn.cpp ccstruct/txtregn.h + textord/blobcmpl.h textord/fpchop.cpp textord/fpchop.h + textord/oldbasel.cpp textord/oldbasel.h textord/sortflts.cpp + textord/sortflts.h textord/topitch.cpp textord/topitch.h + textord/tovars.cpp textord/tovars.h wordrec/charsample.h + ccmain/fixspace.cpp ccmain/fixspace.h +Copyright: Copyright 1993, Hewlett-Packard Ltd +License: Apache-2.0 + +Files: ccmain/docqual.cpp ccmain/docqual.h ccmain/output.cpp ccmain/output.h + ccstruct/rejctmap.cpp ccstruct/rejctmap.h textord/underlin.cpp + textord/underlin.h +Copyright: Copyright 1994, Hewlett-Packard Ltd +License: Apache-2.0 + +Files: ccutil/nwmain.h ccutil/tessopt.cpp ccutil/tessopt.h ccutil/tprintf.cpp + ccutil/tprintf.h +Copyright: Copyright 1995, Hewlett-Packard Co +License: Apache-2.0 + +Files: ccstruct/callcpp.cpp ccstruct/hpddef.h ccutil/debugwin.cpp + ccutil/debugwin.h ccutil/notdll.h ccutil/ocrclass.h + ccutil/ocrshell.cpp ccutil/ocrshell.h cutil/callcpp.h +Copyright: Copyright 1996, Hewlett-Packard Co +License: Apache-2.0 + +Files: image/imgbmp.cpp image/imgbmp.h +Copyright: Copyright 1998, Ray Smith +License: Apache-2.0 + +Files: ccmain/baseapi.cpp ccutil/scanutils.cpp ccutil/scanutils.h + image/svshowim.cpp image/svshowim.h ccutil/unichar.cpp + ccutil/unichar.h ccutil/unicharmap.cpp ccutil/unicharmap.h + ccutil/unicharset.cpp ccutil/unicharset.h + training/unicharset_extractor.cpp training/wordlist2dawg.cpp +Copyright: Copyright 2006, Google Inc +License: Apache-2.0 + +Files: tessdll.cpp tessdll.h +Copyright: Copyright 2007, Jetsoftdev +License: Apache-2.0 --- tesseract-2.04.orig/debian/unicharset_extractor.1 +++ tesseract-2.04/debian/unicharset_extractor.1 @@ -0,0 +1,32 @@ +.TH UNICHARSET_EXTRACTOR 1 "August 21, 2007" +.SH NAME +tesseract \- command line OCR tool +.SH SYNOPSIS +Part of the process to train tesseract for a new language. Tesseract needs to know the set of possible characters it can output. To generate the unicharset data file, use the unicharset_extractor program on the training pages bounding box files: +.PP +.B unicharset_extractor +.RI "fontfile_1.box fontfile_2.box ..." +.SH DESCRIPTION +This manual page documents briefly the +.B unicharset_extractor +command. +.PP +\fBtesseract\fP is a commercial quality OCR engine originally developed at +HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated +by UNLV. It was open-sourced by HP and UNLV in 2005. +.PP +Tesseract needs to have access to character properties isalpha, isdigit, isupper, islower. This data must be encoded in the unicharset data file. Each line of this file corresponds to one character. The character in UTF-8 is followed by a hexadecimal number representing a binary mask that encodes the properties. Each bit corresponds to a property. If the bit is set to 1, it means that the property is true. The bit ordering is (from least significant bit to most significant bit): isalpha, islower, isupper, isdigit. +.PP +.SH SEE ALSO +.BR feh (1), +.BR convert (1), +.BR mftraining (1), +.BR cntraining (1), +.BR tesseract (1), +.BR wordlist2dawg (1). +.br +.SH AUTHOR +tesseract was written by Ray Smith. +.PP +This manual page was written by Jeffrey Ratcliffe , +for the Debian project (but may be used by others). --- tesseract-2.04.orig/debian/README.source +++ tesseract-2.04/debian/README.source @@ -0,0 +1,59 @@ +This package uses quilt to manage all modifications to the upstream +source. Changes are stored in the source package as diffs in +debian/patches and applied during the build. + +To configure quilt to use debian/patches instead of patches, you want +either to export QUILT_PATCHES=debian/patches in your environment +or use this snippet in your ~/.quiltrc: + + for where in ./ ../ ../../ ../../../ ../../../../ ../../../../../; do + if [ -e ${where}debian/rules -a -d ${where}debian/patches ]; then + export QUILT_PATCHES=debian/patches + fi + done + +To get the fully patched source after unpacking the source package, cd to +the root level of the source package and run: + + quilt push -a + +The last patch listed in debian/patches/series will become the current +patch. + +To add a new set of changes, first run quilt push -a, and then run: + + quilt new + +where is a descriptive name for the patch, used as the filename in +debian/patches. Then, for every file that will be modified by this patch, +run: + + quilt add + +before editing those files. You must tell quilt with quilt add what files +will be part of the patch before making changes or quilt will not work +properly. After editing the files, run: + + quilt refresh + +to save the results as a patch. + +Alternately, if you already have an external patch and you just want to +add it to the build system, run quilt push -a and then: + + quilt import -P /path/to/patch + quilt push -a + +(add -p 0 to quilt import if needed). as above is the filename to +use in debian/patches. The last quilt push -a will apply the patch to +make sure it works properly. + +To remove an existing patch from the list of patches that will be applied, +run: + + quilt delete + +You may need to run quilt pop -a to unapply patches first before running +this command. + + -- Jeffrey Ratcliffe , Thu, 17 Dec 2009 21:48:48 +0100 --- tesseract-2.04.orig/debian/mftraining.1 +++ tesseract-2.04/debian/mftraining.1 @@ -0,0 +1,31 @@ +.TH MFTRAINING 1 "August 21, 2007" +.SH NAME +tesseract \- command line OCR tool +.SH SYNOPSIS +Part of the process to train tesseract for a new language. When the character features of all the training pages have been extracted, we need to cluster them to create the prototypes. The character shape features can be clustered using the mftraining and cntraining programs: +.PP +.B mftraining +.RI "fontfile_1.tr fontfile_2.tr ..." +.PP +This will output two data files: inttemp (the shape prototypes) and pffmtable (the number of expected features for each character). (A third file called Microfeat is also written by this program, but it is not used.) +.SH DESCRIPTION +This manual page documents briefly the +.B mftraining +command. +.PP +\fBtesseract\fP is a commercial quality OCR engine originally developed at +HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated +by UNLV. It was open-sourced by HP and UNLV in 2005. +.SH SEE ALSO +.BR feh (1), +.BR convert (1), +.BR tesseract (1), +.BR cntraining (1), +.BR unicharset_extractor (1), +.BR wordlist2dawg (1). +.br +.SH AUTHOR +tesseract was written by Ray Smith. +.PP +This manual page was written by Jeffrey Ratcliffe , +for the Debian project (but may be used by others). --- tesseract-2.04.orig/debian/tesseract-ocr.manpages +++ tesseract-2.04/debian/tesseract-ocr.manpages @@ -0,0 +1,6 @@ +debian/tesseract.1 +debian/mftraining.1 +debian/cntraining.1 +debian/unicharset_extractor.1 +debian/wordlist2dawg.1 + --- tesseract-2.04.orig/debian/watch +++ tesseract-2.04/debian/watch @@ -0,0 +1,3 @@ +version=3 +http://code.google.com/p/tesseract-ocr/downloads/list http://tesseract-ocr.googlecode.com/files/tesseract-(.*)\.ta?r?\.?gz + --- tesseract-2.04.orig/debian/tesseract-ocr-dev.install +++ tesseract-2.04/debian/tesseract-ocr-dev.install @@ -0,0 +1,3 @@ +usr/lib/* +usr/include/tesseract/*.h + --- tesseract-2.04.orig/debian/docs +++ tesseract-2.04/debian/docs @@ -0,0 +1 @@ +README --- tesseract-2.04.orig/debian/tesseract.1 +++ tesseract-2.04/debian/tesseract.1 @@ -0,0 +1,44 @@ +.TH TESSERACT 1 "December 16, 2009" +.SH NAME +tesseract \- command line OCR tool +.SH SYNOPSIS +.B tesseract +.RI "imagename outputbase [configfile] [-l ]" +.SH DESCRIPTION +This manual page documents briefly the +.B tesseract +command. +.PP +\fBtesseract\fP is a commercial quality OCR engine originally developed at +HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated +by UNLV. It was open-sourced by HP and UNLV in 2005. +.SH OPTIONS +.RI imagename +must be a TIF image with a .tif extension. +.P +.RI outputbase +is the text file created with the OCR output +.P +.RI configfile +is a file of control parameters used for debugging or modifying tesseract's behaviour. +They are stored in +.RI /usr/share/tesseract-ocr/tessdata/configs/ +.P +The +.RI "-l " +option must come last. At the time of writing, there are language packages available for +English (eng), German (deu), German fraktur (deu-f), French (fra), Italian (ita), Dutch +(nld), Portuguese (por), Spanish (spa), and Vietnamese (vie). +.SH SEE ALSO +.BR feh (1), +.BR convert (1), +.BR mftraining (1), +.BR cntraining (1), +.BR unicharset_extractor (1), +.BR wordlist2dawg (1). +.br +.SH AUTHOR +tesseract was written by Ray Smith. +.PP +This manual page was written by Jeffrey Ratcliffe , +for the Debian project (but may be used by others). --- tesseract-2.04.orig/debian/changelog +++ tesseract-2.04/debian/changelog @@ -0,0 +1,111 @@ +tesseract (2.04-2.1) unstable; urgency=low + + * Non-maintainer upload. + * Bump build-dependency on quilt to >= 0.46-7~. + * Disable xterm-based debug windows (closes: #612032, LP: #607297). Thanks + to Kees Cook for the bug report. + + -- Jakub Wilk Thu, 10 Feb 2011 16:35:45 +0100 + +tesseract (2.04-2) unstable; urgency=low + + * Fix FTBFS with gcc4.4 (Closes: #504885) + * Changed language dependency to tesseract-ocr-eng | tesseract-ocr-language + (Closes: #464085) + * Bumped standards to 3.8.3 (no changes needed) + * Updated debhelper build dependency to 7.0.50~ as override_dh_ targets are + used + * Added README.source + * Improved manpage (Closes: #551522) + + -- Jeffrey Ratcliffe Fri, 16 Dec 2009 17:35:24 +0100 + +tesseract (2.04-1) unstable; urgency=low + + * New upstream version (Closes: #484052) + * Added -fPIC to CFLAGS + * Removed --as-needed from LDFLAGS + * Bumped standards to 3.8.2 (no changes needed) + * Adapted java patch to fix distclean target + * Moved to dh7 + * Added watch file + * Updated copyright file according to http://dep.debian.net/deps/dep5/ + + -- Jeffrey Ratcliffe Fri, 03 Jul 2009 23:35:24 +0200 + +tesseract (2.03-3) unstable; urgency=low + + * Patch wordlist2dawg + * Bumped standards + * Fixed lintian errors in copyright + + -- Jeffrey Ratcliffe Thu, 15 Aug 2008 23:59:00 +0200 + +tesseract (2.03-2) unstable; urgency=low + + * Patch ccmain/baseapi.cpp to allow use with ocropus (Closes: #483896) + + -- Jeffrey Ratcliffe Thu, 12 Jun 2008 23:17:00 +0200 + +tesseract (2.03-1) unstable; urgency=low + + * Initial release of 2.03 (Closes: #478556) + * Switch to quilt for managing patches + * Patch java/makefile to fix install and distclean targets + * Patch ccutil/Makefile.* to fix redefine warnings (Closes: #455397) + * Patch viewer/scrollview.cpp, viewer/svmnode.cpp & viewer/svutil.cpp + to fix FTBFS with gcc 4.3 + * Corrected debian/copyright (thanks Winnie) + + -- Jeffrey Ratcliffe Tue, 22 Apr 2008 20:35:09 +0200 + +tesseract (2.01-4) unstable; urgency=low + + * + libtiff dependency (Closes: #459811) + * Updated description (Closes: #418991) + * Bumped standards + * + Uploaders: Gürkan Sengün + * + XS-DM-Upload-Allowed: yes + + -- Jeffrey Ratcliffe Tue, 08 Jan 2008 22:10:17 +0100 + +tesseract (2.01-3) unstable; urgency=low + + * - Recommends: (Closes: #451865) + + -- Jeffrey Ratcliffe Tue, 20 Nov 2007 21:14:26 +0100 + +tesseract (2.01-2) unstable; urgency=low + + * + Replaces: tesseract-ocr-data (Closes: #451042) + + -- Jeffrey Ratcliffe Thu, 15 Nov 2007 20:16:59 +0100 + +tesseract (2.01-1) unstable; urgency=low + + * Initial release of 2.01 (Closes: #434152) + * Applied tesseract-2.01.patch1.tar.gz + * Changed packaging licence to GPLv3 + + -- Jeffrey Ratcliffe Sat, 20 Oct 2007 09:07:28 +0200 + +tesseract (1.02-3) unstable; urgency=medium + + * Applied patch of Bryan Stillwell to fix + FTBFS on 64 bit arches. (Closes: #398379) + + -- Gürkan Sengün Mon, 11 Dec 2006 11:23:00 +0100 + +tesseract (1.02-2) unstable; urgency=low + + * Applied patch to fix tessdata directory access. (Closes: #400183) + * Split the data to a data package. + + -- Gürkan Sengün Mon, 27 Nov 2006 11:11:31 +0100 + +tesseract (1.02-1) unstable; urgency=low + + * Initial release. (Closes: #390204) + + -- Gürkan Sengün Mon, 9 Oct 2006 17:15:29 +0200 + --- tesseract-2.04.orig/debian/rules +++ tesseract-2.04/debian/rules @@ -0,0 +1,13 @@ +#!/usr/bin/make -f +CFLAGS = -Wall -g -fPIC -DTESSDATA_PREFIX=/usr/share/tesseract-ocr/ +%: + dh --with quilt $@ + +override_dh_auto_test: + +override_dh_auto_clean: + dh_auto_clean + dh_clean java/com/Makefile java/com/google/Makefile java/com/google/scrollview/Makefile java/com/google/scrollview/events/Makefile java/com/google/scrollview/ui/Makefile + +override_dh_auto_configure: + ./configure --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE) --prefix=/usr --mandir=\$${prefix}/share/man --infodir=\$${prefix}/share/info CFLAGS="$(CFLAGS)" CXXFLAGS="$(CFLAGS)" LDFLAGS="-Wl,-z,defs" --- tesseract-2.04.orig/debian/patches/java +++ tesseract-2.04/debian/patches/java @@ -0,0 +1,29 @@ +# Description: Fixes FTBFS due to distclean not working in java directory +# Origin: Adapted from 2.03-1 +Index: tesseract-2.04/java/Makefile.in +=================================================================== +--- tesseract-2.04.orig/java/Makefile.in 2009-07-01 00:24:19.000000000 +0200 ++++ tesseract-2.04/java/Makefile.in 2009-07-04 00:06:08.000000000 +0200 +@@ -402,7 +402,7 @@ + + clean-am: clean-generic mostlyclean-am + +-distclean: distclean-recursive ++distclean: + -rm -f Makefile + distclean-am: clean-am distclean-generic distclean-tags + +Index: tesseract-2.04/java/makefile +=================================================================== +--- tesseract-2.04.orig/java/makefile 2009-07-04 00:16:06.000000000 +0200 ++++ tesseract-2.04/java/makefile 2009-07-04 00:16:52.000000000 +0200 +@@ -47,6 +47,9 @@ + clean : + rm -f ScrollView.jar *.class + ++distclean: clean ++ -rm -f Makefile ++ + # all-am does nothing, to make the java part optional. + all all-am install : + --- tesseract-2.04.orig/debian/patches/gcc4.4 +++ tesseract-2.04/debian/patches/gcc4.4 @@ -0,0 +1,14 @@ +# Description: Fixes FTBFS for gcc4.4 +# Origin: From #504885 +Index: tesseract-2.04/viewer/svutil.cpp +=================================================================== +--- tesseract-2.04.orig/viewer/svutil.cpp 2009-06-03 18:29:38.000000000 +0200 ++++ tesseract-2.04/viewer/svutil.cpp 2009-12-17 17:38:10.865904977 +0100 +@@ -43,6 +43,7 @@ + #endif + + #include ++#include + + const int kBufferSize = 65536; + const int kMaxMsgSize = 4096; --- tesseract-2.04.orig/debian/patches/series +++ tesseract-2.04/debian/patches/series @@ -0,0 +1,3 @@ +gcc4.4 +java +debugwin-xterm --- tesseract-2.04.orig/debian/patches/debugwin-xterm +++ tesseract-2.04/debian/patches/debugwin-xterm @@ -0,0 +1,20 @@ +Description: Disable xterm-based debug windows. +Author: Jakub Wilk +Bug-Debian: http://bugs.debian.org/612032 +Bug-Ubuntu: http://launchpad.net/bugs/607297 +Bug: http://code.google.com/p/tesseract-ocr/issues/detail?id=448 +Forwarded: no +Last-Update: 2011-02-10 + +--- a/ccutil/debugwin.cpp ++++ b/ccutil/debugwin.cpp +@@ -23,7 +23,8 @@ + + DLLSYM INT_VAR (debug_lines, 256, "Number of lines in debug window"); + +-#ifndef GRAPHICS_DISABLED ++#if 0 ++/* Disabled for Debian */ + + #ifdef __MAC__ + #include