Comment 9 for bug 933162

Revision history for this message
Jeff Breidenbach (jeff-jab) wrote : Re: Sync tesseract 3.02.01-1 (universe) from Debian sid (main)

>I see on the linked Debian bug that sikuli has reported worse performance with this new series.
>Is that not a concern or is their usecase no longer as well supported?

Tesseract upstream is in communication with Sikuli upstream. a 10% drop in recognition
performance is considered acceptable by Sikuli upstream. Additionally, future releases
of Sikuli may remove that penalty now that the two upstreams are in communication.
Here is the relevant quote from Sikuli upstream Tsung-Hsiang (Sean) Chang.

  "The main reason we aren't not switching to tesseract 3 in an official release is
  that its recognition performance is worse than 2.04 in our dataset. (Not very bad,
  about 10% worse as I recall.) So I think it's fine to wrap the tesseract 3 branch for
  Debian sid."

>Could you please give an explicit list of all packages to be synced?

Appended.

>I must admit to being a bit concerned about the way that ocropus was broken
>without apparently warning its maintainer too, especially given that there is
>no replacement available yet.

This is a reasonable concern. I assume you are referring to
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=659597

From an etiquette perspective, I have been in email communication (8 threads
in the last 16 days) with Jeffrey Ratcliffe, who is the first listed maintainer for
both Tesseract and Ocropus. I have been in bug tracking communication (7 bugs)
with Jakub Wilk over the same period. Jakub has been incredibly helpful by filing
those packaging bugs. That said, my strategy was to - with blessing from my
co-maintainer - bring Tesseract 3 into Debian unstable, then find and fix problems
as quickly as possible. I apologize for causing surprise.

However, that leaves the issue of Ocropus. If Ubuntu 12.04 accepts Tesseract 3, it
will lose Ocropus. I respect Ubuntu's decision whichever way it goes. Please
consider the number of users affected on either side, and also Ocropus upstream
Tom Breuel's comments.

   "the version of OCRopus that has been packaged is completely outdated. OCRopus
  is now a set of Python libraries with a little bit of C++ in each. The complete final
  package structure isn't settled yet, but I want different components to be fairly
  independent of each other. Now, during my sabbatical, I've finally had time to actually
  work on it more than just a little on the side. The best thing for Debian probably would
  be to discontinue the current packaging for OCRopus and start over again when the
  new release is out."

Thank you for your consideration.

=========

Full list of source packages to remove:

ocropus
tesseract-ocr-deu-f

Full list of non-source packages to remove (maybe this goes away automatically):

tesseract-ocr-dev

Full list of source packages to sync (note lack of tesseract-lat-lid):

sikuli
tesseract
tesseract-afr
tesseract-ara
tesseract-aze
tesseract-bel
tesseract-ben
tesseract-bul
tesseract-cat
tesseract-ces
tesseract-chi-sim
tesseract-chi-tra
tesseract-chr
tesseract-dan
tesseract-deu
tesseract-deu-frak
tesseract-ell
tesseract-eng
tesseract-enm
tesseract-epo
tesseract-equ
tesseract-est
tesseract-eus
tesseract-fin
tesseract-fra
tesseract-frk
tesseract-frm
tesseract-glg
tesseract-heb
tesseract-hin
tesseract-hrv
tesseract-hun
tesseract-ind
tesseract-isl
tesseract-ita
tesseract-ita-old
tesseract-jpn
tesseract-kan
tesseract-kor
tesseract-lav
tesseract-lit
tesseract-mal
tesseract-mkd
tesseract-mlt
tesseract-msa
tesseract-nld
tesseract-nor
tesseract-osd
tesseract-pol
tesseract-por
tesseract-ron
tesseract-rus
tesseract-slk
tesseract-slk-frak
tesseract-slv
tesseract-spa
tesseract-spa-old
tesseract-sqi
tesseract-srp
tesseract-swa
tesseract-swe
tesseract-tam
tesseract-tel
tesseract-tgl
tesseract-tha
tesseract-tur
tesseract-ukr
tesseract-vie