No Package(s) for Language-specific Stemming Dictionary and Affix Files

Bug #301770 reported by Duncan McGreggor
4
Affects Status Importance Assigned to Milestone
Hardy Backports
Won't Fix
Undecided
Unassigned
postgresql-8.3 (Ubuntu)
Fix Released
Undecided
Martin Pitt
postgresql-common (Ubuntu)
Fix Released
Medium
Martin Pitt

Bug Description

Binary package hint: postgresql-common

Currently, PostgreSQL 8.3 full text search only provides simple stemming support by default. postgresql-common does not install files needed to support full stemming. In order for Ubuntu-packaged PostgreSQL to support full stemming, ispell (or myspell or hunspell) dictionary and affix files for the desired languages need to be installed. They need to be UTF-8 files, and as of now, they need to be installed in the postgres "tsearch_data" directory.

If packaging support was provided, then full text search with improved stemming could be supported in environments that require Ubuntu packages for all software/source code installations.

Revision history for this message
Martin Pitt (pitti) wrote :

As per our email discussion:

  - can't directly use hunspell directories in /usr/share/myspell/dicts/, since they are often not UTF-8 encoded, which is required for PostgreSQL

  - p-common gets a dpkg trigger which iconvs available hunspell dictionaries to /var/lib/postgresql/dicts/

  - p-common gets test cases based on Duncan's Launchpad-private scripts.

  - p-8.3 gets a patch which falls back to /var/lib/postgresql/dicts/ if no available dictionary is found in the postgres tsearch-data/ directory.

Changed in postgresql-common:
assignee: nobody → pitti
importance: Undecided → Medium
status: New → In Progress
Martin Pitt (pitti)
Changed in postgresql-8.3:
assignee: nobody → pitti
status: New → In Progress
Revision history for this message
Martin Pitt (pitti) wrote :

Fixed in bzr trunk.

For postgresql-8.3 I have a working patch, just its upstream inclusion still needs to be discussed.

Changed in postgresql-common:
status: In Progress → Fix Committed
Changed in postgresql-8.3:
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

For the record, upstream wasn't happy with my original approach, so I updated the patch and the -common infrastructure, and sent it upstream again. I'll wait for some more feedback before I upload, just to avoid having people actually put it in use and then having it to change later.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package postgresql-common - 95

---------------
postgresql-common (95) experimental; urgency=low

  * Add automatic building of PostgreSQL tsearch/stem dictionaries:
    - Add pg_updatedicts: Build dictionaries and affix files from installed
      hunspell/myspell dictionary packages.
    - Add t/150_tsearch_stemming.t: Test cases for pg_updatedicts, tsearch
      functionality, and word stem handling.
    - t/001_packages.t: Ensure that hunspell-en-us is installed, above new
      test relies on it.
    - debian/postgresql-common.install: Install pg_updatedicts.
    - debian/rules: Create man page from pg_udpatedicts POD.
    - Add debian/postgresql-common.triggers: Register interest on
      /usr/share/myspell/dicts.
    - debian/postgresql-common.postinst: Call pg_updatedicts on upgrade to
      this version, fresh install, and our trigger.
    - debian/postgresql-common.postrm: Remove /var/cache/postgresql on purge.
    - (LP: #301770)

postgresql-common (94) unstable; urgency=low

  * t/070_non_postgres_clusters.t: Test that all cluster configuration files
    are owned by the cluster superuser. Reproduces #481349.
  * pg_createcluster: Make the cluster configuration directory, "start.conf",
    and "environment" owned by the cluster superuser instead of root.
    (Closes: #481349)
  * t/030_errors.t: Check behaviour of starting of clusters with colliding
    ports. Reproduces #472627.
  * pg_ctlcluster: Error out with a port collision message if another cluster
    is already running on the port. (Closes: #472627)
  * t/090_multicluster.t: Don't reconfigure cluster on conflicting port, since
    that now fails with above fix.

 -- Martin Pitt <email address hidden> Sat, 06 Dec 2008 11:35:52 -0800

Changed in postgresql-common:
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package postgresql-8.3 - 8.3.5-2

---------------
postgresql-8.3 (8.3.5-2) experimental; urgency=low

  * Add 15-dict-fallback-dir.patch: If a tsearch/stem dictionary is
    not found in sharedir/tsearch_data/ll_cc.{dict,affix}, fall back
    to sharedir/tsearch_data/system_ll_cc.{dict,affix}, where
    postgresql-common creates them from system directories. (LP: #301770)

 -- Martin Pitt <email address hidden> Sat, 06 Dec 2008 11:39:31 -0800

Changed in postgresql-8.3:
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Subscribed backports team for approval of a hardy backport of both packages. I keep -common and all the server packages backportable all the time.

Revision history for this message
Martin Pitt (pitti) wrote :

Duncan, I assume you need this for hardy, not for dapper? The -8.3 package needs a small modification to be backportable to dapper, since dapper didn't have the new python world order yet.

Dan Streetman (ddstreet)
Changed in hardy-backports:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.