[MIR] mecab

Bug #1781529 reported by Lars Tangvald
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mecab (Ubuntu)
Fix Released
High
Unassigned
mecab-ipadic (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Note: Both mecab and MySQL have a dependency on mecab-ipadic, so the request is for both mecab and mecab-ipadic

[Availability]

mecab and mecab-ipadic are in universe/misc (they are part of the same piece of software with upstream source https://github.com/taku910/mecab, but separate source packages)

[Rationale]

Mecab is used by MySQL for Japanese text parsing (https://dev.mysql.com/doc/refman/8.0/en/fulltext-search-mecab.html), and is a build-dependency for Upstream/Debian MySQL packages. Since it is currently in universe, we maintain a diff for Ubuntu that removes this dependency (also removing the plugin). Including mecab and mecab-ipadic in main would let us include the text parsing features and allow us to drop the diff between Debian and Ubuntu for MySQL.
The package is a simple language parsing tool that has very few changes (but packaging is kept up to date by the mantainers).

[Security]

- Only found CVE was http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-3231 (affecting versions older than 0.96). This was patched in 0.95 package before version update.
- No security relevant binaries

[Quality assurance]

- The packages are very simple, being just simple text parser util and csv files.
- Only outstanding bug is https://bugs.launchpad.net/ubuntu/+source/mecab/+bug/1469406
- Package itself hasn't changed since 2013, but packaging is kept up-to-date by maintainers
- Package is missing d/watch and|or d/README.source

[Dependencies]

mecab depends on mecab-ipadic. Other dependencies are in main

[Standards compliance]

No issues reported by lintian, and package is synced directly from Debian.

[Maintenance]

Both packages are maintained by Natural Language Processing (Japanese) <email address hidden>
Last few years have had little maintainer work except for keeping packaging up-to-date (new standards versions, vcs urls etc.).

[Background information]

- The mecab package provides Japanese text parsing (through library or stdin interface)
- mecab-ipadic is an international phonetic alphabet (ipa) dictionary for mecab

CVE References

description: updated
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

These packages appear to be missing a subscribing team, as is required for MIRs.

Have you discussed including these packages in main with the server team? I've subscribed them to this bug to have their opinion on whether they are okay with the added effort of looking after mecab, mecab-ipadic, considering it could reduce the work on mysql packages.

Revision history for this message
David Britton (dpb) wrote :

+1 for maintenance from an ubuntu-server perspective.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

mecab-ipadic reviewed; it's basically only data in EUC-JP format, with an additional package that builds from that into UTF-8 format at install time. While that seems to be suboptimal to me, there's no particular objection.

MIR approved for mecab-ipadic.

Changed in mecab-ipadic (Ubuntu):
status: New → Fix Committed
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Let's get a code review for mecab by the security team. It parses data to re-encode...

Changed in mecab (Ubuntu):
assignee: nobody → Ubuntu Security Team (ubuntu-security)
Revision history for this message
Steve Langasek (vorlon) wrote :

This has now found its way into cosmic-proposed via a sync from Debian, and the cosmic release version is missing the last round of security fixes (because of build regressions on ppc64el). It should be a priority for cosmic to either complete this security review and promote mecab, or revert the dependency from mysql-5.7.

Changed in mecab (Ubuntu):
importance: Undecided → High
Revision history for this message
Seth Arnold (seth-arnold) wrote :

As far as I've read the code so far, it looks like overly-complicated pre-C++-11 code: I don't think I've ever seen so many 'new' and 'delete' calls in a source package before. As one concrete example -- there's a StringBuffer class. I can't figure out *why* there's a StringBuffer class, as C++ already has std::string. (It *might* be to make it easier to work with C-strings alongside std::string -- I can't speak to how well or poorly that actually works in C++ -- but I do know that I've never seen a StringBuffer implemented in C++ before.)

So, a few questions:

- Mecab was in our MySQL packages previously. Was it vendored in by Oracle or by Debian?

- I understand Debian is dropping MySQL. Is this merge from Debian our last?

- When Mecab was vendored in to mysql source packages, we could at least examine discovered flaws with knowledge, however poor, of how Mecab was going to be used by exactly one package. With Mecab in main on its own, we may not have that luxury, and may need to support this tool for far more issues than before.

So: if there's no future in syncing MySQL package updates from Debian, is this part of the change actually useful? Does having this separate package benefit anybody? What do we lose by returning to our previous MySQL packages and keeping the tarball updated as Oracle releases them? (Does Oracle actually provide security support for Mecab in this hypothetical configuration?)

Thanks

Revision history for this message
Robie Basak (racb) wrote :

> Mecab was in our MySQL packages previously. Was it vendored in by Oracle or by Debian?

It wasn't vendored at all to my knowledge. It's always been a separate package since I noticed it appearing. In Ubuntu, we've chosen to build MySQL without the mecab plugin due to the component mismatch. This MIR is driven by the wish to get back into sync with Debian - it's the only delta left and we keep having to do package merges for the sake of this one difference, which isn't acceptable to Debian to take.

> I understand Debian is dropping MySQL. Is this merge from Debian our last?

Debian still maintains MySQL in unstable. This is preferable for Ubuntu to more easily coordinate with MariaDB packaging so the two work together correctly in the archive.

> So: if there's no future in syncing MySQL package updates from Debian, is this part of the change actually useful?

We want to continue maintaining MySQL in Debian so that Ubuntu can sync from it. This way MySQL and MariaDB will play together in the archive - both for Debian unstable users and for Ubuntu users.

There are possibilities though. If you're not happy putting mecab into main from a security perspective, I might be able to arrange the packaging to use dpkg-vendor and when building on Ubuntu build a separate binary package, if that's acceptable to archive admins (src:mysql-5.7:debian/control not listing a mysql-plugins-extra-5.7 binary package or similar, dynamically added in debian/rules via dpkg-vendor that contains the built MySQL mecab plugin).

Revision history for this message
Robie Basak (racb) wrote :

Another option is to continue building without mecab on Ubuntu using dpkg-vendor, but we did have at least one request for it from a user.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Hello, I reviewed mecab version 0.996-6 as checked into cosmic. This is
not a full security audit but rather a quick gauge of maintainability.

- One CVE for mecab in our CVE database.
- mecab is a japanese natural language parser

- Build-Depends: debhelper
- Does not daemonize
- Does not do networking
- postinst looks autogenerated
- No initscript
- No systemd unit files
- No dbus services
- No setuid
- mecab-config and mecab binaries in path
- No sudo fragments
- No udev rules
- Some tests run during the build, I didn't investigate their depth
- No cronjobs
- Relatively clean build logs

- No subprocesses spawned
- Memory management is way too manual; I didn't spot any errors but this
  code would benefit from a C++14-aware rewrite.
- Logging looked good
- Environment variables HOME and MECABRC used, looked fine
- No privileged operations
- No cryptography
- No networking
- No privileged portions of code
- No temporary files
- No WebKit
- No JavaScript
- No PolicyKit

Here's some issues I found while reading the code; these may or may not
have security relevance. The misleading error messages are just going to
be annoying for users.

- cppcheck reports uninitialized variable:
  [src/darts.h:117]: (error) Uninitialized struct member: tmp_node.right

- StringBuffer::reserve() doesn't appear to handle irresponsible length
  increases -- it moves the security boundary out to all callers of this
  routine, including StringBuffer::write(const char* str, size_t length)

- dtoa() in ./src/utils.h can be made to overflow the 64 byte buffer
  provided by _DTOA() if called with DBL_MAX or potentially other inputs.
  This is then exposed via StringBuffer operator<<().

- Iconv::convert() in ./src/iconv_utils.cpp looks vulnerable to an integer
  overflow if given a too-long str parameter

- copy() in ./src/dictionary_generator.cpp has a misleading error message
  "permission denied" that may not reflect the actual error.

- genmatrix() in ./src/dictionary_generator.cpp has a misleading error
  message "permission denied" that may not reflect the actual error.

- compile() in ./src/dictionary.cpp has a misleading error message
  "permission denied" that may not reflect the actual error.

I couldn't actually tell what the code *does* but it appears to do a good
job of checking calls for errors, checking inputs where that makes sense,
etc. As much as I'd love to see this moved to a C++-14 style, there's
something to be said for code that also appears to be pretty static.
(CVE-2007-3231 was fixed in mecab version 0.96. Maybe mecab is *too*
static, the webpage I found suggests no new changes since 2013.)

Security team ACK for promoting mecab to main.

Thanks

Changed in mecab (Ubuntu):
assignee: Ubuntu Security Team (ubuntu-security) → nobody
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That was the last bit missing, thereby MIR approved for mecab
Since the changes that pull it in are (currently) not in the archive the state is "in progress" [1], please go on pulling it in with your early merges for 19.04.

[1]: https://wiki.ubuntu.com/MIRTeam#Process_states

Changed in mecab (Ubuntu):
status: New → In Progress
Revision history for this message
Robie Basak (racb) wrote :

Thanks! We'll sync mysql-5.7 early next cycle then, which should pull in mecab.

Revision history for this message
Steve Langasek (vorlon) wrote :

Override component to main
mecab 0.996-6 in disco: universe/misc -> main
libmecab-dev 0.996-6 in disco amd64: universe/libdevel/optional/100% -> main
libmecab-dev 0.996-6 in disco arm64: universe/libdevel/optional/100% -> main
libmecab-dev 0.996-6 in disco armhf: universe/libdevel/optional/100% -> main
libmecab-dev 0.996-6 in disco i386: universe/libdevel/optional/100% -> main
libmecab-dev 0.996-6 in disco ppc64el: universe/libdevel/optional/100% -> main
libmecab-dev 0.996-6 in disco s390x: universe/libdevel/optional/100% -> main
libmecab2 0.996-6 in disco amd64: universe/libs/optional/100% -> main
libmecab2 0.996-6 in disco arm64: universe/libs/optional/100% -> main
libmecab2 0.996-6 in disco armhf: universe/libs/optional/100% -> main
libmecab2 0.996-6 in disco i386: universe/libs/optional/100% -> main
libmecab2 0.996-6 in disco ppc64el: universe/libs/optional/100% -> main
libmecab2 0.996-6 in disco s390x: universe/libs/optional/100% -> main
mecab 0.996-6 in disco amd64: universe/misc/optional/100% -> main
mecab 0.996-6 in disco arm64: universe/misc/optional/100% -> main
mecab 0.996-6 in disco armhf: universe/misc/optional/100% -> main
mecab 0.996-6 in disco i386: universe/misc/optional/100% -> main
mecab 0.996-6 in disco ppc64el: universe/misc/optional/100% -> main
mecab 0.996-6 in disco s390x: universe/misc/optional/100% -> main
mecab-utils 0.996-6 in disco amd64: universe/misc/optional/100% -> main
mecab-utils 0.996-6 in disco arm64: universe/misc/optional/100% -> main
mecab-utils 0.996-6 in disco armhf: universe/misc/optional/100% -> main
mecab-utils 0.996-6 in disco i386: universe/misc/optional/100% -> main
mecab-utils 0.996-6 in disco ppc64el: universe/misc/optional/100% -> main
mecab-utils 0.996-6 in disco s390x: universe/misc/optional/100% -> main
25 publications overridden.

Changed in mecab (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Pinged in #ubuntu-devel about mecab-ipadic

Revision history for this message
Steve Langasek (vorlon) wrote :

Override component to main
mecab-ipadic 2.7.0-20070801+main-2 in disco: universe/misc -> main
mecab-ipadic 2.7.0-20070801+main-2 in disco amd64: universe/misc/optional/100% -> main
mecab-ipadic 2.7.0-20070801+main-2 in disco arm64: universe/misc/optional/100% -> main
mecab-ipadic 2.7.0-20070801+main-2 in disco armhf: universe/misc/optional/100% -> main
mecab-ipadic 2.7.0-20070801+main-2 in disco i386: universe/misc/optional/100% -> main
mecab-ipadic 2.7.0-20070801+main-2 in disco ppc64el: universe/misc/optional/100% -> main
mecab-ipadic 2.7.0-20070801+main-2 in disco s390x: universe/misc/optional/100% -> main
mecab-ipadic-utf8 2.7.0-20070801+main-2 in disco amd64: universe/misc/optional/100% -> main
mecab-ipadic-utf8 2.7.0-20070801+main-2 in disco arm64: universe/misc/optional/100% -> main
mecab-ipadic-utf8 2.7.0-20070801+main-2 in disco armhf: universe/misc/optional/100% -> main
mecab-ipadic-utf8 2.7.0-20070801+main-2 in disco i386: universe/misc/optional/100% -> main
mecab-ipadic-utf8 2.7.0-20070801+main-2 in disco ppc64el: universe/misc/optional/100% -> main
mecab-ipadic-utf8 2.7.0-20070801+main-2 in disco s390x: universe/misc/optional/100% -> main
13 publications overridden.

Changed in mecab-ipadic (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.