kmimetypefinder5 misidentifies mimetype of python files containing certain strings

Bug #1857824 reported by Nathaniel Beaver
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qtbase-opensource-src (Ubuntu)
Fix Released
Undecided
Dmitry Shachnev

Bug Description

Expected behavior:

    $ kmimetypefinder5 example.py
    text/x-python3

or

    $ kmimetypefinder5 example.py
    text/x-python

or

    $ kmimetypefinder5 example.py
    text/plain

Actual behavior:

    $ kmimetypefinder5 example.py
    application/xhtml+xml

Summary: Python scripts with a string containing HTML can be misidentified as HTML files by kmimetypefinder5.

For example, this python script is identified as "application/xhtml+xml":

#! /usr/bin/env python3
example_string = \
"""\
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Example title</title>
  </head>
  <body>
    <p>Example body</p>
  </body>
</html>
"""
print('Hello, world!')

This difficulty is not shared by other mimetype identification tools.

    $ kmimetypefinder5 example.py
    application/xhtml+xml
    $ cat example2.py #! /usr/bin/env python3
    print('Hello, world!')
    $ kmimetypefinder5 example2.py
    text/x-python3
    $ mimetype 'example.py'
    example.py: text/x-python
    $ mimetype 'example2.py'
    example2.py: text/x-python
    $ file --mime-type 'example.py'
    example.py: text/plain
    $ file --mime-type 'example2.py'
    example2.py: text/plain

$ lsb_release -rd
Description: Ubuntu 18.04.3 LTS
Release: 18.04
$ apt-cache policy kde-cli-tools
kde-cli-tools:
  Installed: 4:5.12.8-0ubuntu0.1
  Candidate: 4:5.12.8-0ubuntu0.1
  Version table:
 *** 4:5.12.8-0ubuntu0.1 500
        500 http://us.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages
        100 /var/lib/dpkg/status
     4:5.12.4-0ubuntu1 500
        500 http://us.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: kde-cli-tools 4:5.12.8-0ubuntu0.1
ProcVersionSignature: Ubuntu 4.15.0-72.81-generic 4.15.18
Uname: Linux 4.15.0-72-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
CurrentDesktop: KDE
Date: Sun Dec 29 13:28:37 2019
InstallationDate: Installed on 2018-12-12 (381 days ago)
InstallationMedia: Kubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
SourcePackage: kde-cli-tools
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Nathaniel Beaver (nathanielmbeaver) wrote :
Revision history for this message
Nathaniel Beaver (nathanielmbeaver) wrote :
Revision history for this message
Alex A. D. (hinell) wrote :

Is this still relevant?

Revision history for this message
Alex A. D. (hinell) wrote :

Wow. I'm experiencing the same problem.

I run the following test case:

$ tee index.py <<eol
#! /usr/bin/env python3
example_string = \
"""\
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Example title</title>
  </head>
  <body>
    <p>Example body</p>
  </body>
</html>
"""
print('Hello, world!')
eol

$ kmimetypefinder5 index.py # => application/xhtml+xml

The last command outputs wrong type. Seriously, it's a bug. I've the same problem in a separate bugreport. Checkout: https://bugs.launchpad.net/ubuntu/+source/shared-mime-info/+bug/1890716

affects: kde-cli-tools (Ubuntu) → shared-mime-info (Ubuntu)
Revision history for this message
Sebastien Bacher (seb128) wrote :

it's not a shared-mime-info issue, on a GNOME session the result is right

$ xdg-mime query filetype index.py
text/x-python

affects: shared-mime-info (Ubuntu) → kde-cli-tools (Ubuntu)
Revision history for this message
Kai Kasurinen (kai-kasurinen) wrote :

> dpkg -S /usr/bin/mimetype
libfile-mimeinfo-perl: /usr/bin/mimetype
> /usr/bin/mimetype --magic-only foo.py
foo.py: application/xhtml+xml

affects: kde-cli-tools (Ubuntu) → shared-mime-info (Ubuntu)
Revision history for this message
Sebastien Bacher (seb128) wrote :

@Kai, so you use a 'command is provided by a perl utility' as a reason to reasing to shared-mime-info, what's the logic? also xdg-mime works fine under GNOME or when using gio that uses shared-mime-info...

affects: shared-mime-info (Ubuntu) → kde-cli-tools (Ubuntu)
Revision history for this message
Kai Kasurinen (kai-kasurinen) wrote :

testing with xdgmime [1] (reference implementation of the spec [2] and also used tests on shared-mine-type):

> ../xdgmime/src/print-mime-data .
test.py:
        name: text/x-python
        data: application/xhtml+xml
        file: application/xhtml+xml

> ../xdgmime/src/test-mime test.py
File "test.py" has a mime-type of application/xhtml+xml

[1] https://gitlab.freedesktop.org/xdg/xdgmime
[2] https://freedesktop.org/wiki/Specifications/shared-mime-info-spec/

--

kmimetypefinder5 is just a tiny wrapper over QMimeDatabase [3][4]. So bug is likely on shared-mime-info or Qt implementation.

[3] https://github.com/KDE/kde-cli-tools/blob/master/kmimetypefinder/kmimetypefinder.cpp
[4] https://bugs.freedesktop.org/show_bug.cgi?id=99672

affects: kde-cli-tools (Ubuntu) → shared-mime-info (Ubuntu)
Revision history for this message
Sebastien Bacher (seb128) wrote :

Could you please stop that reassigning game? You are right, it's either on shared-mime-info or the qt integration, now the result of the testcase provided is correct on GNOME (see comment 5), it's also correct on a cloud instance (e.g no desktop integration) so it's only buggy on KDE, you should rather reassign to the Qt integration, unsure what the right component for that is though

affects: shared-mime-info (Ubuntu) → kde-cli-tools (Ubuntu)
Revision history for this message
Alex A. D. (hinell) wrote :
Revision history for this message
Kai Kasurinen (kai-kasurinen) wrote :
affects: kde-cli-tools (Ubuntu) → qtbase-opensource-src (Ubuntu)
Revision history for this message
Dmitry Shachnev (mitya57) wrote :

Qt actually prefers system copy from shared-mime-info if it's available. Here is the code for 5.12.8 (used in Ubuntu 20.04):

https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/mimetypes/qmimedatabase.cpp?h=5.12.8#n99

The internal copy (":/qt-project.org/qmime") is only used when the system one is missing (fdoIterator == mimeDirs.constEnd()).

Furthermore, in Ubuntu 20.10 and 21.04 we are building Qt with -no-mimetype-database, which makes Qt use *only* the system copy:

https://salsa.debian.org/qt-kde-team/qt/qtbase/-/commit/f0d53be16a31ea55

So I think it is a bug in shared-mime-info only, not in Qt. Do you agree?

Revision history for this message
Kai Kasurinen (kai-kasurinen) wrote :

specially:
"The spec hasn't changed, but I made the same mistake in xdgmime
(the reference implementation) and in Qt: when multiple globs match,
and the result from magic sniffing is unrelated to any of those globs,
then I used the magic result, but that's wrong, globs have priority
and one of them should be picked up."

and there is multiple glob matches:
<mime-type type="text/x-python3">
<glob pattern="*.py" weight="50"/><!-- lower priority than in text/x-python -->
<mime-type type="text/x-python">
<glob pattern="*.py" weight="60"/>

Revision history for this message
Kai Kasurinen (kai-kasurinen) wrote :
Revision history for this message
Dmitry Shachnev (mitya57) wrote :

Thank you for the links!

Changed in qtbase-opensource-src (Ubuntu):
status: New → In Progress
assignee: nobody → Dmitry Shachnev (mitya57)
Revision history for this message
Dmitry Shachnev (mitya57) wrote :

The fix should land in Impish (21.10) very soon:
https://launchpad.net/ubuntu/+source/qtbase-opensource-src/5.15.2+dfsg-6.

Do you also want a fix for Hirsute (21.04)? If yes, you can help me by filling the SRU template in the description (see https://wiki.ubuntu.com/StableReleaseUpdates#SRU_Bug_Template).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qtbase-opensource-src - 5.15.2+dfsg-6

---------------
qtbase-opensource-src (5.15.2+dfsg-6) unstable; urgency=medium

  * Backport upstream patch to adjust QMimeDatabase behavior (LP: #1857824).
  * Make qtbase5-dev break qt5-default (see #976389, LP: #1920130).

 -- Dmitry Shachnev <email address hidden> Sat, 29 May 2021 12:04:21 +0300

Changed in qtbase-opensource-src (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.