src:texinfo fails to import (importer) or download (pull-debian-source) with ASCII decoding issue

Bug #1700846 reported by Nish Aravamudan
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
git-ubuntu
Fix Released
Undecided
Robie Basak
ubuntu-dev-tools (Ubuntu)
Fix Released
Undecided
Robie Basak

Bug Description

06/27/2017 12:50:58 - DEBUG:Updating importer/ubuntu/breezy-devel to importer/ubuntu/breezy-security
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'af4fca4aa53c2a05132803d35e451a8e0caf44fe', <ChangelogField.version: 1>) {} = 4.7-2.2ubuntu2.1..
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'af4fca4aa53c2a05132803d35e451a8e0caf44fe', <ChangelogField.previous_version: 2>) {} = 4.7-2.2ubuntu2..
06/27/2017 12:51:00 - DEBUG:Executing: sh -c 'echo 1f4bae2aaa704ea4a731a341b45e49e089fc1bb4:debian/changelog | git cat-file --batch --follow-symlinks | sed -n '1{/^[^ ]* blob/!{p;q1}};2,$p' | dpkg-parsechangelog -l- -n1 -SVersion'
06/27/2017 12:51:00 - DEBUG:Executing: sh -c 'echo 1f4bae2aaa704ea4a731a341b45e49e089fc1bb4:debian/changelog | git cat-file --batch --follow-symlinks | sed -n '1{/^[^ ]* blob/!{p;q1}};2,$p' | dpkg-parsechangelog -l- -n1 -o1 -SVersion'
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'f32b9f41e23679b8375bc37842fcbf584ac824f3', <ChangelogField.maintainer: 3>) {} = Kees Cook <email address hidden>..
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, 'f32b9f41e23679b8375bc37842fcbf584ac824f3', <ChangelogField.date: 4>) {} = Fri, 3 Nov 2006 17:08:46 -080..
06/27/2017 12:51:00 - DEBUG:Executing: git commit-tree f32b9f41e23679b8375bc37842fcbf584ac824f3 -p af4fca4aa53c2a05132803d35e451a8e0caf44fe -p 1f4bae2aaa704ea4a731a341b45e49e089fc1bb4 -F /tmp/tmpgzlnl4_t
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, '3be8275ab70e9085b0bb52d544e59d6f58966802', <ChangelogField.version: 1>) {} = 4.8.dfsg.1-3..
06/27/2017 12:51:00 - DEBUG:Cache hit on (<gitubuntu.git_repository.GitUbuntuRepository object at 0x7f0257da97b8>, '3be8275ab70e9085b0bb52d544e59d6f58966802', <ChangelogField.previous_version: 2>) {} = 4.8.dfsg.1-2..
06/27/2017 12:51:04 - INFO:Importing patches-unapplied 4.8.dfsg.1-4 to ubuntu/feisty
/usr/lib/python3/dist-packages/debian/deb822.py:216: UnicodeWarning: decoding from utf-8 failed; attempting to detect the true encoding
  UnicodeWarning)
06/27/2017 12:51:11 - DEBUG:EUC-JP Japanese prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:EUC-KR Korean prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:CP949 Korean prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:EUC-TW Taiwan prober hit error at byte 19
06/27/2017 12:51:11 - DEBUG:utf-8 not active
06/27/2017 12:51:11 - DEBUG:CP932 Japanese confidence = 0.01
06/27/2017 12:51:11 - DEBUG:EUC-JP not active
06/27/2017 12:51:11 - DEBUG:GB2312 Chinese confidence = 0.01
06/27/2017 12:51:11 - DEBUG:EUC-KR not active
06/27/2017 12:51:11 - DEBUG:CP949 not active
06/27/2017 12:51:11 - DEBUG:Big5 Chinese confidence = 0.01
06/27/2017 12:51:11 - DEBUG:EUC-TW not active
06/27/2017 12:51:11 - DEBUG:windows-1251 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:KOI8-R Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:MacCyrillic Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM866 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM855 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-7 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1253 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Bulgairan confidence = 0.01
06/27/2017 12:51:11 - DEBUG:windows-1251 Bulgarian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:TIS-620 Thai confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-9 Turkish confidence = 0.7729647837244535
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1251 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:KOI8-R Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:MacCyrillic Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM866 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:IBM855 Russian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-7 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1253 Greek confidence = 0.0
06/27/2017 12:51:11 - DEBUG:ISO-8859-5 Bulgairan confidence = 0.01
06/27/2017 12:51:11 - DEBUG:windows-1251 Bulgarian confidence = 0.01
06/27/2017 12:51:11 - DEBUG:TIS-620 Thai confidence = 0.01
06/27/2017 12:51:11 - DEBUG:ISO-8859-9 Turkish confidence = 0.7729647837244535
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:11 - DEBUG:windows-1255 Hebrew confidence = 0.0
06/27/2017 12:51:20 - DEBUG:Executing: git checkout --orphan master
06/27/2017 12:51:20 - DEBUG:Executing: git reset --hard
06/27/2017 12:51:20 - DEBUG:Executing: git clean -f -d
Traceback (most recent call last):
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 1094, in import_publishes
    import_func(srcpkg_information)
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 753, in import_unapplied_spi
    GitUbuntuDsc(spi.dsc_pathname),
  File "/home/nacc/work/usd-importer/gitubuntu/dsc.py", line 22, in __init__
    super(GitUbuntuDsc, self).__init__(dscf)
  File "/usr/lib/python3/dist-packages/debian/deb822.py", line 1251, in __init__
    self._bytes(s, encoding) for s in sequence)
  File "/usr/lib/python3/dist-packages/debian/deb822.py", line 649, in split_gpg_and_payload
    for line in sequence:
  File "/usr/lib/python3/dist-packages/debian/deb822.py", line 1251, in <genexpr>
    self._bytes(s, encoding) for s in sequence)
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 314: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nacc/work/usd-importer/bin/git-ubuntu", line 18, in <module>
    main()
  File "/home/nacc/work/usd-importer/gitubuntu/__main__.py", line 203, in main
    args.func(args)
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 1240, in main
    ubuntu_head_versions=ubuntu_head_versions)
  File "/home/nacc/work/usd-importer/gitubuntu/importer.py", line 1110, in import_publishes
    raise GitUbuntuImportException(msg) from e
gitubuntu.importer.GitUbuntuImportException: Unable to import patches-unapplied 4.8.dfsg.1-4 to ubuntu
06/27/2017 12:51:21 - INFO:Leaving /tmp/tmpt998hlld as directed

Related branches

Revision history for this message
Nish Aravamudan (nacc) wrote :

Robie, can you look at this and see if you can reproduce? I see it on my bastion and on my laptop with 17.10.

Robie Basak (racb)
tags: added: import-edge-case
Revision history for this message
Robie Basak (racb) wrote :

Reproduced on 16.04.

Revision history for this message
Robie Basak (racb) wrote :

"pull-debian-source -d texinfo 4.8.dfsg.1-4" also fails with a similar error.

Revision history for this message
Robie Basak (racb) wrote :

However "pull-lp-source -d texinfo 4.8.dfsg.1-4" succeeds.

Revision history for this message
Nish Aravamudan (nacc) wrote :

On 17.10:

$ pull-debian-source -d texinfo 4.8.dfsg.1-4
pull-debian-source: Downloading texinfo version 4.8.dfsg.1-4
pull-debian-source: Using rmadison for component determination
pull-debian-source: Guessing component from most recent upload
/usr/lib/python2.7/dist-packages/debian/deb822.py:216: UnicodeWarning: decoding from utf-8 failed; attempting to detect the true encoding
  UnicodeWarning)
pull-debian-source: Error: Signature on texinfo_4.8.dfsg.1-4.dsc could not be verified
pull-debian-source: Error: Failed to download: Signature on texinfo_4.8.dfsg.1-4.dsc could not be verified

$ pull-lp-source -d texinfo 4.8.dfsg.1-4
pull-lp-source: Downloading texinfo version 4.8.dfsg.1-4
/usr/lib/python2.7/dist-packages/debian/deb822.py:216: UnicodeWarning: decoding from utf-8 failed; attempting to detect the true encoding
  UnicodeWarning)
pull-lp-source: Downloading texinfo_4.8.dfsg.1.orig.tar.gz from archive.ubuntu.com (1.837 MiB)
pull-lp-source: Downloading texinfo_4.8.dfsg.1.orig.tar.gz from launchpad.net (1.837 MiB)
pull-lp-source: Downloading texinfo_4.8.dfsg.1-4.diff.gz from archive.ubuntu.com (0.097 MiB)
pull-lp-source: Downloading texinfo_4.8.dfsg.1-4.diff.gz from launchpad.net (0.097 MiB)

Revision history for this message
Robie Basak (racb) wrote :

The cause is that the original uploaded dsc contains invalid UTF-8:

Uploaders: Frank K<FC>ster <email address hidden>

Changed in usd-importer:
assignee: nobody → Robie Basak (racb)
status: New → In Progress
Revision history for this message
Robie Basak (racb) wrote :

Debian policy 3.8.1.0 first mandated UTF-8 in control files in March 2009. The dsc file in question is from November 2006. So it was valid by the policy that applied at the time. This suggests that we must be able to handle non-UTF-8 correctly for historical source packages.

With an undefined codec, perhaps errors='replace' would be appropriate.

Revision history for this message
Robie Basak (racb) wrote :

Looks like pull-debian-source already opens as binary, but pull-debian-source is still Python 2 and in that case the autodetection appears to fail. Converting pull-debian-source to Python 3 with no other direct change fixes it.

So we need two fixes: one for the importer, and one in ubuntu-dev-tools.

summary: - src:texinfo fails to import with ASCII decoding issue
+ src:texinfo fails to import (importer) or download (pull-debian-source)
+ with ASCII decoding issue
Changed in ubuntu-dev-tools (Ubuntu):
assignee: nobody → Robie Basak (racb)
status: New → In Progress
Robie Basak (racb)
tags: added: hash-abi-break
Nish Aravamudan (nacc)
Changed in usd-importer:
status: In Progress → Fix Released
Revision history for this message
Dan Streetman (ddstreet) wrote :

This problem was not that the encoding was non-utf8, python-debian correctly detects that and uses chardet to autodetect the right encoding. That 'decoding from utf-8 failed' message is harmless.

The problem here is that the old package signature required a public key that's no longer in the keyring; ubuntutools/archive.py treats that as an error and fails the sig verification, which errors out pull-debian-source. This 'works' for pull-lp-source because it uses UbuntuSourcePackage which inherits from SourcePackage which calls check_dsc() without verification turned on. Only DebianSourcePackage calls check_dsc() with verify_signature=True, so dsc sig verification is done only for debian packages.

I have already fixed this in my rewrite of pull-* in bug 1453330, by defaulting to verify the signature, but only printing a warning if it detects the public key isn't available. I also added the --no-verify-signature param (defaulting to false; do verification by default).

Also note that in trusty, the public key for this package version *is* in the keyring (in my test container, at least) and signature verification succeeds. In xenial and later, the public key isn't found.

Please feel free to merge bug 1453330 which will fix this bug for pull-debian-source (and, will change pull-lp-source and pull-uca-source to actually start verifying dsc signatures).

Revision history for this message
Dan Streetman (ddstreet) wrote :
Mattia Rizzolo (mapreri)
Changed in ubuntu-dev-tools (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-dev-tools - 0.170

---------------
ubuntu-dev-tools (0.170) unstable; urgency=medium

  [ Robie Basak ]
  * pull-debian-source:
    + Add a new --no-verify-signature option option, to download a source
      package without checking its signature.
    + Port to Python 3. LP: #1700846

  [ Mattia Rizzolo ]
  * d/control:
    + Bump debhelper compat level to 12.
  * reverse-depends:
    + prevent crash when specifying a specific architecture. Closes: #933018
  * ubuntutools/archive:
    + Default to checking signatures while pulling a .dsc.

 -- Mattia Rizzolo <email address hidden> Mon, 05 Aug 2019 13:28:23 +0200

Changed in ubuntu-dev-tools (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.