[Upstream] Office does not try to detect character set of CSV

Bug #694188 reported by komputes
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
LibreOffice
Confirmed
Wishlist
OpenOffice
Invalid
Undecided
Unassigned
libreoffice (Ubuntu)
Won't Fix
Wishlist
Unassigned
openoffice.org (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: openoffice.org

When importing a CSV file in OOo Calc, the Lucid version of OOo does not recognize the correct character set. The CSV import module just seems to default to the last character set used.

When doing this with the OOo from 8.04LTS, or with the official version from Openoffice.org, autodetection does it's job.

How to reproduce:
1) make one or two CSV files, for example:
echo '"text","test","vest","vast"' > /tmp/1.csv
echo '"text","test","vest","vast"' > /tmp/2.csv
2) open it with OpenOffice.org:
openoffice.org /tmp/1.csv
Choose "unicode" for import. (This is wrong, but please do this anyway).
The result shows various non-Latin characters, which is expected as we chose the wrong import character set.
Now close the file and reopen any other file (or, for that matter, the same file):
openoffice.org /tmp/2.csv
Now Unicode is the default import type. There simply seems to be no autodetection.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: openoffice.org-calc 1:3.2.0-7ubuntu4.1
ProcVersionSignature: Ubuntu 2.6.32-26.48-generic 2.6.32.24+drm33.11
Uname: Linux 2.6.32-26-generic i686
Architecture: i386
Date: Fri Dec 24 15:39:04 2010
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Release Candidate i386 (20100419.1)
ProcEnviron:
 LANGUAGE=en_US:en_CA:en
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: openoffice.org

Revision history for this message
komputes (komputes) wrote :
Revision history for this message
In , Björn Michaelsen (bjoern-michaelsen) wrote :

It would be great to have charset autodetection as an option for the csv import in Calc. Currently the last charset gets remembered.

I imagine that this could cause quite a bit of trouble to endusers who accidentally change their charset import setting to something that is not the default, but looks like an innocent choice (e.g. "Unicode").

see also:

 https://bugs.launchpad.net/ubuntu/+source/openoffice.org/+bug/694188

Revision history for this message
In , Björn Michaelsen (bjoern-michaelsen) wrote :

There is already a basic implementation of charset detection implemented in the writer text import as SwIoSystem::IsDetectableText:

http://opengrok.libreoffice.org/xref/writer/sw/source/filter/basflt/iodetect.cxx#427

It old and ugly, but could be a starting point. Obviously, it would have to be moved out of writer and polished a bit so that it can be used in other applications too.

Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

Rerproducable in 3.3.1 on natty

Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

Talked about it with upstream. This is currently intended behavior -- there is no autodetection and the last import gets reused.

Changed in openoffice.org (Ubuntu):
status: New → Confirmed
Changed in openoffice:
importance: Unknown → Wishlist
status: Unknown → Confirmed
Revision history for this message
Valentijn Sessink (valentijn) wrote :

I don't understand the "intended behaviour" part. I don't think I should file a bug in OpenOffice.org because autodetection does work correctly there, or should I? ;-)

penalvch (penalvch)
Changed in openoffice:
importance: Wishlist → Undecided
status: Confirmed → New
summary: - OpenOffice.org 3.2 does not try to detect character set of CSV
+ Office does not try to detect character set of CSV
Revision history for this message
penalvch (penalvch) wrote : Re: Office does not try to detect character set of CSV

Confirmed in Ubuntu 10.10 LibreOffice Calc via the Terminal:

cd ~/Desktop && echo '"text","test","vest","vast"' > ~/Desktop/example1.csv && localc -nologo example1.csv

Character Set Unicode drop down box -> OK button -> File -> Exit

cd ~/Desktop && echo '"text","test","vest","vast"' > ~/Desktop/example1.csv && localc -nologo example2.csv

Changed in libreoffice (Ubuntu):
status: New → Confirmed
Changed in df-libreoffice:
importance: Unknown → Wishlist
status: Unknown → Confirmed
penalvch (penalvch)
Changed in libreoffice (Ubuntu):
importance: Undecided → Wishlist
Revision history for this message
In , Alexander-balzer (alexander-balzer) wrote :

Would be nice if the implementation would also work with 38637 - Better handling for csv-Files

Changed in openoffice.org (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote : migrating packaging from OpenOffice.org to Libreoffice

[This is an automated message.]
There are no new official OpenOffice.org releases in Ubuntu packaging anymore => Won't Fix

If the problem persists, please mark this bug as "also affects project Libreoffice" or "also affects distribution Libreoffice (Ubuntu)" if that has not happened already.

Please leave references to upstream OpenOffice.org bugs in place to allow cross pollination.

Revision history for this message
penalvch (penalvch) wrote : Re: Office does not try to detect character set of CSV

No reference URL.

Changed in openoffice:
status: New → Invalid
summary: - Office does not try to detect character set of CSV
+ [Upstream] Office does not try to detect character set of CSV
Changed in libreoffice (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

This is:
- a clearcut upstream issue
- wishlist/feature request unlikely to be fixable with a simple patch
- wont be fixed in libreoffice packaging
Resolving as WONTFIX in libreoffice packaging. This does not mean the issue will not be cared about, but if it is cared about (even by Ubuntu/Canonical contributors), it is done upstream at LibreOffice.

Libreoffice (Ubuntu) => WONTFIX

Revision history for this message
Valentijn Sessink (valentijn) wrote :

The bug clearly states: "When doing this with the OOo from 8.04LTS, or with the official version from Openoffice.org, autodetection does it's job."

Hence, this is NOT an upstream issue. It is a bug introduced by the Ubuntu 10.04 packages.

Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

Upstream for Libreoffice is Libreoffice -- not Openoffice.org.

Revision history for this message
In , Björn Michaelsen (bjoern-michaelsen) wrote :

[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1

more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.