CSV and TAB format use wrong content-type

Bug #344788 reported by Xavier Brochard
2
Affects Status Importance Assigned to Milestone
Woda
Fix Committed
Low
Xavier Brochard

Bug Description

content-type for CSV, CSVS and TAB format is set to text/plain in Search and in Export
it should be set to text/CSV

concerns sub are
sub printSearchHttpHeader
sub wbExport

woda 4x (at least)

Related branches

Xavier Brochard (xavier)
Changed in woda:
assignee: nobody → Xavier Brochard (xavier)
status: New → Confirmed
importance: Undecided → Medium
importance: Medium → Low
Revision history for this message
Malcolm Fitzgerald (malcolm-notyourhomework) wrote :

I agree. We should be more specific about the header. It allows the browser to pass the content off to another application. At the least it would save the output as a file and that saves the user a step.

There is also the XMP issue with TAB format. Is XMP meant to be old HTML "example" tag or Adobe Extensible Metadata Platform? Can we remove it or is it needed?

sub printSearchHttpHeader {

    local ( $format, $yh, $yf ) = @_;
    local ( $html, $x );

    if ( $format =~ m/^(\d+)$/ && $WBB{"formatHttpHead$1"} ne '' ) {
        $x = $WBB{"formatHttpHead$1"};
        $x =~ s/\n*$//g; # remove trailing newlines
        print "$x\n\n" if $yh;
        $html = $yh = $yf = 0; # no more head or foot stuff for sure
    }
    elsif ( $format =~ m/^CSV/ ) {
        print "Content-type: test/csv\n" if $yh;
        print "Content-Disposition: attachment; filename=\"woda export.csv\"\n\n" if $yh;
        $html = $yh = $yf = 0; # no more head of foot stuff for sure
    }
    elsif ( $format =~ m/^TAB$/ ) {
        print "Content-type: text/tab-separated-values\n" if $yh;
        print "Content-Disposition: attachment; filename=\"woda export.tsv\"\n\n" if $yh;
        $html = $yh = $yf = 0; # no more head of foot stuff for sure
    }
    elsif ( $format =~ m/^RAW$/ ) {
        print "Content-type: text/plain\n\n" if $yh;
        $html = $yh = $yf = 0; # no more head of foot stuff for sure
    }
    elsif ( $format =~ m/^XML/ ) {
        printXmlHead() if $yh;
        $html = $yh = $yf = 0; # no more head of foot stuff for sure
    }
    elsif ( $format =~ m/^RSS/ ) {
        print "Content-type: text/xml\n\n" if $yh;
        $html = $yh = $yf = 0; # no more head of foot stuff for sure
    }
    else {
        $html = 1;
    }

    return ( $html, $yh, $yf );

}

Revision history for this message
Xavier Brochard (xavier) wrote : Re: [Bug 344788] Re: CSV and TAB format use wrong content-type

Le Vendredi 26 Mars 2010 14:25:08, vous avez écrit :
> There is also the XMP issue with TAB format. Is XMP meant to be old HTML
> "example" tag or Adobe Extensible Metadata Platform? Can we remove it or
> is it needed?

it is the old HTML example tag, deprecated, obsolete and removed long long
time ago (around 13 years!)

we should replace it with PRE (I will)

Xavier
<email address hidden> - 09 54 06 16 26

Revision history for this message
Xavier Brochard (xavier) wrote :

Le Vendredi 26 Mars 2010 14:25:08, vous avez écrit :
> sub printSearchHttpHeader

Don't forget that each content-type must end with 2 \n

I'm guessing:
shouldn't text/csv be used also for tab separated values (TAB)? it is (was)
very common. I'm not against text/tab-separated-values but I really don't know
if it is widely understandable by browsers, systems or softwares.

Charset is also missing.
We shoud have
(code from sub printHead)
        if ( $WBB{'intlCharset'} ne '' ) {
            $chs = "; charset=$WBB{'intlCharset'}";
        }
        else {
            $chs = '';
        }

And then for each format:
elsif ( $format =~ m/^CSV/ ) {
        print "Content-type: test/csv$chs\n\n"

About TAB again, there is no need for XMP or PRE in sub formatFoundSeparators)
as long as content-type is not text/html. But is it different than wbExport
code?
Todo: charset is also missing in sub wbExport

Xavier
<email address hidden> - 09 54 06 16 26

Revision history for this message
Malcolm Fitzgerald (malcolm-notyourhomework) wrote :

On 27/03/2010, at 1:02 AM, Xavier Brochard wrote:

> Le Vendredi 26 Mars 2010 14:25:08, vous avez écrit :
>> There is also the XMP issue with TAB format. Is XMP meant to be old HTML
>> "example" tag or Adobe Extensible Metadata Platform? Can we remove it or
>> is it needed?
>
> it is the old HTML example tag, deprecated, obsolete and removed long long
> time ago (around 13 years!)
>
> we should replace it with PRE (I will)

I don't think it should be there at all. If a user selects TAB export they don't want it wrapped in PRE. I've always stripped it out because it doesn't make sense to be there.

Malcolm

Revision history for this message
Malcolm Fitzgerald (malcolm-notyourhomework) wrote :

On 27/03/2010, at 2:10 AM, Xavier Brochard wrote:

> Le Vendredi 26 Mars 2010 14:25:08, vous avez écrit :
>> sub printSearchHttpHeader
>
> Don't forget that each content-type must end with 2 \n

Yes, If you look carefully you'll see that the final line always has two line feed characters.

You are right about character set. It should be there too.

The tab-separated-values should be safe. It is correct. Using text/csv is a work-around which takes advantage of the fact that the CSV delimiter does not have to be comma. Semi-colon is the common alternative but people have used tab.

>
> I'm guessing:
> shouldn't text/csv be used also for tab separated values (TAB)? it is (was)
> very common. I'm not against text/tab-separated-values but I really don't know
> if it is widely understandable by browsers, systems or softwares.
>
>
> Charset is also missing.
> We shoud have
> (code from sub printHead)
> if ( $WBB{'intlCharset'} ne '' ) {
> $chs = "; charset=$WBB{'intlCharset'}";
> }
> else {
> $chs = '';
> }
>
> And then for each format:
> elsif ( $format =~ m/^CSV/ ) {
> print "Content-type: test/csv$chs\n\n"
>
>
> About TAB again, there is no need for XMP or PRE in sub formatFoundSeparators)
> as long as content-type is not text/html. But is it different than wbExport
> code?
> Todo: charset is also missing in sub wbExport
>
>
> Xavier
> <email address hidden> - 09 54 06 16 26
>
> --
> CSV and TAB format use wrong content-type
> https://bugs.launchpad.net/bugs/344788
> You received this bug notification because you are a member of Woda,
> which is the registrant for Woda.
>
> Status in Woda, the Web Oriented DAtabase: Confirmed
>
> Bug description:
> content-type for CSV, CSVS and TAB format is set to text/plain in Search and in Export
> it should be set to text/CSV
>
> concerns sub are
> sub printSearchHttpHeader
> sub wbExport
>
> woda 4x (at least)
>
>

Revision history for this message
Xavier Brochard (xavier) wrote :

Le Samedi 27 Mars 2010 08:09:36, vous avez écrit :
> Yes, If you look carefully you'll see that the final line always has two
> line feed characters.

aaaaargh!
I will change my glasses

> The tab-separated-values should be safe. It is correct. Using text/csv
> is a work-around which takes advantage of the fact that the CSV
> delimiter does not have to be comma. Semi-colon is the common
> alternative but people have used tab.

ok

Xavier
<email address hidden> - 09 54 06 16 26

Xavier Brochard (xavier)
Changed in woda:
status: Confirmed → Fix Committed
Xavier Brochard (xavier)
Changed in woda:
milestone: none → merge-with-pro
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.