Wrong result with semijoin materialization and blob fields

Bug #823930 reported by Timour Katchaounov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

Two test cases from subselect_sj_mat that test materialization
with blobs produce incorrect empty result with semijoin
materialization. The two failing test cases are the ones that test
blobs with size 1024, and 1025. Here I extracted and simplified
a bit the test case marked with:
# BLOB == 1024 (group_concat_max_len == 1024).
For length 1025 the wrong result is the same.

Revision history for this message
Timour Katchaounov (timour) wrote :

create table t1_1024 (a1 blob(1024), a2 blob(1024));
create table t2_1024 (b1 blob(1024), b2 blob(1024));
create table t3_1024 (c1 blob(1024), c2 blob(1024));

insert into t1_1024 values
 (concat('1 - 00', repeat('x', 1018)), concat('2 - 00', repeat('x', 1018)));
insert into t1_1024 values
 (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018)));
insert into t1_1024 values
 (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018)));

insert into t2_1024 values
 (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018)));
insert into t2_1024 values
 (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018)));
insert into t2_1024 values
 (concat('1 - 03', repeat('x', 1018)), concat('2 - 03', repeat('x', 1018)));

insert into t3_1024 values
 (concat('1 - 01', repeat('x', 1018)), concat('2 - 01', repeat('x', 1018)));
insert into t3_1024 values
 (concat('1 - 02', repeat('x', 1018)), concat('2 - 02', repeat('x', 1018)));
insert into t3_1024 values
 (concat('1 - 03', repeat('x', 1018)), concat('2 - 03', repeat('x', 1018)));
insert into t3_1024 values
 (concat('1 - 04', repeat('x', 1018)), concat('2 - 04', repeat('x', 1018)));

-- Correct result
set @@optimizer_switch='semijoin=off,materialization=on,in_to_exists=off';

select left(a1,7), left(a2,7)
from t1_1024
where a1 in (select substring(b1,1,1024) from t2_1024 where b1 > '0');

-- Wrong result
set @@optimizer_switch='semijoin=on,materialization=on,in_to_exists=off';

select left(a1,7), left(a2,7)
from t1_1024
where a1 in (select substring(b1,1,1024) from t2_1024 where b1 > '0');

Changed in maria:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Sergey Petrunia (sergefp)
milestone: none → 5.3
Changed in maria:
importance: Medium → High
Revision history for this message
Sergey Petrunia (sergefp) wrote :

Looking at EXPLAIN outputs, I see that

- The query with "Correct result" settings does not use Materialization (even though optimizer_switch value instructs it to do it)

- The query with "Wrong result" settings does use Materialization.

It seems, the problem is that SJ-Materialization attempts to handle this case, while it should refuse it, like regular materialization did.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

In the case of non-semijoin materialization, we proceed as far as entering
subselect_hash_sj_engine::init(). Inside that function, we execute up to this
code:

  if (result_sink->create_result_table(thd, tmp_columns, TRUE,
                                       tmp_create_options,
           name, TRUE, TRUE))
    DBUG_RETURN(TRUE);

  tmp_table= result_sink->table;
  result= result_sink;

  /*
    If the subquery has blobs, or the total key lenght is bigger than
    some length, or the total number of key parts is more than the
    allowed maximum (currently MAX_REF_PARTS == 16), then the created
    index cannot be used for lookups and we can't use hash semi
    join. If this is the case, delete the temporary table since it
    will not be used, and tell the caller we failed to initialize the
    engine.
  */
  if (tmp_table->s->keys == 0)
  {

And then we find out that "tmp_table->s->keys == 0" which causes us not to use
materialization strategy, and switch to IN->EXISTS instead.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

It is difficult to reuse this approach for SJ-Materialization, because SJ-Materialization creates temp.table after join optimization has been finished. At that point, it is not easy to go back and redo the join optimization.

A better option would be to make subquery_types_allow_materialization() do its job properly and detect all cases where materialization is not applicable.

Changed in maria:
status: Confirmed → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.