MySQL++

Encoding error when upgrading from Ubuntu 18 to 20
Login

Encoding error when upgrading from Ubuntu 18 to 20

(1) By bruno on 2020-07-10 16:57:40 [link] [source]

Hi,

After I have upgraded my Ubuntu 18 to 20 (that means some libraries were upgraded, such as mysql++ from 3.2.2 to 3.2.5, and gcc 7 to 9), using a local DB server with the Docker MySql 8 image, I am having encoding errors for the following query:

mysqlpp::Query query = conn.query();
std::string position("\xC8\x03\x18\x00", 4);
query << "UPDATE Player p JOIN (SELECT " 
      << mysqlpp::quote << position 
      << " AS xxx, 2 AS player_id) u USING (player_id) SET p.position=u.xxx";

And the error is:

Incorrect string value: '\xC8\x03\x18\x00' for column 'xxx' at row 1

Before upgrading, I didn't have that error. The Player.position column is of type tinyblob. It seems that mysql is interpreting the derived table column as an encoded string instead of just a hex blob. Executing this using the mysql cli works:

UPDATE local.Player p JOIN (SELECT x'C8031800' AS xxx, 2 AS player_id) u USING (player_id) SET p.position=u.xxx;

Just executing a regular update works as well:

query << "UPDATE Player SET position=" << mysqlpp::quote << position;

The char sets are (but in this case the column is just a blob, it should not have any encoding, right?):

'character_set_client', 'utf8mb4'
'character_set_connection', 'utf8mb4'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8mb4'
'character_set_server', 'utf8mb4'
'character_set_system', 'utf8'

What am I missing? What could have changed in this upgrade that have caused this to start failing?

Thanks in advance

(2) By Warren Young (tangent) on 2020-07-10 17:15:51 in reply to 1 [link] [source]

I've tried feeding that string to 3 different UTF-8 decoders, and none of them gave me anything sensible. Indeed, the null byte is highly suspect: UTF-8 was designed specifically to ensure that null bytes never appear in output text that did not have them in the input.

The fact that you're using a string for a "player_id" column is also suspect. Why not just use a 32-bit or larger integer?

(3.2) By bruno on 2020-07-10 18:02:08 edited from 3.1 in reply to 2 [link] [source]

Hi, that string isn't UTF-8 encoded, it is just 4 raw bytes. The query I have shared is just an example that I have simplified so that I can make this error reproducible.

What is odd is that forcing it to use ascii encoding it then works (it seems to allow extended ascii, ie 8-bit):

query << "UPDATE Player p JOIN (SELECT _ascii" << mysqlpp::quote << position << " AS xxx, 2 AS player_id) u USING (player_id) SET p.position=u.xxx";

But I still would like to understand why was it working before and not now, and why a regular update on the 'position' column works but not an update that involves a derived column. Any ideas?

Edit-- The _binary character set also works (and makes more sense for this case: https://dev.mysql.com/doc/refman/8.0/en/charset-binary-set.html).

(4) By Warren Young (tangent) on 2020-07-10 18:06:24 in reply to 3.1 [link] [source]

just an example that I have simplified so that I can make this error reproducible.

It's irreproducible here. You haven't provided nearly enough info for that.

If you want me to help you further, modify the examples and post a diff to show the symptom. That will at least allow a bisect to find the checkin that changed the behavior, if any.

Alternately bisect it yourself. Pointing me at a specific commit will be far more helpful than giving a range of versions spanning four years.

There's a fair chance this isn't a MySQL++ issue at all, but instead a change in the way the MySQL/MariaDB C API handles strings. There have been some Unicode changes in the past few years.

Anyway, I still don't see why it's a good idea to use a string-encoded 4-byte blob instead of a 32-bit integer. The latter will work with any DBMS back to the stone age. Your hack was teetering on the edge of madness from the start. I'm not surprised that some small change in the way the software now works kicked it over the edge.

If this was a BLOB column, then I'd understand better, but then I have to ask why you're using raw C strings instead of mysqlpp::sql_blob.

(5) By bruno on 2020-07-10 18:42:53 in reply to 4 [link] [source]

Thanks I will look into that and the MySQL changes in these years. Let me expand the example code, and you need the table as well:

CREATE TABLE `Player` (
  `player_id` bigint NOT NULL,
  `position` tinyblob NOT NULL);

And the code:

void test() {
  mysqlpp::Connection conn(false);
  if (!conn.connect("local","127.0.0.1","root","root", 3306)) {
    std::cout << conn.error() << std::endl;
    exit(1);
  }

  mysqlpp::Query query = conn.query();
  std::string position("\xFF\xFF\xFF\x00", 4);
  query << "UPDATE Player p JOIN (SELECT _binary" 
        << mysqlpp::quote << position 
        << " AS xxx, 2 AS player_id) u USING (player_id) SET p.position=u.xxx";

  if (mysqlpp::SimpleResult result = query.execute()) {
    std::cout << "Success! " << std::endl;
    exit(0);
  }

  std::cout << query.error() << std::endl;

  exit(1);
}

Please let me know if you need more details. If you can, please try testing that with the head version of mysqlpp in your machine and let me know if it works.

Anyway, I still don't see why it's a good idea to use a string-encoded 4-byte blob instead of a 32-bit integer. The latter will work with any DBMS back to the stone age. Your hack was teetering on the edge of madness from the start. I'm not surprised that some small change in the way the software now works kicked it over the edge.

That is not what the original code does, this is just an example to showcase this error.

If this was a BLOB column, then I'd understand better, but then I have to ask why you're using raw C strings instead of mysqlpp::sql_blob.

It is a blob column. If I am not mistaken it will implicitly convert it, right? Again, this is just an example. I am using raw C strings because of other libraries I am using in the original code which serialize some objects that I save in the DB, and it uses std::strings.

Thank you

(6) By Warren Young (tangent) on 2020-07-10 19:59:18 in reply to 5 [link] [source]

please try testing that with the head version of mysqlpp in your machine and let me know if it works.

It does.

I'm running the Homebrew build of MariaDB 10.4.13 on macOS 10.14.

Are you sure it doesn't need any test data to properly prove success, though? I'm a little fuzzy on what your SQL is trying to accomplish, but it looks like a no-op when run on an empty Player table.

I therefore changed your table schema to make player_id an AUTO_INCREMENT PRIMARY KEY column and replaced the body of examples/simple1.cpp with this:

  mysqlpp::Connection conn(false); 
  if (!conn.connect("mysql_cpp_data", "127.0.0.1", "test", "test")) { 
    std::cout << conn.error() << std::endl; 
    exit(1); 
  } 
 
  mysqlpp::Query query = conn.query(); 
  mysqlpp::sql_blob position("\xFF\xFF\xFF\x00", 4); 
  query << "INSERT INTO Player(position) VALUES(" << 
        mysqlpp::quote << position << ')'; 
 
  if (mysqlpp::SimpleResult result = query.execute()) { 
    std::cout << "Success! " << std::endl; 
    exit(0); 
  } 
 
  std::cout << query.error() << std::endl; 

It runs successfully here with tip-of-trunk.

I am using in the original code which serialize some objects that I save in the DB, and it uses std::strings.

You aren't using std::string directly in this example, though, you're using C strings converted to C++ strings. The distinction is that copying C++ strings doesn't require special handling of the null byte.

There is an explicit conversion from std::string to mysqlpp::sql_blob in MySQL++. Changing the data type back to std::string in the above diff doesn't change the result: it still works, but maybe switching to mysqlpp::sql_blob in your code will work because of local differences.

(7) By bruno on 2020-07-10 21:12:34 in reply to 6 [source]

Are you sure it doesn't need any test data to properly prove success, though? I'm a little fuzzy on what your SQL is trying to accomplish, but it looks like a no-op when run on an empty Player table.

Sorry, it does need one row. Please insert one row and use the player_id of it in the query.

I therefore changed your table schema to make player_id an AUTO_INCREMENT PRIMARY KEY column and replaced the body of examples/simple1.cpp with this:

That also works here, but if you use a derived table (like in my example) then it fails. Try this version (with the ID you have inserted):

    mysqlpp::Connection conn(false);
    if (!conn.connect("mysql_cpp_data", "127.0.0.1", "test", "test")) {
      std::cout << conn.error() << std::endl;
      exit(1);
    }

    mysqlpp::Query query = conn.query();
    mysqlpp::sql_blob position("\xFF\xFF\xFF\x00", 4);
    query << "UPDATE Player p JOIN (SELECT "
          << mysqlpp::quote << position
          << " AS position, 1 AS player_id) u USING (player_id) SET p.position=u.position";

    if (mysqlpp::SimpleResult result = query.execute()) {
      std::cout << "Success! " << std::endl;
      exit(0);
    }

    std::cout << query.error() << std::endl;

There is an explicit conversion from std::string to mysqlpp::sql_blob in MySQL++. Changing the data type back to std::string in the above diff doesn't change the result: it still works, but maybe switching to mysqlpp::sql_blob in your code will work because of local differences.

Got it, good to know thanks. I have tried using the sql_blob (as above), but I get the same error (with the derived table version).

(8) By Warren Young (tangent) on 2020-07-10 22:20:56 in reply to 7 [link] [source]

Your new code does indeed fail here, but rolling back to MySQL++ 3.2.2 doesn't fix that.

I think you're running up against a change in MariaDB here, not MySQL++.

Until you can prove otherwise, I've got to stop spending time on this.