MySQL++

Check-in [9c2b57cbf6]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Added MYSQLPP_UTF8_CS macro. Using it in resetdb example. Documented it in the user manual, along with an explanation of the MySQL history of Unicode support to justify its existence.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256:9c2b57cbf63ef09589d8c371ae0d15e1679d42e969c3dd4e22fe0d29c3b46b66
User & Date: tangent 2018-07-26 16:54:24
Context
2018-07-26
16:59
Wrote up the changes-so-far for MySQL++ 3.2.4 in the ChangeLog.md file. check-in: 1484202d13 user: tangent tags: trunk
16:54
Added MYSQLPP_UTF8_CS macro. Using it in resetdb example. Documented it in the user manual, along with an explanation of the MySQL history of Unicode support to justify its existence. check-in: 9c2b57cbf6 user: tangent tags: trunk
15:31
Removed the Subversion revision numbers from the ChangeLog entries, since the Subversion repo it refers to is no longer available. check-in: 775aba61ff user: tangent tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to doc/userman/unicode.dbx.

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
..
60
61
62
63
64
65
66




































67
68
69
70
71
72
73
    subset covering the most economically valuable cases takes two bytes
    per character, so many Unicode-aware programs only support this
    subset, storing characters as 2-byte values, rather than use 4-byte
    characters so as to cover all possible cases, however rare. This
    subset of Unicode is called the Basic Multilingual Plane, or
    BMP.</para>

    <para>Unfortunately, Unicode was invented about two decades
    too late for Unix and C. Those decades of legacy created an
    immense inertia preventing a widespread move away from 8-bit
    characters. MySQL and C++ come out of these older traditions, and
    so they share the same practical limitations. MySQL++ currently
    doesn't have any code in it for Unicode conversions; it just
    passes data along unchanged from the underlying MySQL C API,
    so you still need to be aware of these underlying issues.</para>

    <para>During the development of the <ulink
    url="http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs">Plan
    9</ulink> operating system (a kind of successor to Unix) Ken
    Thompson <ulink
    url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink>
    the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8
................................................................................
    they have no explicit support for UTF-8. (Follow the last link above
    to see how the design of UTF-8 allows this.) Thus, when explicit
    support for Unicode was added in MySQL v4.1, they chose to make
    UTF-8 the native encoding, to preserve backward compatibility with
    programs that had no Unicode support.</para>
  </sect2>






































  <sect2 id="unicode-unix">
    <title>Unicode on Unixy Systems</title>

    <para>Linux and Unix have system-wide UTF-8 support these days. If
    your operating system is of 2001 or newer vintage, it probably has
    such support.</para>







|
|
|
|
|
|
|
|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
..
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
    subset covering the most economically valuable cases takes two bytes
    per character, so many Unicode-aware programs only support this
    subset, storing characters as 2-byte values, rather than use 4-byte
    characters so as to cover all possible cases, however rare. This
    subset of Unicode is called the Basic Multilingual Plane, or
    BMP.</para>

    <para>Unfortunately, Unicode was invented about two decades too late
    for Unix and C. Those decades of legacy created an immense inertia
    preventing a widespread move away from 8-bit characters. MySQL and
    C++ come out of these older traditions, and so they share the same
    practical limitations. MySQL++ doesn&#x2019;t have any code in it
    for Unicode conversions, and it likely never will; it just passes
    data along unchanged from the underlying MySQL C API, so you still
    need to be aware of these underlying issues.</para>

    <para>During the development of the <ulink
    url="http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs">Plan
    9</ulink> operating system (a kind of successor to Unix) Ken
    Thompson <ulink
    url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink>
    the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8
................................................................................
    they have no explicit support for UTF-8. (Follow the last link above
    to see how the design of UTF-8 allows this.) Thus, when explicit
    support for Unicode was added in MySQL v4.1, they chose to make
    UTF-8 the native encoding, to preserve backward compatibility with
    programs that had no Unicode support.</para>
  </sect2>


  <sect2 id="unicode-mysql">
    <title>Unicode in MySQL</title>

    <para>Since MySQL comes out of the Unix world, and it predates the
    widespread use of UTF-8 in Unix, it started out not supporting
    Unicode at all. You could store raw UTF-8 strings in old versions of
    MySQL, but it wouldn&#x2019;t know how to do things like sort a
    column of UTF-8 strings.</para>

    <para>MySQL 4.1 added the first true support for Unicode. This
    version of MySQL supported only the BMP, meaning that if you told it
    to expect strings to be in UTF-8, it could only use up to 3 bytes
    per character.</para>

    <para>MySQL 5.5 was the first release to completely support Unicode.
    Because the BMP-only Unicode support had been in the wild for about
    6 yeras by that point, and changing to the new character set
    requires a table rebuild, the new one was called
    &#x201C;utf8mb4&#x201D; rather than change the longstanding meaning
    of &#x201C;utf8&#x201D; in MySQL. This release also added a new
    alias for the old UTF-8 subset character set,
    &#x201C;utf8mb3.&#x201D;</para>

    <para>Finally, in MySQL 8.0, &#x201C;utf8mb4&#x201D; became the
    default character set. For backwards compatibility,
    &#x201C;utf8&#x201D; remains an alias for
    &#x201C;utf8mb3.&#x201D;</para>

    <para>As of MySQL++ 3.2.4, we&#x2019;ve defined the
    <varname>MYSQLPP_UTF8_CS</varname> macro which expands to
    &#x201C;utf8mb4&#x201D; when you build it against MySQL 5.5 and
    newer and to &#x201C;utf8&#x201D; otherwise. We use this macro in
    our <filename>resetdb</filename> example.</para>
  </sect2>


  <sect2 id="unicode-unix">
    <title>Unicode on Unixy Systems</title>

    <para>Linux and Unix have system-wide UTF-8 support these days. If
    your operating system is of 2001 or newer vintage, it probably has
    such support.</para>

Changes to examples/resetdb.cpp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
...
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
/***********************************************************************
 resetdb.cpp - (Re)initializes the example database, mysql_cpp_data.
	You must run this at least once before running most of the other
	examples, and it is helpful sometimes to run it again, as some of
	the examples modify the table in this database.

 Copyright (c) 1998 by Kevin Atkinson, (c) 1999-2001 by MySQL AB, and
 (c) 2004-2009 by Educational Technology Resources, Inc.  Others may
 also hold copyrights on code in this file.  See the CREDITS file in
 the top directory of the distribution for details.

 This file is part of MySQL++.

 MySQL++ is free software; you can redistribute it and/or modify it
 under the terms of the GNU Lesser General Public License as published
 by the Free Software Foundation; either version 2.1 of the License, or
 (at your option) any later version.
................................................................................
				"  item CHAR(30) NOT NULL, " <<
				"  num BIGINT NOT NULL, " <<
				"  weight DOUBLE NOT NULL, " <<
				"  price DECIMAL(6,2) NULL, " << // NaN & inf. == NULL
				"  sdate DATE NOT NULL, " <<
				"  description MEDIUMTEXT NULL) " <<
				"ENGINE = InnoDB " <<
				"CHARACTER SET utf8 COLLATE utf8_general_ci";
		query.execute();

		// Set up the template query to insert the data.  The parse()
		// call tells the query object that this is a template and
		// not a literal query string.
		query << "insert into %6:table values " <<
				"(%0q, %1q, %2, %3, %4q, %5q:desc)";






|
|
|
|







 







|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
...
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
/***********************************************************************
 resetdb.cpp - (Re)initializes the example database, mysql_cpp_data.
	You must run this at least once before running most of the other
	examples, and it is helpful sometimes to run it again, as some of
	the examples modify the table in this database.

 Copyright © 1998 by Kevin Atkinson, © 1999-2001 by MySQL AB, and
 © 2004-2009, 2018 by Educational Technology Resources, Inc.  Others may
 also hold copyrights on code in this file.  See the CREDITS file in the
 top directory of the distribution for details.

 This file is part of MySQL++.

 MySQL++ is free software; you can redistribute it and/or modify it
 under the terms of the GNU Lesser General Public License as published
 by the Free Software Foundation; either version 2.1 of the License, or
 (at your option) any later version.
................................................................................
				"  item CHAR(30) NOT NULL, " <<
				"  num BIGINT NOT NULL, " <<
				"  weight DOUBLE NOT NULL, " <<
				"  price DECIMAL(6,2) NULL, " << // NaN & inf. == NULL
				"  sdate DATE NOT NULL, " <<
				"  description MEDIUMTEXT NULL) " <<
				"ENGINE = InnoDB " <<
				"CHARACTER SET " MYSQLPP_UTF8_CS " COLLATE utf8_general_ci";
		query.execute();

		// Set up the template query to insert the data.  The parse()
		// call tells the query object that this is a template and
		// not a literal query string.
		query << "insert into %6:table values " <<
				"(%0q, %1q, %2, %3, %4q, %5q:desc)";

Changes to lib/common.h.

214
215
216
217
218
219
220
221







222
// while actually working with C++.  This is why we disobey the MySQL
// developer docs, which recommend including my_global.h before mysql.h.
#if defined(MYSQLPP_MYSQL_HEADERS_BURIED)
#	include <mysql/mysql.h>
#else
#	include <mysql.h>
#endif








#endif // !defined(MYSQLPP_COMMON_H)








>
>
>
>
>
>
>

214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
// while actually working with C++.  This is why we disobey the MySQL
// developer docs, which recommend including my_global.h before mysql.h.
#if defined(MYSQLPP_MYSQL_HEADERS_BURIED)
#	include <mysql/mysql.h>
#else
#	include <mysql.h>
#endif

// The Unicode chapter of the user manual justifies the following.
#if MYSQL_VERSION_ID >= 50500
#   define MYSQLPP_UTF8_CS "utf8mb4"
#else
#   define MYSQLPP_UTF8_CS "utf8"
#endif

#endif // !defined(MYSQLPP_COMMON_H)