MySQL++

Check-in [8469cf623d]
Login

Check-in [8469cf623d]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Polishing pass on the new Unicode material in the user manual.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 8469cf623de79670cb0e76a1e5f95bf50bb8b3049835dcf7cb3bbd3c02549b48
User & Date: tangent 2018-07-27 04:39:12.031
Context
2018-07-27
04:45
Modified the "most economically valuable" stuff in the userman's Unicode chapter to handle the "except for emoji" case. check-in: 0bd33dc4fc user: tangent tags: trunk
04:39
Polishing pass on the new Unicode material in the user manual. check-in: 8469cf623d user: tangent tags: trunk
04:34
Updated user manual Docbook version from 4.2 to 4.4, effectively dropping CentOS 3 and 4 as build platforms, since the current Homebrew Docbook XSL stylesheets throw lots of errors if you specify 4.2 or 4.3. Updated the user manual's README.txt file accordingly. check-in: 3b3678d64e user: tangent tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/userman/unicode.dbx.
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
    Thompson <ulink
    url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink>
    the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8
    encoding</ulink>. UTF-8 is a superset of 7-bit ASCII and is
    compatible with C strings, since it doesn&#x2019;t use 0 bytes
    anywhere as multi-byte Unicode encodings do. As a result, many
    programs that deal in text will cope with UTF-8 data even though
    they have no explicit support for UTF-8. (Follow the last link above
    to see how the design of UTF-8 allows this.) Thus, when explicit
    support for Unicode was added in MySQL v4.1, they chose to make
    UTF-8 the native encoding, to preserve backward compatibility with
    programs that had no Unicode support.</para>
  </sect2>


  <sect2 id="unicode-mysql">
    <title>Unicode in MySQL</title>

    <para>Since MySQL comes out of the Unix world, and it predates the
    widespread use of UTF-8 in Unix, it started out not supporting
    Unicode at all. You could store raw UTF-8 strings in old versions of
    MySQL, but it wouldn&#x2019;t know how to do things like sort a
    column of UTF-8 strings.</para>

    <para>MySQL 4.1 added the first true support for Unicode. This
    version of MySQL supported only the BMP, meaning that if you told it
    to expect strings to be in UTF-8, it could only use up to 3 bytes
    per character.</para>

    <para>MySQL 5.5 was the first release to completely support Unicode.
    Because the BMP-only Unicode support had been in the wild for about
    6 yeras by that point, and changing to the new character set
    requires a table rebuild, the new one was called
    &#x201C;utf8mb4&#x201D; rather than change the longstanding meaning
    of &#x201C;utf8&#x201D; in MySQL. This release also added a new
    alias for the old UTF-8 subset character set,
    &#x201C;utf8mb3.&#x201D;</para>

    <para>Finally, in MySQL 8.0, &#x201C;utf8mb4&#x201D; became the
    default character set. For backwards compatibility,
    &#x201C;utf8&#x201D; remains an alias for
    &#x201C;utf8mb3.&#x201D;</para>

    <para>As of MySQL++ 3.2.4, we&#x2019;ve defined the
    <varname>MYSQLPP_UTF8_CS</varname> and
    <varname>MYSQLPP_UTF8_COL</varname> macros which expand to
    &#x201C;utf8mb4&#x201D; and &#x201C;utf8mb4_general_ci&#x201D; when
    you build MySQL++ against MySQL 5.5 and newer and to
    &#x201C;utf8&#x201D; and &#x201C;utf8_general_ci&#x201D; otherwise.
    We use these macros in our <filename>resetdb</filename>
    example.</para>
  </sect2>


  <sect2 id="unicode-unix">
    <title>Unicode on Unixy Systems</title>

    <para>Linux and Unix have system-wide UTF-8 support these days. If







|
|
<
<
<







|
|
|
|

|






|

















|
|







53
54
55
56
57
58
59
60
61



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
    Thompson <ulink
    url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink>
    the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8
    encoding</ulink>. UTF-8 is a superset of 7-bit ASCII and is
    compatible with C strings, since it doesn&#x2019;t use 0 bytes
    anywhere as multi-byte Unicode encodings do. As a result, many
    programs that deal in text will cope with UTF-8 data even though
    they have no explicit support for UTF-8. Follow the last link above
    to see how the design of UTF-8 allows this.</para>



  </sect2>


  <sect2 id="unicode-mysql">
    <title>Unicode in MySQL</title>

    <para>Since MySQL comes out of the Unix world, and it predates the
    widespread use of UTF-8 in Unix, the early versinos of MySQL had no
    explicit support for Unicode. From the start, you could store raw
    UTF-8 strings, but it wouldn&#x2019;t know how to do things like
    sort a column of UTF-8 strings.</para>

    <para>MySQL 4.1 added the first explicit support for Unicode. This
    version of MySQL supported only the BMP, meaning that if you told it
    to expect strings to be in UTF-8, it could only use up to 3 bytes
    per character.</para>

    <para>MySQL 5.5 was the first release to completely support Unicode.
    Because the BMP-only Unicode support had been in the wild for about
    6 years by that point, and changing to the new character set
    requires a table rebuild, the new one was called
    &#x201C;utf8mb4&#x201D; rather than change the longstanding meaning
    of &#x201C;utf8&#x201D; in MySQL. This release also added a new
    alias for the old UTF-8 subset character set,
    &#x201C;utf8mb3.&#x201D;</para>

    <para>Finally, in MySQL 8.0, &#x201C;utf8mb4&#x201D; became the
    default character set. For backwards compatibility,
    &#x201C;utf8&#x201D; remains an alias for
    &#x201C;utf8mb3.&#x201D;</para>

    <para>As of MySQL++ 3.2.4, we&#x2019;ve defined the
    <varname>MYSQLPP_UTF8_CS</varname> and
    <varname>MYSQLPP_UTF8_COL</varname> macros which expand to
    &#x201C;utf8mb4&#x201D; and &#x201C;utf8mb4_general_ci&#x201D; when
    you build MySQL++ against MySQL 5.5 and newer and to
    &#x201C;utf8&#x201D; and &#x201C;utf8_general_ci&#x201D; otherwise.
    We use these macros in our <filename>resetdb</filename> example;
    you're welcome to use them in your code as well.</para>
  </sect2>


  <sect2 id="unicode-unix">
    <title>Unicode on Unixy Systems</title>

    <para>Linux and Unix have system-wide UTF-8 support these days. If