MySQL++

Check-in [8469cf623d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Polishing pass on the new Unicode material in the user manual.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 8469cf623de79670cb0e76a1e5f95bf50bb8b3049835dcf7cb3bbd3c02549b48
User & Date: tangent 2018-07-27 04:39:12
Context
2018-07-27
04:45
Modified the "most economically valuable" stuff in the userman's Unicode chapter to handle the "except for emoji" case. check-in: 0bd33dc4fc user: tangent tags: trunk
04:39
Polishing pass on the new Unicode material in the user manual. check-in: 8469cf623d user: tangent tags: trunk
04:34
Updated user manual Docbook version from 4.2 to 4.4, effectively dropping CentOS 3 and 4 as build platforms, since the current Homebrew Docbook XSL stylesheets throw lots of errors if you specify 4.2 or 4.3. Updated the user manual's README.txt file accordingly. check-in: 3b3678d64e user: tangent tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to doc/userman/unicode.dbx.

    53     53       Thompson <ulink
    54     54       url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink>
    55     55       the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8
    56     56       encoding</ulink>. UTF-8 is a superset of 7-bit ASCII and is
    57     57       compatible with C strings, since it doesn&#x2019;t use 0 bytes
    58     58       anywhere as multi-byte Unicode encodings do. As a result, many
    59     59       programs that deal in text will cope with UTF-8 data even though
    60         -    they have no explicit support for UTF-8. (Follow the last link above
    61         -    to see how the design of UTF-8 allows this.) Thus, when explicit
    62         -    support for Unicode was added in MySQL v4.1, they chose to make
    63         -    UTF-8 the native encoding, to preserve backward compatibility with
    64         -    programs that had no Unicode support.</para>
           60  +    they have no explicit support for UTF-8. Follow the last link above
           61  +    to see how the design of UTF-8 allows this.</para>
    65     62     </sect2>
    66     63   
    67     64   
    68     65     <sect2 id="unicode-mysql">
    69     66       <title>Unicode in MySQL</title>
    70     67   
    71     68       <para>Since MySQL comes out of the Unix world, and it predates the
    72         -    widespread use of UTF-8 in Unix, it started out not supporting
    73         -    Unicode at all. You could store raw UTF-8 strings in old versions of
    74         -    MySQL, but it wouldn&#x2019;t know how to do things like sort a
    75         -    column of UTF-8 strings.</para>
           69  +    widespread use of UTF-8 in Unix, the early versinos of MySQL had no
           70  +    explicit support for Unicode. From the start, you could store raw
           71  +    UTF-8 strings, but it wouldn&#x2019;t know how to do things like
           72  +    sort a column of UTF-8 strings.</para>
    76     73   
    77         -    <para>MySQL 4.1 added the first true support for Unicode. This
           74  +    <para>MySQL 4.1 added the first explicit support for Unicode. This
    78     75       version of MySQL supported only the BMP, meaning that if you told it
    79     76       to expect strings to be in UTF-8, it could only use up to 3 bytes
    80     77       per character.</para>
    81     78   
    82     79       <para>MySQL 5.5 was the first release to completely support Unicode.
    83     80       Because the BMP-only Unicode support had been in the wild for about
    84         -    6 yeras by that point, and changing to the new character set
           81  +    6 years by that point, and changing to the new character set
    85     82       requires a table rebuild, the new one was called
    86     83       &#x201C;utf8mb4&#x201D; rather than change the longstanding meaning
    87     84       of &#x201C;utf8&#x201D; in MySQL. This release also added a new
    88     85       alias for the old UTF-8 subset character set,
    89     86       &#x201C;utf8mb3.&#x201D;</para>
    90     87   
    91     88       <para>Finally, in MySQL 8.0, &#x201C;utf8mb4&#x201D; became the
................................................................................
    95     92   
    96     93       <para>As of MySQL++ 3.2.4, we&#x2019;ve defined the
    97     94       <varname>MYSQLPP_UTF8_CS</varname> and
    98     95       <varname>MYSQLPP_UTF8_COL</varname> macros which expand to
    99     96       &#x201C;utf8mb4&#x201D; and &#x201C;utf8mb4_general_ci&#x201D; when
   100     97       you build MySQL++ against MySQL 5.5 and newer and to
   101     98       &#x201C;utf8&#x201D; and &#x201C;utf8_general_ci&#x201D; otherwise.
   102         -    We use these macros in our <filename>resetdb</filename>
   103         -    example.</para>
           99  +    We use these macros in our <filename>resetdb</filename> example;
          100  +    you're welcome to use them in your code as well.</para>
   104    101     </sect2>
   105    102   
   106    103   
   107    104     <sect2 id="unicode-unix">
   108    105       <title>Unicode on Unixy Systems</title>
   109    106   
   110    107       <para>Linux and Unix have system-wide UTF-8 support these days. If