Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Polishing pass on the new Unicode material in the user manual. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
8469cf623de79670cb0e76a1e5f95bf5 |
User & Date: | tangent 2018-07-27 04:39:12.031 |
Context
2018-07-27
| ||
04:45 | Modified the "most economically valuable" stuff in the userman's Unicode chapter to handle the "except for emoji" case. check-in: 0bd33dc4fc user: tangent tags: trunk | |
04:39 | Polishing pass on the new Unicode material in the user manual. check-in: 8469cf623d user: tangent tags: trunk | |
04:34 | Updated user manual Docbook version from 4.2 to 4.4, effectively dropping CentOS 3 and 4 as build platforms, since the current Homebrew Docbook XSL stylesheets throw lots of errors if you specify 4.2 or 4.3. Updated the user manual's README.txt file accordingly. check-in: 3b3678d64e user: tangent tags: trunk | |
Changes
Changes to doc/userman/unicode.dbx.
︙ | ︙ | |||
53 54 55 56 57 58 59 | Thompson <ulink url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink> the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8 encoding</ulink>. UTF-8 is a superset of 7-bit ASCII and is compatible with C strings, since it doesn’t use 0 bytes anywhere as multi-byte Unicode encodings do. As a result, many programs that deal in text will cope with UTF-8 data even though | | | < < < | | | | | | | | | 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | Thompson <ulink url="http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt">invented</ulink> the <ulink url="http://en.wikipedia.org/wiki/UTF-8">UTF-8 encoding</ulink>. UTF-8 is a superset of 7-bit ASCII and is compatible with C strings, since it doesn’t use 0 bytes anywhere as multi-byte Unicode encodings do. As a result, many programs that deal in text will cope with UTF-8 data even though they have no explicit support for UTF-8. Follow the last link above to see how the design of UTF-8 allows this.</para> </sect2> <sect2 id="unicode-mysql"> <title>Unicode in MySQL</title> <para>Since MySQL comes out of the Unix world, and it predates the widespread use of UTF-8 in Unix, the early versinos of MySQL had no explicit support for Unicode. From the start, you could store raw UTF-8 strings, but it wouldn’t know how to do things like sort a column of UTF-8 strings.</para> <para>MySQL 4.1 added the first explicit support for Unicode. This version of MySQL supported only the BMP, meaning that if you told it to expect strings to be in UTF-8, it could only use up to 3 bytes per character.</para> <para>MySQL 5.5 was the first release to completely support Unicode. Because the BMP-only Unicode support had been in the wild for about 6 years by that point, and changing to the new character set requires a table rebuild, the new one was called “utf8mb4” rather than change the longstanding meaning of “utf8” in MySQL. This release also added a new alias for the old UTF-8 subset character set, “utf8mb3.”</para> <para>Finally, in MySQL 8.0, “utf8mb4” became the default character set. For backwards compatibility, “utf8” remains an alias for “utf8mb3.”</para> <para>As of MySQL++ 3.2.4, we’ve defined the <varname>MYSQLPP_UTF8_CS</varname> and <varname>MYSQLPP_UTF8_COL</varname> macros which expand to “utf8mb4” and “utf8mb4_general_ci” when you build MySQL++ against MySQL 5.5 and newer and to “utf8” and “utf8_general_ci” otherwise. We use these macros in our <filename>resetdb</filename> example; you're welcome to use them in your code as well.</para> </sect2> <sect2 id="unicode-unix"> <title>Unicode on Unixy Systems</title> <para>Linux and Unix have system-wide UTF-8 support these days. If |
︙ | ︙ |