Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Modified the "most economically valuable" stuff in the userman's Unicode chapter to handle the "except for emoji" case. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
0bd33dc4fce31306a6af9c11415241c8 |
User & Date: | tangent 2018-07-27 04:45:20.243 |
Context
2018-07-27
| ||
05:00 | Squished Clang complaint in pedantic builds about beemutex's pmutex_ private member being unused when thread-awareness is not enabled. check-in: a014eece1d user: tangent tags: trunk | |
04:45 | Modified the "most economically valuable" stuff in the userman's Unicode chapter to handle the "except for emoji" case. check-in: 0bd33dc4fc user: tangent tags: trunk | |
04:39 | Polishing pass on the new Unicode material in the user manual. check-in: 8469cf623d user: tangent tags: trunk | |
Changes
Changes to doc/userman/unicode.dbx.
︙ | ︙ | |||
26 27 28 29 30 31 32 | common 7-bit ASCII subset. Either people used approximations like a plain “c” instead of the French “ç”, or they invented things like HTML entities (“&ccedil;” in this case) to encode these additional characters using only 7-bit ASCII.</para> <para>Unicode solves this problem. It encodes every character used | | | > > | < < | < | 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | common 7-bit ASCII subset. Either people used approximations like a plain “c” instead of the French “ç”, or they invented things like HTML entities (“&ccedil;” in this case) to encode these additional characters using only 7-bit ASCII.</para> <para>Unicode solves this problem. It encodes every character used for writing in the world, using up to 4 bytes per character. Before emoji became popular, the subset covering the most economically valuable cases fit into the lower 65536 code points, so you could encode most texts using only two bytes per character. Many nominally Unicode-aware programs only support this subset, called the Basic Multilingual Plane, or BMP.</para> <para>Unfortunately, Unicode was invented about two decades too late for Unix and C. Those decades of legacy created an immense inertia preventing a widespread move away from 8-bit characters. MySQL and C++ come out of these older traditions, and so they share the same practical limitations. MySQL++ doesn’t have any code in it for Unicode conversions, and it likely never will; it just passes |
︙ | ︙ | |||
139 140 141 142 143 144 145 | in two versions. One version supports only 1-byte “ANSI” characters (a superset of ASCII), so they end in 'A'. Windows also supports the 2-byte subset of Unicode called <ulink url="http://en.wikipedia.org/wiki/UCS-2">UCS-2</ulink><footnote><para>Since Windows XP, Windows actually uses the <ulink url="http://en.wikipedia.org/wiki/UTF-16">UTF-16</ulink> encoding, | | | | | > | | | | 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | in two versions. One version supports only 1-byte “ANSI” characters (a superset of ASCII), so they end in 'A'. Windows also supports the 2-byte subset of Unicode called <ulink url="http://en.wikipedia.org/wiki/UCS-2">UCS-2</ulink><footnote><para>Since Windows XP, Windows actually uses the <ulink url="http://en.wikipedia.org/wiki/UTF-16">UTF-16</ulink> encoding, not UCS-2. This means that if you use characters beyond the 16-bit BMP range, they get encoded as 4-byte characters. But again, since the most economically valuable subset of Unicode is the BMP if you ignore emoji, many programs ignore this distinction and assume Unicode strings on Windows are always 2 bytes per character.</para></footnote>. Some call these “wide” characters, so the other set of functions end in 'W'. The <function><ulink url="http://msdn.microsoft.com/library/en-us/winui/winui/windowsuserinterface/windowing/dialogboxes/dialogboxreference/dialogboxfunctions/messagebox.asp">MessageBox</ulink>()</function> API, for instance, is actually a macro, not a real function. If you define the <symbol>UNICODE</symbol> macro when building your program, the <function>MessageBox()</function> macro evaluates to <function>MessageBoxW()</function>; otherwise, to <function>MessageBoxA()</function>.</para> |
︙ | ︙ |