Files in directory r47/rejig from the latest check-in of branch trunk

.sccignore
.vscode
cmd
data
decnumber
input
mode
output
program
tests
tools
ChangeLog.md
go.mod
go.sum
Makefile
README.md
Taskfile.yaml
TODO.md

rejig: Translate RPN Programs Between Formats

Motivation

The R47 offers two different ways to save a single program out to its USB flash storage:

🟦 I/O WRITEP produces a text file containing a sequence of internal R47 byte codes.
🟦 I/O XPORTP produces a pretty-printed RTF version that relies on the user to have the C47 custom font installed.

The first is technically UTF-8 plaintext, but it isn’t human-readable, whereas the second is human-readable but only via a word processor. Even then, it’s functionally read-only because the R47 offers no corresponding “IMPORTP” function that will parse an edited RTF file back into byte codes.

My rejig tool rejiggers R47 program files between any combination of:

P47 input: This is the WRITEP format referenced above, using the *.p47 file name extension. It is the default input format, which you may specify redundantly with the --from/-f p47 option.
P47 output: The tool can write out a file using that same format. Combined in this way, rejig acts as a byte-code verifier: if you get out the same file content as you put in, the input is a valid R47 program. Select this with the --to/-t p47 option.
UTF-8 output: This is the default output format, translating the input into pretty-printed Unicode plaintext. It accepts either -t utf8 or -t p47u as an alias, after the tentative *.p47u file extension.¹ We prefer -t utf8 because it is more visually distinct when combined with -f p47 on command lines.
UTF-8 input: The -f utf8 option — a.k.a. -f p47u — effectively reverses the prior mode. Its primary use case is paired with -t p47, allowing you to write an R47 program on your big development machine using your favorite text editor or IDE, then “assemble” that UTF-8 text file back into a *.p47 file that the R47 can read.
DM32 input/output: The --from/to dm32 options reads/write SwissMicros DM32 statefiles, *.d32. See below for details.
HP-15C input/output: The --from/to hp15c options use Torsten Manz’ HP-15C Simulator as a partial lingua franca owing to its broad compatibility. We also have a parser for the RLM-15CX simulator via --from/to rlm15cx. See below for details.
HP-11C input/output: The --from/to hp11c options use the RLM-11CX simulator format for lack of a better lingua franca for the HP-11C. The proper name for this facility is therefore --from/to rlm11cx in the hope that I might in the future learn of a more widespread format which better deserves the generic name. See below for details.

Broadly, rejig adds the virtues of the Unix Way to the R47 world. It takes inspiration from Pandoc, but without nearly as much ambition behind it.

Examples

My triangle solver is available here in four different forms:

The hand-annotated version at the top of the linked article.
The *.p47 version produced on an R47 via WRITEP.
The RTF version produced on an R47 via XPORTP.
The Unicode pretty-printed version produced by rejig in its default operating mode: -f p47 -t utf8

Comparing them should prove instructive.

For more examples, see the parent directory, where most of *.p47 have a corresponding *.p47u file.

Downloads

Pre-built static binaries are available for download here for Windows, macOS, and Linux, for both 64-bit ARM and Intel CPUs.

The installation process is:

Unpack the archive.
Copy the binary to the destination folder.
There is no step 3. You’re done already.

You may wish to consult the change log to see what’s new.

If those binaries do not appeal, your best bet is the…

Source Code

rejig is written in Go and hosted here.

You may clone it and everything else on this section of my site via Fossil.

On a macOS or Universal Blue² host:

brew install fossil go go-task
fossil clone https://tangentsoft.com/rpn/
cd rpn/r47/rejig
make

That should get you a rejig binary in the current working directory, built from the current source code thanks to the magic of Task and go build.

See the Fossil quick start guide for more information on using Fossil, including updating to current versions and more. For an amuze-bouche, try this:

fossil ui

You should now be seeing a copy of this very website in your default web browser. Cool, yeah?

Implied Formats

By default, the program operates as if you gave -f p47 -t utf8, but the command line parser consults those defaults only when no file names are given. Otherwise, it checks the file name extensions for the known file format specifiers accepted in --from/to options, and if there is a match, it uses it. The following commands are therefore equivalent:

$ rejig - < input.p47 > output.p47u
$ rejig input.p47 -o output.p47u

The first works because these are the default operating mode formats, while the second actually overwrites those defaults with the same value from the extensions, redundantly giving the default behavior.

Where this becomes more useful is when doing non-default things:

$ rejig input.utf8 -o output.p47
$ rejig --from p47u input.utf8 --to p47 -o output.p47

Those do the same thing, but the second is needlessly verbose.

This behavior makes --from/to options largely redundant when using files, since you should be naming them in accord with the conventions this program supports regardless.

However, these options may be necessary when the first input is given as - to mean “read from stdin” or when writing output to the default location — stdout — where file name extension guessing cannot work. For example:

$ rejig -f utf8 -t p47 - < input.p47u > output.p47

This is the overly-verbose example above recast using pipes, making the -f/t options fully justified because they override defaults we do not want to accept in this instance.

Command Option Flags

rejig accepts these option flags:

--fix-obsolete: When parsing input, accept [obsolete] ops instead of failing.
--line-numbers: In UTF-8 output, include line numbers like the R47 uses for RTF outputs and in PEM.
--output/o: Write output to a named file. You may give - here to redundantly tell rejig to write to stdout, but because that is the default behavior, the option is dropped.
--verbose: Show debugging output.
--version/v: Show the program’s version number.

The defaults overridden by --line-numbers and --verbose were chosen because rejig focuses on writing outputs that allow lossless round-tripping, enforced by a test that runs frequently during development. It therefore elides metadata such as program byte counts, line numbers, and total line counts unless you ask for them.

Operating Modes

Depending on which options and file arguments you give, rejig acts as…

…a pretty-printer for all “source code” type --to flag formats; programs become more readable not only through sensible style choices, but also by having idiosyncrasies canonicalized
…a verifier when given the same --from format as the --to; for byte code formats, if the program passes without complaint, the destination calculator should accept it; for source formats, a successful verification pass means the program is well-formed, conforming to the known instruction processing rules; this pairing is useful for verifying that rejig is functionally lossless.³
…an assembler when given a source code type --from flag and a byte code format as --to, transforming symbolic op codes into the byte code the calculator can run
…a disassembler when given a byte code format as --from and a source code format as --to
…a transpiler when given different source code formats for --from and --to, either lowering a program written in a high-level format to a lesser target, or the reverse; raising is more reliable than lowering, but both directions can be lossy due to incompletely overlapping feature sets

This all extends to implied formats, of course.

A Matter of Style

rejig -t utf8 implements my preferred RPN programming style, with no option for customization short of changing the code.

There are plans to change this, to some extent, but in the meantime, here is what rejig does:

Identifier Names

The R47 programming language was not intended to be parsed, as such, but to support its round-tripping features, rejig makes several changes to the operation names, system flags, and so forth:

spaces: To avoid a need for quoted identifiers — which brings on a whole pile of other trouble — rejig either removes spaces or replaces them with hyphens. A good example of this is the “a b/c” function on the calculator for entering fraction mode, which rejig spells a-b/c. I do realize that this is ambiguous with respect to subtraction, but it is a common way of expressing a mixed fraction, as with “1-⅔” in US cookbooks.

The biggest single case you need to be aware of is with the numeric tests. What the R47 styles “x≠ ?” is spelled without the space in rejig.
question mark: The R47 uses a trailing question mark in operation names for three incompatible purposes:
- tests take the calculator state — often the X value alone — and check some property of that state. For instance, LINT? determines whether X is a long integer. A “false” result causes program execution to skip the next instruction, implementing the classic HP RPN do-if-true logic.
- getters like BATT? put their result into X without otherwise affecting further program execution
- screens like VERS? and WHO? do neither; they simply put up a screenful of information that goes away on the next keystroke.
This ambiguity is inconvenient within the rejig code, where we would like to use a trailing question mark to determine whether to indent the next line to mark it as affected by do-if-true logic. rejig renames all the others with a leading “get” prefix and drops the trailing question mark.

A good second reason to rename the getters is to enable lossless round-tripping. For instance, the R47 defines two different “FRACT” operations, one to get that system flag’s current value and the other to set it. This is fine on the calculator where it can use the path you took to the menu item to disambiguate the cases. It is even fine in the WRITEP output, where differing raw op codes distinguish the cases. We finally run into trouble in the UTF-8 → UTF-8 automatic reformatting case, where rejig would otherwise collapse both cases down to the same op code.

If you wish to write R47 programs using rejig as a byte-code compiler, you will have to be aware of these name changes. There is presently no documentation for all the changes other than what you find in the source code, starting about ⅔ the way down in the ops file, in the internalOpInfo map.

Comments

rejig recognizes three styles of comments:

# shell script
; assembly
// C family

When one of those are encountered outside quotes, everything from that point to the end of the line is ignored. The intent is that the rejig format be as useful as a programmer-to-programmer communication medium as a way to get programs from your computer into an R47.

One of the first things you are likely to notice on studying my triangle solver is all the explanatory comments, one on nearly every line. I’ve always been a big fan of documenting code, and I find comments especially helpful with terse languages like the one backing the R47.

Take the comments showing stack register movements: they aid the reader in understanding the program by documenting how the data moves on the stack at each step. As currently written, these comments assume the simpler SSIZE4 mode even though the program was tested with SSIZE8 mode; my choice was to pick one for clarity or document both options, muddying the presentation. Since the only material difference is that the interactions involving T affect the R47’s D register instead, I took this simpler tack.

Indenting

rejig takes inspiration from go fmt: both implement a common formatting scheme for their respective ecosystems, doing away with local idiosyncrasies. One may quibble with the choices made, but when the rules are fixed and automatically enforced, developer focus tends to shift to substantive matters.

The rejig scheme is simple: it adds a 2-space indent for each LBL after the first and subtracts a level for each RTN. The op following a do-if-true op gets an extra indent level. An END op brings the indent level back to 1, and .END. zeroes it. Easy.

The sole exception results from a combination of the above: a do-if-true RTN does not reduce the indent level because it is conditionally the end of the subroutine.

The fact that does not always produce neat hierarchies of subroutine calls reflects the unstructured nature of RPN programming. If you find yourself questioning the formatting, first ask whether it is a fair expression of the ops as given. Surprising indent levels might be telling you something important about how the program actually works, which you might wish to address.

That is my experience, at any rate. I find that my scheme makes far more sense than the one built into the R47’s PEM mode and its RTF outputs.

Blank Lines

Doubtless because of the limited screen real estate on calculators, even biggish ones like the R47, it is traditional in RPN to write one line after the next, all the way through, without any blank lines.

Yet, when writing the program out as described above, I do believe we can accept adding one blank line above each LBL for readability. This is another reason to suppress line numbers by default in UTF-8 exports.

rejig adds these when pretty-printing and strips them back out when reading text-form programs back in.

If a LBL is preceded by a REM, the blank line is placed before the REM instead, under the assumption that it is commenting on the following subroutine. This does not yet work for a block of multiple REM statements.

Automatic Reformatting

Above I brought up the example of the Go programming language’s automatic formatting. In case it is not clear, my intent in allowing option combos like

rejig -f utf8 -t utf8 < a > b

…is to provide a similar facility, canonicalizing the input in as lossless a manner as may be hoped for.

Compatibility Breakages

Nearly all cases where rejig will not read a *.p47 file result from lack of a feature, which is why the version number begins with “0.” — see SemVer rule 4.

However, there are certain cases worth pointing out as likely traps:

Obsolete Operation Names

When rejig encounters an obsolete op, it complains and stops input processing immediately unless you give the --fix-obsolete option. It does so for the same reason that attempting to run a program with one of these ops on an R47 raises the error:

Function has changed, please replace.

There are two common situations that cause an op to be obsoleted like this:

A new one was added with the old name, indicating that the new one is not strictly compatible, requiring the programmer to evaluate the change and decide how to cope.
The op was removed entirely, requiring that the programmer find a new way to get the desired effect.

Ops are never⁴ removed outright because they need to retain their op code index for compatibility with existing programs.

These conditions make the --fix-obsolete option risky. All it does is strip the angle brackets⁵ and retry.

Number Handling

The R47 has an extremely powerful set of numeric data types, both real and integer. In the main, the R47 preserves these by encoding them in “string” form, not one of the internal binary formats. This in turn allows regig to store and regurgitate the byte codes faithfully without actually needing to process the data in their native forms.

The sole “binary” number format rejig currently supports is for what the R47 calls “short” integers: unsigned 64-bit integers tagged with a base value. When the calculator shows you “7F#16”, it is storing the decimal value 127 internally as a short integer, tagged for hex display. The R47 can store these in either string or binary form. It uses binary for positive numbers when the string form would take more memory, and it uses string form when the number is negative — to preserve the sign irrespective of the current number display mode — or when doing this saves space over the binary form.

The code uses binary formats in a few other cases, but rejig currently includes no code for interpreting them. If and when we do encounter a *.p47 file containing one of these, it will be likely that the first versions supporting it will be lossy. The only way to avoid that would be to bake the same decNumber library into rejig for this sole purpose.

DM32 Limitations

The DM32 support in rejig is implemented as a subset of its R47 support. Reading a DM32 statefile translates it into R47 ops, and lowering it reverses this. Because the R47 way of doing things isn’t exactly the same as how the HP-32SII did it — as emulated in the DM32 — and the R47 isn’t a 100% superset regardless, these conversions are necessarily lossy.

Output

When reading --from another type of file format than dm32 — or equivalently, giving as input a file named other than *.d32 — it is important to limit yourself to functions the DM32 supports. There are very few cases where rejig will emit RPN code to implement missing ops. The current sole exception is the R47 DROP𝑥 op, which is common enough that it is emulated in DM32 output as CL𝑥 plus addition, making use of the additive identity: the prior Y becomes the new X=Y+0.⁶

The primary limitations resulting from this stance are:

Operations: Any other R47 op that has no direct DM32 equivalent halts the conversion process. One example is that while the DM32 has the GRAD angle mode, there is no →GRAD angle value conversion op. While rejig could paper over simple cases like this, there are likely hundreds of them in total, and they’d all have to be backed out for the reverse conversion.
Literals: The DM32 does not have much of a type system; nowhere near what the R47 provides, at any rate:
- String literals are nonexistent as such on the DM32. There are a few cases where rejig will translate R47 string literals into a DM32 approximation, as with statefile comments to REM ops. Otherwise, these translations range from highly lossy to utterly hopeless.
- Date, time, and DMS angle literals lower from R47 typed form to the old HP decimal format, which is perfectly fine for a one-way conversion but means the round-trip cannot be lossless without adding heuristics to guess the intended type from context on raising back into the R47 form.
Complex numbers: The DM32 takes the real and imaginary parts of complex numbers from the stack in pairs, while the R47 has first-class complex number support, with each value maintained as a whole, taking a single spot on the stack. rejig is smart enough to translate R47 style complex numbers down into the DM32 scheme when it appears immediately before a CMPLX op: two such values before a binary op get translated into four real values. Anything else is likely to fail in an R47 → DM32 lowering.
Flags, variables, and registers: No local flags, named flags, or system flags; no global numbered registers; no local registers; no register shuffling; no indirection through registers. Variables are single capital letters only; rejig tries to take the first letter of an alpha variable when given, but it will map to the first available one if there is a clash or it starts with a non-ASCII character.
Program end: The DM32 format has a single program space and so does not make the END vs .END. distinction the R47 does. The first one seen in the input op stream produces a PGMEND in the output DM32 statefile, terminating the conversion.

Input

rejig supports every known operation type that can be written into a DM32 statefile, with one small class of known exceptions. The DM32 has several stats ops that operate on (𝑥,𝑦) points, and of those that operate on linear sequences of scalar values, it has separate versions for 𝑥 and 𝑦. The 𝑥 versions all exist in the R47, but there appear to be no R47 stats ops equivalent to these DM32 ones: 𝑦̅, s𝑦, σ𝑦

I hesitate to file an R47 feature request to add equivalents I can use to map these ops. Their purpose seems dubious to me, reflecting the internal design of the HP-32S and its successors rather than a genuine end-user need. If you want to get stats on a list of values, why would you put them into the calculator in (𝑥,𝑦) form unless that was the only method offered?

Beyond that, the primary area where one runs into trouble with the rejig DM32 input handling is where it is forced to do a translation:

It makes no attempt to reverse the DROP𝑥 conversion described above. The quibble noted above for our CL𝑥-then-add workaround is equally acceptable in other HP RPN calculators — including the R47 — to the point that this 2-op alternative might have been the programmer’s intent, making us reluctant to unconditionally translate every such pair seen to DROPx. The downside is that DROPx will not round-trip.
The heuristics rejig uses to reverse the CMPLX transformation described above are even more likely to fail on the input path, where it attempts to “raise” the op sequence in a DM32 statefile into R47 form. The R47 does not special-case the arithmetic operators for this as the DM32 does: where the DM32 requires you to issue a CMPLX+ to add two complex numbers, the R47 lets you add them, period.

The most reliable way to confuse rejig in this regard is to chain CMPLX arithmetic operations without intervening args. Since that is the normal way of doing things in RPN, it suffices to say that programs doing anything at all clever with complex numbers will likely require hand editing after conversion to properly express the original intent of the DM32 input.
Scalar values found in DM32 input must always be treated as real when raising them into the intermediate processing form. The HP-32S family doesn’t have integers, as such, but instead BCD-coded reals that might be considered integer-like while the fractional part of the mantissa happens to be zero. Because the R47 does have separate integer data types, faithfully maintaining this distinction requires rejig to modify inputs that would otherwise look like integers to it. We accommodate this difference in outlook within rejig by adding a decimal radix to all numbers that lack one. (Exception: The number uses “e” notation, which serves as an adequate clue.)

Round-tripping can therefore fail for a DM32 program that includes integer-like reals, but the solution is easy: save the pretty-printed version in place of the original so that the implied radix is made explicit.

Other translations to accommodate missing DM32 features should be lossless: full-line comments to REM and back, etc.

HP-15C Limitations

The primary HP-15C support in rejig depends on Torsten Manz’ HP-15C Simulator as an intermediary. You can either use his PC program directly to read/write programs for exchange with rejig or use its Devices feature as an interface to the USB connections on the HP-15C Collector’s Edition or SwissMicros’ DM15 family of machines. Its file format has also been adopted by the Jovial JRPN simulator. I wish there was a more nearly universal HP-15C program format, but I’ll gladly accept the limited upside: compatibility with four different calculators by implementing a single format.

Documentation is only distributed with the program itself,⁷ and that path varies based on the host OS. On macOS, it is here.

Because the support for this format in rejig shares many of the --from/to dm32 limitations — and for the same essential reasons — I will not repeat all that here. Instead, I will give the differences particular to this format.

Input

The following HP-15C features do not translate well at all:

The solver in the R47 takes after the HP-42S, where you pick a program that serves as the equation to be solved, then pick an MVAR defined in that program which you want the programmed equation to be solved for. The key difference in the HP-15C is that it always uses the X register for this instead. How is rejig to map from one to the other without losing information, breaking round-tripping?
Numeric integration fail to map 1:1 for much the same reason, because it operates on similar principles.
Matrices work altogether differently in the HP-15C and the R47, to the point that about all they share is the name and the mathematical underpinnings. Until and unless rejig becomes smart enough to rewrite programs using matrices on the fly, all it is able to offer in this regard is to preserve the input ops for round-tripping, as when using rejig as a pretty-printer or a verifier. At present, the only HP-15C matrix-related op it knows how to do this with is DIM because there is no “MATRIX” op in the R47 as such.
Random number generation works entirely differently on the R47 due to its support for swappable distribution functions and its ability to change the endpoints of that distribution. Because of this, when translating an HP-15C program that uses its simple RAN# facility to an R47 output format, the result refers to a generic named variable of that name having no effect on the behavior of the actual random number generator. This allows lossless round-tripping plus translation to DM32 which works similarly, but it does not allow rejig to “upgrade” a 15C program using these facilities to the equivalent code needed on the R47.

Even when round-tripping 15-to-15, everything gets translated up to an R47-based ideal form during ingest, which then gets converted down to the output format. The upshot is that these features won’t work even when using rejig as a source verifier or pretty-printer due to the semantic mismatch between the high-level R47 expression modeled in rejig and the primitive facilities HP pioneered way back in 1982.

Future versions of rejig may learn how to map one or more of those, but I’m making no promises.

Output

Of what’s left, I believe everything round-trips cleanly. Where you are most likely to run into trouble with --to hp15c is when starting with --from $SOMETHING_ELSE.

There are several minor cosmetic differences between rejig output and that of the simulators:

The decoded form of the LAST𝑥 op in output from the Manz simulator misuses the Greek capital letter chi (Χ) instead of the mathematical capital italic (𝑋) letter. I could make rejig conform, but I don’t wish to perpetrate the error.
The JRPN simulator uses plain ASCII output, with the decoded ops shown like x^2 instead of 𝑥².
The Manz simulator defaults⁸ to UTF-16 + BOM output in service of its enhanced op decoding info.
The header comment styles differ considerably:
- Manz:
```
  HP-15C Simulator program
  Created with version 5.1.00
```
  The version number changes in each release.
- JRPN:
```
  Program produced by JRPN 15C.
  Character encoding:  UTF-8
  Generated 2025-9-27 11:41 MDT.
  Program occupies 21 bytes.
```
  It doesn’t declare the writing program’s version number, but the timestamp line changes every minute even for identical output.
- rejig melds the best of both:
```
  HP-15C program generated by rejig --to hp15c (v0.10.0)
  Character encoding: UTF-8
  Byte count: 17
```
  In order to reduce unhelpful differences in make test output while running a pre-release build, the bit at the end of the first line changes to “trunk build” (or whatever the branch name was) instead of showing the commit ID prefix as the --version flag does. This same motivation leads me to resist adding a timestamp to the output; that’s what mtimes are for!
- The op decoder in the Manz simulator is inconsistent⁹ in its use of Unicode thin spaces, making me reluctant to match these details in rejig output.
- The Manz format allows storing certain types of metadata in comments, editable from within the simulator via the File → Program Description… menu item. Because rejig strips all comments outside the section containing program data, that metadata is lost when round-tripping.

The point of listing these differences is that it affects my round-tripping tests. If you load one of my programs up into one of the simulators and save it right back out, the differences listed will make this test appear to fail. A proper round-trip test uses rejig throughout the chain.

RLM-15CX

This is another fine HP-15C simulator, which uses a different (JSON-based) file format. Because it uses the *.15C file name extension and rejig matches that case-insensitively, you must give --from/to rlm15cx to read these files in/write them out.

This format can include non-program data such as the contents of the stack, the simulator settings, etc., but rejig ignores all that. Because the RLM simulator preserves elements not explicitly overwritten, round-tripping a program through rejig does not necessarily lose these other elements. Simply load the new program atop the existing configuration, then save it back out with those items checked in the “Select data to Save” dialog that pops up.

HP-11C Limitations

The sole support for HP-11C programs in rejig at present is via the RLM Tools simulator. You might think that this is a simple extension to the RLM-15CX file format support described above, but there are a significant number of differences between the HP-15C and the 11C that make it a more substantial translation effort.

At a superficial level, all unshifted keys are the same between the two machines, but 35% of the f/g-shifted ones differ. In many cases, all this requires is an alternate mapping, as with the change in the location of 𝜋 from f EEX on the HP-15C to f CHS on the HP-11C. Round-tripping between .15c and .11c may therefore change key codes even when the program logic is identical.

A more annoying case is that the HP-11C arranges its eight comparison ops sensibly as shifted versions of the four basic arithmetic keys at the right edge of the calculator. When HP extended that to produce the HP-15C, the many additional functions they crammed into a space originally designed to accommodate the HP-11C feature set forced the design team to hide all but two of these comparison ops away in the TEST menu, requiring the user to flip the calculator over to look up the test number.¹⁰ Worse, the two ops visible on the HP-15C front panel are in different locations than on the HP-11C, requiring that all comparison ops be remapped.

There are a few cases where rejig is forced to do something more clever than simple key code remapping:

Swaps: Whereas the HP-15C offers the general-purpose 𝑥⇄ op taking any of the legal registers, the only options on the HP-11C are to swap the X register directly with I or indirectly thru it.
Complex: Not only must rejig elide this for the HP-11C, it has to account for the fact that the I key (f-shifted TAN) has these additional meanings. In other words, the HP-15C allows use of f I as an operation, while the HP-11C does not. The RLM-11CX simulator will code this sequence in a program, but if it has a defined meaning in the HP-11C manual, I can’t find it.

The HP-15C format handlers do not support use of “(i)” as an op, but that is because it is a momentary version of the Re⇄Im op that rejig does support. This calculator feature meant for interactive use, not in HP-15C programs. The point here is that it is another no-op on the HP-11C.

One further difference needs pointing out: because the RLM simulators use JSON as their file format, rejig does not output the metadata headers you get with --to hp15c when you say --to hp11c instead.¹¹ Although this same restriction affects --to rlm15cx the point here is that there is no “better” HP-11C format where you can get that extra info if you want it. If you want a byte count for an HP-11C program, you can get into the ballpark with “rejig --to hp15c”, but beware that the other differences above can throw this result off; comparisons in particular are likely to encode to different lengths.

Unicode Discrepancies

The R47 programming language makes heavy use of math symbols, superscripts, Greek letters, and so forth. In support of this, they ship a custom font that must be installed on the host system in order for the R47’s RTF output from XPORTP to display as intended. As of this writing, it defines 705 glyphs in a 15-bit subset of the available 21-bit range.

Unfortunately, there are a number of discrepancies, in several classes:

Count	Description
1	diacritic mismatch (ķ rendered as k̂)
1	“ℐ” drawn as double-struck capital I
1	“∜” drawn as xth-root, changing its meaning
1	“⇀” drawn as a short-armed arrow
1	“⇄” drawn in classic HP “swap” style
1	“⇍” drawn as the undo symbol (should be U+238C = “⎌”)
1	“ẝ” misused as the f-shift indicator
1	“Ϳ” misused as x-under-root (Coptic Greek “yot”)
1	“Ȳ” drawn as y-under-root (visually similar if you squint, but semantically different)
10	superscript Arabic digits at Roman Ⅰ thru Ⅹ
27	unassigned spots taken over; e.g. “x̅” in the block reserved for Coptic Greek
57	reassignments; e.g. Δ/∇-looking glyphs overlaying ⇉/⇋; x-over-y glyph overlaying ⧰
114	similar meaning but different rendering; e.g. “Ⓩ” used as “^Z”
218	TOTAL DISCREPANCIES

Much of this is harmless, as with the “loss” of lame characters like the parenthesized numerals, which can be adequately rendered without: Unicode ⑻ ≈ ASCII (8). Another example is that Unicode’s Roman numeral “Ⅷ” renders nearly identically to the plain ASCII alternative “VIII” in many fonts. We should not mourn the loss of these characters.

Where we have a problem is when meanings change.

Take the C47 font’s xth-root glyph, which overwrites what Unicode set aside as U+221C, the fourth-root glyph. One may say, “The C47/R47 doesn’t have a fourth-root feature, so we can safely take this over,” but that still makes us ask, “Why doesn’t the custom font put this nonstandard character up in the PUA where it belongs?”

I have no problem with the decision to leave the fourth-root character undefined in the custom font. My complaint is that it was overwritten with a glyph having a different meaning. If you copy-paste this character out of an XPORTP file into a document using a different font, it will visually change from xth-root to 4th-root! Pasting it back will restore the meaning, but why allow the confusion? If Unicode doesn’t define what you need, it is better that the paste shows an undefined character in the font you’re using to clue you into the problem rather than hide it under another meaning.

Test Method

Those curious about the method used to come to the conclusions above may wish to study:

The program used to extract the defined code points from the C47 source code and produce the raw TSV file I examined for this study. The header comments go into further details on my method.
The TSV data file distilled from that process. You might wish to load that up into a spreadsheet and reapply the font changes suggested in that script’s header comment to make your own local evaluation, for instance.

A Plan to Improve This Situation

Reworking the R47 at this late date to drag it into line with Unicode would be a tremendous amount of work, to no visual benefit, but to considerable semantic benefit: copy-pasting text from an XPORTP output file into one using a different font would not cause it to change meaning.

The following multi-step evolution would improve matters:

Strip the 16th bit.

The C47/R47 code restricts itself to a 15-bit subset¹² of Unicode by forcibly setting the high bit on all uint16_t character values beyond the 7-bit ASCII subset. This flags non-ASCII characters so that they can be recognized by testing the high bit in their first byte when stored in big-endian fashion, which in turn lets it use a single byte for text where ASCII suffices, saving considerable RAM and flash space.

The main downside — from our immediate perspective — is that reassigning the meaning of that top bit cuts off half the UCS-2 character space. That not only rules out these characters as sources of solutions to the discrepancies above, it means the C47 font can’t shove its nonstandard characters into a PUA block since the first is way up at U+E000–U+F8FF, requiring that sixteenth bit.

Much the same benefit results from the more complicated UTF-8 encoding scheme,¹³ which solves the entire problem by encoding all Unicode characters in a variable-length encoding scheme taking 1-4 bytes per. We can dream of a UTF-8 based R47, but it won’t happen any time soon.
Move all wholly custom characters to the PUA.

That’s what it’s for. No compatibility will be lost with this change, because there is no cross-font compatibility in this case regardless.

This includes the characters currently squatting on unassigned spots in Unicode. Future standards may provide new characters here, which we won’t be able to take advantage of under the current situation.
Consider moving the rest on a case-by-case basis.

While the bulk of the current discrepancies are harmless, the main reason not to move everything to the PUA once it is open for use is to reduce the upset from doing the move. The fewer things that change at once, the simpler debugging those changes will be.

Taking up the Burden

I created rejig under the presumption that no one will volunteer to undertake that huge task, and after much discussion on the forum, that appears to be an accurate view.

And that’s perfectly fine! Unlike the R47 project proper, rejig doesn’t have a choice in this matter: its default output format is UTF-8 plaintext, where we cannot count on any particular font being used. It must of necessity conform to the Unicode standard.

rejig can solve these discrepancies by starting with P47 byte codes and emitting the proper Unicode from there.

Practical Transformations

rejig uses these transformations in service of the goal of offering pretty Unicode output from P47 byte codes:

superscripts/subscripts: Unicode defines a large number of these, but that offering does not cover the entire Latin + Greek alphabet. For instance, there is a superscript “x” character in Unicode allowing one to render the common calculator function yˣ in plain Unicode, but there is no subscript “Q” needed to render the R47’s `x_(Q1)` function.

rejig uses what Unicode provides by preference, with fallbacks to LaTeX math notation: a_b means `a_b`, and c^d means `c^d`. Multi-letter super/subscripts must be given in parens because a_bc renders as `a_bc` in LaTeX. You must give it as a_(bc) to get the desired effect: `a_(bc)`

There are extensive aliases for these in rejig, allowing you to type ASCII/LaTeX equivalents such as log10(x) for `log_(10)(x)` or arctan(x) for `tan^(-1)(x)`.
swaps: There are several different left/right arrow styles in Unicode. The closest in visual appearance to the glyph used for swaps on-screen and on the R47 keyboard is the “greater-than-or-less-than” operator, ≷, but for rejig and this site more broadly, I have settled on “⇄” as closer in semantic meaning. You may also use “<>” as an alias.
operators: We use proper math symbols by preference, but where that takes us outside the ASCII range, we offer aliases inspired by programming languages: != for ≠, * for ×, and so forth. Note that we do not use <> to mean inequality as in BASIC and Pascal, as we already used that for swaps and do not wish to create a language ambiguity.
math italics: rejig prefers Unicode mathematical italic letters in UTF-8 output, and it will happily accept them on UTF-8 input, but it also offers plain ASCII aliases. The cos(𝑥) function is canonical, but we also accept cos(x), and since we do case-insensitive matching on ASCII letters, it means we also accept COS(X), plus the plain COS. We eschew case-folding for other alphabets to avoid confusing Σ with σ, or worse, confusing ASCII X with Greek 𝛸, the uppercase letter chi.
roots: When one wishes to go beyond cubes and cube roots, Unicode starts to become tricky enough that it isn’t worth the bother. Thus why the R47’s `root(x)(y)` function is spelled root(x)(y) by rejig per LaTeX notation. Prior versions used root(x,y) for this, but that encourages experienced programmers to violate the no-spaces rule by misspelling it root(x, y)”.
constants: The R47 spells out references to the built-in constants as ”# 29 mPL mass.planck” but this, too, violates the no-spaces rule. In rejig, you just give one of the two alternative names: either m_(PL) or mass.planck in this example. Note that this also invokes the subscript transformation.

Because rejig converts all these transformations and aliases to a canonical form, you can use it to “upgrade” a plain ASCII version of a program to the pretty-printed Unicode version:

$ rejig -f utf8 -t utf8 - < ascii.txt > utf8.p47u

(You may now wish to return to my R47 article index.)

License

^{^} This is not standardized anywhere. I picked it for lack of a better plan and reserve the right to change it later.
^{^} Homebrew comes preinstalled on Aurora, Bazzite, Bluefin…
^{^} Indeed, developers can run “task test” to run the unit tests and then verify its round-tripping ability with several programs, including a variant of the R47’s “AllOps” demo. Running “task intcov” produces an integration test report showing how much of the Go code the all-ops tests “touch,” ideally approaching 100% except for the known-error cases.
^{^} There is a proposal under consideration to age out obsolete ops once enough time has passed. We wish to balance the decreasing likelihood of anyone trying to load an old program using such an op with the fact that this probability will never equal zero. The trick is setting those thresholds to an acceptable level. Maintaining backward compatibility forever carries a space cost for all the holes it creates in the op table.
^{^} That being the R47 convention for marking >OBSOLETE< op names.
^{^} This workaround has the side effect of clearing the LAST𝑥 register, but keep in mind that the alternative here is no conversion at all. If you are on a calculator with DROP𝑥, prefer it over this hack.
^{^} The author “…sees no advantage in offering the help files without the simulator.” I see one: so I can point people directly to the docs I used!
^{^} The Manz sim will conform to my strong preference for UTF-8 if one goes into its Preferences and unchecks the “Encode programs in UTF-16 (LE)” option on the Files tab.
^{^} There are three f-shifted ops with a thin-space between the first and second letters — Py,x →R, →RAD — where the closely-related g-shifted op lacks one in the same spot. I’d follow either convention, if it was consistently applied, but as it is, I pick one and end up “wrong” in the other half.
^{^} HP had not yet advanced their tech to allow on-screen menus in this class of machine. That would have to wait until the first Pioneer series units came out in 1988.
^{^} Yes, I checked. The simulator pops up an error dialog when you ask it to open an .11c file containing a comment. Strictly speaking, it is correct to complain that JSON does not allow comments, but what this tells us more broadly is that the RLM simulators do not support one of the extensions that relax this restriction, such as JSON5.
^{^} For instance, the square root symbol at U+221A is referenced in the code as "\xA2\x1A" = 0xA21A = 0x8000 | 0x221A
^{^} If the C47 scheme is standardized anywhere, I'm not aware of it.

Enter RPN

Files in r47/rejig/ of trunk

Files in directory r47/rejig from the latest check-in of branch trunk

rejig: Translate RPN Programs Between Formats

Motivation

Examples

Downloads

Source Code

Implied Formats

Command Option Flags

Operating Modes

A Matter of Style

Identifier Names

Comments

Indenting

Blank Lines

Automatic Reformatting

Compatibility Breakages

Obsolete Operation Names

Number Handling

DM32 Limitations

Output

Input

HP-15C Limitations

Input

Output

RLM-15CX

HP-11C Limitations

Unicode Discrepancies

Test Method

A Plan to Improve This Situation

Taking up the Burden

Practical Transformations

License