Enter RPN

Files in r47/rejig/ of trunk
Login

Files in r47/rejig/ of trunk

Files in directory r47/rejig from the latest check-in of branch trunk


rejig: Translate RPN Programs Between Formats

Motivation

The R47 offers two different ways to save a single program out to its USB flash storage:

The first is technically UTF-8 plaintext, but it isn’t human-readable, whereas the second is human-readable but only via a word processor. Even then, it’s functionally read-only because the R47 offers no corresponding “IMPORTP” function that will parse an edited RTF file back into byte codes.

My rejig tool rejiggers R47 program files between any combination of:

Broadly, rejig adds the virtues of the Unix Way to the R47 world. It takes inspiration from Pandoc, but without nearly as much ambition behind it.

Examples

My triangle solver is available here in four different forms:

  1. The hand-annotated version at the top of the linked article.
  2. The *.p47 version produced on an R47 via WRITEP.
  3. The RTF version produced on an R47 via XPORTP.
  4. The Unicode pretty-printed version produced by rejig in its default operating mode: -f p47 -t utf8

Comparing them should prove instructive.

For more examples, see the parent directory, where most of *.p47 have a corresponding *.p47u file.

Downloads

Pre-built static binaries are available for download here for Windows, macOS, and Linux, for both 64-bit ARM and Intel CPUs.

The installation process is:

  1. Unpack the archive.
  2. Copy the binary to the destination folder.
  3. There is no step 3. You’re done already.

You may wish to consult the change log to see what’s new.

If those binaries do not appeal, your best bet is the…

Source Code

rejig is written in Go and hosted here.

You may clone it and everything else on this section of my site via Fossil.

On a macOS or Universal Blue2 host:

brew install fossil go go-task
fossil clone https://tangentsoft.com/rpn/
cd rpn/r47/rejig
make

That should get you a rejig binary in the current working directory, built from the current source code thanks to the magic of Task and go build.

See the Fossil quick start guide for more information on using Fossil, including updating to current versions and more. For an amuze-bouche, try this:

fossil ui

You should now be seeing a copy of this very website in your default web browser. Cool, yeah?

Implied Formats

By default, the program operates as if you gave -f p47 -t utf8, but the command line parser consults those defaults only when no file names are given. Otherwise, it checks the file name extensions for the known file format specifiers accepted in --from/to options, and if there is a match, it uses it. The following commands are therefore equivalent:

$ rejig - < input.p47 > output.p47u
$ rejig input.p47 -o output.p47u

The first works because these are the default operating mode formats, while the second actually overwrites those defaults with the same value from the extensions, redundantly giving the default behavior.

Where this becomes more useful is when doing non-default things:

$ rejig input.utf8 -o output.p47
$ rejig --from p47u input.utf8 --to p47 -o output.p47

Those do the same thing, but the second is needlessly verbose.

This behavior makes --from/to options largely redundant when using files, since you should be naming them in accord with the conventions this program supports regardless.

However, these options may be necessary when the first input is given as - to mean “read from stdin” or when writing output to the default location — stdout — where file name extension guessing cannot work. For example:

$ rejig -f utf8 -t p47 - < input.p47u > output.p47

This is the overly-verbose example above recast using pipes, making the -f/t options fully justified because they override defaults we do not want to accept in this instance.

Command Option Flags

rejig accepts these option flags:

The defaults overridden by --line-numbers and --verbose were chosen because rejig focuses on writing outputs that allow lossless round-tripping, enforced by a test that runs frequently during development. It therefore elides metadata such as program byte counts, line numbers, and total line counts unless you ask for them.

Operating Modes

Depending on which options and file arguments you give, rejig acts as…

This all extends to implied formats, of course.

A Matter of Style

rejig -t utf8 implements my preferred RPN programming style, with no option for customization short of changing the code.

There are plans to change this, to some extent, but in the meantime, here is what rejig does:

Identifier Names

The R47 programming language was not intended to be parsed, as such, but to support its round-tripping features, rejig makes several changes to the operation names, system flags, and so forth:

If you wish to write R47 programs using rejig as a byte-code compiler, you will have to be aware of these name changes. There is presently no documentation for all the changes other than what you find in the source code, starting about ⅔ the way down in the ops file, in the internalOpInfo map.

Comments

rejig recognizes three styles of comments:

When one of those are encountered outside quotes, everything from that point to the end of the line is ignored. The intent is that the rejig format be as useful as a programmer-to-programmer communication medium as a way to get programs from your computer into an R47.

One of the first things you are likely to notice on studying my triangle solver is all the explanatory comments, one on nearly every line. I’ve always been a big fan of documenting code, and I find comments especially helpful with terse languages like the one backing the R47.

Take the comments showing stack register movements: they aid the reader in understanding the program by documenting how the data moves on the stack at each step. As currently written, these comments assume the simpler SSIZE4 mode even though the program was tested with SSIZE8 mode; my choice was to pick one for clarity or document both options, muddying the presentation. Since the only material difference is that the interactions involving T affect the R47’s D register instead, I took this simpler tack.

Indenting

rejig takes inspiration from go fmt: both implement a common formatting scheme for their respective ecosystems, doing away with local idiosyncrasies. One may quibble with the choices made, but when the rules are fixed and automatically enforced, developer focus tends to shift to substantive matters.

The rejig scheme is simple: it adds a 2-space indent for each LBL after the first and subtracts a level for each RTN. The op following a do-if-true op gets an extra indent level. An END op brings the indent level back to 1, and .END. zeroes it. Easy.

The sole exception results from a combination of the above: a do-if-true RTN does not reduce the indent level because it is conditionally the end of the subroutine.

The fact that does not always produce neat hierarchies of subroutine calls reflects the unstructured nature of RPN programming. If you find yourself questioning the formatting, first ask whether it is a fair expression of the ops as given. Surprising indent levels might be telling you something important about how the program actually works, which you might wish to address.

That is my experience, at any rate. I find that my scheme makes far more sense than the one built into the R47’s PEM mode and its RTF outputs.

Blank Lines

Doubtless because of the limited screen real estate on calculators, even biggish ones like the R47, it is traditional in RPN to write one line after the next, all the way through, without any blank lines.

Yet, when writing the program out as described above, I do believe we can accept adding one blank line above each LBL for readability. This is another reason to suppress line numbers by default in UTF-8 exports.

rejig adds these when pretty-printing and strips them back out when reading text-form programs back in.

If a LBL is preceded by a REM, the blank line is placed before the REM instead, under the assumption that it is commenting on the following subroutine. This does not yet work for a block of multiple REM statements.

Automatic Reformatting

Above I brought up the example of the Go programming language’s automatic formatting. In case it is not clear, my intent in allowing option combos like

rejig -f utf8 -t utf8 < a > b

…is to provide a similar facility, canonicalizing the input in as lossless a manner as may be hoped for.

Compatibility Breakages

Nearly all cases where rejig will not read a *.p47 file result from lack of a feature, which is why the version number begins with “0.” — see SemVer rule 4.

However, there are certain cases worth pointing out as likely traps:

Obsolete Operation Names

When rejig encounters an obsolete op, it complains and stops input processing immediately unless you give the --fix-obsolete option. It does so for the same reason that attempting to run a program with one of these ops on an R47 raises the error:

Function has changed, please replace.

There are two common situations that cause an op to be obsoleted like this:

Ops are never4 removed outright because they need to retain their op code index for compatibility with existing programs.

These conditions make the --fix-obsolete option risky. All it does is strip the angle brackets5 and retry.

Number Handling

The R47 has an extremely powerful set of numeric data types, both real and integer. In the main, the R47 preserves these by encoding them in “string” form, not one of the internal binary formats. This in turn allows regig to store and regurgitate the byte codes faithfully without actually needing to process the data in their native forms.

The sole “binary” number format rejig currently supports is for what the R47 calls “short” integers: unsigned 64-bit integers tagged with a base value. When the calculator shows you “7F#16”, it is storing the decimal value 127 internally as a short integer, tagged for hex display. The R47 can store these in either string or binary form. It uses binary for positive numbers when the string form would take more memory, and it uses string form when the number is negative — to preserve the sign irrespective of the current number display mode — or when doing this saves space over the binary form.

The code uses binary formats in a few other cases, but rejig currently includes no code for interpreting them. If and when we do encounter a *.p47 file containing one of these, it will be likely that the first versions supporting it will be lossy. The only way to avoid that would be to bake the same decNumber library into rejig for this sole purpose.

DM32 Limitations

The DM32 support in rejig is implemented as a subset of its R47 support. Reading a DM32 statefile translates it into R47 ops, and lowering it reverses this. Because the R47 way of doing things isn’t exactly the same as how the HP-32SII did it — as emulated in the DM32 — and the R47 isn’t a 100% superset regardless, these conversions are necessarily lossy.

Output

When reading --from another type of file format than dm32 — or equivalently, giving as input a file named other than *.d32 — it is important to limit yourself to functions the DM32 supports. There are very few cases where rejig will emit RPN code to implement missing ops. The current sole exception is the R47 DROP𝑥 op, which is common enough that it is emulated in DM32 output as CL𝑥 plus addition, making use of the additive identity: the prior Y becomes the new X=Y+0.6

The primary limitations resulting from this stance are:

Input

rejig supports every known operation type that can be written into a DM32 statefile, with one small class of known exceptions. The DM32 has several stats ops that operate on (𝑥,𝑦) points, and of those that operate on linear sequences of scalar values, it has separate versions for 𝑥 and 𝑦. The 𝑥 versions all exist in the R47, but there appear to be no R47 stats ops equivalent to these DM32 ones: 𝑦̅, s𝑦, σ𝑦

I hesitate to file an R47 feature request to add equivalents I can use to map these ops. Their purpose seems dubious to me, reflecting the internal design of the HP-32S and its successors rather than a genuine end-user need. If you want to get stats on a list of values, why would you put them into the calculator in (𝑥,𝑦) form unless that was the only method offered?

Beyond that, the primary area where one runs into trouble with the rejig DM32 input handling is where it is forced to do a translation:

Other translations to accommodate missing DM32 features should be lossless: full-line comments to REM and back, etc.

HP-15C Limitations

The primary HP-15C support in rejig depends on Torsten Manz’ HP-15C Simulator as an intermediary. You can either use his PC program directly to read/write programs for exchange with rejig or use its Devices feature as an interface to the USB connections on the HP-15C Collector’s Edition or SwissMicros’ DM15 family of machines. Its file format has also been adopted by the Jovial JRPN simulator. I wish there was a more nearly universal HP-15C program format, but I’ll gladly accept the limited upside: compatibility with four different calculators by implementing a single format.

Documentation is only distributed with the program itself,7 and that path varies based on the host OS. On macOS, it is here.

Because the support for this format in rejig shares many of the --from/to dm32 limitations — and for the same essential reasons — I will not repeat all that here. Instead, I will give the differences particular to this format.

Input

The following HP-15C features do not translate well at all:

Even when round-tripping 15-to-15, everything gets translated up to an R47-based ideal form during ingest, which then gets converted down to the output format. The upshot is that these features won’t work even when using rejig as a source verifier or pretty-printer due to the semantic mismatch between the high-level R47 expression modeled in rejig and the primitive facilities HP pioneered way back in 1982.

Future versions of rejig may learn how to map one or more of those, but I’m making no promises.

Output

Of what’s left, I believe everything round-trips cleanly. Where you are most likely to run into trouble with --to hp15c is when starting with --from $SOMETHING_ELSE.

There are several minor cosmetic differences between rejig output and that of the simulators:

The point of listing these differences is that it affects my round-tripping tests. If you load one of my programs up into one of the simulators and save it right back out, the differences listed will make this test appear to fail. A proper round-trip test uses rejig throughout the chain.

RLM-15CX

This is another fine HP-15C simulator, which uses a different (JSON-based) file format. Because it uses the *.15C file name extension and rejig matches that case-insensitively, you must give --from/to rlm15cx to read these files in/write them out.

This format can include non-program data such as the contents of the stack, the simulator settings, etc., but rejig ignores all that. Because the RLM simulator preserves elements not explicitly overwritten, round-tripping a program through rejig does not necessarily lose these other elements. Simply load the new program atop the existing configuration, then save it back out with those items checked in the “Select data to Save” dialog that pops up.

HP-11C Limitations

The sole support for HP-11C programs in rejig at present is via the RLM Tools simulator. You might think that this is a simple extension to the RLM-15CX file format support described above, but there are a significant number of differences between the HP-15C and the 11C that make it a more substantial translation effort.

At a superficial level, all unshifted keys are the same between the two machines, but 35% of the f/g-shifted ones differ. In many cases, all this requires is an alternate mapping, as with the change in the location of 𝜋 from f EEX on the HP-15C to f CHS on the HP-11C. Round-tripping between .15c and .11c may therefore change key codes even when the program logic is identical.

A more annoying case is that the HP-11C arranges its eight comparison ops sensibly as shifted versions of the four basic arithmetic keys at the right edge of the calculator. When HP extended that to produce the HP-15C, the many additional functions they crammed into a space originally designed to accommodate the HP-11C feature set forced the design team to hide all but two of these comparison ops away in the TEST menu, requiring the user to flip the calculator over to look up the test number.10 Worse, the two ops visible on the HP-15C front panel are in different locations than on the HP-11C, requiring that all comparison ops be remapped.

There are a few cases where rejig is forced to do something more clever than simple key code remapping:

One further difference needs pointing out: because the RLM simulators use JSON as their file format, rejig does not output the metadata headers you get with --to hp15c when you say --to hp11c instead.11 Although this same restriction affects --to rlm15cx the point here is that there is no “better” HP-11C format where you can get that extra info if you want it. If you want a byte count for an HP-11C program, you can get into the ballpark with “rejig --to hp15c”, but beware that the other differences above can throw this result off; comparisons in particular are likely to encode to different lengths.

Unicode Discrepancies

The R47 programming language makes heavy use of math symbols, superscripts, Greek letters, and so forth. In support of this, they ship a custom font that must be installed on the host system in order for the R47’s RTF output from XPORTP to display as intended. As of this writing, it defines 705 glyphs in a 15-bit subset of the available 21-bit range.

Unfortunately, there are a number of discrepancies, in several classes:

Count Description
1 diacritic mismatch (ķ rendered as k̂)
1 “ℐ” drawn as double-struck capital I
1 “∜” drawn as xth-root, changing its meaning
1 “⇀” drawn as a short-armed arrow
1 “⇄” drawn in classic HP “swap” style
1 “⇍” drawn as the undo symbol (should be U+238C = “⎌”)
1 “ẝ” misused as the f-shift indicator
1 “Ϳ” misused as x-under-root (Coptic Greek “yot”)
1 “Ȳ” drawn as y-under-root (visually similar if you squint, but semantically different)
10 superscript Arabic digits at Roman Ⅰ thru Ⅹ
27 unassigned spots taken over; e.g. “x̅” in the block reserved for Coptic Greek
57 reassignments; e.g. Δ/∇-looking glyphs overlaying ⇉/⇋; x-over-y glyph overlaying ⧰
114 similar meaning but different rendering; e.g. “Ⓩ” used as “Z
218 TOTAL DISCREPANCIES

Much of this is harmless, as with the “loss” of lame characters like the parenthesized numerals, which can be adequately rendered without: Unicode ⑻ ≈ ASCII (8). Another example is that Unicode’s Roman numeral “Ⅷ” renders nearly identically to the plain ASCII alternative “VIII” in many fonts. We should not mourn the loss of these characters.

Where we have a problem is when meanings change.

Take the C47 font’s xth-root glyph, which overwrites what Unicode set aside as U+221C, the fourth-root glyph. One may say, “The C47/R47 doesn’t have a fourth-root feature, so we can safely take this over,” but that still makes us ask, “Why doesn’t the custom font put this nonstandard character up in the PUA where it belongs?”

I have no problem with the decision to leave the fourth-root character undefined in the custom font. My complaint is that it was overwritten with a glyph having a different meaning. If you copy-paste this character out of an XPORTP file into a document using a different font, it will visually change from xth-root to 4th-root! Pasting it back will restore the meaning, but why allow the confusion? If Unicode doesn’t define what you need, it is better that the paste shows an undefined character in the font you’re using to clue you into the problem rather than hide it under another meaning.

Test Method

Those curious about the method used to come to the conclusions above may wish to study:

  1. The program used to extract the defined code points from the C47 source code and produce the raw TSV file I examined for this study. The header comments go into further details on my method.

  2. The TSV data file distilled from that process. You might wish to load that up into a spreadsheet and reapply the font changes suggested in that script’s header comment to make your own local evaluation, for instance.

A Plan to Improve This Situation

Reworking the R47 at this late date to drag it into line with Unicode would be a tremendous amount of work, to no visual benefit, but to considerable semantic benefit: copy-pasting text from an XPORTP output file into one using a different font would not cause it to change meaning.

The following multi-step evolution would improve matters:

  1. Strip the 16th bit.

    The C47/R47 code restricts itself to a 15-bit subset12 of Unicode by forcibly setting the high bit on all uint16_t character values beyond the 7-bit ASCII subset. This flags non-ASCII characters so that they can be recognized by testing the high bit in their first byte when stored in big-endian fashion, which in turn lets it use a single byte for text where ASCII suffices, saving considerable RAM and flash space.

    The main downside — from our immediate perspective — is that reassigning the meaning of that top bit cuts off half the UCS-2 character space. That not only rules out these characters as sources of solutions to the discrepancies above, it means the C47 font can’t shove its nonstandard characters into a PUA block since the first is way up at U+E000–U+F8FF, requiring that sixteenth bit.

    Much the same benefit results from the more complicated UTF-8 encoding scheme,13 which solves the entire problem by encoding all Unicode characters in a variable-length encoding scheme taking 1-4 bytes per. We can dream of a UTF-8 based R47, but it won’t happen any time soon.

  2. Move all wholly custom characters to the PUA.

    That’s what it’s for. No compatibility will be lost with this change, because there is no cross-font compatibility in this case regardless.

    This includes the characters currently squatting on unassigned spots in Unicode. Future standards may provide new characters here, which we won’t be able to take advantage of under the current situation.

  3. Consider moving the rest on a case-by-case basis.

    While the bulk of the current discrepancies are harmless, the main reason not to move everything to the PUA once it is open for use is to reduce the upset from doing the move. The fewer things that change at once, the simpler debugging those changes will be.

Taking up the Burden

I created rejig under the presumption that no one will volunteer to undertake that huge task, and after much discussion on the forum, that appears to be an accurate view.

And that’s perfectly fine! Unlike the R47 project proper, rejig doesn’t have a choice in this matter: its default output format is UTF-8 plaintext, where we cannot count on any particular font being used. It must of necessity conform to the Unicode standard.

rejig can solve these discrepancies by starting with P47 byte codes and emitting the proper Unicode from there.

Practical Transformations

rejig uses these transformations in service of the goal of offering pretty Unicode output from P47 byte codes:

Because rejig converts all these transformations and aliases to a canonical form, you can use it to “upgrade” a plain ASCII version of a program to the pretty-printed Unicode version:

$ rejig -f utf8 -t utf8 - < ascii.txt > utf8.p47u

(You may now wish to return to my R47 article index.)

License

This work is © 2026 by Warren Young and is licensed under CC BY-NC-SA 4.0


  1. ^ This is not standardized anywhere. I picked it for lack of a better plan and reserve the right to change it later.
  2. ^ Homebrew comes preinstalled on Aurora, Bazzite, Bluefin
  3. ^ Indeed, developers can run “task test” to run the unit tests and then verify its round-tripping ability with several programs, including a variant of the R47’s “AllOps” demo. Running “task intcov” produces an integration test report showing how much of the Go code the all-ops tests “touch,” ideally approaching 100% except for the known-error cases.
  4. ^ There is a proposal under consideration to age out obsolete ops once enough time has passed. We wish to balance the decreasing likelihood of anyone trying to load an old program using such an op with the fact that this probability will never equal zero. The trick is setting those thresholds to an acceptable level. Maintaining backward compatibility forever carries a space cost for all the holes it creates in the op table.
  5. ^ That being the R47 convention for marking >OBSOLETE< op names.
  6. ^ This workaround has the side effect of clearing the LAST𝑥 register, but keep in mind that the alternative here is no conversion at all. If you are on a calculator with DROP𝑥, prefer it over this hack.
  7. ^ The author “…sees no advantage in offering the help files without the simulator.” I see one: so I can point people directly to the docs I used!
  8. ^ The Manz sim will conform to my strong preference for UTF-8 if one goes into its Preferences and unchecks the “Encode programs in UTF-16 (LE)” option on the Files tab.
  9. ^ There are three f-shifted ops with a thin-space between the first and second letters — Py,x →R, →RAD — where the closely-related g-shifted op lacks one in the same spot. I’d follow either convention, if it was consistently applied, but as it is, I pick one and end up “wrong” in the other half.
  10. ^ HP had not yet advanced their tech to allow on-screen menus in this class of machine. That would have to wait until the first Pioneer series units came out in 1988.
  11. ^ Yes, I checked. The simulator pops up an error dialog when you ask it to open an .11c file containing a comment. Strictly speaking, it is correct to complain that JSON does not allow comments, but what this tells us more broadly is that the RLM simulators do not support one of the extensions that relax this restriction, such as JSON5.
  12. ^ For instance, the square root symbol at U+221A is referenced in the code as "\xA2\x1A" = 0xA21A = 0x8000 | 0x221A
  13. ^ If the C47 scheme is standardized anywhere, I'm not aware of it.