PEP001.PA

Or, How I Managed to Spend Three Days Solving FizzBuzz

FizzBuzz is a test problem for software developers, often given early in interviews to quickly reject incompetent candidates. It is famously easy for a competent programmer to solve, yet somehow I — a self-styled competent programmer — managed to spend three days solving it. Let me tell you the story.

The Challenge

I set myself the challenge of solving Project Euler Problem #1 — a close relative of FizzBuzz — in PDP-8 assembly language as a way to force myself to learn the platform. Nothing motivates like an achievable goal, particularly when one's good self-image is on the line.

When I started, I had only been playing with my PiDP-8/I for a few weeks. I had no prior experience with a PDP-8 or any similar machine. The closest prior experience I had was with 6502 assembly language on the Apple ][ series in the early to late 1980s. I knew little more than how to run one of the many PDP-8 assemblers at the start of this challenge.

There are many different commonly-used assemblers for the PDP-8, a testament to the importance of assembly language on the PDP-8. Fortunately, most of them share a common syntax. The major differences among them have to do with pseudo-operations, macros, and peripheral features like how they deal with the PDP-8's page-and-field memory layout.

I also knew that the basic PDP-8 instruction set is really tiny, only 8 instructions at the high level, or more like 24 if you consider the stock microcoded instructions as distinct instructions and not all part of the OPR instruction. For comparison, Microchip's PIC10F200 — which is also a 12-bit processor, like the PDP-8 — has 33 instructions and costs only about 40 cents in quantity. This is a really constrained instruction set!

On the positive side of the balance scale, I've been a software developer for about a third of a century now, most of that in a professional capacity. I've learned an awful lot of programming languages and CPUs in that time, including a few other assembly languages.

How hard could it be?

Trap 1: No ROM, No Runtime

One universal characteristic of all personal computers is some sort of ROM or basic run time system, a fixed chunk of code with a well-defined API that contains commonly-used routines, such as one to print a line of text to the user's display. That could be anything from the PROM chip on an Apple ][ to the BIOS on an IBM PC to glibc on a Linux box. The point is, modern PCs all have some base level of functionality that you can count on being present so you don't have to continually reinvent the wheel.

I consider the PDP-8 the spiritual ancestor of the personal computer: inexpensive, simple, and small when compared to its contemporaneous competition. Just like its microcomputer descendants, the PDP-8 ran BASIC, but real programmers more commonly wrote in assembly instead. The PDP-8 wasn't quite a "personal" computer, but it was as close as you could get circa 1970, the heyday of the PDP-8, before microcomputers ate low-end minicomputers' lunch.¹

The PDP-8 didn't have a ROM. When you turn a PDP-8 on, the CPU initializes itself, and then nothing else happens. It just sits there waiting for you to give it some code to run. Every single word of program code that the machine runs had to be put in its memory manually by some human.² If you want the PDP-8 to boot an operating system, the operator must manually bootstrap the system, which may involve several stages. Or, the operator could just use the machine without loading anything first, toggling your program in via the front panel switches.

So that brings me to Trap 1: How does this program print an answer to the screen? You can't just call on some pre-existing ROM or runtime routine.³ No, one must write his own decimal number printing and ASCII string printing routines! In assembly language. Debugged via the front panel.

There's one day gone.

Trap 2: No Modulo Operator

The standard solution to FizzBuzz is to use the modulo operator, which returns the remainder when dividing one number into another. 6 mod 4 is 2, for example.

Major problem: the PDP-8's stock instruction set doesn't have a modulo operator, or even a division operator. This was a common limitation of early microprocessors, too: the 6502, the 8080, and the Z80 all lacked a hardware multiplier, and often when you were lucky enough to get a multiplier, you didn't also get a divider, as that's generally more difficult to implement.

Fortunately for me, DEC produced an add-on to the PDP-8/I that I used for this project called the KE8/I, also known as the Extended Arithmetic Element. The PDP-8 simulator I used for this project does have an EAE feature, so I was able to use its DVI instruction.

But of course the standard PDP-8 assembler doesn't know about the DVI instruction, or any of the special flags, registers, and microcode instructions you need to use along with it. You must actually define all of this at the top of your program so that the assembler knows how to generate the necessary code!

DVI is an integer divide instruction that leaves the remainder in the PDP-8's accumulator, which means it effectively implements the modulo operation as a side effect of doing division. But in order to figure that out, you must read the KE8/I specification document, which describes in excruciating detail how the KE8/I is implemented at the silicon gate level. There are even gate-level schematics!

Nowhere in this document does it simply say that "DVI divides the value following the DVI instruction into {AC:MQ}, with the quotient left in MQ and the remainder left in AC." It does say all of that, piece by piece, over the five pages documenting the DVI instruction, but you must extract it yourself. Here, I've done that for you in a single sentence. You're welcome.

There's another day gone.

Trap 3: No Precision

The problem I set myself states that you must add up the numbers that match the given criteria. The Project Euler web site then prompts you for that large random-looking integer, by which it knows you must have correctly solved the problem.

No, I'm not going to give you that integer. Go write a 5-line Python program to calculate it yourself.

Me, I took 156 PDP-8 instructions to solve it, not counting space for constants, variables, and links, the latter being a dirty trick that PDP-8 assemblers use for jumping between 128-word "pages" of memory. And you thought the 64k segments of real mode x86 assembly were bad!

But I digress.

The PDP-8 is a 12-bit machine. Since the standard arithmetic operations assume you want signed integer arithmetic, this means the largest positive number you can do math on within a PDP-8 is 2047. As the problem is posed on the Project Euler site, you need to add up the matching numbers in the range 1 to 999. That means there are many combinations of only three matching values in that range that, when added together, would overflow our 2047 limit. You simply can't compute an answer that large in a single integer on a PDP-8.⁴

I solved this by making my program report subtotals along the way each time the amount exceeded 1024. Then it would reset the total so far back to 0 and continue work. I chose that threshold as the largest "round number" in binary terms that would not overflow when adding the largest value that matched the criteria, 999.

The program writes this series of subtotals out to the console as a long string of additions:

ANSWER: 1064 + 1059 + 1060 + 1137 + 1158 + 1125 + 1047 + 1125 + ...

It goes on like that for many lines. Since you are probably running this program via some kind of terminal program, you can then copy that string of additions and paste it into bc(1), which will do all the adding up using a processor many orders of magnitude more powerful than the PDP-8.

Cheating? Maybe. I briefly considered saving each subtotal to core memory, then writing a routine to add the values up in grade school fashion, column-by-column, right-to-left, carries and all.

I decided that, day 3 gone, that I'd done well enough.

The Solution

Here is my winning solution to the problem, all 217 well-commented lines of it:

/ Project Euler Problem #1, Multiples of 3 and 5:
/
/   If we list all the natural numbers below 10 that are multiples of
/   3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
/   Find the sum of all the multiples of 3 or 5 below 1000.
/ 
/ Solution by Warren Young of tangentsoft.com, 2016.11.30
////////////////////////////////////////////////////////////////////////


////////////////////////////////////////////////////////////////////////
/ DIY ASSEMBLER INSTRUCTIONS

/ Our assembler doesn't know the EAE instructions, so teach it
DVI=7407                / integer divide .+1 into {AC:MQ}, answer in MQ

/ Combined microcoded instruction aliases
CLR=CLA CLL             / clear both AC and L
TCA=CMA IAC             / two's complement the accumulator


//// MAIN //////////////////////////////////////////////////////////////
/ Program entry point.   We purposely reinitialize global variables and
/ processor state in case we're restarting this program in-core.

PAGE 1
MAIN,   CLR
        TAD (3)
        DCA CURR        / start with 3, because we know 1 & 2 can't work
        DCA TOTAL       / reset total to 0
        TLS             / send null to terminal to get flags set right
        TAD (ANSWER)    / write "ANSWER: " to the terminal
        JMS PRINTS
        JMP MLCORE
CURR,   0               / current number we're checking
TOTAL,  0               / the answer so far; at the end, printed out

/ Constants
        DECIMAL
MAX,    999             / check natural numbers CURR to MAX; must be < 2048!
STMAX,  1024            / subtotal max; avoids overflow of 12-bit signed int

        OCTAL
CRLF,   15;12;0         / ASCII character values; don't forget trailing 0!
PLUS,   40;53;40;0
ANSWER, 101;116;123;127;105;122;72;40;0


//// MLCORE ////////////////////////////////////////////////////////////
/ The core of the main loop.  MAIN just inits the globals and calls us.

        / Try dividing 3 into CURR first
MLCORE, CLR
        TAD (3)
        JMS ISMOD0
        SNA             / if ISMOD0 left AC = 0, CURR divided evenly by
        JMP NXITER      / 3, so skip 5 lest we count multiples of 15 2x

        / That didn't work out, so try dividing 5 into it
        CLA
        TAD (5)
        JMS ISMOD0

        / Loop cleanup
NXITER, CLA             
        TAD CURR
        TCA
        TAD MAX         / = 0 if CURR == MAX
        SNA             / if so, leave calculation loop
        JMP MLDONE

        CLR             / CURR still < MAX, so increment CURR
        TAD CURR        
        IAC
        DCA CURR

        TAD TOTAL       / if TOTAL is getting too big, print...
        TCA             / a subtotal and zero TOTAL so we don't...
        TAD STMAX       / overflow the 12-bit limit
        SZL
        JMP MLCORE      / STMAX - TOTAL > 0 so re-enter loop core
        JMS SHOTOT      / exceeded threshold, so display subtotal and " + "
        CLA
        DCA TOTAL       / take advantage of free zero
        TAD (PLUS)
        JMS PRINTS
        JMP MLCORE      

        / Calculation complete.  Show answer and exit gracefully.
MLDONE, JMS SHOTOT
        CLA
        TAD (CRLF)
        JMS PRINTS
        JMP ENDG


//// ISMOD0 ////////////////////////////////////////////////////////////
/ If passed AC divides evenly into CURR (in C-speak, CURR % AC == 0)
/ add CURR to TOTAL and return 0 in AC.  Else, return nonzero in AC and
/ leave TOTAL untouched.

ISMOD0, 0
        / Divide CURR by DIVISOR, passed as AC
        DCA DIVISOR
        TAD CURR        / load CURR into just-cleared AC
        MQL DVI         / move CURR to MQ, divide by DIVISOR...
DIVISOR,0               / ...quotient in MQ, remainder in AC
        SZA
        JMP I ISMOD0    / remainder nonzero, so leave early

        / Division left AC empty, so CURR divides evenly by DIVISOR!
        TAD CURR        / don't need to clear AC; prior test says AC == 0
        TAD TOTAL
        DCA TOTAL
        JMP I ISMOD0


//// SHOTOT ////////////////////////////////////////////////////////////
/ Write TOTAL to terminal in decimal, nothing following.

SHOTOT,0
        CLR
        TAD TOTAL
        JMS DECPRT      / print answer on console, in decimal
        TSF             / wait for terminal to be ready again
        JMP .-1

        JMP I SHOTOT    / and done



//// PRINTS ////////////////////////////////////////////////////////////

PRINTS,0
        DCA SADDR       / save AC as string address
PSNEXT, TAD I SADDR     / load next character
        SNA
        JMP I PRINTS    / found the null terminator; leave

        TSF             / wait for terminal to be ready
        JMP .-1
        TLS             / write character to the terminal

        CLA             / increment string address pointer
        TAD SADDR
        IAC
        DCA SADDR

        JMP PSNEXT      / look at next character
SADDR,  0


//// ENDG //////////////////////////////////////////////////////////////
// End program gracefully, either re-entering OS/8 if we can see that
// its entry point looks sane, or halting with the answer in AC so the
// user can see the answer on the front panel.

ENDG,   CLA
        TAD OS8ENT
        TCA
        TAD OS8INS1
        SNA
        JMP 7600        / re-enter OS/8
        CLR
        TAD TOTAL
        HLT             / not running under OS/8, so halt
OS8ENT, 7600            / OS/8 entry point
OS8INS1,4207            / first instruction at that entry point


//// DECPRT ////////////////////////////////////////////////////////////
// Decimal number printer; see examples/routines/decprt.pal

PAGE 2
DECPRT, 0
        DCA VALUE       /SAVE INPUT
        DCA DIGIT       /CLEAR
        TAD CNTRZA
        DCA CNTRZB      /SET COUNTER TO FOUR
        TAD ADDRZA
        DCA ARROW       /SET TABLE POINTER
        SKP
        DCA VALUE       /SAVE
        CLL
        TAD VALUE
ARROW,  TAD TENPWR      /SUBTRACT POWER OF TEN
        SZL
        ISZ DIGIT       /DEVELOP BCD DIGIT
        SZL
        JMP ARROW-3     /LOOP
        CLA             /HAVE BCD DIGIT
        TAD DIGIT       /GET DIGIT
        TAD K260        /MAKE IT ASCII
        TSF             /OR TAD DIGIT
        JMP .-1         /JMS TDIGIT(SEE 8-19-U)
        TLS             /TYPE DIGIT
        CLA
        DCA DIGIT       /CLEAR
        ISZ ARROW       /UPDATE POINTER
        ISZ CNTRZB      /DONE ALL FOUR?
        JMP ARROW-1     /NO: CONTINUE
        JMP I DECPRT    /YES: EXIT
ADDRZA, TAD TENPWR
CNTRZA, -4
TENPWR, -1750           /ONE THOUSAND
        -0144           /ONE HUNDRED
        -0012           /TEN
        -0001           /ONE
K260,   260
VALUE,  0
DIGIT,  0
CNTRZB, 0


//// END ///////////////////////////////////////////////////////////////
/ Assembler-generated constants will appear below this in the list file
$

Allow me to repeat that in most modern programming languages, you get the same result in about 5 lines of code, and you don't need to fob the arithmetic off on bc(1).

Isn't it glorious?

Epilogue

There is now an optimized version of the above code checked into the code repository which fits everything into a single PDP-8 page of core, just barely: it takes up exactly 128 words of core memory.

Most of that optimization work was done by Rick Murphy.

Footnotes and Digressions

Minicomputers did, of course, live on quite some time after the microcomputer revolution: there were VAXen and S/390s and such. The point is that they all moved to the high end because they couldn't compete with PCs at the low end. Then the workstation revolution of the mid 1980s further eroded the minicomputer marketplace, so that by 1990, new minicomputer designs weren't all that "mini" at all: they had to morph into mighty beasts in order to find a profitable niche.
There were several different peripherals that could automatically boot a PDP-8 on power-up, such as the MI8E bootstrap loader, giving the same effect as the boot ROM on a more modern computer. These were all extra-cost add-ons, however, even as late as 1974 with the PDP-8/a.
It was common for PDP users to have libraries of common routines laying around, but because PDP-8 assemblers generally don't have include or linking features, you ended up having to textually include such routines into your program, as I have done above with the DECPRT routine I found in one of DEC's publications.

And let's be clear: I didn't just copy-and-paste that text into my program: I had to manually type it in from a PDF I found online. Back in the day, they'd have a printout of such programs in a 3-ring binder somewhere, given to them by DEC when they bought the system, mailed to them as part of their service contract, or exchanged on paper tape with other PDP users at a DECUS meeting. If you were lucky, someone else at your site already did all that and saved the assembly text to disk or tape, so you could textually include it in your program, too. That's not what we call a library today. It certainly doesn't constitute a system ROM or a basic runtime system.

Higher level languages for the PDP-8 series computers did include some sort of runtime system, but we assembly language programmers had to do all this manually.
There are "double precision" integer methods for the PDP-8, which combine two 12-bit words into a 24-bit value, which would be large enough to hold the answer.

(There, I've gone and told you that the answer is under 16.7 million. Such a generous hint. </sarcasm>)

One way to do this is in software, which has the same sort of problem as in the prior footnote: you had to textually include the code for these routines into your program.

Many floating-point arithmetic implementations for the PDP-8 — including DEC's own Floating Point Processor — do solve this problem by effectively giving you 23 or 24-bit integers if you ignore the exponent part, and do everything in the mantissa. For the purposes of solving this problem, I wasn't willing to assume the existence of the FPP, but other solutions to this problem in examples/pep001.* do use the FPP or a software emulation of it.

PiDP-8/I Software