PEP001.PA
Or, How I Managed to Spend Three Days Solving FizzBuzz
FizzBuzz is a test problem for software developers, often given early in interviews to quickly reject incompetent candidates. It is famously easy for a competent programmer to solve, yet somehow I — a self-styled competent programmer — managed to spend three days solving it. Let me tell you the story.
The Challenge
I set myself the challenge of solving Project Euler Problem #1 — a close relative of FizzBuzz — in PDP-8 assembly language as a way to force myself to learn the platform. Nothing motivates like an achievable goal, particularly when one's good self-image is on the line.
When I started, I had only been playing with my PiDP-8/I for a few weeks. I had no prior experience with a PDP-8 or any similar machine. The closest prior experience I had was with 6502 assembly language on the Apple ][ series in the early to late 1980s. I knew little more than how to run one of the many PDP-8 assemblers at the start of this challenge.
There are many different commonly-used assemblers for the PDP-8, a testament to the importance of assembly language on the PDP-8. Fortunately, most of them share a common syntax. The major differences among them have to do with pseudo-operations, macros, and peripheral features like how they deal with the PDP-8's page-and-field memory layout.
I also knew that the basic PDP-8 instruction set is really tiny, only 8 instructions at the high level, or more like 24 if you consider the stock microcoded instructions as distinct instructions and not all part of the OPR instruction. For comparison, Microchip's PIC10F200 — which is also a 12-bit processor, like the PDP-8 — has 33 instructions and costs only about 40 cents in quantity. This is a really constrained instruction set!
On the positive side of the balance scale, I've been a software developer for about a third of a century now, most of that in a professional capacity. I've learned an awful lot of programming languages and CPUs in that time, including a few other assembly languages.
How hard could it be?
Trap 1: No ROM, No Runtime
One universal characteristic of all personal computers is some sort of ROM or basic run time system, a fixed chunk of code with a well-defined API that contains commonly-used routines, such as one to print a line of text to the user's display. That could be anything from the PROM chip on an Apple ][ to the BIOS on an IBM PC to glibc
on a Linux box. The point is, modern PCs all have some base level of functionality that you can count on being present so you don't have to continually reinvent the wheel.
I consider the PDP-8 the spiritual ancestor of the personal computer: inexpensive, simple, and small when compared to its contemporaneous competition. Just like its microcomputer descendants, the PDP-8 ran BASIC, but real programmers more commonly wrote in assembly instead. The PDP-8 wasn't quite a "personal" computer, but it was as close as you could get circa 1970, the heyday of the PDP-8, before microcomputers ate low-end minicomputers' lunch.¹
The PDP-8 didn't have a ROM. When you turn a PDP-8 on, the CPU initializes itself, and then nothing else happens. It just sits there waiting for you to give it some code to run. Every single word of program code that the machine runs had to be put in its memory manually by some human.² If you want the PDP-8 to boot an operating system, the operator must manually bootstrap the system, which may involve several stages. Or, the operator could just use the machine without loading anything first, toggling your program in via the front panel switches.
So that brings me to Trap 1: How does this program print an answer to the screen? You can't just call on some pre-existing ROM or runtime routine.³ No, one must write his own decimal number printing and ASCII string printing routines! In assembly language. Debugged via the front panel.
There's one day gone.
Trap 2: No Modulo Operator
The standard solution to FizzBuzz is to use the modulo operator, which returns the remainder when dividing one number into another. 6 mod 4 is 2, for example.
Major problem: the PDP-8's stock instruction set doesn't have a modulo operator, or even a division operator. This was a common limitation of early microprocessors, too: the 6502, the 8080, and the Z80 all lacked a hardware multiplier, and often when you were lucky enough to get a multiplier, you didn't also get a divider, as that's generally more difficult to implement.
Fortunately for me, DEC produced an add-on to the PDP-8/I that I used for this project called the KE8/I, also known as the Extended Arithmetic Element. The PDP-8 simulator I used for this project does have an EAE feature, so I was able to use its DVI
instruction.
But of course the standard PDP-8 assembler doesn't know about the DVI
instruction, or any of the special flags, registers, and microcode instructions you need to use along with it. You must actually define all of this at the top of your program so that the assembler knows how to generate the necessary code!
DVI
is an integer divide instruction that leaves the remainder in the PDP-8's accumulator, which means it effectively implements the modulo operation as a side effect of doing division. But in order to figure that out, you must read the KE8/I specification document, which describes in excruciating detail how the KE8/I is implemented at the silicon gate level. There are even gate-level schematics!
Nowhere in this document does it simply say that "DVI divides the value following the DVI instruction into {AC:MQ}, with the quotient left in MQ and the remainder left in AC." It does say all of that, piece by piece, over the five pages documenting the DVI
instruction, but you must extract it yourself. Here, I've done that for you in a single sentence. You're welcome.
There's another day gone.
Trap 3: No Precision
The problem I set myself states that you must add up the numbers that match the given criteria. The Project Euler web site then prompts you for that large random-looking integer, by which it knows you must have correctly solved the problem.
No, I'm not going to give you that integer. Go write a 5-line Python program to calculate it yourself.
Me, I took 156 PDP-8 instructions to solve it, not counting space for constants, variables, and links, the latter being a dirty trick that PDP-8 assemblers use for jumping between 128-word "pages" of memory. And you thought the 64k segments of real mode x86 assembly were bad!
But I digress.
The PDP-8 is a 12-bit machine. Since the standard arithmetic operations assume you want signed integer arithmetic, this means the largest positive number you can do math on within a PDP-8 is 2047. As the problem is posed on the Project Euler site, you need to add up the matching numbers in the range 1 to 999. That means there are many combinations of only three matching values in that range that, when added together, would overflow our 2047 limit. You simply can't compute an answer that large in a single integer on a PDP-8.⁴
I solved this by making my program report subtotals along the way each time the amount exceeded 1024. Then it would reset the total so far back to 0 and continue work. I chose that threshold as the largest "round number" in binary terms that would not overflow when adding the largest value that matched the criteria, 999.
The program writes this series of subtotals out to the console as a long string of additions:
ANSWER: 1064 + 1059 + 1060 + 1137 + 1158 + 1125 + 1047 + 1125 + ...
It goes on like that for many lines. Since you are probably running this program via some kind of terminal program, you can then copy that string of additions and paste it into bc(1)
, which will do all the adding up using a processor many orders of magnitude more powerful than the PDP-8.
Cheating? Maybe. I briefly considered saving each subtotal to core memory, then writing a routine to add the values up in grade school fashion, column-by-column, right-to-left, carries and all.
I decided that, day 3 gone, that I'd done well enough.
The Solution
Here is my winning solution to the problem, all 217 well-commented lines of it:
/ Project Euler Problem #1, Multiples of 3 and 5:
/
/ If we list all the natural numbers below 10 that are multiples of
/ 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
/ Find the sum of all the multiples of 3 or 5 below 1000.
/
/ Solution by Warren Young of tangentsoft.com, 2016.11.30
////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////
/ DIY ASSEMBLER INSTRUCTIONS
/ Our assembler doesn't know the EAE instructions, so teach it
DVI=7407 / integer divide .+1 into {AC:MQ}, answer in MQ
/ Combined microcoded instruction aliases
CLR=CLA CLL / clear both AC and L
TCA=CMA IAC / two's complement the accumulator
//// MAIN //////////////////////////////////////////////////////////////
/ Program entry point. We purposely reinitialize global variables and
/ processor state in case we're restarting this program in-core.
PAGE 1
MAIN, CLR
TAD (3)
DCA CURR / start with 3, because we know 1 & 2 can't work
DCA TOTAL / reset total to 0
TLS / send null to terminal to get flags set right
TAD (ANSWER) / write "ANSWER: " to the terminal
JMS PRINTS
JMP MLCORE
CURR, 0 / current number we're checking
TOTAL, 0 / the answer so far; at the end, printed out
/ Constants
DECIMAL
MAX, 999 / check natural numbers CURR to MAX; must be < 2048!
STMAX, 1024 / subtotal max; avoids overflow of 12-bit signed int
OCTAL
CRLF, 15;12;0 / ASCII character values; don't forget trailing 0!
PLUS, 40;53;40;0
ANSWER, 101;116;123;127;105;122;72;40;0
//// MLCORE ////////////////////////////////////////////////////////////
/ The core of the main loop. MAIN just inits the globals and calls us.
/ Try dividing 3 into CURR first
MLCORE, CLR
TAD (3)
JMS ISMOD0
SNA / if ISMOD0 left AC = 0, CURR divided evenly by
JMP NXITER / 3, so skip 5 lest we count multiples of 15 2x
/ That didn't work out, so try dividing 5 into it
CLA
TAD (5)
JMS ISMOD0
/ Loop cleanup
NXITER, CLA
TAD CURR
TCA
TAD MAX / = 0 if CURR == MAX
SNA / if so, leave calculation loop
JMP MLDONE
CLR / CURR still < MAX, so increment CURR
TAD CURR
IAC
DCA CURR
TAD TOTAL / if TOTAL is getting too big, print...
TCA / a subtotal and zero TOTAL so we don't...
TAD STMAX / overflow the 12-bit limit
SZL
JMP MLCORE / STMAX - TOTAL > 0 so re-enter loop core
JMS SHOTOT / exceeded threshold, so display subtotal and " + "
CLA
DCA TOTAL / take advantage of free zero
TAD (PLUS)
JMS PRINTS
JMP MLCORE
/ Calculation complete. Show answer and exit gracefully.
MLDONE, JMS SHOTOT
CLA
TAD (CRLF)
JMS PRINTS
JMP ENDG
//// ISMOD0 ////////////////////////////////////////////////////////////
/ If passed AC divides evenly into CURR (in C-speak, CURR % AC == 0)
/ add CURR to TOTAL and return 0 in AC. Else, return nonzero in AC and
/ leave TOTAL untouched.
ISMOD0, 0
/ Divide CURR by DIVISOR, passed as AC
DCA DIVISOR
TAD CURR / load CURR into just-cleared AC
MQL DVI / move CURR to MQ, divide by DIVISOR...
DIVISOR,0 / ...quotient in MQ, remainder in AC
SZA
JMP I ISMOD0 / remainder nonzero, so leave early
/ Division left AC empty, so CURR divides evenly by DIVISOR!
TAD CURR / don't need to clear AC; prior test says AC == 0
TAD TOTAL
DCA TOTAL
JMP I ISMOD0
//// SHOTOT ////////////////////////////////////////////////////////////
/ Write TOTAL to terminal in decimal, nothing following.
SHOTOT,0
CLR
TAD TOTAL
JMS DECPRT / print answer on console, in decimal
TSF / wait for terminal to be ready again
JMP .-1
JMP I SHOTOT / and done
//// PRINTS ////////////////////////////////////////////////////////////
PRINTS,0
DCA SADDR / save AC as string address
PSNEXT, TAD I SADDR / load next character
SNA
JMP I PRINTS / found the null terminator; leave
TSF / wait for terminal to be ready
JMP .-1
TLS / write character to the terminal
CLA / increment string address pointer
TAD SADDR
IAC
DCA SADDR
JMP PSNEXT / look at next character
SADDR, 0
//// ENDG //////////////////////////////////////////////////////////////
// End program gracefully, either re-entering OS/8 if we can see that
// its entry point looks sane, or halting with the answer in AC so the
// user can see the answer on the front panel.
ENDG, CLA
TAD OS8ENT
TCA
TAD OS8INS1
SNA
JMP 7600 / re-enter OS/8
CLR
TAD TOTAL
HLT / not running under OS/8, so halt
OS8ENT, 7600 / OS/8 entry point
OS8INS1,4207 / first instruction at that entry point
//// DECPRT ////////////////////////////////////////////////////////////
// Decimal number printer; see examples/routines/decprt.pal
PAGE 2
DECPRT, 0
DCA VALUE /SAVE INPUT
DCA DIGIT /CLEAR
TAD CNTRZA
DCA CNTRZB /SET COUNTER TO FOUR
TAD ADDRZA
DCA ARROW /SET TABLE POINTER
SKP
DCA VALUE /SAVE
CLL
TAD VALUE
ARROW, TAD TENPWR /SUBTRACT POWER OF TEN
SZL
ISZ DIGIT /DEVELOP BCD DIGIT
SZL
JMP ARROW-3 /LOOP
CLA /HAVE BCD DIGIT
TAD DIGIT /GET DIGIT
TAD K260 /MAKE IT ASCII
TSF /OR TAD DIGIT
JMP .-1 /JMS TDIGIT(SEE 8-19-U)
TLS /TYPE DIGIT
CLA
DCA DIGIT /CLEAR
ISZ ARROW /UPDATE POINTER
ISZ CNTRZB /DONE ALL FOUR?
JMP ARROW-1 /NO: CONTINUE
JMP I DECPRT /YES: EXIT
ADDRZA, TAD TENPWR
CNTRZA, -4
TENPWR, -1750 /ONE THOUSAND
-0144 /ONE HUNDRED
-0012 /TEN
-0001 /ONE
K260, 260
VALUE, 0
DIGIT, 0
CNTRZB, 0
//// END ///////////////////////////////////////////////////////////////
/ Assembler-generated constants will appear below this in the list file
$
Allow me to repeat that in most modern programming languages, you get the same result in about 5 lines of code, and you don't need to fob the arithmetic off on bc(1)
.
Isn't it glorious?
Epilogue
There is now an optimized version of the above code checked into the code repository which fits everything into a single PDP-8 page of core, just barely: it takes up exactly 128 words of core memory.
Most of that optimization work was done by Rick Murphy.
Footnotes and Digressions
Minicomputers did, of course, live on quite some time after the microcomputer revolution: there were VAXen and S/390s and such. The point is that they all moved to the high end because they couldn't compete with PCs at the low end. Then the workstation revolution of the mid 1980s further eroded the minicomputer marketplace, so that by 1990, new minicomputer designs weren't all that "mini" at all: they had to morph into mighty beasts in order to find a profitable niche.
There were several different peripherals that could automatically boot a PDP-8 on power-up, such as the MI8E bootstrap loader, giving the same effect as the boot ROM on a more modern computer. These were all extra-cost add-ons, however, even as late as 1974 with the PDP-8/a.
It was common for PDP users to have libraries of common routines laying around, but because PDP-8 assemblers generally don't have include or linking features, you ended up having to textually include such routines into your program, as I have done above with the DECPRT routine I found in one of DEC's publications.
And let's be clear: I didn't just copy-and-paste that text into my program: I had to manually type it in from a PDF I found online. Back in the day, they'd have a printout of such programs in a 3-ring binder somewhere, given to them by DEC when they bought the system, mailed to them as part of their service contract, or exchanged on paper tape with other PDP users at a DECUS meeting. If you were lucky, someone else at your site already did all that and saved the assembly text to disk or tape, so you could textually include it in your program, too. That's not what we call a library today. It certainly doesn't constitute a system ROM or a basic runtime system.
Higher level languages for the PDP-8 series computers did include some sort of runtime system, but we assembly language programmers had to do all this manually.
There are "double precision" integer methods for the PDP-8, which combine two 12-bit words into a 24-bit value, which would be large enough to hold the answer.
(There, I've gone and told you that the answer is under 16.7 million. Such a generous hint. </sarcasm>)
One way to do this is in software, which has the same sort of problem as in the prior footnote: you had to textually include the code for these routines into your program.
Many floating-point arithmetic implementations for the PDP-8 — including DEC's own Floating Point Processor — do solve this problem by effectively giving you 23 or 24-bit integers if you ignore the exponent part, and do everything in the mantissa. For the purposes of solving this problem, I wasn't willing to assume the existence of the FPP, but other solutions to this problem in
examples/pep001.*
do use the FPP or a software emulation of it.
License
Copyright © 2017-2018 by Warren Young. This document is licensed under the terms of the SIMH license.