PiDP-8/I SoftwareForum
Not logged in

cc8 cross compiler

(1) By poetnerd on 2018-12-23 06:43:02 [link]

I've spent a fair bit of time understanding how the cc8 cross compiler has problems.

The root cause is the basic assumption that sizeof int = sizeof char * = sizeof int *, and that you can just willy nilly say things like:

int foo;
char *bar;
foo = bar;

printf ("%s", foo); 

and everything will work.

After wailing on the cc8-64-bit branch for a while, I got some of the basics working, but re-defining various ints to long, and by having various routines that returned int to return char * instead.

The Mac was consistently blowing out when a pointer was copied into a variable defined as an int, and then later an attempt was made to use the variable as a pointer.

Unfortunately, that is how for and while are implemented at the most basic level. So the array of 7 integers, some of which hold buffer pointers will have to be rewritten as a struct. This turns out to be major surgery. However this work has already been done on a modern port of Small C at:

The code is MUCH cleaner, even though the structure and modularity of the code is the SAME. Everything is where you would expect it to be if you were familiar with one codeline or the other.

The differences / improvements are as follows:

  1. Names are cleaner. (longer but easier to understand.)
  2. Indentation is saner.
  3. Instead of pointers, everything is an integer index into arrays of the appropriate data type.
  4. Complex data structures, instead of being arrays of integers interpreted on the fly without casts, are broken down into structures, and accessed with proper static types.
  5. Lots of comments have been added to the code.
  6. In places where changes were radical, the old version of the code is kept in comments.
  7. union and struct, constructs missing from the original Small C, have been added in.

At this point, in order to get cc8/cross working on the Mac, while.c, and stmt.c will require radical surgery.

I vote instead for porting code8.c into SmallC-85 and adopting it instead. I think the code8.c module should be pretty straightforward to port to the new code line.

The only problem I see is that there's no license described on the github repository. It is described as:

"Ron Cain's Small C public domain compiler revived after 30 years."

After a bit of digging, I've discovered that one needs to publish a license or else the default is Copyright with all rights reserved. I've forked the SmallC-85 repo, and proposed "the unlicense" (as amended by you on tangentsoft to restrict use of author names in advertising) as a pull-up into the repo.

(2) By poetnerd on 2018-12-24 05:59:39 [link] in reply to 1

The owner of the SmallC-85 tree merged in my the proposed license. That obstacle is removed.

I've done a lot of the re-targeting SmallC-85 to the PDP-8. The day I spent learning why the old cross didn't work served me well in chasing down problems with the re-targeting.

At the present time my libc.s (haven't merged in the name change code) is not quite identical to the on my Pi, but the compilation gets all the way through generating output instead of seg-faulting.

(3) By tangent on 2018-12-24 14:39:28 [link] in reply to 2

Sounds fine. Just keep it on a branch until this next release gets out. The only CC8 related stuff I intend to go into the next release is if Ian Schofield decides to send me updates to the existing version.

(6) By poetnerd on 2018-12-30 03:49:06 [link] in reply to 3


(16) By poetnerd on 2019-01-01 18:48:27 [link] in reply to 3

Ian sent me a zip file that he called the latest version of the native compiler.

It differs in various ways from what we have checked in now.

Do you have anything in your inbox that's labeled an update?

(17) By tangent on 2019-01-02 14:02:13 [link] in reply to 16

I haven't received anything from Ian in many months.

I'm rather disillusioned with the whole idea of a native C compiler for the PDP-8, because "C compiler" puts a rather definite idea in my head, and from what I've seen, you can't provide that within the constraints of the PDP-8.

It wouldn't be so bad if what was claimed was something like "K&R C compiler" or "UNIX V5 C compiler," so I could dial back my expectations to some definite target, but Ian refused to put any kind of design target on his efforts. That leaves me to my own expectations, which are that of someone who first learned C around 1991.

Cross-compilers are a different deal entirely, since the much looser constraints of modern PC platforms mean you can spend much more resources on optimization, error messages, source analysis, etc. I believe the current CC8 offering does make some steps in this direction, but what we have today is miles from GCC for the PDP-8.

Why GCC? Because when you say "C compiler" in 2019 (!) that's the sort of thing many programmers immediately think about. That's why it bugs me so much that there is no definite target, an exemplar that we can match, which when achieved, will tell us when we're done.

At this point, if I had to write a complicated program in a high-level language for the PDP-8, I think I'd write it in FORTRAN IV.

(18) By poetnerd on 2019-01-02 17:50:39 [link] in reply to 17

I agree with you on many points here.

Complicated programs for the PDP-8 probably need to live in FORTRAN-IV.

"C Compiler" and "PDP-8" are, I think, provably "architectural non-sequiturs": Although B, the predecessor to C, and perhaps the first version of C was developed on a PDP-9, the simplest, and I think very accurate "sound-byte description" of C would be, "Syntactic sugar around the PDP-11 instruction set." The PDP-11 architecture was conceived as pretty much antithetical to the PDP-8 architecture:

  1. Larger address space.
  2. Larger word size.
  3. Stack support primitively.
  4. Multiple accumulators with an orthogonal approach to their support.
  5. Re-entrant subroutine support at the hardware level.

C relies on all those features, none of which are in the PDP-8.

Additionally, the way the C parsing and code generation is designed to happen is based around having a minimum of 28K of working-set all available at run-time. FORTRAN-IV can finesse this issue by having a simpler syntax, and a more straightforward code generation design. Remember that FORTRAN-IV doesn't even support recursion!

I still remember the last "Fortran Man" comic in People's Computer Company Magazine: "Fortran Man vs. Pascal Man". The punchline was:

  • Pascal Man calls himself recursively and multiple instances of himself attack Fortran Man.
  • Fortran Man tries to call himself recursively.
  • Pascal Man sees this and calls out, RETURN!
  • Fortran Man, lost in an infinite loop, is defeated.

The best we can ever hope for in a native C compiler for the PDP-8 is a toy with significant compromises. Yet it is precisely these compromises that give us an appreciation of computer architecture.

Mind you, this sort of, "Start with a toy and flesh it out into something real," has a long tradition with PDP-8s. Look at the difference between FOCAL-69 and U/W FOCAL.

I agree totally with the notion that explicit expectation setting for Ian's cc8 for OS/8 is useful and important. However, I'd also say that he may not, himself, fully enough understand where the limits lie.

Bottom Line: CC8 is an evolving version of Tiny C. Each pass currently suffers from living in a 4K instead of a 28K working set.

Given all this, I suggest the following as the concept behind the action plan:

Let our PiDP-8 distribution provide a reasonable, working cross compiler to enable further play with cc8. Mind you this does mean more work for me and smc-85-cc8. :-)

(19) By tangent on 2019-01-03 06:53:22 in reply to 18

C relies on all those features, none of which are in the PDP-8.

While all of those differences are of course true, I think that list misses the main problem, which is just that the PDP-8 can't address enough memory to provide all of "C," for arbitrary definitions of "C."

What I want to get to is putting a pin into the development timeline of "C" that tells what we're trying to achieve here. We might find that it's possible to match C as of UNIX V5, but not as of K&R 1978, for example.

On the address space issue, I'd prefer that we think about that differently for the cross compiler than for the native compiler. As I understand it, the two are nearly forked already in the current offering. If we can get a better C by concentrating our efforts on the cross compiler and treating the native compiler as a fun subset toy, that's fine with me. C cross-compilers targeting 8 and 12-bit CPUs have a very long history. This is do-able.

The word size issue is also not a serious problem. My wildest dream in this area is to be able to achieve what several other language implementations have on the PDP-8: double-word integer and floating point precision, both in software and hardware implementations. The current design of CC8, which bases it partly on the FORTRAN II infrastructure of OS/8 should make this easier, though I wonder if some future version of CC8 will be based on RALF/FORTRAN IV instead, so it can get the features of FRTS.

I have zero expectation of CC8 ever getting beyond 24 bit values.

The stack issue is a red herring. You can do stacks in software, and our compiler is free to invent its own calling convention. I don't know if CC8 currently uses JMS instructions, but it certainly doesn't have to. It can do everything in terms of JMP. We can just say something like "page 1 of field 0 is the stack." I assume something like this has already been done, since one of the current CC8 examples is recursive.

he may not, himself, fully enough understand where the limits lie.

The bigger problem, to my mind, is that you can't reliably construct working programs purely by following the restrictions in the current documentation.

That problem has two forks:

  1. The current docs don't fully capture the current CC8 limitations. That's partly why I filed all those Low priority bugs against CC8.

  2. Many of those limitations are weaknesses in the implementation, not intentional design limits. While doing the initial CC8 integration work, I must have sent Ian half a dozen programs that did not compile, and he'd rearrange it in some way to make it work. I don't want to have to understand CC8 to his level before I can write working programs. I'm a pragmatic programmer: I want someone to give me a set of hard and fast rules for the language. Then I can play within those rules.

(4) By poetnerd on 2018-12-28 05:38:22 [link] in reply to 1

Status report:

I've got a first pass completed with the new SmallC-85 port targeting the PDP-8.

src/cc8/os8/{libc.c,n8.c,p8.c,c8.c} all seem compile without complaint. and have been validated against builds on the pi.

Differences are microscopic. (See context diffs below.) I actually think that were mine differ, I'm correct.

Today n8.c and p8.c would not compile on my Pi. The complaint was about many illegal definitions. I think there's some include file it's not finding. I'll chase that down soon.

Here are context diffs:

wdc-home2:scratch wdc$ diff -c ../os8/
***	2018-12-28 00:09:26.000000000 -0500
--- ../os8/	2018-12-27 23:55:12.000000000 -0500
*** 1,5 ****
  /	SMALL C PDP8 CODER (1.0:27/1/99)
! /	FRONT END (1.0:27/1/99)
--- 1,7 ----
  /	SMALL C PDP8 CODER (1.0:27/1/99)
! /FRONT END (2.7,84/11/28)
! /FRONT END FOR ASXXXX (2.8,13/01/20)
*** 4018,4028 ****
! GBLS,	128

  MCC0,	0
--- 4020,4030 ----
! GBLS,	240

  MCC0,	0
wdc-home2:scratch wdc$ diff -c ../os8/
***	2018-12-28 00:02:40.000000000 -0500
--- ../os8/	2018-12-28 00:11:25.000000000 -0500
*** 1,5 ****
  /	SMALL C PDP8 CODER (1.0:27/1/99)
! /	FRONT END (1.0:27/1/99)
--- 1,7 ----
  /	SMALL C PDP8 CODER (1.0:27/1/99)
! /FRONT END (2.7,84/11/28)
! /FRONT END FOR ASXXXX (2.8,13/01/20)
*** 400,410 ****
  CC0,	114;0;67;67;46;67;0;119
! GBLS,	128

  MCC0,	0
--- 402,412 ----
  CC0,	114;0;67;67;46;67;0;119
! GBLS,	132

  MCC0,	0

(5) By poetnerd on 2018-12-30 03:48:17 [link] in reply to 4

New status report:

I've got SmallC-85 building with no warnings with the default clang C compiler settings. That codeline is on github at:

I put it there because I wanted to be able to push changes upstream to the active developer. Note that that codeline is NOT what I'm currently building src/cc8/os8 with. That is because that codeline is going to be my CLEAN port of what is working now.

The "dirty" port in the cc8-64-bit branch now builds identical .sb files except for two comments:

  • The front end version string
  • The GLOBAL POOL count.

I'm discussing with Ian what the most useful form the latter should take. It's an indicator of how much of the symbol table is consumed by cross when performing a compile. Currently I output a count of globals consumed, whereas he output the number of bytes consumed in the global symbol table. (Symbol table entries are a larger data structure in SmallC-85, but that does not affect the code output on the targeted platform.

My next step is to make a new branch off trunk (which has the latest of everything, including the src/cc8/os8 sources), pull down the clean SmallC-85 and redo my port onto the cleaned up SmallC-85 base.

(7) By poetnerd on 2018-12-30 06:15:16 [link] in reply to 5

I've just checked in the clean, re-targeted SmallC-85.

But now that I go to look for branch smc85-cc8, I can't find it.

Apparently I have checked it into the trunk. This annoys me since it is NOT what I intended.

I was originally going to call it cc8-85-plus, renamed the checkout directory a couple times and then created the branch. Here are excerpts from my shell history output:

937  mkdir cc8-85-plus
938  cd cc8-85-plus/
939  ls
940  fossil open ~/museum/pidp8i.fossil 
941  ls
942  cd ..
943  mv cc8-85-plus SmC85-cc8-plus
944  cd SmC85-cc8-plus/
945  cd ..
946  mv SmC85-cc8-plus smc85-cc8-plus
947  mv smc85-cc8
948  mv smc85-cc8-plus smc85-cc8
949  cd smc85-cc8/
950  fossil ci --branch smc85-cc8
951  fossil sync
952  ls
953  ./configure prefix=/Users/wdc/PDP-8/PiDP-8/runtime
954  tools/mmake
955  cd src/cc8

Then I redid my port to the cleaner SmallC-85 codeline and tested it. Then went back to fossil:

1049 cp -p old-cross/ctype.h cross 1050 fossil status 1051 ls cross 1052 cd cross 1053 fossil add LICENSE 1054 fossil add README 1055 fossil add extern.h 1056 fossil add initials.c 1057 fossil add struct.c 1058 fossil status 1059 fossil commit 1060 fossil undo 1061 history | grep fossil 1062 history | less

When I saw I was on the trunk I tried an undo but it wouldn't un-commit the changes.

Warren, if you want to push this commit into a branch, you're welcome to. I'm sorry that I messed up the trunk.

Why was this not happening in branch smc85-cc8? "Creating Branches" in says:

That is to say, you make your changes as you normally would; then when you go to check them in, you give the --branch option to the ci/checkin command to put the changes on a new branch, rather than add them to the same branch the changes were made against.

Was my mistake that I needed to say --branch in the commit command, and that the ci command was ignored because there was nothing changed?

(8) By tangent on 2018-12-30 11:04:45 [link] in reply to 7

Apparently I have checked it into the trunk.

Yes, which means fixing it was difficult because a simple merge would have backed out everything that's happened on trunk since your initial fork of smallc-85 from trunk.

I've tried to repair it as best I can, which may be "poorly" since I have no idea what's going on in that directory. All I know is that it compiles and doesn't revert anything I want kept. You'll have to check it for sanity beyond that.

Note in particular that extern.h is back, due to a #include of it from one of the diffs in the accidental trunk checkin. If it's supposed to go away for good, some manual rework will be needed.

While doing this, I found that you had two "smallc-85" branches, one merely claiming to start the branch, but with no real content, and the other having the actual work.

The useless branch was created on line 950 in your history transcript: note that no changes were made to files tracked by Fossil in prior commands.

I wonder if you're coming from recent experience with another VCS that requires that you create a branch before doing any work on it. With Fossil, you generally want to complete the first checkin's worth of work on a new branch, then check it in with

 $ fossil ci --branch new-branch-name

to create the first checkin on the new branch.

This order of operations makes more sense to me, since I often don't know I need a new branch until I'm about ready to check something in, and realize it can't go on the tip of its parent branch. Creating a new branch lets me save the new content durably without messing up the parent branch. It is the File → Save As of the VCS world. :)

As soon as you create two branches with the same name, Fossil begins warning you about it on every sync operation. Since the useful version of your smallc-85 branch has several checkins on it, Fossil warned you several times. Fossil isn't prone to issuing warnings for poor reasons. When it does, it's trying to help.

When I saw I was on the trunk

Lesson 1: Run fossil status before doing a checkin after any stretch of work complex enough to cause you to lose situational awareness. It's just like checking your mirrors occasionally when driving.

Lesson 2: I can see from the length of your checkin messages that you are composing them in a text editor — as opposed to fossil ci -m — which means Fossil tells you up front which branch it's going to put the checkin on in the "tags" line within the commented-out section of the checkin message buffer. A branch is just a special type of tag.

Lesson 3: Pay attention to the output of Fossil. Not just warnings, but also the post-sync message that tells you what it did. You didn't have to guess that the checkin went on trunk: it told you. You can then fix it like I did:

fossil up ab1ea                         # "stub" branch
fossil merge --integrate 50ae87         # merge real-work branch into stub
mmake                                   # test merge; mmake is in my PATH
bin/cc8                                 # does it run?
fossil diff | less                      # try to understand changes; fail
fossil ci                               # double-tipped branch fix
fossil merge --cherrypick 8ef9b5fae0    # lots of merge conflicts; no good
fossil revert                           # start over
fossil up 8ef9b5fae0                    # use accidental checkin instead
cp -r src/cc8/ ~/tmp                    # save modified cc8 cross code
fossil up smallc-85                     # get to branch 8ef9b should be on
cp -r ~/tmp/cross src/cc8               # crudely apply 8ef9b to branch
fossil diff | less                      # try to understand; fail again
fossil add src/cc8/cross/extern.h       # diff showed this was needed
mmake                                   # does it build?
bin/cc8                                 # does it run?
fossil ci                               # okay, hope it's sane

When I saw I was on the trunk I tried an undo but it wouldn't un-commit the changes.

Fossil's undo only affects the working checkout. Once a checkin is written to the Fossil block-chain, it's permanent, on purpose. The only sensible option at that point is the rework I show above.

Was my mistake that I needed to say --branch in the commit command

No, it's that you somehow got yourself onto trunk, then did work that should have been on a branch.

This is why I show, in the HACKERS file, a directory structure where every working branch is in a separate checkout tree. Wherever possible, you want to change working branches with "cd", not with "fossil up". Switching branches within a working directory opens you to problems like this.

In my transcript above, you see a lot of switching branches in place, but that's an exception case, done briefly only to clean up the mess. As soon as it was cleaned up, I switched that checkout directory back to its original branch.

As to your literal question, you only give --branch in a fossil ci command on the first checkin creating that branch. You don't need to give it on each subsequent checkin. That's how a branch differs from any other tag in Fossil: it's self-propagating. Once you're on a given branch, every subsequent checkin goes on that same branch until you give --branch again to create another fork in the development history, or you use a command like fossil up other-branch-name to switch branches.

(9) By poetnerd on 2018-12-30 15:38:32 [link] in reply to 8

From all your tutelage, my misunderstanding seems to boil down to, "coming from recent experience with another VCS that requires that you create a branch before doing any work on it." As you know I'm not very experienced with branching, and while learning branching in git to interface with github for the SmallC-85 baseline, I did indeed think that both git and fossil needed the branch creation first.

The deltas look right to me:

  • Refreshed files from old .../src/cc8/cross
  • New files initials.c, struct.c, LICENSE, README
  • Re-purposing of extern.h from cc8-64-bit exploration to cover all external uses.

My intent with that branch was simply:

  • With latest trunk as baseline.
  • Replace src/cc8/cross with new content that is SmallC-85 from github, with a re-importation of Ian's targeting for the PDP-8.
  • Include the LICENSE file that was adopted upstream.
  • Include the README file from upstream.
  • Create externs.h analogous to the exploration in cc8-64-bit but with more stuff in it to cover all external references to procedures across modules.

When I do:

fossil up smallc-85

a whole bunch of stuff changes. I presume this is because the smallc-85 has as its baseline the trunk from a year ago.

Now that I understand better, I'm going to create the branch smc85-cc8 with trunk as baseline and move forward from there.


What are the lines marked EXECUTABLE in the fossil status? I didn't touch those files myself, as far as I know, and they weren't reported before my branch was shunted to BOGUS.

wdc-home2:smc85-cc8 wdc$ fossil status
repository:   /Users/wdc/museum/pidp8i.fossil
local-root:   /Users/wdc/src/pidp8i/smc85-cc8/
config-db:    /Users/wdc/.fossil
checkout:     8ef9b5fae0d50f0c614b6b3d45444cb61cf70eab 2018-12-30 05:59:38 UTC
parent:       85596d132e12b212593d708162f9d60d23154e3a 2018-12-28 12:57:40 UTC
tags:         BOGUS
comment:      New code base for cc8/cross: SmallC-85. SmallC-85 is a revived and updated port of
          SmallC that is much cleaner. This version compiles on the Mac using clang silently
          without warnings, and runs well enough to generate .sb files that are functionally
          the same as what was generated on the non-portable cross. (Accidentally checked in
          on trunk, moved to BOGUS for later cherry-pick to the smallc-85 branch.) (user:
EXECUTABLE autosetup/autosetup-config.guess
EXECUTABLE autosetup/autosetup-config.sub
EXECUTABLE autosetup/autosetup-test-tclsh
EXECUTABLE autosetup/jimsh0.c
EXECUTABLE autosetup/migrate-autoconf
EXECUTABLE autosetup/sys-find-tclsh

(10) By tangent on 2018-12-30 16:45:33 [link] in reply to 9

I'm going to create the branch smc85-cc8 with trunk as baseline and move forward from there.

That shouldn't be necessary. Just merge the latest trunk into that branch:

$ fossil up smallc-85
$ fossil merge trunk

Build, test, and check it in if it's all okay. Then the diff from the tip of trunk to the tip of smallc-85 will contain just the actual differences you've been working on, plus whatever remains of my cc8-64-bit branch's work, if anything.

The only time it becomes different is if any of the files involved have had changes to the same area of the file in both branches since the fork. Then you need to do a manual merge of those areas. Since CC8 hasn't changed in quite a while, that's not too likely, and if it did happen, you probably just want to overwrite the old cc8 with the new SmallC base.

This is a common thing to do on long-lived branches that are intended to eventually merge back into their parent branch.

What are the lines marked EXECUTABLE in the fossil status?

It means the file has the POSIX executable bit set in the checkout, and it didn't have it in the prior checkin on that branch. It's mainly done as a warning for Windows users, since for historical reasons, Windows maps its "archive" file bit to the POSIX executable bit where the two systems are in use at the same time. e.g. Windows client talking to Linux box over Samba; or Cygwin, etc. Thus it's common on Windows for files to get +x bits set for stupid reasons; Fossil is cluing you into the problem so you don't cause others' checkouts to have unwanted +x bits on files.

Of those files, only autosetup-find-tclsh is executable on trunk. Since you're not on Windows, I guess you've got some other interfering process setting +x bits on files. Maybe you're swapping tarballs with Ian Schofield, who does use Windows? Regardless of the reason, say "fossil revert file-name" to reset the file's mode properly. (I'm assuming there is no actual file content difference in those files.)

You might think that autosetup/autosetup needs to be +x, but it's not run directly. It's run indirectly via the top-level configure script using tclsh or jimsh0, so the shebang line at the top is never actually important.

(13) By poetnerd on 2018-12-30 18:14:14 [link] in reply to 10

Thanks for the additional tutelage. The concept of long lived branches makes sense.

Hopefully, the next time I need such a merge, I’ll have the confidence to do it with fossil.

(11) By poetnerd on 2018-12-30 16:53:19 [link] in reply to 9

It was simplest to just re-create my branch from baseline.

We now have the smc85-cc8 I intended to create yesterday.

The smallc-85 branch is based on too old a trunk. How do I kill it?

(12) By tangent on 2018-12-30 18:09:35 [link] in reply to 11

Click on the Timeline link above, click the checkin ID of the tip of the branch you want closed, click Edit, and mark the branch Closed.

All that does it hide it from the Branches list, but it's good hygiene to keep obsolete branches off that page.

(14) By poetnerd on 2018-12-30 18:22:57 [link] in reply to 12

I think I followed your instructions, but I still see the branch on the timeline. I also saw a separate “Branch Hiding” bit. Should I set that?

(15) By tangent on 2018-12-30 23:53:43 [link] in reply to 14

It's fine. I should have said that it's marked "closed" on the Branches page so that when you sort by the Status column, all of the active branches are grouped together.

Hiding the branch only affects the timeline. I usually don't bother hiding things from the timeline, since it'll fall off the first screen quickly enough with subsequent activity.