PiDP-8/I SoftwareForum
Not logged in

Pi B+ build of os8-v3f-extensions fails

(1) By tangent on 2018-12-03 11:30:19 [link]

I'm seeing a repeatable — 3 times now — test failure on my Pi B+ when trying to build the software in-tree:

bin/os8-run ./media/os8/scripts/all-tu56.os8 --enable td12k
Running script file: ./media/os8/scripts/all-tu56.os8
Building OS/8 v3d tu56 image
for td12k configuration of TD8E.
before:
after: .
before: Y
after: SYS BUILT
Traceback (most recent call last):
  File "bin/os8-run", line 222, in <module>
    main()
  File "bin/os8-run", line 206, in main
    os8.run_script_file (script_file)
  File "/home/pidp8i/pidp8i/lib/os8script.py", line 917, in run_script_file
    retval = commands[m.group(1)](rest, script_file)
  File "/home/pidp8i/pidp8i/lib/os8script.py", line 1366, in os8_command
    self.simh.os8_send_cmd ("\\.", os8_comm)
  File "/home/pidp8i/pidp8i/lib/simh.py", line 303, in os8_send_cmd
    self._child.expect ("\n%s$" % prompt)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 327, in expect
    timeout, searchwindowsize, async_)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 355, in expect_list
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 104, in expect_loop
    return self.timeout(e)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 68, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0xb69007b0>
command: /home/pidp8i/pidp8i/bin/pidp8i-sim
args: ['/home/pidp8i/pidp8i/bin/pidp8i-sim']
buffer (last 100 chars): 'R CCL\r\n'
before (last 100 chars): 'R CCL\r\n'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 22766
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <open file '/home/pidp8i/pidp8i/obj/os8-run.log', mode 'w' at 0xb64fd230>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: None
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
    0: re.compile("
\.$")   
Makefile:611: recipe for target 'bin/v3d-td12k.tu56' failed
make: *** [bin/v3d-td12k.tu56] Error 1

These test failures have various attempts at voodoo debugging interspersed. (make clean, make reconfig, etc.)

Here's the tail end of obj/os8-run.log:

.RU RKA0:FOTP.SV^MRU RKA0:FOTP.SV^M
*DTA0:<RKA0:TDCOPY.SV^MDTA0:<RKA0:TDCOPY.SV^M
*DTA0:<RKA0:TDFRMT.SV^M^[DTA0:<RKA0:TDFRMT.SV^M
*$^M
.^E^M
Simulation stopped, PC: 01207 (KSF)^M
sim> show td0
show td0^M
TD0 190KW, attached to /home/pidp8i/pidp8i/bin/v3d-td12k.tu56, write enabled^M
    12b^M
sim> boot td0
boot td0^M
^M
.R CCL^MR CCL^M

The last appears to be line 281 from all-tu56.os8. It appears to be expecting a prompt after running CCL which never arrives.

While the B+ is a single-core ARM, causing mmake to only run a single build step at a time, I was able to avoid the error by re-running with just plain make. I don't know if that's a fluke or if there is some essential difference beween make and make -j1 on Raspbian Stretch, so I'm going to re-do the test under plain make again after removing the V3F tapes. That'll take another half hour or so, so I'm choosing to post this now rather than wait for a fifth run.

(2) By poetnerd on 2018-12-03 18:22:52 [link] in reply to 1

I think this is another manifestation of the ^C in BUILD race condition that I thought I fixed, but was only active when debugging was enabled. Update now with the expectation that it won't happen again.

(3) By tangent on 2018-12-03 19:54:54 [link] in reply to 2

Still happening:

bin/os8-run ./media/os8/scripts/v3f-control.os8
WARNING: Failed to lock GPIO for exclusive use.  Won't use front panel.
Running script file: ./media/os8/scripts/v3f-control.os8
copy command:
    from: /home/pidp8i/pidp8i/obj/v3f-build.rk05,
    to: /home/pidp8i/pidp8i/bin/v3f-made.rk05
Traceback (most recent call last):
  File "bin/os8-run", line 222, in <module>
    main()
  File "bin/os8-run", line 215, in main
    s.send_cmd ("detach all")
  File "/home/pidp8i/pidp8i/lib/simh.py", line 676, in send_cmd
    self._child.expect ("\n\\.$")
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 327, in expect
    timeout, searchwindowsize, async_)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 355, in expect_list
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 104, in expect_loop
    return self.timeout(e)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 68, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0xb69380f0>
command: /home/pidp8i/pidp8i/bin/pidp8i-sim
args: ['/home/pidp8i/pidp8i/bin/pidp8i-sim']
buffer (last 100 chars): 'SUBMIT CCL.BI\r\n$JOB ASSEMBLE AND LINK CCL.MA\r\n\r\n$JOB ASSEMBLE AND LINK RESORC.MA\r\n\r\n'
before (last 100 chars): 'SUBMIT CCL.BI\r\n$JOB ASSEMBLE AND LINK CCL.MA\r\n\r\n$JOB ASSEMBLE AND LINK RESORC.MA\r\n\r\n'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 23698
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <open file '/home/pidp8i/pidp8i/obj/os8-run.log', mode 'w' at 0xb653d910>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: None
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
    0: re.compile("
\.$")   
Makefile:595: recipe for target 'bin/v3f-made.rk05' failed
make: *** [bin/v3f-made.rk05] Error 1

And the tail end of os8-run.log:

.COPY DSK:<RKA1:CCL.BI^MCOPY DSK:<RKA1:CCL.BI^M
FILES COPIED:^M
CCL.BI^M
^M
.SUBMIT CCL.BI^MSUBMIT CCL.BI^M
$JOB ASSEMBLE AND LINK CCL.MA^M
^M
$JOB ASSEMBLE AND LINK RESORC.MA^M

Also, I've noticed that when a pexpect timeout occurs, os8-run doesn't clean up after itself, so a subsequent build thinks the half-built output media is finished. Apparently you need some kind of exception, signal, or on-exit handler.

I didn't notice this until recently, so when I wrote earlier that it's happening repeatedly, it was happening on subsequent media builds, not always on the same medium.

Also, I've seen it happen now with both make and make -j1.

(4) By poetnerd on 2018-12-03 20:58:02 [link] in reply to 3

The build process is supposed to be:

  1. Create an empty image that will contain the buildable source: obj/v3f-build.rk05.
  2. Copy in the sources from the POSIX tree using os8-cp -v --action-file actions.txt
  3. Run os8-run v3f-control.os8 to make a copy of the buildable source image into bin/os8-v3f-made.rk05as the image to use to build and then
  4. Perform build actions.

The all-tu56.os8 script that creates the 4 tu56 images should just use bin/os8-v3f-made.rk05 in a read-only mode to copy built bits. Makefile.in should have dependencies that guarantees it's built and ready before any of the 4 runs happen. Those 4 runs should be able to happen in parallel because we boot a scratch image, and use bin/os8-v3f-made.rk05 in a read-only mode. (I see that all-tu56.os8 doesn't mount it read-only but it sure can and probably should.)

v3f-build.rk05 is supposed to be read-only, so if it exists, step 2 to the end should just work.

If it doesn't, hand-removing obj/v3f-build.rk05 and bin/os8-v3f-made.rk05 should put things in a clean ground state.

Would you please send me the output of:

bin/os8-run -v -d ./media/os8/scripts/v3f-control.os8

on the platform that's failing the build.

(5) By tangent on 2018-12-03 22:41:01 [link] in reply to 4

Here it is:

script_files: ['./media/os8/scripts/v3f-control.os8']
options_enabled[]
options_disabled[]
os8-run -- Language Version: 1.0
Running script file: ./media/os8/scripts/v3f-control.os8
New line_ct_stack: [0]
Temp file name is: /home/pidp8i/pidp8i/bin/v3d-temp-Vk8PFk.rk05
Line 7: mount: att rk0 /home/pidp8i/pidp8i/bin/v3d-temp-Vk8PFk.rk05
copy command: 
    from: /home/pidp8i/pidp8i/obj/v3f-build.rk05,
    to: /home/pidp8i/pidp8i/bin/v3f-made.rk05
Pre-existing /home/pidp8i/pidp8i/bin/v3f-made.rk05 found.  Saving as /home/pidp8i/pidp8i/bin/v3f
Overwriting old /home/pidp8i/pidp8i/bin/v3f-made.rk05.save
Line 9: mount: att rk1 /home/pidp8i/pidp8i/bin/v3f-made.rk05
Line 11: boot rk0
Assembling RKA1:BUILD.PA
Line: 15: RU SYS:ABSLDR.SV
Line: 16: * RKB1:BUILD.BN
Line 17: end cdprog SYS:ABSLDR.SV
Line: 19: os8_command: SAVE RKB1:BUILD.SV
os8_send_cmd: expecting: \.
    Last match before: {RU SYS:ABSLDR.SV^M}
    Last match after: {
*}
Assembling RKA1:OS8.PA
Assembling RKA1:CD.PA
Assembling RKA1:RL0.PA
Assembling RKA1:RL1.PA
Assembling RKA1:RL2.PA
Assembling RKA1:RL3.PA
Assembling RKA1:RLC.PA
Assembling RKA1:RXSY1.PA
Assembling RKA1:RXSY1.PA
Assembling RKA1:RXNS.PA
Assembling RKA1:BATCH.PA
Line: 33: RU SYS:ABSLDR.SV
Line: 34: * RKB1:BATCH.BN
Line 35: end cdprog SYS:ABSLDR.SV
Line: 36: os8_command: SAVE RKB1:BATCH.SV
os8_send_cmd: expecting: \.
    Last match before: {RU SYS:ABSLDR.SV^M}
    Last match after: {
*}
Line: 38: os8_command: ASSIGN RKA1 IN
os8_send_cmd: expecting: \.
    Last match before: {RKB1:BATCH.BN^M
*$^M}
    Last match after: {
.}
Line: 39: os8_command: ASSIGN RKB1 OUT
os8_send_cmd: expecting: \.
    Last match before: {SAVE RKB1:BATCH.SV^M
^M}
    Last match after: {
.}
Line: 40: os8_command: COPY DSK:<RKA1:RESORC.BI
os8_send_cmd: expecting: \.
    Last match before: {ASSIGN RKA1 IN^M}
    Last match after: {
.}
Line: 41: os8_command: COPY DSK:<RKA1:CCL.BI
os8_send_cmd: expecting: \.
    Last match before: {ASSIGN RKB1 OUT^M}
    Last match after: {
.}
Line: 42: os8_command: SUBMIT CCL.BI
os8_send_cmd: expecting: \.
    Last match before: {COPY DSK:<RKA1:RESORC.BI^M
FILES COPIED:^M
RESORC.BI^M
^M}
    Last match after: {
.}
popped line_ct_stack: []
Deleting scratch_copy: /home/pidp8i/pidp8i/bin/v3d-temp-Vk8PFk.rk05
Traceback (most recent call last):
  File "bin/os8-run", line 222, in <module>
    main()
  File "bin/os8-run", line 215, in main
    s.send_cmd ("detach all")
  File "/home/pidp8i/pidp8i/lib/simh.py", line 676, in send_cmd
    self._child.expect ("\n\\.$")
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 327, in expect
    timeout, searchwindowsize, async_)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 355, in expect_list
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 104, in expect_loop
    return self.timeout(e)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 68, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0xb69bd6b0>
command: /home/pidp8i/pidp8i/bin/pidp8i-sim
args: ['/home/pidp8i/pidp8i/bin/pidp8i-sim']
buffer (last 100 chars): 'SUBMIT CCL.BI\r\n$JOB ASSEMBLE AND LINK CCL.MA\r\n\r\n'
before (last 100 chars): 'SUBMIT CCL.BI\r\n$JOB ASSEMBLE AND LINK CCL.MA\r\n\r\n'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 25190
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <open file '/home/pidp8i/pidp8i/obj/os8-run.log', mode 'w' at 0xb65b8230>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: None
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
    0: re.compile("
\.$")

As far as I can tell, it doesn't say anything the above doesn't: that it's timing out waiting for the OS/8 dot prompt while running SUBMIT CCL.BI, which is the last line of the script you wanted me to run.

How long is that command supposed to take? Is it possible that the default pexpect timeout is just too short when running that command on a 700 MHz single-core ARMv6 processor? When running the SIMH PDP-8 simulator un-throttled, a Pi B+ runs about ¼ the speed as a Pi 3 in the same condition.

I tried to find a way to increase the timeout, but it looks like there is no way to do that per-command in the current os8-run scheme.

The above was done with my latest checkin which passes the os8-run -d flag down into lib/simh when handling os8 script commands. Was that not the intent?

(6) By poetnerd on 2018-12-04 05:15:26 [link] in reply to 5

Re: passing debug to simh.py:

I'd just never gotten around to hooking it up. Your amendment is most welcome.

Re: How long to run the assembler batch job:

I never bothered to time it because I was always running on hideously fast platforms. It could take a few minutes on a slower platform. It's performing a pretty ambitious set of runs of the MACREL assembler and linker. (17 source .MA files which then go to the linker. Yup. I could take a while.

Re: timeout pass-in:

I would agree that the current os8-run setup does not have a way to pass the timeout through. According to the pexpect doc child.expect can take a timeout argument.

Is there a value somewhere that could be read that exposes the slower speed of your Pi B+? Perhaps we could finesse the timeout issue by scaling the global timeout by the CPU speed?

Otherwise we should come up with a syntax for temporary or permanent amendment to the timeout values.

(7) By tangent on 2018-12-04 09:30:05 [link] in reply to 6

The closest thing would be the teco-pi-demo -b benchmark mode, but that depends on having bootable OS/8 RK05 media!

Next best would be to parse the configuration string emitted by the GPIO thread on startup: [cake] == "runs really fast," [pi3b] == "runs pretty fast", [pi1b] == "slow as molasses." The main problem with this is that it's brittle: it'll have to be updated every time we update the code that generates it, and that code lags reality. (It's actually outdated at the moment: it doesn't know about the Pi 3B+ yet.)

See pi_type() in src/pidp8i/gpio-common.c.

(8) By poetnerd on 2018-12-05 04:01:01 [link] in reply to 7

Ok. How about a completely different approach:

New os8-run statement:

pragma whale

The statement following the pragma gets a bigger time out because it is a "whale sized" action that might time out in normal operation.

What do you think?

(9) By tangent on 2018-12-05 14:50:02 [link] in reply to 8

That amounts to hard-coding a timer using obscure language. I'd rather just make it explicit, like timeout 100 to change the default timeout to 100 seconds.

Another option would be something like complexity 5 to declare that the following statement is about 5 sigma out from the median time for commands. We can add a mode to os8-run that logs time taken for each command. When sorted and graphed, I think you'll get a power law, with most of the commands taking very little time, but a long tail of commands that take arbitrarily long to run. You'd set the default timeout to cover maybe 99% of the commands, then override it for the <1% of exceptions.

You could couple that with timeout scaling, set by adding a build step between building the simulator and building the first generated medium that benchmarks a simple machine code program that takes a large fraction of a second to run on your development machine and several seconds on a Pi B+. So, "complexity 1" might mean 50 ms on your development box but a full second on a slow Pi.

(10) By poetnerd on 2018-12-05 18:48:16 [link] in reply to 9

I'd agree that timeout 100 or complexity <n> would be a more understandable syntax.

I think threading a time-taken output mode to os8-run is a non-trivial undertaking. So too would be a pre-build benchmark test. I wonder if such efforts are warranted.

I would prefer not to introduce major disruptions to os8-run or the build process to get to the other side of a problem of the form, "one of the commands in one of the scripts takes longer than the default hard-coded timeout on a slow platform."

Does v3f-control.os8 run to completion if you double the timeout value globally?

(11) By tangent on 2018-12-06 04:47:12 [link] in reply to 10

I've checked in a minimal change that fixes this problem.

I chose 30 seconds because 15 seconds is too short. 22 seconds also worked, but I'd like a bit of wiggle room.

I've validated dozens of tests under this on my B+. I should have everything re-tested by morning.

(12) By tangent on 2018-12-06 18:03:34 [link] in reply to 11

Thirteen hours and change later, the B+ has gotten though all 200 tests.

That's the final threshold: you can merge this branch down to trunk whenever you're ready.

(13) By poetnerd on 2018-12-06 21:08:20 [link] in reply to 12

Whoo hoo!

(14) By poetnerd on 2018-12-08 15:27:29 in reply to 11

By the way, if we wanted to "obtain relative CPU speed" from the python execution environment, either once at build time, or in real time, there is a library to do that, py-cpuinfo:

https://github.com/workhorsy/py-cpuinfo