HP 3000 Simulator Debugging Completed ===================================== Listed below are problems in the HP 3000 simulator that have been debugged and fixed. The writeups may be useful when debugging other sections of the simulator. ---------------------- CST Expansion Firmware ---------------------- Cold loading MPE V/E with CSTX firmware enabled succeeds through *WELCOME* and then aborts with: Escape Code = %000010 %000000 17:00/13/SP#6/SPOOLED OUT **** PASCAL EXECUTION ERROR (PASCERR *) **** SYSTEM FAILURE #310 STATUS %140057 DELTA-P %053760 Programmed halt, CIR: 030377 (HALT 17), P: 131275 (PCAL 1) sim> Tracing instructions after WELCOME with and without firmware enabled yields this difference in segment 27: [new, line 40706]: >>CPU instr: 01.166640 041606 LOAD Q-6 >>CPU data: 04.125672 103003 stack read >>CPU reg: 04.125730 000001 A 103003, X 000002, M I t r o c CCL >>CPU instr: 01.166641 031000 PCAL 0 >>CPU data: 04.125731 000002 stack write >>CPU data: 04.125732 041012 stack write >>CPU data: 04.125733 140427 stack write >>CPU data: 04.125734 000034 stack write >>CPU data: 00.000000 012140 absolute read >>CPU data: 00.012140 000677 absolute read >>CPU opnd: 00.000000 000677 physical label 103003 resolves to CST entry 0003 >>CPU data: 00.012154 141067 absolute read >>CPU data: 00.012154 161067 absolute write >>CPU data: 00.012156 000400 absolute read >>CPU data: 00.012157 030336 absolute read >>CPU data: 04.125735 000002 stack write >>CPU data: 04.125736 076304 stack write >>CPU data: 04.125737 140403 stack write >>CPU data: 04.125740 000004 stack write >>CPU reg: 04.125740 000000 X 000002, M I t r o c CCL >>CPU reg: 00.000000 000003 PB 030336, PL 004333, DL 124500, DB 175430, Q 125740, Z 133350 P [old, line 39413]: >>CPU instr: 01.166640 041606 LOAD Q-6 >>CPU data: 04.106672 103303 stack read >>CPU reg: 04.106730 000001 A 103303, X 000002, M I t r o c CCL >>CPU instr: 01.166641 031000 PCAL 0 >>CPU data: 04.106731 000002 stack write >>CPU data: 04.106732 001012 stack write >>CPU data: 04.106733 140427 stack write >>CPU data: 04.106734 000034 stack write >>CPU data: 00.000001 013674 absolute read >>CPU data: 00.013674 000003 absolute read >>CPU opnd: 00.000001 000003 label 103303 resolves to CSTX entry 0003 >>CPU data: 00.013710 062002 absolute read >>CPU data: 00.013710 062002 absolute write >>CPU data: 00.013712 000005 absolute read >>CPU data: 00.013713 077430 absolute read >>CPU data: 05.107437 040133 program read >>CPU data: 05.107431 041311 program read >>CPU opnd: 00.107437 040133 STT 006 is 041311 locals 100 total 133 >>CPU reg: 04.106734 000000 X 000002, M I t r o c CCL >>CPU reg: 03.000000 000303 PB 077430, PL 107437, DL 105500, DB 174630, Q 106734, Z 114350 P label is coming from Q-7 before trace starts. Expanding trace to start at LOG FILE produces: [new, 1,620,546]: >>CPU instr: 03.174245 033450 LLBL 50 >>CPU data: 03.177617 006502 program read >>CPU data: 03.177547 003003 program read >>CPU opnd: 00.177617 006502 STT 050 is 003003 locals 015 total 102 >>CPU opnd: 00.000000 000000 STT 050 resolves to logical label 003003 >>CPU reg: 04.125655 000001 A 103003, X 000001, M I t r o c CCG Note that RA now contains a physical mapping! When that label is resolved by the PCAL 0, it resolves to CST 3 rather than CSTX 3. Kaboom! Fundamental problem is that by setting the "old" external bit, LLBL loses the physical/logical meaning. When a label lookup is used directly by PCAL, EXIT, etc., it works OK because the mapping is carried with the PLABEL. With CSTX firmware, PCAL 0 is defined as taking an external label, so bit 0 is redundant. But as a result, the mapping has been lost, resulting in transfer to the wrong segment! Fixed by redoing PLABEL to store the label in the new format and keep the internal/external status rather than the mapping mode. ------- MPE V/E ------- + Cannot boot with the disc in REALTIME mode. MPE hangs after reporting, "PART 6 OF 6 COMPLETED - SYSTEM PROCESS CREATION." A trace shows that the channel program is looping on a DSJ instruction. The problem seems to be that MPE executes an HIOP instruction while a disc channel program is executing. The HIOP occurs after the DSJ has addressed the card but before the card returns the DSJ value. The program status is 100010, executing and waiting for a device status request (PHI interrupt on inbound FIFO occupied). The HIOP changes that to 040010 (halt pending and waiting for device status request), and New Status asserts CSRQ. At the same time, the DSJ byte arrives, and the PHI also asserts CSRQ. The CPP is entered, program status is correctly identified as a channel request that is waiting for the PHI, and the prescribed action taken ("write the device number to Register F, unmask all PHI interrupts, and fetch the next channel instruction, as the program cannot be halted at this point." The "next" channel instruction is still the DSJ, which should be reentered to get the data at this point. However, the process_channel code (line 2741) sets "reentry" unconditionally FALSE. This causes the DSJ executor to readdress the device, send the DSJ command, and wait for the reply. Because the status is still 040010, the above process then repeats (forever). There is a big window for the HIOP to hit (3658 event ticks) in REALTIME mode, whereas the window is only 4 ticks in FASTTIME mode. [FIX] Derive reentry status from the channel status word in DRT3 on entry. + RESTORE hangs after reading tape file 4 of "subsys-compilers.tape". MPE attempts to read 16384 bytes from the 576-byte record in record mode, which completes normally. Read Record, Wait, then PPR after 64 bytes in FIFO, then Read Data, then PPR after FIFO reaches 65 again, then FIFO drops to 0/1 and stays there until last byte in, then PPR disabled, then stop phase completes, then PPR. Then, channel does End | clear DSJ | inhibit poll | clear poll, then Read Byte Count, then Wait. The End is supposed to clear the polls generated during the record transfer, but it actually clears the end-of-command poll. So the process hangs at the Wait. [FIX] Extend the FASTTIME stop time; REALTIME mode works fine. + Can't get the CI prompt at any aux terminal. Pressing RETURN disconnects the session, even in CALTIME mode. Problem is MPE sets modifier 2 Interface Control to 0 (none) after I press RETURN, and this calls tmxr_control_line, which sees that DTR is not in the setting, so it disconnects the line. If I comment out the tmxr call, then aux terminals work fine. [FIX] Omit the call until the first time modifier 2 is called with a modem control (normally, it's either "echo" or "none") and once it's set, continue to call for that port. That's how the ATC works -- the port is "direct" until an ATCC call, then it becomes "data set". + Reload gets to "LOADING OF SYSTEM FILES IN PROGRESS" and then fails with: TAPE UNIT # 0 WENT NOT READY STACK MARKER TRACE Q-7 Q-6 Q-5 Q-4 X REG DELTAP STATUS DELTAQ SEGMENT NAME ------ ------ ------ ------ ------ ------ ------ ------ --------------- 1736 200 42 100 42 1561 100012 150 FILEIO 30370 50 0 0 16 1532 103020 71 MAINSEG1B 0 15 100002 4 BOOTSTRAP 0 0 0 0 ERROR #451 TAPE FILE LABEL CHECKSUM ERROR Programmed halt, CIR: 030364 (HALT 4), P: 012310 (RSW) There seem to be (at least) two problems. First, a Read Record terminates with a tape mark seen, but DSJ = 0, so then the Read Data command is sent. But because the tape is in the error phase, DMA times out and then aborts. Second, an eventual Read Status does not show the tape mark. [FIX] Add ma->dsj = 1 to the SCPE error return from the tape library. + MPE V/E boots and runs, but after logging the operator off, the console is unresponsive to CTRL+A. I can press RETURN to get the CI prompt, but the console remains dead, except for RETURN. [FIX] The problem is the console has a timeout on it, and because I am using REALTIME mode for the clock, it times out before I can enter any characters. If I switch to CALTIME, CTRL+A works normally but =SHUTDOWN never completes because the calibrated clock is not incrementing. Switching back to REALTIME after entering SHUTDOWN completes the process normally. + Pressing BREAK while in QUERY works as expected. Pressing BREAK while at the CI prompt causes the terminal to stop responding. Pressing BREAK after logging the operator off causes SF 715, "console terminal failure." [FIX] Reading the NUL produced by the BREAK should reset DSJ 2, but it doesn't. Clearing UART BREAK status fixes the issue. + Pressing BREAK while a program is running doesn't produce the ":" prompt until a few seconds elapse. This is a clock issue, as if the CPU clock is REALTIME, the colon appears immediately. [FIX] The issue should fix itself once the calibrated clock is working. --------------------------- GICDIAG 2.11 (S28S231C.SPL) --------------------------- + Step 26 fails with "PARERR failed to set." and "DMASTAT=0 when PARERR=1." The diag uses the MCS instruction to write bad parity, and we don't handle this. The diag does A = 174054, B = 000000, MCS to set ACR = %174, disable PER, disable ECC, and use ACR. The combination of the first two perform a write initialize, using the ACR for check bits rather than the ECC generation. Then it does M [%120] = 0, which produces a location with bad parity. Then it does A = 000000, B = 000000, MCS to reset the controller to normal. DMA has been set up to do a one-byte transfer from memory location 120, so starting DMA produces a parity error abort. The diag then "fixes" memory by doing A = 000050, B = 000000, MCS, which sets up another write initialize, this time using generated ECC. Then it does M [%120] = %134. Finally, it does A = 000000, B = 000000, MCS to reset the controller to normal. [FIX] This can be faked by having MCS set a flag so that (only) IMB memory reads fail. The flag sets with MCS xxxx54, clears with xxxx50, and MCS 000000 is ignored. Or maybe it only gives an error on the next read and resets the flag automatically (might be safer). Only the GIC uses IMB memory accesses. + Step 31 fails with "DMA clock rate=0 MHz, expected 9 MHz." The diag starts a 10,000 byte transfer, waits a measured time, and then aborts DMA and checks the DMA byte count, which is 999 in the failure case. The diag then calculates (1000 - 999) * 9 / 1469. The result must be between 7 and 11 to pass, i.e., it expects between 1142 and 1795 bytes to have been transferred in that time. Diag does INIT, reg 6 = outbound, reg 8/9 = bank/offset, reg A = 10000, reg B = address freeze + inbound + no diag (so the DMA clock runs free). DMA goes from state 0 to 8 and then blocks until data arrives. The diag then loops for about 1000 instructions, aborts DMA, and reads the remaining count from reg A. It is 999 because DMA blocked and never ran. INIT sets PHI offline and unaddressed. Setting reg 6 = outbound asserts DMARQ continuously because the outbound FIFO is never full (nothing is filling it). This should essentially "bypass" the input wait, as DMARQ, which should be "FIFO has data" is actually asserted continuously. So junk data is transferred without waiting. Bug is that this "illegal" condition is not tested, so the normal input wait is done. (Reading from the PHI with no data results in an "indeterminate value and the HANDSHAKE ABORT bit in reg 2 is set.") However, another issue is that DMA cannot be allowed to run continuously in this mode, as it will never yield, and so will run the entire transfer between one CPU instruction and the next! Should allow only so many DMA cycles to occur in one go before yielding. But then, there is no mechanism to continue the transfer in the absence of a data transfer! NRFD_service could be scheduled, but it calls transfer_data to resume an outbound transfer, and that routine will not call dma_cycle unless there is data in the FIFO! So maybe NRFD_service needs to have two modes, depending on whether the PHI (not DMA) is inbound or outbound (DMA is inbound in either case -- NRFD is holding off other GIC talker). + Step 33 fails with "DEVRQ/RIOC fails for NS." The diag sets reg 3 to 140377, then does IFC, which ends with a PP and a CSRQ for the PHI interrupt for the empty FIFO (channel request), then sets PP7 sense and mask bits, which causes CSRQ and device request, then clears mask bit and does SIOP, which sets New Status and CSRQ for PHI interrupt and New Status change (channel request). Diag is expecting device request. Current reg F code says if PHI IRQ, then will be chanrq, either with device = PPR if poll active, or with device = reg B if not. + Step 33 fails with 'DEVRQ false when NS>0 and not CIC." This is a consequence of the prior Step 33 failure, as DEVRQ is not being decoded properly. + Step 34 fails with "CHANRQ/OBSI not false when DMIN is false." This is a CHANRQ decoding failure. + Step 32 fails with "DMA timeout before one second." and "CPVA(0)=!E004." The diag sets up a one-byte offline inbound transfer, which hangs in state 8. It then delays in a loop for 900 msec. and then checks CPVA 0 for a value other than E004, then delays 2000 msec, and then checks CPVA 0 again for the E004 value. The timeout seems to be about 11,000 instructions short, or about 30 msec. REALTIME. Adjusting the timeout period to 1.03 seconds succeeds (1.02 seconds fails). Limits allowed are 1.03 - 2.43 seconds, suggesting that 1.73 seconds = 9 MHz. + Step 41 fails with "IFC did not clear device." Diag sets Listen Always, then does IFC and checks for no listener. For PHI, IFC clears T/L status while it is asserted, then reestablishes T/L when IFC is denied. Bug is that T/L is being reestablished immediately during IFC assertion; should be moved to IFC denied! + Step 42 fails with "no poll response" and "no poll respond bits set." Diag does PP configure dev 1-7 and sets poll mask ch 11 reg 4 0-7. Then sets ch 10 reg 7 dev to 0-7 and expects poll response in ch 11 reg 2. [FIX] Change to set poll_configure unconditionally on PPC and configure on SCG only if R1_LISTENER; either way, set poll_active after config SCG. + Step 43 fails with "Device did not handshake." Diag sets ch 11 talk always and ch 10 listen always and sends byte, but it's reported as "no acceptors". Problem is that ch 10 listen always does not set phi->listeners bit in ch 11 controller. [FIX] Setting LA or TA must also set bits in controller, which is always the second dispatch table entry assuming this PHI is a device. And must also set listening bit in controller's dispatch table entry for the device GIC! + Step 43 fails with "REN not cleared at device." Problem is denying REN by writing to ch 11 reg 6 does not deny remote status in ch 10 reg 1. The other case, asserting REN at ch 11 reg 6, does set remote status at ch 10 reg 1. [FIX] R1_REMOTE is cleared only when entering LADS via Listen. Should also clear in hpib_accept when REN is denying. + Step 22 fails with "DMA abort failed, state = 19". The step seems to do one more RIOC to clock the state machine than it should. DMA moves correctly from state 10 to 18 when the abort is done, but then GICDIAG does four clocks, which moves from 18 to 19 before checking the state. The move is correct, but the test is coming one clock late for some reason. Reason is that the DMAENF signal is registered, so it appears one state after it is asserted! --------------------------- GICDIAG 1.26 (S28S231C.SPL) --------------------------- + Step 72 fails with "status is 00 should be 10." Problem is that status is set when dma_cycle is entered in state 18, but status is being tested when dma_cycle EXITS in state 18. State 18 is one of several that assert the SETEND signal, and this appears when the ROM outputs are clocked in phase 2 of the DMA clock, i.e., after phase 1 has clocked the state. FIX: dma_cycle needs to generate these phase-2 signals separately after each phase-1 state. + Step 81 fails with "DMA state machine is 00 expected 08". Diag is doing B-write (start DMA), 8-read, 8-read, dma_cycle 0 -> 0, E-write (abort DMA), 8-read, 8-read (state 0), dma_cycle 0 -> 8, fail. FIX: When starting DMA, do not initialize dma_clock if it is already running, as the diagnostic depends on maintaining synchronization from test to test. --- DUS --- + DUS 3.06 boots and appears to run OK. But when executing IOMAP, a system halt 4 (stack overflow on the ICS) occurs. A CPU trace shows that a CST violation trap occurs at the timer IXIT before the mayhem ensues. [FIX] The SED 1 deferred timer interrupt is not setting up the CR and TR as it should (as it does when the interrupt is not deferred). Also, it set a timer interrupt without checking to see if the timer is enabled by TON. + Running ADCCDIAG, at the ">" prompt, the first character of any command is not echoed. The program appears to run with the interrupt system off, so the last Interrupt/Halt channel instruction remains pending. The ADCC is coded to give priority to IRQ, so the SIOP that would configure the ADCC to read the command line never executes, because the unanswered IRQ holds off the CSRQ that would start the program. DUS itself runs with interrupts on, so this doesn't happen with DUS commands. + GIADIAG executes an MCS instruction (unimplemented). It does: 01.017407: LDI 0 01.017410: ZERO,NOP 01.017411: MCS 01.017413: DELB,NOP 01.017414: DELB,NOP 01.017415: STOR Q-4,I ----------- DUS-III AID ----------- AID 1.01 for the Series III has a bug in the RUN command initialization. The bug stems from the the fact that the DRT layout for HP-IB devices differs between the Series III and the Series 3x-7x machines, as follows: Word Series 3x/4x HP-IB Series III HP-IB ---- -------------------------------- -------------------------------- 0 Channel program absolute address Channel program absolute address 1 CPVA absolute address Interrupt handler program label 2 Interrupt handler program label CPVA absolute address 3 Channel program status Channel program status Words 1 and 2 are exchanged for the Series III so that the interrupt microcode can set up the correct program label. There are two places in AID that set up the DRTs: in procedure CLEARVARTABLE, which is called from the RUN executor, and in the RSIO executor. The former sets up the WRONGINTERRUPT handler label and the default CPVA pointer in all DRTs. The latter sets up the SERVICEINTPT handler label and either the user-specified CPVA pointer or the default pointer. The RSIO executor uses the correct Series III format. The RUN executor does not. As long as interrupts from HP-IB devices occur as a result of running channel programs, everything works fine. However, if an interrupt occurs outside of a channel program, or more precisely before a channel program has executed for that DRT, AID dies with a Bounds Violation trap as a result of interpreting the CPVA pointer as the interrupt handler label. There are two possible workarounds. The first is to insert an empty channel program on the specified channel and device into the AID program to be run: BSIO xx IN H RSIO This will initialize DRT words 1 and 2 correctly. However, should an unexpected interrupt occur on another DRT, the crash will still occur. Also, it must be executed each time the program is run. The second workaround is to change the code in AID itself to correct the use of DRT words 1 and 2. The AID code in procedure CLEARVARTABLE is essentially: HOLD:=%10; REPEAT BEGIN DRTW(HOLD, 1, %77771); DRTW(HOLD, 2, LOGICAL ASMB(LLBL WRONGINTERRUPT)); END;WHILE (HOLD := HOLD + 1) < %1001; The two DRTW calls are rendered as: 01.037541 041402 LOAD Q+2 01.037542 131401 LDX Q+1 01.037543 020321 PSTA 01.037544 120401 INCM Q+1 01.037545 033553 LLBL 153 01.037546 131401 LDX Q+1 01.037547 020321 PSTA 01.037550 120401 INCM Q+1 Exchanging the LOAD Q+2 with the LLBL 153 corrects the DRT word 1 and 2 assignments: sim> examine -m 01.037541 01.037541: LOAD Q+2 sim> examine -m 01.037545 01.037545: LLBL 153 sim> deposit 01.037541 033553 sim> deposit 01.037545 041402 sim> examine -m 01.037541 01.037541: LLBL 153 sim> examine -m 01.037545 01.037545: LOAD Q+2 This can be corrected on the DUS-III tape by searching for this pattern: B301 20D1 A101 (LDX Q+1 ; PSTA ; INCM Q+1) ...and exchaning the words that appear before the two instances of this sequence (4302 and 376B). ---------------------------- HP32002E.01.00 MPE Operation ---------------------------- + Running OPT/3000 and issuing the #T (Enter Systems Tables Context) command and then accepting the default display by responding to "ENTER COMMAND FOR DISPLAY" with a CR starts to display the Table Utilization Report, but then MPE dies with: **** SYSTEM FAILURE #16 STATUS %100001 DELTA-P %002347 The crash is repeatable. However, if the GIC channel number is changed from the default 11 to any other number, the command succeeds. Setting the simulator to trace CPU instructions from the problematic command to the system failure shows that the failure occurs while OPT is walking through the DRT list. It starts at DRT 8, the first possible HP-IB DRT, and continues by eights until it reaches the declared highest DRT number. For each four-word DRT examined, it retrieves DRT words 1 and 2. On the Series III, these are the interrupt handler program label and the interrupt handler data base, respectively. It then adds 8 to the contents of word 1 and loads the value to which that points. So it is adding 8 to a program label, which is nonsense. However, it's not nonsense on an HP-IB machine, such as the Series 4x. On those machines, the DRT layout is different. Word 1 is the CPVA absolute address, and word 2 is the interrupt handler program label. So adding 8 to the CPVA pointer, which actually points to the ILT entry, accesses the word that is a pointer to the channel program. After some more offsets and indirect loads, OPT attempts to move a 6-word block from a calculated system data segment to OPT's data segment. But because the bank and offset values supplied to the MFDS instruction are bogus, the microcode traps with a DST violation. The move is conditional on one word loaded in the bogus indirect sequence matching another word loaded in the bogus indirect sequence. In the case that fails, the values happen to match, and the MFDS is executed. In the cases where it succeeds, the values do not match, and the move is skipped. But in both cases, OPT is looking at the wrong values in the DRT. On the V/R system, there is a file named MPEMIT2.DOCUMENT.SYS. Its title is, "Multiprogramming Executive Operating System (MPE V/R) for Series II/III/30/33 machines," and it has a table that is captioned, "The following products have been update/changed on MPE V/R." Mentioned in the table is OPT/3000. The V/P-delta-1 version is A.00.12. For V/R, the version is A.33.20. The version we tested is A.00.31. There's an additional note that says, "The following products were patched for this MIT." The first one listed is "HP32238 OPT." So the bottom line is that the tested OPT accesses DRTs using the HP-IB system layout rather than the SIO system layout. I presume that's why a special V/R version of OPT was needed. + Defining a WELCOME message appears to work, but when the next logon attempts to print the message, an infinite number of CRLFs are printed instead. Routine WELCOMEMES in module 51 prints the message. The message is stored in the extra data segment whose number appears in location 0.001277. This value is multiplied by 4 and added to the value in 0.000002 to give the DST entry of the segment. Words 3 and 4 of the entry give the bank and offset of the XDS. If no welcome message is defined, location 0.001277 is 0. sim> e 0.1277 00.001277: 000057 sim> e 0.2 00.000002: 004574 sim> e 0.5070/4 00.005070: 020037 00.005071: 001002 00.005072: 000003 00.005073: 163430 sim> e 3.163430/20 03.163430: 100001 03.163431: 000174 03.163432: 021400 03.163433: 040502 03.163434: 041504 03.163435: 020115 03.163436: 042506 03.163437: 043510 03.163440: 020104 03.163441: 044512 03.163442: 045447 03.163443: 046115 03.163444: 047117 03.163445: 020063 03.163446: 030060 03.163447: 030040 sim> e -c 3.163430/20 03.163430: \200,SOH 03.163431: NUL,'|' 03.163432: '#',NUL 03.163433: 'A','B' 03.163434: 'C','D' 03.163435: ' ','M' 03.163436: 'E','F' 03.163437: 'G','H' 03.163440: ' ','D' 03.163441: 'I','J' 03.163442: 'K',''' 03.163443: 'L','M' 03.163444: 'N','O' 03.163445: ' ','3' 03.163446: '0','0' 03.163447: '0',' ' The WELCOMMES comments say that the first record appears at DB+3. DB+0 is the "usage count", DB+1 is the length of the segment, DB+2 upper byte is the character '#', and DB+2 lower byte is the length of the first line. Each subsequent line starts on a word boundary and is padded with a blank if necessary to make the string length odd, and the byte preceding is the line length. ==> Problem is that the first line length is 0 above. The routine outputs the zero-length line (resulting in CRLF), then increments the pointer by "line length + 1 >> 1". So the pointer never changes, and a succession of zero-length lines are output. It appears that the lines are written incorrectly by the WELCOME command. The code in CXWELCOME (module 85) that writes the line length is: NEXTLINE(-1):=IOCOUNT; <> ...where NEXTLINE is a BYTE POINTER to the location where the text of the line has been written by an FREAD on the terminal. Is there a problem with negative byte addressing? CXWELCOME is in segment OPLOW, which is in CST 17, at offset 007565. The routine starts at 04.060015. READLOOP is 04.060167. FREAD call is 04.060240. The NEXTLINE(-1) statement is: 04.060312: LOAD Q+4 04.060313: LDXN 1 04.060314: STB Q+15,I,X >>CPU fetch: 04.060314 166415 instruction fetch >>CPU reg: 04.036566 000001 A 000005, X 177777, M I t r o c CCG >>CPU reg: 03.000000 000017 PB 050230, PL 066007, DL 035474, DB 163430, Q 036551, Z 047704 >>CPU fetch: 04.060315 041404 instruction fetch >>CPU instr: 04.060314 166415 STB Q+15,I,X >>CPU data: 04.036566 000006 stack read >>CPU data: 03.063432 140005 data read >>CPU data: 03.063432 140005 data write >>CPU instr: 04.060315 140017 simulation stop: Step expired ==> BUG! Should have written the value 5 to the right-hand byte of location DB+2, so to location 03.163432. But it wrote to 03.063432, so it lost the sign bit! Problem is in cpu_ea. "base" is 163430 (DB), "displacement" is 6 (X), "selector" is lower, DBANK is 03, SBANK is 04. Bug is in this line: displacement = displacement + X; ...which sets displacement to 200005 (000006 + 177777). Then this line: base = base + (displacement >> 1) & LA_MASK; ...sets base to 063432 (163430 + (200005 >> 1), or 163430 + 100002). ==> FIX: mask the displacement indexing to 16 bits. Additional notes: negative byte displacements are also used in the following locations during COOLSTART and SHUTDOWN: ANY CHANGES? NO >> PBANK = 07, P = 037321, PB = 013077, X = 177777, CIR = 166604 STB Q-4,I,X (loop) >> PBANK = 07, P = 037321, PB = 013077, X = 177776, CIR = 166604 STB Q-4,I,X (loop) >> PBANK = 01, P = 055715, PB = 047430, X = 177777, CIR = 166604 STB Q-4,I,X DATE (M/D/Y)?03/13/88 TIME (H:M)?16:48 SUN, MAR 13, 1988, 4:48 PM? (Y/N)Y LOG FILE NUMBER 357 ON *WELCOME* :HELLO OPERATOR.SYS;HIPRI >> PBANK = 03, P = 127437, PB = 127430, X = 177777, CIR = 156700 LDB S-0,I,X (4x) 16:48/13/SP#6/SPOOLED OUT 16:48/#S1/14/LOGON FOR: OPERATOR.SYS,OPERATOR ON LDEV #20 HP3000 / MPE V E.01.00 (BASE E.01.00). SUN, MAR 13, 1988, 4:48 PM >> PBANK = 04, P = 023563, PB = 023430, X = 177777, CIR = 156700 LDB S-0,I,X (loop) : The locations are: 07.037317 (Mitroc 003, P0 = 2.24220) INITIAL/RESIDENT, PROCEDURE DNTOA, called by PROCEDURE LDMAP via NTOA other half of data segment seems to be HALT %10s. 01.055715 (MItroc 301, P0 = 0.6263) maybe PROCEDURE NTOA in module LOAD? 03.127435 (MItroc 102, P0 = 0.7) NURSERY, PROCEDURE UPSHIFT, called for ":HELLO OPERATOR.SYS;HIPRI" by PROCEDURE STARTDEVICE doing upshift in place, LDB tests for zero byte at end 04.023563 (MItroc 106) UDC + When restoring the compilers after a system reload, a System Halt was encountered. This occurred immediately on resumption after the tape simulator was set for REALTIME and then set online. The halt was "System halt, P: 076146 (EXIT 5)". The code leading up to it is: 00.076126: LSEA 00.076127: LOAD Q+40 00.076130: BRE P+7 00.076131: LOAD Q+24 00.076132: EXF #0:#8 00.076133: CMPI 1 00.076134: BNE P+3 00.076135: INCA,NOP 00.076136: BR P+6 00.076137: LOAD Q+24 00.076140: ANDI 377 00.076141: CMPI 1 00.076142: BNE P+2 00.076143: INCA,NOP 00.076144: SSEA 00.076145: DDEL,NOP 00.076146: EXIT 5 This appears to be the exit from procedure FETCHSEGMENT in module 92 (KERNELC, lines 8763-8770). But reentering the commands that caused the failure do not reproduce it, even after repeated attempts. Another failure was noted when attempting to store the example files: HP3000 / MPE V E.01.00 (BASE E.01.00). MON, FEB 22, 1988, 4:36 PM :FILE T;DEV=TAPE :STORE SQR,TEN,TENCMD;*T;SHOW STORE/RESTORE, VERSION 2 (C) 1981 HEWLETT-PACKARD CO. MON, FEB 22, 1988, 6:56 PM ?18:56/#S1/15/LDEV# FOR "T" ON TAPE (NUM)? Simulation stopped, P: 071144 (PAUS 0) sim> att ms0 ex.tape MS: creating new file sim> c System halt 4, P: 076146 (EXIT 5) sim> This time, we capture syshalt 4, which is "stack overflow on the ICS." The stack registers are: Q: 013062 SM: 013062 SR: 0 Z: 013772 SBANK: 00 ...and it appears that the abort is in cpu_exit_procedure, because the stack marker at Q-3 to Q-0 is: 00.013057: 000347 00.013060: 003070 00.013061: 141110 00.013062: 100066 [Q] 00.013063: 000000 00.013064: 001000 The abort occurs when new_q is checked: if (new_q > Z || new_sm > Z) /* if either the new Q or SM exceed the stack limit */ MICRO_ABORT (trap_Stack_Overflow); /* then trap with a stack overflow */ new_sm is calcuated as "Q - 4 - field & R_MASK", which is 013051 and so is below Z. However, new_q is "Q - operand & R_MASK", where "operand" is the delta Q value at 013062, and so is 112774, which is above Z. This causes the stack overflow trap and the subsequent syshalt. Bit 0 in delta Q looks wrong. An interrupt ICS stack marker can have bit 0 set if the dispatcher was interrupted, and that is handled for IXIT but not for EXIT, which should not be called for an interrupt. The only place that bit 0 is set is within cpu_setup_ics_irq if the cpx1_ICSFLAG and cpx1_DISPFLAG are both set. There is another problem: at syshalt, STA = 140000. That is, the current code segment number is 0, which is illegal! ==> Theory: is there a problem when two interrupts occur simultaneously? CTRL+E is recognized only when the console keyboard is polled during an ATCD service. The PCLK, ATCD, and CLK are co-scheduled and serviced in that order. So when CTRL+E is recognized and the simulator stops, the CLK service is about to be called. The clock nominally ticks each millisecond and interrupts every 100 ticks. However, for idling, the clock actually ticks every 10 msec and interupts after every 10 ticks. To maintain the illusion of ticking every msec, the clock count register is updated each time the simulator stops. A clock interrupt, therefore, will immediately follow after resumption if the count register is 99 (service adds one tick and compares to the limit register, which is set to 100). If the tape drive is attached or set online at the simulator stop, a tape interrupt is set as part of the online action. If this occurs when the clock count register is 99, both tape (DEVNO 6) and clock (DEVNO 3) will be requesting interrupts when the simulator is resumed. Is this handled properly? ==> Test: Stop the simulator, set the clock COUNT register to 99, attach a tape, and resume. Does a syshalt occur? YES! Enabling tracing for the CPU, IOP, CLK, and MS devices shows: >>CPU instr: 00.071145 000000 external interrupt The parameter, i.e., the interrupting device, is 0! If the clock is set to interrupt on resumption, but the tape is not, then the trace shows: >>CPU instr: 00.071145 000003 external interrupt ...which is correct. ==> Problem is that iop_poll is called twice before resuming execution: once on the first pass, when the micromachine is halted, and then a second time when the micromachine is running. The second call returns 0 because the original interrupting device (3) is holding off lower-priority interrupts (6). But because cpx1_EXTINTR is set, cpu_run_mode_interrupt is called with device = 0. cpu_run_mode_interrupt calls cpu_setup_irq_handler and passes parameter = 0. For an external interrupt, the routine reads the interrupt handler label from the DRT at location "parameter * 4 + 1". However, parameter is 0, so the location read is 00.000001, which is the origin of the CST extension. The value there, 007710, is interpreted as an internal label, so the code segment number in STA is not changed from the 140000 (STATUS_M | STATUS_I) value established by the routine. P is set up based on the "label" offset, and execution goes off into the weeds. Luckily, the EXIT instruction encountered shortly after causes the syshalt as noted above. ==> FIX: The hardware will not conduct another interrupt poll until EXTINTR is cleared. iop_poll should return IOA if cpx1_EXTINTR is set, rather than executing another poll (and returning 0). + Attempting to restore the compilers tape hangs the RESTORE program. Doing: :FILE T;DEV=TAPE :RESTORE *T;@.@.@;SHOW ?14:04/#S1/14/LDEV# FOR "T" ON TAPE (NUM)? =REPLY 14,7 ...just waits at PAUS. A trace shows that a Select Unit 0 command is never starting. The problem is that the order of execution is: - The previous channel program ends with a SETINT - The tape interface does INTREQ - The CPU runs the interrupt handler - The handler does DSTARTIO - The channel program does PCONTSTB with Select Unit 0 - The handler does DRESETINT Because the INTACT FF is set when the handler does PCONTSTB, the command is noted but not started by calling the controller. When DRESETIO is done on handler IXIT, the INTACT FF is cleared. The interface then checks the INTREQ FF, which is clear, and then the SIOBSY FF, which is set. This prevents the interface from starting the pending command. This is wrong. If PCONTSTB defers a command, that means that the channel is running, so requiring the channel to be inactive before starting the command that was set up by PCONTSTB is incorrect. >>> Removed the sio_busy test from the call_controller code in DRESETIO. Now, the Select Unit command starts, and completes immediately, but no SR is done, so the channel hangs. The problem is INTOK is set after the DRESETIO, so a poll is done, and that overwrites the RQSRV that is returned by the Select Unit command. >>> INTOK must not be set if sio_busy is TRUE. Now, returning with RQSRV does not set SR. device_sr is set by call_controller, but SRn is never set. SRn is set only in CHANSO, which does not accompany DRESETIO. >>> DRESETIO now falls into CHANSO. Now, it works for REALTIME but fails for FASTTIME. Problem is that if a rewind completes while in an interrupt handler (so INTOK is not set), then attention sets but the poll is deferred. But after DRESETINT, the poll is never done, so the attention interrupt never occurs. >>> DRESETIO now calls the controller if CMRDY or INTOK is set. The :RESTORE now works properly for either REALTIME or FASTTIME. + Mounting a magnetic tape does not produce the "volume mounted" message. Attaching a tape invokes auto-volume recognition (S08), which runs an SIO program. After asserting INTREQ, MS controller sits in a loop until INTREQ + INTACT is zero. However, AVC is called from within the interrupt processor, so INTACT is still set. Consequently, the SIO program hangs at first command (Select Unit 0). GIP (S55, line 1480) calls monitor with state = 6 for unexpected interrupt at 00.064553 PCAL 31 (00.064554 is IXIT). Monitor for tape is SIODM (SA8). STATE6 is line 3983 (00.045457 LOAD Q-5,I). Line 4001 is AWAKE(DEVRECPCBP,JUNKWAIT,NOWAIT) (00.045524 PCAL 161). DEVREC is S08; ASSEMBLE(MFDS 4) line 596 is 04.001003 MFDS 4. At DOTAPE line 744 (04.000331 PCAL 2), seems to want to rewind the tape, which does an SIO while INTACT is still set. --> I don't see how this works. S18 line 440 after STARTIO call is 04.003557 BLE P+5. --> It might work, if RIN was sent to the tape controller. Once the INTACK FF was cleared, the SIO program would continue. The fact that RIN is never sent is suspicious. A Rewind could take a long time, so the interrupt handler must certainly return to wait for it. There is one absent code segment trap on IXIT. No other traps are present, and this one executes on the user stack, so it returns with EXIT. in KERNELC: 00.111503 PSEB (QI-18 2 => 1) 00.111504 DISP ([Q] => %100000) 00.043766 PSEB (QI-18 1 => 0, dispatch interrupt) ...but we're still on the ICS from the tape IRQ. >>> PSEB isn't checking for execution on ICS before starting the dispatcher! With that fixed, mounting a tape produces: 15:53/10/LDEV 7 UNIT FAILURE. I/O STATUS % 54 However, with SET MS REALTIME, the result is: 15:55/10/VOL (UNLABELLED) MOUNTED ON LDEV# 7 >>> Looks like another race condition. >>> 9/23/15 I didn't change the fast timing, but this now works! Must have been some other issue that is now corrected. + Running QUERY and entering ADD causes PROGRAM ERROR #29 :STACK UNDERFLOW. Code is doing an ADDS 0 with RA = 004000 and SM = 174743. Resulting sum is 000743, which is < DB 137030, so underflow trap. Microcode [1553] sets SP1 = 003777 ([S] - 1), then SP1 = SM + SP1 + 1, then JSB BNDC. BNDC [1751] does Z - SP1 (175051 - 000743) = 174106, then POS (fails), so JMP EX11! Actually checks that Z - SP1 is positive, i.e., within 32K of each other. Also checks that SP1 - DB is positive for stack underflow. BNDC called from 1533 (SETR Q), 1541 (SETR S), 1557 (ADDS/SUBS), 2452 (SXIT). BND2 called from 1451 (PSHR). >>> Modify bounds checking to account for arithmetic overflow or underflow. + Running BASIC and calling XDBEXPLAIN produces a bounds violation when the error message is to be printed. Violation is on MVB PB,3 with A 011721, B 046157, C 003114, PB 113630, PL 125727, DL 044674, DB 047030, Q 050451, Z 054573. A is a signed byte count, so that looks wrong here. B is the PB-relative source byte displacement, and this also looks wrong, as the source word address is 136717, so > PL. C is the DB-relative destination byte displacement, which is word address 50476 (OK). The test program called XDBOPEN with a password not terminated by a semicolon. This preceding instrucion pair isn't right, though: 04.124012 131403 LDX Q+3 04.050454 177777 stack read 04.050542 000001 A 161000, X 177777, M I T r o c CCL 04.124013 014600 SCAN,X 04.050542 000001 A 142000, X 200000, M I T r o c CCL Note that X has overflowed! >>> The SCAN instruction doesn't mask its register additions! + Attempting to RESTORE the SPL source tape fails with RESTORE waiting for the drive. >>> Problem is that the first Read Record fails with Tape Mark Seen. An interrupt occurs, and another channel program is started that issues a Select Unit 0. But this is held off because the interrupt hasn't been reset. When it is, the command is executed and SRn is asserted, but the IOP, which issued the DRESETINT, doesn't do anything with it, so the MPX is now hung. Once an interrupt is asserted, the controller sits in a tight loop waiting for the interrupt to be reset. When it is, the controller returns to the idle loop and looks for the next command. This is simulated by the command_pending flip-flop, but resumption doesn't work properly. >>> Fixed by having MS PCONTSTB call mpx_assert_SRn instead of asserting SRn when invoked via DRESETINT. The other option would have been to respond to SRn in iop_direct_io by calling mpx_assert_SRn, but this must only be done when processing ioRIN (else if TIO called while a channel program is in process, and device_sr is set, then MPX will get a spurious SR). + If a read is pending on the system console (i.e., ":" is present), then entering =SHUTDOWN works but does not print the expected "ALL JOBS LOGGED OFF" message. Instead it prints a single digit (0, 1, or 2) on a line, followed as expected by SHUT. The digit is the leading digit of the time. For example: : =SHUTDOWN 2 SHUT Shutdown occurs in CONSSHUTDOWN in S09. GENMSG is in S59. Call is GENMSG(1,206,,,,,,,0), which should write "nn:nn/[#J/S]/PIN/" prefix and then the message, e.g., "12:06/1/ALL JOBS LOGGED OFF". Earlier GENMSGs are OK, e.g., "18:18/13/SP#6/SPOOLED OUT". Message catalog is SCAS002C.txt. The "ALL JOBS LOGGED OFF" message correctly appears if either of these is true: - The operator session on the console is logged in. - No read is pending (i.e., SHUTDOWN entered immediately after :BYE). Originally, it appeared that the message appeared if the system clock was set to REALTIME instead of CALTIME. However, doing so accelerated the read timeout from 120 seconds to 2 seconds, so the read was expiring before SHUTDOWN could be entered. Once this was done via a script, the message reverted to the single digit. "<< DELAY UNTIL ALL I/O COMPLETE >>" is line 2306 (00.161350 LDNI 1/PCAL 102) (362,274 of shutdown.log). Looks like ATTACHIO (2314) is called a bunch of times. (Note that sync is sent at 182,120 and the '0' of the truncated message is sent at 187,795.) ASSEMBLE (TSBC 0, X) at 2275 is 00.161323 TSBC #0,X (322,393). MOVE STOPBUF (9) := "DELETE" at 2258 is 00.161270 MVB PB,3 (169,749). 00.161215 PCAL 52 (31,222) is GENMSG call. 02.036053 EXIT 16 (163,891) is GENMSG return (value is 177777). "0:00/1/" stored at 33,677, "ALL JOBS LOGGED OFF" stored starting at 158,451 and ending at 159,189. The buffer is 00.150365-150401, and it contains the proper message. The write starts at 00.063610 WIO 1 (162,353). The first character is obtained from 00.002455 (buffer is 00.2455-002471, moved from 02.044035-044051 at 01.016304 MDS 5) (buffer was 02.044035-044051, moved from 00.150365-150401 at 02.036073 MTDS 0) Looking at the ATC interrupt after sending sync and then again after sending '0', we have: - first interrupt at 182,577; second interrupt at 187,799 - interrupts are to TIP (line 2488 of S55) - lines 2507-08: TOS := DITP(DIOQP); << SET IOQP >> TOS := DITP; << GET DFLAG >> ...yield A 141601, B 013503 for 1st irq and A 141400, B 000000 for 2nd irq from addresses 00.003436 and 00.003434, respectively. - X := S0.DSTATE (2529) loads X = 1 for 1st irq and X = 0 for 2nd irq (from 141601/141400) - jump is to WRITINGL (3559) for the 1st irq and OUT for the 2nd - WRITINGL sends next char; OUT does DITP := TOS and IXIT. So how does 00.003434 go from 141601 to 141400? - 1st interrupt is at 182,577 - stores 141601 just before IXIT at 00.062251 STOR Q+6,I (182,829) - goes from 141601 to 161401 at 01.017132 STOR Q-5,I (184,917) - goes from 161401 to 161400 at 01.017352 STOR Q-5,I (185,042) [state WRITING => NULL] seems to do DISABLE; IF DITP.DSTATE = WRITING THEN DITP.DSTATE := NULL; starting at 01.017342 SED 0 through 01.017352 STOR Q-5,I. --> this is ABORTL (1493 in S22) --> line 1489 X := IOQP; << TEST BIT 0 FOR ABORT >> succeeds! data 00.014503 is 104000 abort is being set at 184,608 (01.013412 LOAD Q+3,I / TSBC #0 / STOR Q+3,I) - goes from 161400 to 141400 at 01.017073 STOR Q-5,I (185,999) - 2nd interrupt is at 187,799 - tested at 00.056424 LOAD Q+6,I (187,870) DRESETINT is delivered at 182,839; next ATC event is character '0' sent and IRQ at 187,795. In S09, the timeline is: - write starts at 162,353 - MOVE STOPBUF (9) := "DELETE" at 2258 is line 169,749. - first send interrupt (sync) at 182,577 - I/O abort occurs at 184,608 - second interrupt (of '0') at 187,799 - ASSEMBLE (TSBC 0, X) at 2275 is line 322,393 Testing with and without a read pending on the console ----------------------- FINISHREAD (S55): - DITP(DTRLX).READTRLX [line 2379] is 0 or 3 for no-read or read case. - READTIMED is FALSE in either case. - MPXCONTROL(ECHOOFF,DITP) [line 2381] calls are identical. - STOPTIMEOUT(BLOCKTIMEOUT,DITP) [line 2418] calls are identical. - CHECKTQUEUE(DITP) [line 2425] is 00.056233 PCAL 130. - through 00.062252 IXIT, execution is identical. >>> This is an MPE bug. DEVREC calls ABORTIO to cancel the logon read, but this also cancels PROGEN's write of the ALL JOBS LOGGED OFF message. UMIT M09 has: ABORTIO (CONSOLELDEV); <> ...added before GENMSG is called to eliminate this problem. + Restoring tape 1 of 2 (e.g., 32033-11009_1of2.tape) does not end cleanly. Instead, RESTORE reports "TAPE READ ERROR" for the final (split) file and "CATASTROPHIC ERROR" for all files on tape 2 of 2. --> Is this because EOM returns Tape Runaway, and the image doesn't end in the triple-EOF that RESTORE expects? NO, RUNAWAY IS NEVER INDICATED! For :RESTORE *T;SS@.@.@;SHOW, the actions are: - a bunch of FSFs to get to position 36414192 - a bunch of RDRs until EOF is seen (reads the first part of the first file) - a RDR to read the trailer (40 words at position 36528996) - a RWO to rewind the tape and take it offline - a RDR which should fail because the unit is offline, but it succeeds! - a huge number of REWs which succeed even with the unit offline The RWO trace is: >>MS cmds: Unit 0 control is Rewind Offline >>MS cmds: Unit 0 Rewind Offline tape command started at position 36529084 >>MS cmds: Unit 0 status is interrupt | no error >>MS cmds: Unit 0 status is SIO OK | no error >>MS xfer: Unit 0 rewind of 22830.6 inches succeeded >>MS cmds: Unit 0 Rewind Offline tape command completed at position 0 >>> There should be no interrupt for RWO! The RDR trace is: >>MS cmds: Unit 0 control is Read Record >>MS cmds: Unit 0 Read Record tape command started at position 0 >>MS xfer: Unit 0 read terminated with tape mark seen >>MS cmds: Unit 0 Read Record tape command completed at position 4 >>MS cmds: Unit 0 status is interrupt | tape mark | no error >>MS cmds: Unit 0 requested attention >>MS cmds: Unit 0 status is SIO OK | tape mark | unit interrupt >>> There should be a Command Reject status here! + Entering CTRL+Y while LISTF @.PUB.SYS,2 is writing hangs the terminal. The CPU sits at the PAUS instruction. Actually, it hangs if any character is entered. Also, SHUTDOWN with a second terminal session active produces: 18:18/#S2/15/LOGO -- on the console, and: SESSION -- on the other terminal ...and then things stop with the CPU in PAUS. Note that CTRL+Y works correctly if no output is occurring (e.g., in a BASIC 10 GOTO 10 loop). >>> Problem is that channel 1 requests an interrupt while currently in the channel 0 interrupt handler; iop_direct_io (ioRIN) clears the interrupt, removing the pending interrupt, but the ATC data flag is still set, so no further interrupt requests are allowed. Sequence is: - recv int - iop_assert_INTREQ - iop_interrupt_request_set = 01 - interrupt_poll_set = 01 - iop_poll - call interface (INTPOLLIN) => INTACK - iop_interrupt_request_set = 00 (poll still set) - STA = 100031 - CIO with CN_ACKN - xmit int - iop_assert_INTREQ - iop_interrupt_request_set = 01 - interrupt_poll_set = 01 - iop_direct_io (ioRIN) - interrupt_poll_set = 00 (iop_interrupt_request_set still set) - iop_poll (eventually, once I bit set again) - iop_interrupt_request_set = 01 - interrupt_poll_set = 00 ==> so request not granted! ...and there we stop. Restarting calls iop_initialize, which sees INTREQ and no INTACK in the DIB and so sets both iop_interrupt_request_set and interrupt_poll_set to 01. The next poll then services the interrupt, and everything continues. >>> Fixed by having ioRIN processor in iop_direct_io clear the poll bit ONLY IF the IRQ bit is also clear (meaning that a second interrupt is not pending). + Restore from magnetic tape does not work. There appear to be problems with the status presented by the MS device. Testing with: :FILE MT;DEV=7 :RESTORE *MT;BASIC.PUB.SYS;SHOW ?17:58/#S1/14/IS "MT" ON LDEV#7 (Y/N)? =REPLY 14,Y ...with MS0 not attached gives: 17:58/#S1/14/LDEV 7 UNIT FAILURE. I/O STATUS % 54 ?17:58/#S1/14/IS "MT" ON LDEV#7 (Y/N)? Replying YES a second time hangs the process until the I/O is aborted. Attaching an image does not produce the "volume mounted" message. With MS0 mounted, replying hangs until I/O is aborted. >>> With volume mounting fixed, this now works OK. + CTRL+X cancels a line, but CTRL+H does not cancel a character or echo '\' on the TTY. It does work correctly for HP terminals. >>> Problem is DEL = %10 by default, so CTRL+H returns DEL to MPE. + Attempting to run some programs in PUB.SYS produces bounds violation aborts: :FCOPY FROM=*MT;TO=FRED HP32212A.3.19 FILE COPIER (C) HEWLETT-PACKARD CO. 1984 ABORT :FCOPY.PUB.SYS.%0.%4404 PROGRAM ERROR #24 :BOUNDS VIOLATION PROGRAM TERMINATED IN AN ERROR STATE. (CIERR 976) :RUN LISTDIR2 LISTDIR2 E.01.00 (C) HEWLETT-PACKARD CO., 1977 TYPE 'HELP' FOR AID >HELP ABORT :LISTDIR2.PUB.SYS.%0.%2456 PROGRAM ERROR #24 :BOUNDS VIOLATION PROGRAM TERMINATED IN AN ERROR STATE. (CIERR 976) :EDITOR HP32201A.7.16 EDIT/3000 WED, APR 8, 1987, 3:57 PM (C) HEWLETT-PACKARD CO. 1984 /ADD ABORT :EDITOR.PUB.SYS.%3.%31 PROGRAM ERROR #24 :BOUNDS VIOLATION PROGRAM TERMINATED IN AN ERROR STATE. (CIERR 976) On the other hand, some programs (e.g., LISTEQ2, FREE2, QUERY) work OK. (The source for LISTDIR2 is SLDS002C.txt) For FCOPY, the bounds violation occurs here: 04.054124 000003 stack register A 000020, B 015554, C 015655 PB 056430, PL 065627, DL 035474, DB 036230, Q 054027, Z 056031 04.063034 020023 MOVE 3 ...so word count = %20, source = 054004-054023, dest = 054105-054124. But the calculated ending address is DB + C + RA, which is 054125, which is > SM, which causes the bounds violation. >>> Problem is the upper bounds check for MOVE is incorrect! It should be testing DB + C + RA - 1. For LISTDIR2, the bounds violation occurs here: 04.024571 000004 stack register A 000001, B 000000, C 003703, D 002675 PB 062630, PL 067467, DL 022474, DB 022630, Z 026631 04.065306 020043 MVB PB,3 ...so byte count = 1, source = PB + 0 (byte), dest = DB + 003703 (byte). This transfers the upper byte of PB + 0 to the lower byte of DB + 1741. But the ending address is calculated as PB - 1, which causes the bounds violation. >>> Problem is the upper bounds check for MVB is incorrect! + Running LISTDIR2.PUB.SYS and entering HELP at the prompt results in a system halt. The failure is a CS Absent trap for segment 1. >>> Problem is that cpu_setup_code_segment sets STA to the relative segment number (i.e., relative to CST or CSTX), not the absolute segment number. So when CSTX entry 1 (segment number 193) is referenced, and it's absent, cpu_segment_abort checks STA syshalts because the segment number in STA is 1, not 193. These failures must have been related, as they no longer occur with this fix: :EDITOR ABORT :EDITOR.PUB.SYS.%7.%433:SYSL.%34.%5 PROGRAM ERROR #24 :BOUNDS VIOLATION :FCOPY HP32212A.3.19 FILE COPIER (C) HEWLETT-PACKARD CO. 1984 ABORT :FCOPY.PUB.SYS.%0.%433 PROGRAM ERROR #17 :STT UNCALLABLE :RUN DBRECOV.PUB.SYS ABORT :DBRECOV.PUB.SYS.%5.%267:SYSL.%17.%10663 PROGRAM ERROR #24 :BOUNDS VIOLATION Of course, before the fix, the code segment number pushed on the stack for a PCAL was incorrect, so an EXIT was to the wrong code segment! + CI produces spurious errors for some commands, while others are OK. For example, HELLO and BYE are OK, but EDITOR says "ONLY ONE PARAMETER, THE LIST FILE, IS ALLOWED", REDO says "THIS COMMAND HAS NO PARAMETERS" when none are supplied, and RUN says "PARAMETER EXCEEDS 255 CHARACTERS" even when no parameter is supplied. Check which commands work (LISTF, SHOWME) and which don't? All of these messages are due to MYCOMMAND returning CCG or CCL. MYCOMMAND is in S70. The fact that all of the process stuff seems to work suggests that this is an error in a string-processing instruction, e.g., MVBW. When REDOing a command, there are a bunch of NULs sent after the command name, suggesting that a count is off somewhere. CI READ from the terminal is at line 21,252 of S51. Test: :RUN PARAMETER EXCEEDS 255 CHARACTERS 03.050476 STOR Q+22 is LENGTH := READ (*, -LEFT) (line 21,252 of S51). (length is 3, which is correct) PNTR(LENGTH) := %15 (21,308) --> string is now "RUN" at 04.000475-476. 03.051214 MVBW AS,0 is THEN MOVE PNTR := PNTR WHILE AS,0 (21,398); COMLEN := 3. IF LOGICAL(@PNTR) at 21,411 is true, so string is blank padded to "RUN " and LENGTH = 4. PNTR := %15; <> (21,421) pads with a CR, so string is "RUN ". GETIMAGE returns to line 21,848 (03.052762 LOAD DB+255). Label XEQIT is 21,861 (03.053007 ZERO,NOP). Calls COMSEARCH at 21,863; return is 03.053021 STOR Q+63 (value is 177777). Parameters are COMMAND, COMLEN, CAP, ACCESS, EXECPLABEL, and CAPERR; return is LOGICAL. Move into REDO buffer at 21,868 looks OK. At 21,888, EXECPLABEL = 110054, @PARMNUM = 000533, @ERRNUM = 000532, @PARMSP = 000005 03.053103 PCAL 0 is line 21,889, which is call to executor (CXRUN). CXRUN entry starts at 04.015132 LDNI 1 (line 10,200). MOVE SAVEDCOMIMAGE := BCOMIMAGE, (BCOMMANDBUFLEN) is 04.015142 MVB 3, which looks OK. SAVEDCOMIMAGE starts at 04.003233 and gets "RUN " and a bunch of NULs. Call to MYCOMMAND(PARMSP,,MAXPARAM,NUMPARMS,PARMS) at 10208 is 04.015153 PCAL 24. MYCOMMAND is in S70 and starts at 01.036534 LOAD P+72. (in MYCOMMAND) Parameters are COMIMAGE, DELIMITERS, MAXPARMS, NUMPARMS, PARMS, DICT, DEFN. TOS := (LIMS := CHEK (MYCOMMANDHANG, %107, %27217D, , %43)) (1282) is 01.036553 PCAL 64. Return is A 003145, B 000000. IF PARMCOUNT.(14:1) THEN (1289) is 01.036602 LOAD Q-4. PARMCOUNT = 000134. MYCOMMAND := 0 in 1306. SCAN PNTR WHILE CRBLANK, 1 (1311) is 01.036701 SCW 1 Line 1350-53 are: SCAN DELIMITERS UNTIL (%6400 LOR LOGICAL(PNTR)), 1; IF CARRY THEN << 1. CHARACTER IS CR, 2. CHARACTER IS SPECIAL OR BLANK. >> The string is ",=;" and the test and terminal bytes are both . HOWEVER, carry is not set after the SCU! >>> Problem is SCU clears carry if both bytes are the same, but microcode shows carry is set in this case (2213-2214)! ------------------------------- HP32002E.01.00 MPE Installation ------------------------------- + MPE boot gives "Unimplemented instruction, P: 165605 (FLT,NOP)" after setting the date. Code is: 01.165604: LSR #1 01.165605: FLT,NOP 01.165606: LOAD Q+16 01.165607: FLT,FDIV 01.165610: LDD DB+306 01.165611: FCMP,NOP 01.165612: BGE P+3,I The above is line 2118 of DIRC (S53S002C) in subroutine ZDISTRIBUTE: IF REAL (TOS & LSR(1)) / REAL (TEMP) < GOODPERCENT THEN [...] --> Floating point instructions are not implemented yet. Continuing with GO -B cause unimplemented instruction trap and eventually: **** SYSTEM FAILURE #310 STATUS %140047 DELTA-P %004725 SF #310 is from ABORTDMP/ABORT ("system process aborted"). >>> Boot up is successful with FP implemented! + MPE boot prints DATE (M/D/Y)? and then WED, NOV 1, 1972, 12:00 AM, as though CR had been entered, but it wasn't. PAUSes at trace lines 927 ('?'), 41764 (disc), 98299 (LF) Call to READ is at 00.156770 PCAL 70 (trace line 9672). Return from READ is at 00.156771 ZERO,XCH (trace line 13972). BUT...there are no calls on ATCD between those lines (next call is line 68492)! IN FACT, there are no I/O instructions either! Looks as though ATTACHIO is returning 0 for the read length and 000053 for the status (qualifying status = 5, general status = 3). IOTERM0 says 53 - NOT ON LINE, DATA SET NOT READY OR DISCONNECT --> Maybe something goes wrong in the console initialization that fails future ATTACHIOs? For instance, maybe it thinks the terminal is offline? Status is set here: 01.017165 LDI 53 [line 12219] ...which is: TOS := OFFLINE; [line 1424 of IOTERM0] --> Status occurs because DITP (00.003434 100400) does not have bit 1 set. It's tested here: TESTBIT UP'); [line 1412 of IOTERM0] 01.017131 TBC #1 ...and that enters the block at line 1416, where: IF NOT(1<=(FUNCTION-ALLOCATE+1)<=16) ... ...succeeds because FUNCTION is 0 (read) and ALLOCATE is 24. --> in S55S002C, procedure INITIO says: << INITIALISE THE I/O SYSTEM. CALLED BY PROGEN AS SOON >> << AS IT TAKES CONTROL IN TWO PHASES >> << - FLAG = TRUE : INIT CONSOLE AND DISC >> << - FLAG = FALSE: INIT ALL OTHER DEVICES >> PROCEDURE TERMINIT in S62S002C (NRIO) sets up diagnostic channels. calls INITCHANNEL calls DSETCONTROL to "SET CD & SA; MONITOR CC = 1" ==> NEEDS ATCC TO WORK??? --> No, TCI is never accessed (i.e., no I/O to DRT 8). It's something else. (DITP is described in 32002-90003 MPE-IV tables manual p333) Procedure TERMIOM (IOTERM0) clears DIT bit 1 at line 2454 (01.022535 TRBC #1). Request flag is 040000 => goes to label DISCNCTL. Branch to DISCNCTL is line 1444 (01.017211 BR P+1,X line 4900). DRQSTP.DISCNCT := 1 is line 1968. Subroutine SETUPCHANNEL (IOTERM0) sets DIT bit 1 at line 1328 (01.016763 TSBC #1 line 68286). READ [module S70S002C, line 1014; calls ATTACHIO at line 1045] ATTACHIO call is 01.037525 PCAL 67 [line 10351] ATTACHIO return is 00.047003 EXIT 11 [line 13760] Return code is 04 = "INVALID REQUEST, FUNCTION OR PARAMETER" Call is TOS := ATTACHIO(CONSOLE,0,0,0,24,0,TYPE,SPEED,1) CONSOLE is 20 (OK), TYPE = 0 (OK), and SPEED = %300 (192.), which is wrong! SPEED is determined from INITIAL.SPEEDSENSE. Code is: PROCEDURE SPEEDSENSE (S00) lines 5846-5850: BEGIN << GOT A CARRIAGE RETURN >> X := IF UNIT = 0 THEN 0 ELSE UNIT-15; BAUDRATE := BRPARAM(X); CONSPEED := SPEEDS(X); END; Machine code: 01.112232 041402 LOAD Q+2 01.112233 022000 CMPI 0 01.112234 141503 BNE P+3 01.112235 000600 ZERO,NOP 01.112236 140003 BR P+3 01.112241 004300 STAX,NOP --> X is 000000 01.112242 040567 LOAD P-167 01.112053 001420 program read 00.156274 000001 stack register A 001420 01.112243 046025 LOAD P+25,I,X 01.112270 177336 program read 01.111626 000005 program read 00.156274 000002 stack register A 000005, B 001420 01.112244 003243 XCH,STAX 00.156274 000002 stack register A 001420, B 000005 01.112244 003243 ---,STAX 00.156274 000001 stack register A 000005 --> X is 001420 01.112245 020321 PSTA 00.001420 000005 absolute write 00.156274 000000 stack registers empty 01.112246 040572 LOAD P-172 01.112054 001423 program read 00.156274 000001 stack register A 001423 01.112247 046022 LOAD P+22,I,X 01.112271 177351 program read 01.113262 004300 program read 00.156274 000002 stack register A 004300, B 001423 >>>>>> X is still 001420, so the wrong index is used for the SPEEDS array! 01.112250 003243 XCH,STAX 00.156274 000002 stack register A 001423, B 004300 01.112250 003243 ---,STAX 00.156274 000001 stack register A 004300 --> X is 001423 01.112251 020321 PSTA 00.001423 004300 absolute write 00.156274 000000 stack registers empty >>> This is a BUG in INITIAL; either the compiler is wrong here, or SPL is being used incorrectly (i.e., using X as the index variable). (This appears to be an incorrect use of SPL. The G.40.00 SPL compiler generates the same code as above.) The code is executed only if the auto-speedsense fails, e.g., if the terminal is not an HP 26xx that responds to ENQ-ACK handshaking. (Was a 2640 required for the system console? Seems so for the Series II and later.) >>> WORKAROUND is to replace 01.112247 LOAD P+22,I,X above with 021360 LDI 360. This sets speed to 240 cps always, but speed sensing is faked on ATCD anyway. This can be done after cold load and before INITIAL is run. --> Bug is MPE V/R SSB KPR Number 5000187104, "Foreign devices as SIII system consoles do not work correctly on V/R." There was a patch for this: The patch information for this SR is : FRI, JAN 18, 1991, 10:23 AM Patch ID/Version W003 A O.S VUF: E.01.00 + MPE boot sits at PAUS with no interrupts scheduled. Starting with end of INITIAL, there are a bunch of ZERO / SCLK instructions (in READCHAR, to effect a timeout on getting characters). However, the one at 00.154622 appears to be S09S002C.txt (PROGEN) line 4245 just before the call to INITDATETIME which should prompt for the date. But it never seems to return from PCAL 13 at 00.154623. 00.154604 MVB PB,3 is line 4225 (MOVE PROCNAME:="INITNLS "). 00.154616 PCAL 0 is line 4231 (ASSEMBLE(PCAL 0); << CALL INITNLS(FALSE); >>). These instructions: 00.154621: ZERO,NOP 00.154622: SCLK 00.154623: PCAL 13 00.154624: PCN ...seem to be PROGEN 4245-4252: TOS := 0; <> ASSEMBLE( SCLK ); IF CALLDEBUG.(14:1) THEN DEBUG; IF M'Mouse THEN INITMMDATETIME << We can set date and time automatically. >> ELSE INITDATETIME; <> CPUNUM; ...except that the CALLDEBUG and IF M'Mouse don't seem to be present. So the PCAL 13 seems to be the call to INITDATETIME. And 00.156764 PCAL 52 should be GENMSG(SYSSET,DATEQUES,,,,,,,,,,,%100000). But it never seems to return from this call; GENMSG is in S59S002C. The next I/O after the call is a CIO of 003711 to DRT3. --> this isn't trying to use the old clock/console card, is it? Message text is read in from disc into 02.067452 (among others). 02.047222 MFDS 0 moves message to 00.150313 (among others). 02.046702 MVB 3 moves 13-character message. 02.044673 MTDS 0 moves the 7-word message. Then: 00.063610 WIO 1 (170405) [send config] 00.063621 CIO 1 (000602) [store param] 00.063610 WIO 1 (120405) [receive config] 00.063621 CIO 1 (000602) [store param] 00.063610 WIO 1 (047577) [output SYNC] ==> no IRQ after this? 00.071144 PAUS 0 >>> Problem is the SYNC output should do completion interrupt, but doesn't! >>> Another problem is that it's trying to do ENQ/ACK, so terminal type is wrong. + Install dies after printing BANK 0 DEPENDENT MEMORY USED - 13924 Simulator halts with: >>CPU instr: 00.171426 020174 MFDS 4 >>CPU data: 00.000002 004574 absolute read >>CPU data: 00.004574 000307 absolute read >>CPU data: 00.004664 100167 absolute read >>CPU data: 00.004664 120167 absolute write >>CPU instr: 00.171427 000000 absent data segment trap >>CPU instr: 00.171427 140002 simulation stop: Command not completed The absent data segment trap handler is not implemented (no CPU diag test for it). It appears that the absent bit is intentionally set: >>CPU instr: 07.126516 030002 LST 2 >>CPU data: 00.001002 003574 absolute read >>CPU data: 00.004664 000167 absolute read >>CPU stack: 06.176265 000003 stack register A 000167, B 153331, C 000006 >>CPU fetch: 07.126520 002000 instruction fetch >>CPU instr: 07.126517 040007 LOAD P+7 >>CPU data: 07.126526 100000 program read >>CPU stack: 06.176265 000004 stack register A 100000, B 000167, C 153331, D 000006 >>CPU fetch: 07.126521 030322 instruction fetch >>CPU instr: 07.126520 002000 ADD,NOP >>CPU stack: 06.176265 000003 stack register A 100167, B 153331, C 000006 >>CPU fetch: 07.126522 041401 instruction fetch >>CPU instr: 07.126521 030322 SST 2 >>CPU data: 00.001002 003574 absolute read >>CPU data: 00.004664 100167 absolute write >>CPU stack: 06.176265 000002 stack register A 153331, B 000006 >>CPU fetch: 07.126523 031050 instruction fetch This routine is present in INITIAL and called a number of times: <<----------------------------------->> << MAKE DATA SEGMENT NON-PRESENT >> <<----------------------------------->> PROCEDURE ABSENT(DSTN, MAXSIZE, DISCADR); ...so it all looks legit. The final line of INITIAL/MAINSEG4 is executed: ASSEMBLE( DISP ); << HELLO MPE >> This executes as: >>CPU instr: 07.176607 030062 DISP Failure is about 185,000 debug lines later. >>> Problem is that the absent data segment trap is not implemented. + Install dies after printing the load map with HALT 1 and RA = %21 (called STT'VIOL). The load map prints, and then the HALT occurs. STT violation trap occurs on 07.007426 PCAL 37. Label is 100423 (external, STT 1, segment 23). CST is 00.173714 (041666, ?, 000007, 170437). STT length word is 07.177766 (041605) --> this looks wrong; length word is 0U000000 for the top 8 bits! STT label word is 07.177765 (140003) -> external label -> STT violation STT for segment 23 is read from disc at 3/4/35 3/4/35 is written to disc (from 00.145461 for 3800 words) after reading from tape write 3800 words: >>SEL data: 00.145561 016055 dma read >>SEL data: 00.145562 020104 dma read >>SEL data: 00.145563 042523 dma read ... >>SEL data: 00.150756 141220 dma read >>SEL data: 00.150757 041437 dma read >>SEL data: 00.150760 023405 dma read Positions to 4/0/0 during transfer >>SEL data: 00.150761 022402 dma read >>SEL data: 00.150762 004300 dma read >>SEL data: 00.150763 047007 dma read ... >>SEL data: 00.155106 006176 dma read >>SEL data: 00.155107 000035 dma read >>SEL data: 00.155110 040046 dma read This looks OK from the tape read. It is also what is present in the disc image file. Later, 3/4/35 is read from disc to memory (to 07.170437 for 3800 words). read of 3800 words: >>SEL data: 07.170437 016055 dma write >>SEL data: 07.170440 020104 dma write >>SEL data: 07.170441 042523 dma write ... >>SEL data: 07.173634 141220 dma write >>SEL data: 07.173635 041437 dma write >>SEL data: 07.173636 023405 dma write Positions to 4/0/0 during transfer >>SEL data: 07.173637 000000 dma write [wrong!] >>SEL data: 07.173640 000000 dma write [wrong!] >>SEL data: 07.173641 000000 dma write [wrong!] ... >>SEL data: 07.177764 041605 dma write [wrong!] >>SEL data: 07.177765 140003 dma write [wrong!] >>SEL data: 07.177766 041605 dma write [wrong!] The load is OK until the auto-seek, after which the data read is incorrect. This leads to the bad STT values being present. >>> The problem is that the auto-seek on read does not refill the buffer! The issue is that start_read() is called TWICE in the 1000 implementation, because the auto-seek reschedules the start phase, which calls start_read() a second time when the seek completes. This reads the file data into the buffer. However, the 3000 implementation waits on the data phase (it shouldn't)! >>> SAME PROBLEM WITH AUTO-SEEKS ON WRITES (except that the problem is start_write). + Install dies after asking # OF SECTORS PER SPOOLFILE EXTENT = 384.? Execution calls segment 7. SIO 1 executed at 07.12260 Read of cylinder 3 head 2 sector 23 1872 words OK Next SIO tries to start on DRT 0! Section starts loading records from MS and writing to DS. Operates in split-stack mode (tested?). Stack markers seem to get overwritten, or maybe Q is reset? INITIAL is in file MPE_Source\MPE-UMIT\HP32002\S00S002C.txt INITIAL is moved from low core to last memory bank; does something not quite work here? (see MOVE'INITIAL) MABS 5 at 07.012750 moves 00.120112 [DL] - 00.157754 [Z] to 06.140125 [new DL] - 06.177767 [new Z] for 037643 words (appears to copy stack OK) SETR SBANK,DB,DL,Z,Q,S at 07.013405 is in subroutine LAUNCH that transfers control to the relocated INITIAL (from PBANK 1 to PBANK 7 and from SBANK 0 to SBANK 6). -> looks like this works OK. stack markers seems to be identical. Stack is split at 07.025143 SETR DB (DBANK := 0, DB := 173272). -> the 7905/06/20/25 disc driver (MH7905) switches DBANK/DB from its current value to 0/SIOPROG. splits at 07.025143 (SETR DB) (DBANK = 0, DB = 173272) ok to >>CPU instr: 07.026265 151414 LDD Q+14 (INITIAL source line 31730) ==> at 07.026266 XCHD; A 145073, B 000006 { mpedebug.log line 1,560,150 } after: A 173272, B 000000 BUT: 07.072703 LOAD DB+125 { mpedebug.log line 1,560,267 } loads from 00.145220 (should be 06.145220) !!! >>> problem is XCHD instruction exchanges RA and DB BUT DOES NOT EXCHANGE RB AND DBANK!!! + # OF SECTORS PER SPOOLFILE EXTENT = 384.? DISC WRITE ERR ON LDEV #1 STATUS=%11411 ADDR=%1701 Problem is disc write is overruning the sector buffer; index >>DS cmds: write strobe 1703 >>DS cmds: Controller (busy) received data 031016 with flags DTRDY >>DS cmds: Unit 0 Write data phase controller entry >>DS cmds: Controller (busy) returned data 000000 with functions IFOUT >>DS cmds: Controller (busy) received data 041271 with flags CMRDY | DTRDY | EOD CMRDY is only set by PCONTSTB, and PCONTSTB records a DEB_CMDS line. However, "CMRDY | DTRDY | EOD" is 000016, and 031016 was the last word received! This occurs 39 words into the first sector after an auto-seek during writing! >>> The buffer index is never reset to 0 if a write crosses a track boundary! The issue is that start_write() is called TWICE in the 1000 implementation, because the auto-seek reschedules the start phase, which calls start_write() a second time when the seek completes. This time, the index is set to 0. However, the 3000 implementation waits on the data phase (it shouldn't)! SEL gets Seek, then SET BANK 6, then WRITE for 2 words from address 176461, then does dma read for Seek params from 06.1576461 and 06.1576462 !!! Next, SEL gets Read, then SET BANK 6, then READ for 128 words from address 147736, then does dma write for Read data to 06.1547736, etc. !!! This overwrites code in bank 7 where the program is executing! >>> NO, THE PRINT ROUTINE IN PORT_READ_MEMORY IS WRONG (prints address instead of offset). + MPE boot attempt 4:15 PM 8/14/14: HP32002E.01.00 WHICH OPTION ? RELOAD WHICH OPTION ? SPREAD ANY CHANGES? NO STACK MARKER TRACE 1421 1727 100003 15 RESIDENT 7 1673 100403 15 RESIDENT 1262 1555 101003 15 RESIDENT 1262 3315 101004 103 MAINSEG1 0 13 100002 4 BOOTSTRAP 0 0 0 0 ERROR #1 NON-RESPONDING DEVICE CONTROLLER DRT 4 Programmed halt, CIR: 030364 (HALT 4), P: 177231 (RSW) >>> does a TIO on DEVNO 4, which fails because DS isn't implemented yet. ----------------------------------------------------------- CARTRIDGE DISC (HP 30129A) DIAGNOSTIC OFF-LINE (D419A.01.4) ----------------------------------------------------------- + Step 66 TOTAL INTERRUPTS 02 SHOULD BE 10 Writes a bad sector and then reads it; should cause a retry! Seek 1/1/42, rfs 138, adrec 1/1/42, wfs 138, fmsk 007600, adrec 1/1/42, read 128 Channel loaded IOCW 040000 (Control) from address 102757 Channel loaded IOAW 007600 (Set File Mask) from address 102760 Channel loaded IOCW 040000 (Control) from address 102761 Channel loaded IOAW 006000 (Address Record) from address 102762 Channel loaded IOCW 067776 Write from address 102763 Channel loaded IOAW 045765 from address 102764 Channel loaded IOCW 020000 (Interrupt) from address 102765 Channel loaded IOAW 177777 from address 102766 Channel loaded IOCW 040000 (Control) from address 102767 Channel loaded IOAW 002400 (Read) from address 102770 Channel loaded IOCW 077600 Read from address 102771 Channel loaded IOAW 046120 from address 102772 Channel loaded IOCW 004000 (Conditional Jump) from address 102773 Channel loaded IOAW 102761 from address 102774 Channel loaded IOCW 034000 (End with Interrupt) from address 102775 Channel loaded IOAW 177777 from address 102776 >>> Problem is that data errors never occur, so retries never occur. + Step 77 CONDITIONAL JUMP WAS NOT EXECUTED seek 4/0/37, adrec 4/0/37, rfs 138, adrec 4/0/37, wfs 138 sfmsk 007423 (retry 1), adrec 4/0/37, read 128 (corr data error) [SETJMP succeeds] sfmsk 007443 (retry 2), adrec 4/0/37, write 128 (normal completion) [SETJMP fails] DEVEND sets JMPMET FF DVEND signal for uncorrectable/correctable/overrun errors. Channel loaded IOCW 040000 (Control) from address 102757 Channel loaded IOAW 007423 (Set File Mask) from address 102760 Channel loaded IOCW 040000 (Control) from address 102761 Channel loaded IOAW 006000 (Address Record) from address 102762 Channel loaded IOCW 067776 Write from address 102763 Channel loaded IOAW 045765 from address 102764 Channel loaded IOCW 040000 (Control) from address 102765 Channel loaded IOAW 002400 (Read) from address 102766 Channel loaded IOCW 077600 Read from address 102767 Channel loaded IOAW 046120 from address 102770 Channel loaded IOCW 004000 (Conditional Jump) from address 102771 Channel loaded IOAW 102775 from address 102772 Channel loaded IOCW 050000 (Sense) from address 102773 Channel loaded IOAW 177777 from address 102774 Channel loaded IOCW 034000 (End with Interrupt) from address 102775 Channel loaded IOAW 177777 from address 102776 >>> Problem is that data errors never occur, so DEVEND never occurs. + Step 54 does six chained reads of one word and expects OVRUN at end of sector i.e., at end of chain. But it looks like PREADSTBs and IFINs are interleaved 1:1. [1140] Read from cylinder 16 head 1 sector 10 [1141] Read of 7 words (1 sector) Same step same problem with writes. [1155] Write to cylinder 16 head 1 sector 10 [1156] Write of 6 words (1 sector) Disc synchronous data rate = 2.133 usec/word SEL channel burst transfer rate = 700 nsec/word Memory cycle time = 700 nsec/word >>> Problem is IOCW and IOAW fetches do not take any cycles! With a fetch taking a read cycle time, OVRUN is asserted, but DVEND is not returned! + Step 20 Request Status after heads unloaded has drive attention bit set but expects it to be clear. Ditto on heads loaded. Should be Normal Completion on unload and Drive Attention on load. poll_drives would have been called on next interface call to detect this >>> Works with poll, but diag depends on slow user response to let controller time out from wait loop! Automated CR response enters before timeout, so controller in wait, so attention still set on drive!! Fix: wait for manual CR, or STEP 1500000 to wait for the timeout before automatically sending CR. Note: End command doesn't seem to be used on the 3000 (diag or MPE)! + Step 19 write with protect ends with Status 2 error and STINT on head 2 but expects Normal Completion because only PROTECT UPPER should be set. Also, read fails because READ ONLY bit in status 2 is set. >>> This requires separate PROTECT UPPER and PROTECT LOWER switches for 7905. + Step 19 write with protect ends with Status 2 error and STINT on Control order but expects abort to occur on Write order. Ditto for Initialize. Ditto for FORMAT on vs. PROTECT on. >>> Idle_Phase must do IFGTC with BUSY + work better if props.functions[] was array indexed by PHASE? wouldn't need to worry about incrementing, and end_command could always use End_Phase entry any case where same phase produces different functions (except for end_command)? + maybe eliminate end_command return and instead key off of cvptr->state != Busy_State to determine if a command ended. If so, then set functions to End_State, and if WRTIO is present, set data to status_1 (would have to alter Load_TIO_Register to set TIO in a separate state before end) - start_command must never call end_command directly; only via End_Phase - End_Phase just calls end_command with status already set + step 92 fails if section 0+6 executed alone (00.022354 LDI 134) >>> test mode must preset device_sr + step 60 read past EOC with no auto-seek ends with EOC in start_read but status not returned (is void) >>> actually is position_sector that returns EOC + step 59 auto-seek causes read/wait phase to be executed twice, incrementing fnindex twice cause is start_read doing an auto-seek (works OK if start_read in intersector phase auto-seeks) + step 47 read with auto-seek fails because intersector phase -> wait phase which steps fnindex and so sends end functions + illegal opcode reports: E21 37 SIO PROGRAM BEFORE - AFTER EXECUTION 00 040000 040000 EXIT ---> 01 013400 013400 SHOULD BE (03) 02 034000 034000 03 177777 177777 does STINT, which causes REQ to channel, so this seems to be correct >>> hardware IFGTC to STINT is 40 microinstructions (8 usec), so channel completes before abort is sent; must schedule error completion + request sector address uptr is null -> status-2 (offline) >> unit_access should be TRUE + Read command EOT is not handled correctly. EOD is not picked up at the end of the sector, leading to start of next sector. >>> PREADSTB with TOGGLEINXFER | EOT must set EOD flag after calling controller (PREADSTB causes DTRDY deny, which clocks eod_2) + illegal opcode does not set SR; causes SR timeout >>> should do IFGTC along with STINT | WRTIO | FREE, but doesn't (IFGTC does SR) + Diag does: 040000 Control 001000 Seek 067776 Write 045765 2 parameters (CHS) 040000 Control 003400 Verify 067777 Write 041316 1 parameter (sector count = 48) 034000 End with Interrupt 177777 Channel program ends while Verify is still active. Diag then does: 040001 Control 001400 Request Status 077776 Read 041746 2 parameters (status 1/2) 034000 End with Interrupt 177777 ...and expects command to execute after Verify completes. But Verify response of RQSRV | STDFL | WRTIO | FREE replies to PCONTSTB, so Request Status response of BUSY | IFGTC replies to TOGGLEINXFER, so IFIN isn't done before PREADSTB, leading to the wrong values being retrieved. NOTE: IFIN and IFOUT set device SR unless EOD2 is set. Does this prevent SR on last transfer? Seems as though it should, given that RQSRV follows, which should not be needed if last IFIN/OUT set SR. Commands so far would work, and Verify would then work properly (channel program doesn't continue until verify complete, so a succeeding conditional jump would also work)! Additionally, paragraph 3-78 in the disc interface manual says: At the end of the...transfer, an EOT signal from the selector channel along with a TOGGLEOUTXFER accompanies the last PWRITESTB. The EOT signal again sets the EOD1 flip-flop directly, the TOGGLEOUTXFER signal clocks the EOD2 flip-flop. --> Shortly thereafter, a data out pulse will again return the Data Ready flip-flop to its cleared state. This transition sets the EOD3 flip-flop, which provides the EOD flag to the disc controller. As IFIN/INOUT are inhibited from setting device SR if EOD2 is set, this implies that the channel program waits until the following RQSRV before continuing! + Anything that calls end_command must propagate a CNTLR_IFN_IBUS return value to pick up the WRTIO performed within end_command. These are: - start_command - continue_command - t_stat start_read - void end_read - void start_write - t_stat end_write - t_bool position_sector - t_bool start_seek + an error return from continue_command should output the functions specified by end_command IN LIEU OF the next expected functions! + "Error entry into poll in idle state" after seek completion + initialize 6144 words is done as chained write 4096 + 2048 PWRITESTB | EOT occurs at end of 1st write ...but that sets EOD, and write ends! >>> EOD should set only if EOT occurs with TOGGLEnXFER! + Seek sets controller to Idle_State after initiation. When dl_controller is entered to process seek completion, continue_command isn't called. Instead, drives are polled for attention, but attention hasn't been set because that's done in continue_command (end_phase). -------------------------------------------- HP 30115A 9-TRACK MAGNETIC TAPE (D433A.01.4) -------------------------------------------- + (MTB) E274 STEP-0424 COMP. AND READ CRCC ARE DIFFER. E116 STEP-0424 EXPECT.- OBTAIN. CRCC 000200 100354 >>> Problem is that RDC does not return the correct CRCC (value is arbitrary). + (MTB) E111 STEP-0444 BEFORE - AFTER SIO PROGRAM: 056774 00 040000 040000 01 000015 000015 02 040000 040000 03 000014 000014 04 067770 067770 05 052461 052461 06 050000 050000 07 177777 000456 08 034000 034000 09 177777 020456 ---> EXIT(05) E223 ACT. DS 1 000 000 100 101 110 STEP 0444 EXPEC.DS D DD0 0DD DDD DD1 010 >>> This is a new error, introduced at some point between 11-2-14 and 9-25-15. Diag: "Issues WRM and WRZ command to generate a report of 16 zero characters and checks for 'tape error' obtained in DS." Had removed error check for a write of a null record; fixed by restoring. + (MTE12) P029 LOAD TAPE(RING), PUSH OFF,RESET, ON-LINE AND P056 TYPE SELECTED DRIVE ? sim> detach ms3 sim> attach ms0 scratch.tape sim> set ms0 offline 0 P038 PUSH DRIVE 0 BUTTON & TYPE RESPONSE 'CR' sim> set ms0 online sim> go E276 STEP-1232 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) IS NOT READY pressing "0" should generate an offline->online interrupt >>> Problem is that SET MSn ONLINE does not set UNIT INTERRUPT if unit was offline (this doesn't occur on 1000, so it should be handled in the 3000 controller) (but SET ONLINE calls ml_set_online, so 3000 controller never knows this!) (note: attach should also gen IRQ, as it goes not ready->ready) (3000 intercepts attach -- but not online -- via DEVICE, so we can handle this) Fixed by walking the modifier table in ms_reset (power on) to change validation routine for UNIT_OFFLINE to ms_set_online + (MTE12) P039 CHECK LIGHT RESET, PUSH ON-LINE AND RESPOND sim> set ms0 online sim> go E276 STEP-1232 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) SIO NOT OK offline->online interrupt request is not occurring >>> Problem is Rewind Offline is not asserting SR, so channel stalls. Fixed by having Rewind Offline assert SR at start phase. + (MTE12) P030 CHECK LIGHT WRITE-ENABLE, PUSH DRIVE 1 AND TYPE RESPONSE 'CR' sim> detach ms0 sim> attach ms1 scratch.tape sim> go E276 STEP-1220 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 1) IS NOT READY attaching is equivalent to loading a reel, pressing LOAD, and then pressing ONLINE going from offline to online generates a unit interrupt >>> Problem is attach is not doing IRQ; fixed by adding IRQ code. + E348 STEP-0532 (0533) TIMING ERROR AFTER < 12 SENSE diag issues Read (Write) Record and then chained 64-word reads (writes) for a 4288-word record after 18 reads (writes) of 64 words (1152 words), issues chained 12 Sense commands and then 4 more reads (writes) problem is Senses should not result in a Timing Error because data is pulled fast enough from drive for 7970B, data delay is 22 instructions, and 12 Sense commands take 36 instructions (for mpx states C-A-B), so next word is ready before they complete but data reg is empty (full) when tape service entered 1st time, so that's OK It's only if it's entered a second time (as with 7970E test) that overrun should occur 45 ips * 800 bpi = 36000 bytes/second = 27.8 usec/byte = 55.6 usec/word typ. 2.5 usec/instruction = 11.1 instructions/byte = 22.2 instructions/word for write, does: - last chained WRITESTB to set data flag - 7 chained sense (21 states) - service, clears data flag - 5 chained sense (15 states) - next chained write (3 states) - C : ACKSR | DEVNODB - A : (no signals) - B : TOGGLESR - SR denied, so everything stops until service, which sets Underrun and SR TOGGLESR denies channel_sr, but device_sr should still be set from data flag clear above! --> it is, but ACKSR in state C of each chained sense resets device_sr!!! (ACKSR aysnc clears device_sr_ff and dev_end_ff) - ACKSR = Address_Transfer * ((State_A + State_C) * (Read_Order + Write_Order) * ~Chained_Order + (State_B + State_C) * Control_Order + State_D * (Read_Order + Write_Order)) --> SO no ACKSR on SENSE in State C!!! But debug.log shows it is! mpx: for ACKSR, state A is wrong, state B is OK, state C is wrong, state D is OK. >>> Fixed ACKSR generation, and diag problem is fixed. + (MT12) diag loops infinitely in step 1230 diag writes gaps from BOT to EOT then rewinds to 0 then does FSR and a master reset after a delay (about 18 seconds real time) old version start phase delay = 10, so start phase entered, so not at BOT when reset new version start phase delay = 252800, so start phase not entered, so still at BOT when reset bot time = 632 msec; too long? set to 51 msec (correct), but start phase still not entered reset occurs with 44692 of 51112 ticks remaining 6420 ticks elapsed = 16 msec, so should be in IDB! diag retries forever because load point is still set! > at LP, diag does FSR with MR after a few msec. > expects tape position to move continually forward until EOT > then does BSR with MR after a few msec. > expects tape position to move continually backward until BOT + (MT12) with tape consisting of 600 feet of gap, sim_tape_sprecr takes 8 seconds because there's no runaway stop, so it has to backspace through 12 MB of gaps! meaning...it's going to take FOREVER to backspace to BOT 400 bytes at a time! (approx. 57 hours!!!) for 7970E: > with original code, 1 MB takes 10 minutes > with 32-element buffered reverse gap, 1 MB takes 45 seconds > with 32-element buffered forward and reverse gap, 1 MB takes 31 seconds > with original code, 600 feet takes 120:00 > with 32-element buffered forward and reverse gap, 600 feet takes 13:06 > with 128-element buffered forward and reverse gap, 600 feet takes 3:57 > with 256-element buffered forward and reverse gap, 600 feet takes 3:23 (3:04 with no debugging) > with 512-element buffered forward and reverse gap, 600 feet takes 3:06 > with 1024-element buffered forward and reverse gap, 600 feet takes 2:51 + (XMTE) E222 STEP-0231 NO INTERRUPT AFTER SIO issues control (Read) then control (Backspace Record) and checks for reject for no read order + (XMTE) E276 STEP-0256 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) IS NOT READY issues Read Record, then end with interrupt (i.e., with no read) then issues Select Unit 0, which causes Command Reject then issues Select Unit 0, which causes Command Reject (but should not!) note: after 10 occurrences of 100000 (master reset) old: >>MS cmds: Unit 0 Write File Mark tape command started at position 5628 new: >>MS cmds: Unit 0 Write File Mark tape command started at position 5638 [+10] + D111 STEP-0267 BEFORE - AFTER SIO PROGRAM: 062047 00 040000 040000 01 000000 000000 02 050000 050000 03 177777 000516 ---> EXIT(03) E223 ACT. DS 0 000 000 001 001 110 STEP 0267 EXPEC.DS D D00 0DD 10D DD1 110 position is 554,400 = 866250 old does 800,878 delay new does 10+17556+858,460+17556 = 893,582 >>> Problem is that we lose "ready" status as soon as rewind init scheduled but we shouldn't until the start phase initiates movement. note: missing "writing" status in a number of places (new) because SELn now clears it. + E223 ACT. DS 1 111 111 111 111 111 STEP 0515 EXPEC.DS D DD0 0DD DDD DD1 110 after DEVEND, issues read which causes reject then issues select unit 0, which causes reject (but should not!) someone sets PHASE to End_Phase after it's been set to Idle_Phase at the end of Read Record! + D111 STEP-0640 BEFORE - AFTER SIO PROGRAM: 056774 00 040000 040000 01 000000 000000 02 050000 050000 03 177777 002516 04 034000 034000 05 177777 022516 ---> EXIT(05) E223 ACT. DS 0 010 010 101 001 110 STEP 0640 EXPEC.DS D DD0 00D DDD DD1 110 >>> Problem is reset occurs before backspace record motion starts issue is IR time is too long; should only be 9.7 msec clock ticks: 88 (wfm)+88 (bsf)+25 (bsr) 200 msec is TOTAL time for WFM+BSF+BSR note: we're not eliminating end_phase motion time for overlap! that's because SR isn't set until end phase complete but it should be set (by ms) when end phase is STARTED! (drive will stop between WFM [fwd] and BSF [rev] but not between BSF [rev] and BSR [rev]) + E245 STEP-0642 NO EOT APPEARED E223 ACT. DS 1 000 000 101 011 110 STEP 0642 EXPEC.DS D DD0 01D DDD D11 110 >>> Problem is reposition detection is not working. + E111 STEP-0640 BEFORE - AFTER SIO PROGRAM: 056774 00 040000 040000 01 000015 000015 02 040000 040000 03 000013 000013 04 040000 040000 05 000012 000012 06 034000 034000 07 020516 020516 ---> EXIT(05) E245 STEP-0642 NO EOT APPEARED D111 STEP-0642 BEFORE - AFTER SIO PROGRAM: 056774 00 040000 040000 01 000017 000017 02 050000 050000 03 000000 000536 04 040000 040000 05 000013 000013 06 040000 040000 07 000013 000013 08 040000 040000 09 000017 000017 10 050000 050000 11 177777 000536 12 034000 034000 13 177777 020536 ---> EXIT(13) E223 ACT. DS 1 000 000 101 011 110 STEP 0642 EXPEC.DS D DD0 01D DDD D11 110 P012 STEP-0642 PAUSE ON ERROR Diag writes gaps to EOT, then does WFM, BSF, and BSR. During BSR, waits 200 msec, then does master reset to stop tape motion and does WFM. Then does FSF and BSF and checks that BOT is seen on forward motion. (Setup is a long gap across BOT that is bounded on either end by a tape mark.) >>> Problem is that BSR over full-tape gap takes one instruction, so 200 msec. abort is too late to prevent tape from going all the way back to start of record at beginning of tape (well, at pos 13004). Tape mark here corrupts the record. This used to pass because we used to do tape runaway during backspace record but now we don't! + (MTB) running sections 1 and 2 together w/7970B causes failure in step 270 >>> Problem is backspace over erase gap that contains a 1/2-gap shift in the middle thus: FE FF FF FF ( FF FF ) FE FF FF FF ... reading backward from 2nd gap toward 1st reads FF FF FF FF, which is EOM! Gap overwrites some gap and then some records, including one of 126 bytes. That requires the gap to lengthen by 2 bytes, which dictates starting the gap with an 0xFFFF half-gap. Problem occurs when a gap starts at the end of a previous gap, AND the space to overlay requires the addition of a half-gap to sync. The resulting long gap can be spaced through forward but not reversed (as it sees the 0xFFFFFFFF as an EOM marker). Fixed by treating 0xFFFFFFFF in reverse as half-gap (can't be EOM because library won't position beyond an EOM). + E223 ACT. DS 1 000 000 100 101 110 STEP 0150 EXPEC.DS 1 0D0 0DD 1DD 1D1 010 >>> Problem is that Write Record Without Parity should result in a Tape Error from the read-after-write parity check, but it doesn't. Fixed by setting Tape Error in the end phase of Write Record Without Parity + E223 ACT. DS 1 000 000 100 001 110 STEP 0211 EXPEC.DS D 1D0 0DD 1DD DD1 010 diag is expecting Odd Count and Tape Error >>> Problem is Write Record Without Parity does not mark record as bad. Fixed by adding MTR_ERF to record length for Write Record Without Parity. diag does WFM, WRZ (1 word: BC 00), BSR, RDR (n.b., NOT RDC) and expects Odd Count to be set because a zero byte will not be seen by the read circuits (with zero parity too, there will be no NRZI flux change) >>> fixed by omitting all zeros bytes from tape and byte count + E222 STEP-0245 NO INTERRUPT AFTER SIO diag does RDC of 66 words on 64-word (+ 1 for CRCC) record >>> Problem is device_end FF is set, but last word has ACKSR | TOGGLEINXFER | PREADSTB | EOT and TOGGLEINXFER clears device_end FF before PREADSTB can return DEVEND. Fixed by rearranging INBOUND_SIGNALS so that TOGGLEnXFER come after PnSTBs and before READNEXTWD + E223 ACT. DS 1 000 000 100 011 110 STEP 0275 EXPEC.DS D DD0 0DD DDD DD0 110 >>>> Problem is bpi is currently hardcoded at 1600 in sim_tape.c. Fixed by adding a new sim_tape_set_dens function. + E223 ACT. DS 1 100 000 100 001 010 STEP 0415 EXPEC.DS D 0D0 0DD DDD DD1 010 >>> Problem is that Odd Length is set even when read is terminated early. Fixed by testing for short reads in End Phase. + E223 ACT. DS 1 000 000 110 001 110 STEP 0142 EXPEC.DS 0 D10 0DD 1DD DD1 110 >>MPX pio: Channel SR 3 loaded IOCW 034000 (End with Interrupt) from address 056776 >>MPX state: Channel SR 3 entered State B >>MS iobus: Received data 000000 with signals TOGGLESR | TOGGLESIOOK | PSTATSTB | SETINT >>IOP irq: Device number 6 asserted INTREQ at priority 11 >>MPX sr: Device number 6 denied SR 3 >>MS iobus: Returned data 100616 with signals JMPMET >>> Problem is that TOGGLESIOOK is processed before PSTATSTB, so SIO OK is still set when the status is returned. Same problem occurs with SETINT, except that it occurs after PSTATSTB. + E222 STEP-0144 NO INTERRUPT AFTER SIO >>> Problem is missing SR assertion after WFM completes. + E223 ACT. DS 1 000 000 110 011 110 STEP 0144 EXPEC.DS 1 DD0 0DD 1DD 111 110 >>> Problem is "write status" is missing because ML_WRSTAT is not set in the tape status by the tape library in ml_start_command. SW = SOLB * write_permit (SOLB is "selected and online") write_permit := WSW write_permit.clock = FWD write_permit.reset = HSFWD + HSREV + REV + ~READY + ~WREN (WREN is write ring in place) + E222 STEP-0147 NO INTERRUPT AFTER SIO >>> Problem is missing SR assertion after GAP completes. + (step 150) Internal error, P: 041763 (----,DCMP) >>MS serv: Unit 0 Write Record Without Parity start phase service entered >>MS cmds: Unit 0 Write Record Without Parity tape command started at position 11360 >>MS serv: Unit 0 Write Record Without Parity start phase service not handled >>> Fixed by treating WRZ as WRR. + E222 STEP-0154 NO INTERRUPT AFTER SIO >>> Problem is missing SR assertions after commands complete. + E222 STEP-0156 NO INTERRUPT AFTER SIO >>MS cmds: Unit 0 Write File Mark tape command started at position 11388 >>MS cmds: Unit 0 Write File Mark tape command completed at position 11392 >>MS cmds: Unit 0 Backspace File tape command started at position 11392 >>MS xfer: Unit 0 spaced over 0-word file >>MS cmds: Unit 0 Backspace File tape command completed at position 11392 >>> Problem is BSF is not backspacing over file mark due to incorrect tapelib BSF test. + (step 157) Internal error, P: 041766 (LDI 1) >>MS serv: Unit 0 Read Record With CRCC start phase service entered >>MS serv: Unit 0 Read Record With CRCC start phase service not handled >>> Fixed by treating RDC as RDR. + E223 ACT. DS 1 111 111 111 010 001 STEP 0166 EXPEC.DS 1 DD0 0DD 1DD DD1 110 after SEL 0 / write / BSR / read: >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 177721 with signals JMPMET problem is with REW: >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 100636 with signals JMPMET >>MS serv: Unit 0 Rewind end phase service entered >>MS cmds: Unit 0 Rewind tape command completed at position 0 >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 177761 with signals JMPMET >>> Problem is tapelib status was set to ~REWIND, not masked to ~REWIND. + E223 ACT. DS 0 010 000 110 011 110 STEP 0170 EXPEC.DS D D10 0DD 0D1 DD1 110 (ready is 1, density is 0) does a bunch of GAPs, then a SEL 0 / REW (from position 162432; delay = 1000), then End >>MS iobus: Received data 000000 with signals TOGGLESR | SETINT | PSTATSTB | TOGGLESIOOK >>MS iobus: Returned data 020636 with signals JMPMET >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 100636 with signals JMPMET >>MS serv: Unit 0 Rewind end phase service entered >>MS cmds: Unit 0 Rewind tape command completed at position 0 >>IOP irq: Device number 6 asserted INTREQ at priority 11 >>MS iobus: Received data 000000 with signals INTPOLLIN >>MS iobus: Returned data 000006 with signals INTACK | JMPMET >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 100636 with signals JMPMET >>> Problem is that ML_DEN1600 never set and ML_UNITRDY not cleared during rewind. + E223 ACT. DS 0 010 000 111 101 110 STEP 0213 EXPEC.DS D DD0 00D D0D DD1 110 >>> Problem is that load point status is not cleared when pos > 0. + diag appears to hang in step 213 >>> Problem is EOT never sets with tape set for unlimited capacity. + E276 STEP-0213 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) NO INTERRUPT >>> 1st problem is timing is not correct for gap and rewind to EOT. 2nd problem is rewind compl not setting unit interrupt in status return (status is 100716, should be 100700). + E222 STEP-0230 NO INTERRUPT AFTER SIO P012 STEP-0230 PAUSE ON ERROR >>MPX pio: Channel SR 3 loaded IOCW 040000 (Control) from address 056774 >>MPX pio: Channel SR 3 loaded IOAW 000000 from address 056775 >>MS cmds: Select Unit 0 tape command completed >>MPX pio: Channel SR 3 loaded IOCW 040000 (Control) from address 056776 >>MPX pio: Channel SR 3 loaded IOAW 000015 from address 056777 >>MS cmds: Unit 0 Write File Mark tape command started at position 5612 >>MS cmds: Unit 0 Write File Mark tape command completed at position 5616 >>MPX pio: Channel SR 3 loaded IOCW 067766 (Write) from address 057000 >>MPX pio: Channel SR 3 loaded IOAW 056314 from address 057001 >>MS iobus: Received data 056314 with signals TOGGLESR | TOGGLEOUTXFER >>MS serv: Unit 0 delay 10 service scheduled >>MPX sr: Device number 6 denied SR 3 >>MS iobus: Returned data 000000 with signals JMPMET >>MS serv: Unit 0 Write File Mark end phase service entered >>MS xfer: Unit 0 wrote file mark >>MS cmds: Unit 0 Write File Mark tape command completed at position 5620 >>> Problem is Write without Control should be rejected but is not because WFM is seen as a write_command, so is allowed to pass at TOGGLEOUTXFER. + D223 ACT. DS 1 000 000 101 010 100 STEP 0231 EXPEC.DS D DD0 0DD DDD DD0 100 E236 SAME STEP -COMMAND REJECTED- IN READ DS E276 STEP-0233 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) IS NOT READY test does a Control (Read) followed by a Control (Backspace Record) it is rejected correctly test then does a Control (Select Unit 0), which is rejected! from then on, every program is rejected! >>> Problem is that the controller is left in the start phase when command is cancelled; must be reset to the end phase when cancelling. + E223 ACT. DS 0 000 000 101 011 110 STEP 0252 EXPEC.DS D DD0 0DD DDD DD0 100 expecting Command Reject, receiving No Error >>CPU instr: 00.042132 030161 SIO 1 >>CPU data: 00.070076 000006 stack read >>IOP dio: Test I/O order sent to device number 6 >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 100536 with signals JMPMET >>CPU data: 00.000030 056774 absolute write >>IOP dio: Start I/O order sent to device number 6 >>MS iobus: Received data 000000 with signals DSTARTIO >>MPX sr: Device number 6 requested channel initialization >>MPX sr: Device number 6 asserted SR 3 >>MS iobus: Returned data 000000 with signals JMPMET >>MPX state: Channel SR 3 entered State C >>MS iobus: Received data 000000 with signals ACKSR | DEVNODB >>MS iobus: Returned data 000030 with signals JMPMET >>CPU stack: 00.070076 000000 stack registers empty >>CPU fetch: 00.042134 030240 instruction fetch >>CPU instr: 00.042133 141513 BNE P+13 >>MPX state: Channel SR 3 entered State A >>MPX pio: Channel SR 3 loaded IOCW 067766 (Write) from address 056774 >>CPU stack: 00.070076 000000 stack registers empty >>CPU fetch: 00.042135 141504 instruction fetch >>CPU instr: 00.042134 030240 TIO 0 >>CPU data: 00.070076 000006 stack read >>IOP dio: Test I/O order sent to device number 6 >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 000536 with signals JMPMET >>MPX state: Channel SR 3 entered State B >>MPX pio: Channel SR 3 loaded IOAW 050200 from address 056775 >>MS iobus: Received data 050200 with signals TOGGLESR | TOGGLEOUTXFER >>IOP irq: Device number 6 asserted INTREQ at priority 11 >>MPX sr: Device number 6 denied SR 3 >>MS iobus: Returned data 000000 with signals JMPMET >>> Problem is TIO status return is beating reject status set in state B. this step does: 00.042132: SIO 1 00.042133: BNE P+13 00.042134: TIO 0 <-- reject status must occur here! 00.042135: BNE P+4 00.042136: STOR DB+65 00.042137: DEL,NOP 00.042140: BR P+12 ...so reject status must occur immediately! tape uP cycle time is 434 ns. in wait loop for TOGGLEnXFER, worst case time from flag to reject status set is 4 instructions = 1.74 usec >>> Problem is that MPX isn't running fast enough (currently is one state per instruction; maybe we need two?). gen info manual says MPX transfers ~500,000 words per second (this is presumably the state D execution time; do state exec times vary?) >>> Fixed by calling mux_poll() from SIO instruction to get one more cycle in before TIO executes. + E223 ACT. DS 1 000 000 101 110 100 STEP 0260 EXPEC.DS 0 0DD DDD DDD DD0 100 E236 SAME STEP -COMMAND REJECTED- IN READ DS expects command reject but SIO still in operation >>CPU instr: 00.042132 030161 SIO 1 >>CPU data: 00.070076 000006 stack read >>IOP dio: Test I/O order sent to device number 6 >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 100576 with signals JMPMET >>CPU data: 00.000030 056774 absolute write >>IOP dio: Start I/O order sent to device number 6 >>MS iobus: Received data 000000 with signals DSTARTIO >>MPX sr: Device number 6 requested channel initialization >>MPX sr: Device number 6 asserted SR 3 >>MS iobus: Returned data 000000 with signals JMPMET >>MPX state: Channel SR 3 entered State C >>MS iobus: Received data 000000 with signals ACKSR | DEVNODB >>MS iobus: Returned data 000030 with signals JMPMET >>MPX state: Channel SR 3 entered State A >>MPX pio: Channel SR 3 loaded IOCW 077766 (Read) from address 056774 >>CPU stack: 00.070076 000000 stack registers empty >>CPU fetch: 00.042134 030240 instruction fetch >>CPU instr: 00.042133 141513 BNE P+13 >>MPX state: Channel SR 3 entered State B >>MPX pio: Channel SR 3 loaded IOAW 050200 from address 056775 >>MS iobus: Received data 050200 with signals TOGGLESR | TOGGLEINXFER | READNEXTWD >>IOP irq: Device number 6 asserted INTREQ at priority 11 >>MPX sr: Device number 6 denied SR 3 >>MS iobus: Returned data 000000 with signals JMPMET >>MS iobus: Received data 000000 with signals INTPOLLIN >>MS iobus: Returned data 000006 with signals INTACK | JMPMET >>IOP irq: Device number 6 acknowledged interrupt request at priority 11 >>CPU instr: 00.042134 000006 external interrupt >>> Same problem as step 252 above, except that diag expects reject status to be set but SIO to still be active, i.e., interrupt and termination has not occurred. this step does: 00.042132: SIO 1 00.042133: BNE P+13 00.042134: TIO 0 <-- reject status + SIO active must occur here! 00.042135: BNE P+4 00.042136: STOR DB+65 >>> Problem is that uP sets reject status but then delays 8.716 msec (tape stop delay) before asserting IRQ so diag expects to see reject status and SIO still active when TIO is done. Fixed by adding controller unit and appropriate delay. + E223 ACT. DS 0 000 000 001 001 110 STEP 0267 EXPEC.DS D D00 0DD 10D DD1 110 ready is not set when expected step does: >>CPU instr: 00.041433 030161 SIO 1 >>CPU instr: 00.041434 004000 DEL,NOP >>CPU instr: 00.041435 141514 BNE P+14 >>CPU instr: 00.041436 031251 PCAL 251 >>CPU instr: 00.040513 041003 LOAD DB+3 >>CPU instr: 00.040514 030240 TIO 0 <-- expects drive to be still READY here >>MS iobus: Received data 000000 with signals DSTATSTB >>MS iobus: Returned data 000116 with signals JMPMET >>CPU instr: 00.040515 141611 BGE P+11 >>CPU instr: 00.040526 051007 STOR DB+7 >>CPU instr: 00.040527 031400 EXIT 0 >>> Problem is that drive is not ready as soon as command starts instead of after the controller delay. + E222 STEP-0270 NO INTERRUPT AFTER SIO P012 STEP-0270 PAUSE ON ERROR >>MS serv: Unit 0 Read Record start phase service entered >>MS cmds: Unit 0 Read Record tape command started at position 0 >>MS xfer: Unit 0 read 2180048-word record >>MS serv: Unit 0 delay 17556 service scheduled >>MS serv: Unit 0 Read Record end phase service entered >>MS cmds: Unit 0 Read Record tape command completed at position 554400 >>> Problem is that "tape runaway" status isn't signalled after 25 feet. Instead, Read command skips entire gap, regardless of length. Fixed by changing sim_tape_rdlntf and sim_tape_rdlntr to detect runaway and return MTSE_RUNAWAY. + E276 STEP-0270 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) SIO NOT OK >>> Problem is MTSE_RUNAWAY causes immediate abort in start phase; fix is to abort in end phase after movement delay. + E223 ACT. DS 1 000 000 101 110 100 STEP 0152 EXPEC.DS 1 D0D 0DD 1DD D11 110 E236 SAME STEP -COMMAND REJECTED- IN READ DS >>> Poblem is TOGGLEOUTXFER check was only checking write; fixed to check for class_write and data_xfer. + E223 ACT. DS 0 010 000 111 011 110 STEP 0514 EXPEC.DS D DD0 0DD D1D D01 110 tape mark is reported but not expected >>> Problem is rewind doesn't clear tape mark status! Fixed by clearing controller status in ml_start_command. + E276 STEP-0525 ELAPSED TIME IS OVER LIMIT AND TAPE (DRIVE 0) NO INTERRUPT previous GAP overwrites part of 4000-word record so read will be short >>> Problem is DEVEND occurs, but we don't exit State D to State A; fixed by adding abort condition to next_state(). >>> Problem is DEVEND on chained read should result in command reject and IRQ at READNEXTWD (by INXFER FF set at command scan); fixed by adding a command_expected flag. + E347 STEP-0532 NO TIMING ERROR AFTER 12 SENSE E347 STEP-0533 NO TIMING ERROR AFTER 12 SENSE diag issues Read (Write) Record and then chained 64-word reads (writes) for a 4288-word record after 18 reads (writes) of 64 words (1152 words), issues chained 12 Sense commands and then 4 more reads (writes) >>> Problem is Senses should result in Timing Error because data isn't pulled (sent) fast enough from drive. data delay is 44 instructions, but 12 Sense commands take 36 instructions, so they complete before next word is ready 45 ips * 1600 bpi = 72000 bytes/second = 13.9 usec / byte typ. 2.5 usec/instruction = 5.6 instructions/byte = 11.1 instructions/word (not 44) Fix by changing tapelib data time to 22 or 11 instructions/word (800/1600 bpi) and changing Timing Error from a unit to a controller status value (so interrupt will occur). + E222 STEP-0614 NO INTERRUPT AFTER SIO P012 STEP-0614 PAUSE ON ERROR >>MS cmds: Unit 0 Read Record tape command started at position 5600 >>MS xfer: Unit 0 read 250-word record >>MS serv: Unit 0 delay 17556 service scheduled >>MS serv: Unit 0 Read Record end phase service entered >>MS cmds: Unit 0 Read Record tape command completed at position 5604 >>MPX sr: Device number 6 asserted SR 3 >>MPX state: Channel SR 3 entered State D >>MS iobus: Received data 000000 with signals ACKSR | PREADSTB | READNEXTWD >>MPX sr: Device number 6 denied SR 3 >>MS iobus: Returned data 000000 with signals JMPMET >>> Problem is read sees tape mark but does not set return signal DEVEND; fix by setting DEVEND if read aborts in the start phase. + E223 ACT. DS 1 000 000 101 000 100 STEP 0724 EXPEC.DS D DD0 0DD 1DD DD1 110 E236 SAME STEP -COMMAND REJECTED- IN READ DS diag does a 1097-word read on a 9-word record, followed by a Return Residue program returns DEVEND but then rejects the Residue command diag expects No Error completion and correct residue values inbound_signals=(ACKSR | PREADSTB | READNEXTWD) >>> Problem is that PREADSTB sets DEVEND and command_expected and then READNEXTWD tests command_expected and sets REJECT. Fixed by eliminating READNEXTWD signal on cycle where DEVEND is returned. ----------------------------------------- 30036A/B MPX CHANNEL TEST (HP D422A.01.2) ----------------------------------------- + E51 ILLEGAL STATE DETECTED IN SR(00) >>> Problem is TIO E-bit is not set unless state error occurs during CHANSO, whereas RIO E-bit is set at any time that more than one state is set. State error clocks E + current RAM address into register that supplies bits to TIO. Maybe have illegal state in mux_poll() set state_error_register (cleared by IORESET) and then merge state_error_register with DIO_OK for TIO. + E55 ADDR PAR SHOULD BE 00; IS 01 >>> Problem is that WRRAMS writes ALL RAMs at the same time! So writing 020000 to the state RAM to set State_A also writes it to the address RAM, producing "address parity" of 0. The next time in loop, WIO writes 000000 (to allow next_state to be set) so "address parity" should be 1 the second time! + E56 STATE PAR SHOULD BE 00; IS 01 >>> Problem is that WRRAMS writes ALL RAMs at the same time! so state RAM is loaded with next state even with order RAM is written. + E16 STATE FOR RR IS B SHOULD BE D for return residue, diag expects states A, D, C, A, but program does A, B, C, A (if expected state is changed to B, then section passes) 30035 manual says A, B, C, A note: set bank does ADCA, and the A->D vs. A->B decision is based on Set_Bank_Order_Next Return Residue is 0001 0, whereas Set Bank is 0001 1. but: "The order tested in State A is the order that will be written to the RAM at the end of the state." Next_State_B = ~REQ * State_A * ~Set_Bank_Order_Next Set_Bank_Order_Next = ~CWBUS1 * ~CWBUS2 * CWBUS3 * CWBUS4 CWBUSn is input to RAMs, while RR is IOCWn output of RAMs + E18 STATE FOR RD IS D SHOULD BE C CIO for order + load WIO 7777 into word count ram RIO 7777 from ram into register CIO for order + increment RIO increments register to 0 CIO for state + load RIO 00 into state register (BUT ALSO LOADS 7777 from word count ram into register!) >>> Neglected to set EOT when counter rolled over. ---------------------------- TERMINAL DATA PD427A 01.01 ---------------------------- + 11/03 RIO after send expects %1777 not previous char read maybe because shift out of send data leaves stop bits? + 03/10 TIO after master reset expects 040000 not 046000 >>> Problem is reset does not clear the status register. + 30/26 operation outside time limit section 2, step 22, send/receive test param 121405: char size 3 (= 12 bits), baud rate 5 (= 2400) ==> 200 cps/5 msec char delay = 2000, which corresponds to 5 msec at 2.5 usec per instruction DB+15 (%63, 51.) must be between %64-%67 (52.-55.) for 171120 size 2 (= 11 bits), baud 177, time (%1163, 627.) must be between %1150-%1162 (616.-626.) >>> corrected service time setting. + 35/35 wrong data after receive aux channel or echo section 3, speed-sense test chan 17 interrupts when only chan 16 has the correct baud rate >>> corrected to receive only when same baud rate as connected channel. + 46/46 count of echoed characters received is wrong section 4, echo test >>> problem is echo is not performed in loopback mode. + 50/56 (282E) timeout before 10 characters sent (halts twice) test section 5 allows 750 msec to transmit 10 sync chars param 162660: char size 101 (= 6 bits), baud rate 176. (81.4 baud) = 13.559 cps = 74 msec/char service is 30769; exact calc would be 29500 or 738 msec diag allows 750 msec for 10 chars >>> corrected service time setting. + 51/56 (292E) number of chars on receive channel <> 5 chars with sync = 1 are not detected (no start bit) but this is detected only for terminal mode, not diag mode! actually, sync prevents transmission, whereas it should prevent reception (a send with sync must generate transmission interrupt) in practice, we cannot know whether receive channel receives sync, except in loopback case see line_service ---------------- CPU/MS Cold Load ---------------- - NOTE: Cold load executes this SIO program: 00.001000: SETBNK 00 00.001002: CONTRL 000000,000006 00.001004: READ #256,007200 00.001006: CONTRL 000000,000007 00.001010: ENDINT 000000 + DEVEND doesn't schedule end phase for read >>> Problem is TOGGLESR continues with program without waiting for end of Read. Fix is if data phase and > length, sched end phase, set devend, do NOT assert SR. Then when end phase, set SR; devend will be returned on following PREADSTB. + RTNRES operand is positive but should be negative value >>> fix = show RTNRES operands as the negative of the stored value. + return residue value is not sign-extended >>> fix = sign-extend the mux_cntr_ram value. + mux goes off into weeds after DEVEND >>> fix = send correct signals to controller and do proper DRT fetch. + attempting 256-word read on 32-word record doesn't assert DEVEND >>> fix = add device_end flip-flop to ms controller. + doing "br 2.3" give SCPE_ARG because "br" has to add -P for pbank default, but banked address with -P is an error. Really want "sim_vm_parse_addr" to have a default, based on command executing (PBANK for BREAK/NOBREAK/RUN/GO, DBANK for EX/DEP). >>> fix = add a configuration set to each of the commands that call parse_addr. + "e -p 23" with PBANK = 2 prints "00.000023" but gets value from 2.23. >>> Problem is fprint_addr should use global sim_switches to print PBANK if -p (maybe?). What about "e -p 1.23"? Should give SCPE_ARG. Fix is to change parse_addr to return linear address, rather than interpret in examine. + after MPE cold load succeeds, P instruction is wrong. >>> Problem is fprint_stopped_gen is not using PBANK to obtain value; fixed by changing cpu_ex to use PBANK if SIM_SW_STOP is passed. + loading MPE V/R results in a syshalt. addr 0 (cstp) = 173600. The CST looks OK: 173600: 000037 173601: 000004 173602: 000000 173603: 000000 173604: 060057 173605: 000000 173606: 000001 173607: 177454 PB = 177454, PL = 177747, but memory at PL doesn't look like an STT. In fact, it's 030370, i.e., initialized (actually, whole block is 030370). >>> fix = must use PBANK to access STT! + State C sioCNTL toggles SR at the end of the command, which stops the channel. Why? Because separate device_sr FF is supposed to be set by PCMD1, PCONSTB. >>> fix = add device_sr. + JUMP order doesn't jump. Address RAM is overwritten in DRT fetch. Conditional JUMP does DRT store in state C; should unconditional JUMP also? >>> fix = DRT store for both JUMP and JUMPC-if-JMPMET. + bool mux_is_idle => int mux_is_busy (reference count from zero) + SETBNK does state_c -> state_a -> state_d, but then poll stops. mux_poll clears mux_request_set and sets dibptr->service_state = srIDLE. MS sr ff remains set, but because controller is not called in state_a, mux_service isn't called. >>> Fix is do not change mux_request_set in mux_poll; mux_service now sets OR CLEARS as directed by controller.