HP 3000 Simulator Debugging In Progress ======================================= Listed below are problems in the HP 3000 simulator that are currently being debugged. ------------------------ Extended Instruction Set ------------------------ Note this SSB from the CD-ROM: HP3000 MPE-V Software Status Bulletin KPR Number: 5000465195 Product Name: MPE V/E Product Number: 32033G One Line Description: On 70 and 6x, the CVBD instruction does not trap on Decimal Overflow. Update Date: 19910830 Submit Date: March 1989 *** PROBLEM TEXT *** On the HP3000 series 6x and 70, decimal overflow is not trapped. *** CAUSE TEXT *** Problem is caused by the microcode routine that outputs the converted numbers to memory. This routine is shared amoung several decimal instructions and does not work correctly for CVBD. *** TEMPORARY SOLUTION TEXT *** Assemble a 207X4 instead of a 206X4 or CVBD. This instruction calls up a firmware emulation routine that does work correctly. This routine does not reside in microcode. . A patch is availiable that permanetly causes CVBD and CVDB to trap to unimplemented instruction, which in turn causes ININ to call up the firmware simulation module FIRMSIM. This module correctly implements CVBD and CVDB. Using this patch will cause the instruction execution to take longer and the system will suffer a performance penalty. MAKE SURE THE CUSTOMER UNDERSTANDS THIS.!!!! *** FIX TEXT *** There is no permanet fix. ---------------------- CST Expansion Firmware ---------------------- Regarding tests, the changes appear to be OK by inspection, but I haven't written any tests because I don't know how to control the M bit for a segment, nor the U bit for the STT header (using old firmware), nor how to arrange to have a procedure start at PB + 0 (the target of an STT 0 call). The U-bit in the STT header seems to be set by the compiler always, but maybe I'm missing some command. What I need is a program that tests the following inter-segment transfer conditions starting from user mode: 1. XBR to a PM segment (should abort with a Privilege Violation trap with both old and new firmware). 2. PCAL 0 with an external label specifying STT 0 (should abort with an STT Uncallable trap with new firmware). 3. PCAL 0 with an external label specifying STT 0 with the U-bit set (should abort with an STT Uncallable trap with old firmware). 4. PCAL 0 to a PM segment with an external label specifying STT 0 with the U-bit set (should abort with an STT Uncallable trap with old firmware). 5. PCAL 0 with an external label specifying STT 0 with the U-bit clear (should succeed with old firmware). 6. PCAL 0 to a PM segment with an external label specifying STT 0 with the U-bit clear (should succeed with old firmware). 7. PCAL > 0 to an uncallable procedure (should abort with an STT Uncallable trap with both old and new firmware). ------- MPE V/E ------- --------------------------- GICDIAG 2.11 (S28S231C.SPL) --------------------------- --------------------------- GICDIAG 1.26 (S28S231C.SPL) --------------------------- --- DUS --- -------- Colossus -------- Running the COLOSSUS program disc test on a system with three CS/80 drives fails. The drives are located on GIC channel 11 address 0 (LDEV 1), channel 11 address 5 (LDEV 2), and channel 6 address 6 (LDEV 3). The symptom is the "Brief System Summary" is partially printed, and then the simulator hangs in an infinite loop (i.e., is unresponsive to CTRL+E). BUG: The clear_fifos routine in CPP clears the inbound FIFO by reading register 2 and, if inbound data is present, reading register 0 to remove it. This action is repeated in a loop until the data present status denies. The problem is that if a CS/80 Reporting Phase returns qstat 0, it appears on the bus as EOI + 00H, which is loaded into the inbound FIFO as 140000 (tag + data). Unfortunately, this is the same encoding as an uncounted transfer enable, which the fifo_unload routine will leave in the FIFO to satisfy the diagnostic. This causes clear_fifos to loop forever. FIX: Perform the uncounted transfer test only for the outbound FIFO. The next problem is that a GIC timeout occurs. The console response is: DISC LDEV #1 NOT RESPONDING TO I/O DISC LDEV #2 NOT RESPONDING TO I/O MPE Table SBUF has overflowed!!! ...and then MPE hangs. The sequence is: - bus 0 does SIOP - bus 0 does Locate and Read - during xfer, bus 5 does HIOP; this is deferred - during xfer, bus 5 does several more HIOPs; these are ignored, as status is "stopping" - when DMA ends, CPP does poll, which gets DMA completion (but DMA is still busy until OBSI) - bus 0 does WAIT, which does poll; now DMA is idle, which sets New Status CSRQ, but PHI IRQ gets precedence - bus 0 does DSJ - bus 0 gets talk 0, secondary 10; this should deny NSEN, i.e., clear Status_Interrupt, because poll stopped - CPP sets reg 3 mask to IRQ | data - CPP reschedules for deferred HIOP bus 5 (new status recognized) ==> HIOP should not be recognized while GIC is waiting for DSJ data!! - CPP sets reg 3 mask to IRQ | status change | poll response --> so now incoming qstat byte won't cause CSRQ! - bus 0 sends 0 byte + EOI - bus 5 does SIOP - CPP sets bus 5 to "starting" but GIC SIOP does not assert CSRQ --> now, neither channel program is running - bus 0 DSJ times out BUG: NSEN is not calculated correctly. Should be: [R1] The PHI is the controller-in-charge [RB] and CSRQ is not disabled [RB] and DMA is inactive and a parallel poll is in progress (ATN and EOI are asserted) and the PHI is not interrupting for a PPR when an OBSI is received and the PHI is not requesting a DMA cycle or: [R1] The PHI is not the controller-in-charge [RB] and CSRQ is not disabled [RB] and DMA is inactive. The following state changes affect the condition of the NSEN signal: - CIC changes for reset, TCT (acceptor), R1/R6 (IFC)/R7 (offline) write - CSRQ disable changes for RB/RF write - DMA busy changes for reset, RB (start) write, DMA state 4 - poll changes for acceptor, DCL/SPD/UNL/dataout (acceptor) The following state changes affect Status_Interrupt: - RB write denies - R1/R6/R7/RF writes test and set - DMA state 4 tests and sets - acceptor no-poll denies; poll tests and sets FIX: Correct NSEN so that an HIOP is not recognized in the middle of a transfer. BUG: Once the timeout bit in reg F sets, it is never cleared! FIX: Clear the timeout bit when reg B is written to start DMA. With these corrected, COLOSSUS runs properly for a full set on one disc. Specifying all tests for just the two discs on channel 11 results in: I/O OPCODE = %000001 DATA WORD1 = %000130 DATA WORD2 = %001020 DATA WORD3 = %022150 I/O STATUS = %000000 MAILBOX #5 = %030370 MAILBOX #6 = %030370 MAILBOX #7 = %030370 **** SYSTEM FAILURE #201 (HARDRES failure, non-responding device on SIOP) STATUS %100031 DELTA-P %000356 ...with all showing "in test 3". This failure also occurs if only the READ TEST is specified for both LDEV 1 and 2. This problem appears to be that issuing SIO to IMBA with IMB IRQ and CSRQ pending doesn't seem to service channel 1 (IMBA) before idling CPP. BUG: CPP IRQ service does WIOC to IMBA to assert INTREQ on the IOP bus. But IMBA does not assert CSRQ in return signals, so imb_cycle removes the CSRQ for channel 1. This causes pending SIO to be ignored, and next attempted SIO gets "not ready" because channel_request is still set, which causes the SF. FIX: Return CSRQ from "imba_imb_interface" if the "channel_request" is set. Specifying all tests for all three discs results in **** SYSTEM FAILURE #642 (ININ stack overflow while I/O frozen and disabled) STATUS %102001 DELTA-P %002047 ...with all showing "in test 4" or "in test 3". Does not fail when specifying all tests with any two discs. BUG: HIOP is returning CCG when program is in the wait state. Should be returning CCE. COLOSSUS appears to send HIOP repeatedly (as a result?). The HIOP logic is wrong. See PDF p.249 in ucode manual and p.149 in HP 300 manual. Currently, the routine Is masking off the wait bit and then testing the same wait bit to determine CCE vs. CCG! FIX: Correct HIOP CCG return. With these fixes, the COLOSSUS disc test succeeds. ---------------------- CSRQ Set vs. Scheduled ---------------------- Attempting to use a gate to enter CPP service on each pass through the CPU instruction loop vs. scheduling for 2 (or 1) event ticks fails. First fail is during Identify, where the CPP sets up a PHI interrupt on data reception. This asserts CSRQ when the first byte arrives. The CPP code then reads both bytes because it assumes that the device is fast enough. But we get only the first ID byte; the second attempt gets DNV. Next, Write Loopback fails. After the last byte is sent, DMA asserts CSRQ, and the CPP is entered. This occurs before the DC device can deny NRFD and complete the loopback. The problem seems to be that the DC service is entered for the penultimate byte, denies NRFD, so the GIC responder calls transfer_data to source the final byte, and the DC responder accepts it, asserts NRFD, and schedules NRFD denial and loopback completion. transfer_data sees that the last byte is out and so asserts CSRQ. After unwinding from the DC service call, cpp_request is set, so CPP is immediately entered. The loopback completion event is still scheduled, but CPP assumes that the transfer is complete and sends an Unlisten, which aborts the loopback because the completion event has not occurred. The problem is the execution path: DC final byte event -> CPP CSRQ entry -> DC command completion If CSRQ assertion is scheduled, even for one tick, then the completion event is serviced before CPP entry, and everything works. ----------------------------- CS80DIAG Step 69 (Burst Mode) ----------------------------- Step 69 does a Set Burst (Last) with a burst size of 1 block. Then it does: 00.140551 Write secondary 05 count 14 burst 1 address 00140446 chain 0 | record mode | left byte 00.140556 Wait | response 0400 00.140560 Write secondary 16 count 2048 burst 256 address 00132035 chain 0 | burst mode | left byte | no EOI 00.140565 Relative Jump 140571 00.140567 Relative Jump 140556 00.140571 Wait | response 0000 00.140573 Device Specified Jump 140620, 140577 00.140577 Write secondary 05 count 3 burst 1 address 00131204 chain 0 | record mode | left byte 00.140604 Wait | response 0601 00.140606 Read secondary 16 count 20 burst 1 address 00124621 chain 0 termination 140613 | record mode | left byte 00.140613 Wait | response 2005 00.140615 Device Specified Jump 140620 00.140620 Interrupt/Halt 0001 | CPVA 1 --------------------- GIC Diagnostic Status --------------------- Test sections 1-17 and 19-25 pass. Section 18 tests memory parity error detection, which creates bad parity in memory via commands to the Fault Logging Interface (FLI), which we do not simulate. ------------------------------------ GIC Diagnostic Step 105 (GIC-to-GIC) ------------------------------------ Two problems are present so far: 1. DMA data reception state 20 does FIFO unload of first byte, which denies NRFD by calling hpib_control. That calls gic_hpib_respond, which sees the denial and calls transfer_data to resume the transfer. But that reenters state 20 recursively, which does another FIFO unload. This time, NRFD is already denied, so it skips the hpib_control call, and continues on to states 22, 10, 11, which unloads another FIFO byte (the third) before continuing into state 15, which writes the last two bytes as the first memory word. The first byte is lost. The cycle continues until all bytes are unloaded from the FIFO. After all bytes are received, DMA exits in state 5 to wait for CSRQ. But then the recursive call unwinds, and DMA resumes in state 24 (22 -> 24), going through 10, 18, 19 (memory read), 23, 21 (memory write of "final" byte), and back to 5. Received data is screwed up as a result. 2. Because DMA executes as long as it can, the two GICs transfer the entire data block within the WIOC to register B that starts DMA. They also execute both channel programs through in response to the SIOP that starts the second program. The diagnostic expects to be able to check the CPVA while it is running. Because the channel program completes before the next instruction executes, the diagnotic reports "Error in step 105, UUT CPVA word 2 is !8001 expected !0000". Looks like DMA has to be paced with an event timer, so that the program can execute concurrently with DMA. Controller program (bus address 30, channel 12, started as device 3) is: 00.161000: Write secondary 05 count 255 burst 0 address 00170000 chain 0 |record mode | left byte | no update 00.161005: Relative Jump 161011 00.161007: Interrupt/Run 0001 | CPVA 2 00.161011: Interrupt/Run 0001 | CPVA 1 00.161013: Wait | response 0000 --> this is waiting for PPOLL 00.161015: Read secondary 12 count 255 burst 0 address 00170000 chain 0 termination 161026 | record mode | left byte | no update 00.161022: Interrupt/Halt 0002 | CPVA 2 00.161024: Interrupt/Halt 0002 | CPVA 2 00.161026: Interrupt/Halt 0002 | CPVA 3 00.161030: (invalid)177777 Device program (bus address 3, channel 11, started as device 5) is: 00.160000: Write Register F value 000005 00.160002: Write Register 3 value 100004 00.160004: Wait | response 0000 00.160006: Write Register 2 value 177777 00.160010: Read Register 0 | response 040005 00.160012: Execute DMA read count 255 burst 0 address 00170400 termination 160023 | record mode | left byte | no update 00.160017: Relative Jump 160025 00.160021: Interrupt/Run 0001 | CPVA 2 00.160023: Interrupt/Run 0001 | CPVA 2 00.160025: Interrupt/Run 0001 | CPVA 1 00.160027: Write Register 6 value 000010 --> this generates a PPOLL response 00.160031: Write Register 3 value 100004 00.160033: Wait | response 0000 00.160035: Write Register 2 value 177777 00.160037: Read Register 0 | response 000000 00.160041: Write Register 6 value 000001 00.160043: Write Register 1 value 000001 00.160045: Execute DMA write count 511 burst 255 address 00171000 termination 160052 | burst mode | left byte | no EOI | no update 00.160052: Interrupt/Halt 0002 | CPVA 2 00.160054: Interrupt/Halt 0002 | CPVA 3 00.160056: (invalid)177777 - fail occurs because poll is already active when WIOC register 6 "poll response" should assert PPR 3, but device calls hpib_control to "redo poll", but poll is calculated in hpib_control (device call) and not in cross-call hpib_respond (controller), so it's never seen. it's seen in the reverse case because the poll becomes active after "poll response" is set, so controller sees it when it conducts the poll. ---------------------------- GIC Diagnostic Step 38 (DMA) ---------------------------- Test 8 step 38 says that it, "Verifies that the DMA EN bit, register 8, is not set when this section of the diagnostic is begun." However, the diagnostic issues a code %12 to the IMBA, which does not appear to be defined. If the CPP returns an error code, the diagnostic faults. However, if it returns success, a succession of "HP-IB mailbox timeout" messages are printed, followed by: Error in step 38 INIT did not clear DMA ENABLE nor set DMY4-0=0 The timeouts occur because the data word at location 00.053253 is set to 1 instead of 0. This in turn causes the BRE P+4 at 00.056221 to fail, allowing the BR P+17 following it to execute, which skips around the SIO that executes the IMBA instruction. The target of the BR P+17 is a PCAL to the procedure at 00.056004 that tests location %774 for CPP instruction completion: 00.056004: ADDS 4 00.056005: LRA Q+3 00.056006: LRA P-4 00.056007: LDI 2 00.056010: MOVE PB,3 00.056011: LDX Q+4 (= %774) 00.056012: PLDA (loads the completion code from %774) 00.056013: DUP,NOP 00.056014: STOR Q+2 00.056015: LSR #15 00.056016: BRE P+4 (tests bit 0 for completion) 00.056017: BR P+30 (branch if complete) 00.056020: NOP,DDEL 00.056021: NOP,STBX 00.056022: LOAD P+7 00.056023: STOR Q+1 00.056024: INCM Q+1 00.056025: LOAD Q+1 00.056026: CMPI 0 00.056027: BGE P+16 (branch if counter has expired) 00.056030: BR P+3 00.056031: 142550 (timeout counter initial value) 00.056032: 000013 00.056033: LDX Q+4 (= %774) 00.056034: PLDA (reload the completion code to see if it changed) 00.056035: DUP,NOP 00.056036: STOR Q+2 00.056037: LSR #15 00.056040: BRE P+4 (test bit 0) 00.056041: BR P+6 (now is complete) 00.056042: NOP,DDEL 00.056043: NOP,INCX 00.056044: BR P-20 (continue to wait) 00.056045: LDI 312 (timeout error routine is called here) 00.056046: PCAL 77 00.056047: LOAD Q+2 (reload the completion code) 00.056050: EXF #1:#15 00.056051: LDI 0 00.056052: CMP,NOP 00.056053: BNE P+3 (verify that the other 15 bits are zero) 00.056054: EXIT 0 00.056055: NOP,DELB 00.056056: LOAD Q+2 (error; reload the completion code) 00.056057: LSR #14 00.056060: ANDI 1 00.056061: BRE P+3 (test bit 1) 00.056062: LDI 313 (bit 1 -> "IMB parity error detected by HP-IB interface") 00.056063: PCAL 77 00.056064: LOAD Q+2 (reload) 00.056065: LSR #13 00.056066: ANDI 1 00.056067: BRE P+3 (test bit 2) 00.056070: LDI 314 (bit 2 -> "Invalid timer interrupt detected by HP-IB interface") 00.056071: PCAL 77 00.056072: LOAD Q+2 (reload) 00.056073: LSR #12 00.056074: ANDI 1 00.056075: BRE P+3 (test bit 3) 00.056076: LDI 315 (bit 3 -> "Non-responding module timeout detected by HP-IB interface") 00.056077: PCAL 77 00.056100: LOAD Q+2 (reload) 00.056101: LSR #11 00.056102: ANDI 1 00.056103: BRE P+3 (test bit 4) 00.056104: LDI 316 (bit 4 -> "Invalid mailbox opcode detected by HP-IB interface") 00.056105: PCAL 77 00.056106: LOAD Q+2 (reload) 00.056107: LSR #10 00.056110: ANDI 1 00.056111: BRE P+3 (test bit 5) 00.056112: LDI 317 (bit 5 -> "SIO disabled flag detected by HP-IB interface during SIOP,RIOC or WIOC") 00.056113: PCAL 77 00.056114: LOAD Q+2 (reload) 00.056115: LSR #9 00.056116: ANDI 1 00.056117: BRE P+3 (test bit 6) 00.056120: LDI 320 (bit 6 -> "SIOP failed because previous channel program is not halted") 00.056121: PCAL 77 00.056122: LDX Q+3 (= %770) 00.056123: PLDA (load the opcode from location %770) 00.056124: LDI 1 00.056125: LCMP,NOP 00.056126: BE P+7 (opcode 1 = HIOP skips test for bit 7) 00.056127: LOAD Q+2 (reload) 00.056130: LSR #8 00.056131: ANDI 1 00.056132: BRE P+3 (test bit 7) 00.056133: LDI 321 (bit 7 -> "SIOP or HIOP failure - halt pending but not in WAIT") 00.056134: PCAL 77 00.056135: LOAD Q+2 (reload) 00.056136: LSR #7 00.056137: ANDI 1 00.056140: BRE P+3 (test bit 8) 00.056141: LDI 322 (bit 8 -> "INIT failed - unable to bring system controller on-line") 00.056142: PCAL 77 00.056143: LOAD Q+2 (reload) 00.056144: LSR #6 00.056145: ANDI 1 00.056146: BRE P+3 (test bit 9) 00.056147: LDI 323 (bit 9 -> "INIT failed - GIC not system controller") 00.056150: PCAL 77 00.056151: LOAD Q+2 (reload) 00.056152: LSR #5 00.056153: ANDI 1 00.056154: BRE P+3 (test bit 10) 00.056155: LDI 324 (bit 10 -> "Data not valid detected by HP-IB interface") 00.056156: PCAL 77 00.056157: LOAD Q+2 (reload) 00.056160: LSR #4 00.056161: ANDI 1 00.056162: BRE P+3 (test bit 11) 00.056163: LDI 325 (bit 11 -> "IMBA status bit 11 set") 00.056164: PCAL 77 00.056165: LOAD Q+2 (reload) 00.056166: LSR #3 00.056167: ANDI 1 00.056170: BRE P+3 (test bit 12) 00.056171: LDI 326 (bit 12 -> "IMBA status bit 12 set") 00.056172: PCAL 77 00.056173: LOAD Q+2 (reload) 00.056174: LSR #2 00.056175: ANDI 1 00.056176: BRE P+3 (test bit 13) 00.056177: LDI 327 (bit 13 -> "IMBA status bit 13 set") 00.056200: PCAL 77 00.056201: LOAD Q+2 (reload) 00.056202: LSR #1 00.056203: ANDI 1 00.056204: BRE P+3 (test bit 14) 00.056205: LDI 330 (bit 14 -> "IMBA status bit 14 set") 00.056206: PCAL 77 00.056207: EXIT 0 [to test error printer, break at 00.056013, set RA to the completion code, continue] This routine loops until bit 0 is set, indicating completion, or until a counter expires. Because the SIO was never done, the bit never sets, which leads to counter expiration and the timeout error. (P+20 would be the procedure exit, and it seems to be an error that the completion checker is called when the SIO is never done. But this isn't clear from reading the code at this point. Changing 00.056222 from 140017 to 140020 produces a BR P+20.) Step 38 appears to begin at 00.024172, which does: 00.024172: LDI 46 00.024173: STOR DB+24 00.024174: PCAL 67 ...which stores "38" in location 00.101760. It then calls a procedure that begins at location 00.075476. It does: 00.075476: ADDS 1 00.075477: LRA Q+1 00.075500: LRA P-3 00.075501: LDI 1 00.075502: MOVE PB,3 00.075503: ZERO,NOP 00.075504: LDI 145 00.075505: PCAL 41 (--> 00.057522) 00.075506: BRE P+3 (this tests 00.053253, which is 0) 00.075507: EXIT 0 00.075510: NOP,DELB 00.075511: LDI 12 \ 00.075512: LDX Q+1 | (this stores %12 in %770) 00.075513: PSTA / 00.075514: PCAL 36 (--> 00.056211, which eventually does SIO 1) 00.075515: LDI 1 <<== SETTING THIS TO 1 INSTEAD OF 0 EVENTUALLY CAUSES THE FAILURE! 00.075516: LDI 145 00.075517: PCAL 42 (--> 00.057552) 00.075520: EXIT 0 The routine at 00.057552 does: 00.057552: ADDS 1 00.057553: LOAD P+6 00.057554: STOR Q+1 00.057555: LOAD Q-4 (= %145 from push at 00.075516) 00.057556: CMPI 1 00.057557: BNE P+10 00.057560: BR P+3 00.057561: 001214 00.057562: 000005 00.057563: LOAD Q-5 00.057564: LDX Q-4 00.057565: PSTA 00.057566: BR P+13 00.057567: BR P+2 00.057570: NOP,DADD 00.057571: ZERO,NOP 00.057572: PCAL 62 00.057573: LOAD Q+1 (= %1214) 00.057574: LOAD Q-4 (= %145 from push at 00.075516) 00.057575: LADD,NOP 00.057576: LADD,STAX 00.057577: LOAD Q-5 (= %1 from push at 00.075515) 00.057600: PSTA (stores 1 in 00.053253) <<== THIS IS THE PROBLEM! 00.057601: EXIT 2 This returns to: 00.024175: PCAL 25 (--> 00.036300) ...which appears to eventually set up an INIT instruction: 00.000770 000003 (--> WIOC) 00.000771 020130 (INIT, channel 11) 00.000772 020130 (WIOC command to write) 00.000774 000000 But because 00.053253 is set to 1 instead of 0, the SIO that starts the CPP is not executed, so this (and all subsequent CPP commands) time out. BOTTOM LINES: 1. Unknown opcode %12 is sent to the CPP, for unknown reasons. This should result in bit 4 set in the status return ("Invalid mailbox opcode detected by HP-IB interface"), but apparently it does not and is accepted by the hardware, with unknown effect. 2. Because the code at 00.075515 eventually sets data word 00.053253 to 1 instead of 0 for some unknown reason, the next (and all future) SIO instructions are bypassed, so no additional commands are sent to the CPP. This results in continual HP-IB timeouts until the diagnostic gives up. A workaround for #1 is to ignore and return successful completion for opcode %12. A workaround for #2 is to change the code at 00.075515 from 021001 (LDI 1) to 000600 (ZERO,NOP). These workarounds appear to allow the diagnotic to continue properly. Why they are needed is a mystery. ---------------------------- GIC Diagnostic Step 21 (PHI) ---------------------------- Closest thing we have is S28S231A.SPL, which is the PIC diagnostic for the Series 37. The PHI test comment says, "This algorithm is from GICDIAG 1.26, essentially unchanged." GIC diagnostic step 21 corresponds to PIC diagnostic step 17, which beings at source line 7458 -- search for "BEGIN'STEP(17)". Errors during the test are accumulated as bits in the PHIERR (7) array. Unfortunately, this is not dumped when an error occurs. However, looks like PHIERR (0:6) are locations 00.103217-00.103225. Examining the bits in these words should indicate where they are being set, which in turn should indicate the causes of the errors. The instructions that alter PHIERR bits are: >>CPU reg: 00.103446 000003 A 000001, B 131427, C 001266, X 000003, M i t r o C CCG >>CPU instr: 00.021413 027341 DPF #14:#1 >>CPU reg: 00.103446 000002 A 131427, B 001266, X 000003, M i t r o C CCL >>CPU instr: 00.021414 053701 STOR S-1,I >>CPU data: 00.103447 001266 stack read >>CPU data: 00.103222 131427 data write ...where the DPF can be between 0 and 15. But note that the memory addresses containing the instructions vary. Note that the log file is about 570 MB, and WnBrowse must be set for a large text file setting to ensure that the entire file is displayed properly. Register 2 Status Change Bit Tests ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One of the GICDIAG tests within Step 21 checks the response of the Status Change bit in PHI register 2. The description of the bit in the 264x HP-IB interface manual says: This bit becomes set whenever there is a change in the value of the REMOTE bit in Register [1] while the PHI is a non-controller, or whenever there is a change in the value of the HP-IB CONTROLLER bit in Register [1]. It is cleared when the host processor writes a "1" into its bit position. There are two tests for the presence of this bit. The first sets bits 5 and 6 in PHIERR (2) if the Interrupt Pending and Status Change bits are not set. The second sets bit 0 of PHIERR (5) if the Status Change bit is not set. The second test is straightforward: <> DO'WIOC(!0010,6); <> DODELAY(STARTIME,2); << WAIT FOR IFC>> DO'WIOC(!0000,6); <> DO'RIOC(2); <> PHIERR(5).(0:1) := NOT TOS'.(8:1); An earlier INIT had set system controller status and cleared controller status. So the IFC, which sets controller status, is expected to set the Status Change bit also. However, the first test is peculiar: <> DO'WIOC(!0020,6); <> DO'WIOC(!8080,3); <> DO'WIOC(!403E,0); <> DO'RIOC(1); <> <> DO'RIOC(2); <> <> The first part of the test asserts REN and then sends a Listen 30 command. This is the proper bus sequence for placing the listener into remote mode. The read of Register 1 and the test of bit 10 (REMOTE) would verify that the command succeeded. However, the check is commented out in PICDIAG, and a code trace of GICDIAG execution shows that the bit is never tested there either. The test then proceeds by reading Register 2. The code trace shows that it checks bit 0 (Interrupt Pending) and bit 8 (Status Change) and sets PHIERR (2) bits 5 and 6, respectively, if the Register 2 bits are set. This differs from the PICDIAG code, which has commented out the Status Change bit test. The apparent intent of the GICDIAG test is to verify that Status Change does not occur when the REMOTE bit changes while the PHI is the controller. The problem is that the Status Change bit has been set as a result of an earlier INIT and IFC. The INIT is done after testing registers 3, 4, 5, and 7. It clears Register 2, sets system controller status, and clears controller status. Then an SRQ test is performed, followed by an IFC and REN test. The IFC sets controller status, and that sets the Status Change bit. A FIFO test follows, and then another IFC. Lastly, a test is performed on the DEVICE CLEAR bit. A Universal Device Clear command is sent, and the bit is checked. Because the PHI is the controller, the DC bit is verified as clear. Then the "REMOTE-LOCAL" test listed above is performed. From the manual description, the Status Change bit must be set here, but the first test fails if it is. The second test specifically checks that an IFC that changes from non-controller (i.e., after an INIT) to controller sets the bit. Yet the first test demands that the bit be clear, even though an INIT and IFC have occurred with no intervening reset of the status bit before the test. An exhaustive search has turned up no additional PHI information. It's described in the early 12009A HP-IB Interface for the 1000 A-Series machines, bit the description is verbatim from the 264x manual. The fact that the status bit test has been commented out in the PICDIAG suggests that the results confused the programmer. In thinking about when the host processor would want a PHI Status Change interrupt, there appear to be five cases (excluding INIT and PON, which clear Register 2) in which a change of controller state occurs: 1. IFC is asserted by the PHI as system controller. 2. A Take Control command is sent by the PHI to another controller. 3. The PHI changes its state from offline to online. 4. IFC is asserted by another system controller. 5. A Take Control command is received from another controller. Of these, the first three are actions initiated by the host processor, so notification of the action by an interrupt would be redundant. The latter two are actions initiated by another controller on the bus, and so an interrupt would be useful, as otherwise the host processor would be unaware of the change. The PHI might set the change bit only if the change arrived from an outside source. However, there is a later test that verifies that the change bit is set when the PHI sends a Take Control command to another (non-existent) bus address. And, as noted originally, the second test above verifies the change bit is set when the PHI asserts IFC. So that contention is invalid. That leaves only two possibilities. Either it's a bug in GICDIAG 1.26, or the preceding Universal Device Clear resets the change bit. The former would mean that the diagnostic would fail on a good GIC. It's an outside possibility, though, as the version might be specific to the Starfish and so would have very low visibility. Still, it seems unlikely that it would have been shipped with MPE V/R without any testing. DCL seems not an unreasonable possibility, except that the PHI description of the DCL command says, "Does not clear the current controller." It's not apparent whether that means "does not clear controller status" or "does not clear the status of the controller." If it means the former, and DCL does clear some of the internal status, it's a question of what status is cleared (other than the change bit, which must be cleared for the diagnostic to succeed). ------------------------------- GIC Diagnostic Response to INIT ------------------------------- The GIC diagnostic says: Set 'SYS CTRL' switch on GIC under test to 'OFF' (out) Respond 'GO' >GO Error in step 05 Expected CCG after INIT > Prior to the first GO, the diagnostic does a WIOC to send an INIT, then does a WIOC to register 7 to set the PHI online, then does a RIOC of register 1 to check that the system controller status bit is set. After the GO it does a RIOC of register 1 to check that the system controller status bit is no longer set. Then it executes an INIT instruction, which sends an INIT IMB command. This is decoded on the GIC to assert -SRST to the PHI. According to the PHI documentation, this clears all registers except register 1 (PHI 3), the status register. So it should clear register 7 (PHI 5), which should take the PHI offline. The PIC diag has a comment that says: << The SYSTCTRL bit is always true if the PHI/ABI is not ONLINE. >> << Thus to check the SYSTCTRL switch setting, the PHI/ABI must >> << be placed ONLINE. >> So after INIT, the PHI should report itself as the system controller. The INIT instruction does: IMB (INIT) if reg14 (15) = 1 then -- not GIC (ucode says bit 3 ?!?) CCG elsif reg1 (12) = 0 then -- not system controller CCG else reg6 := %000010 -- set parallel poll response reg7 := %100200 -- set PHI online (bit 0 not used, says PHI) reg6 := %000060 -- assert REN and IFC delay 100 usec reg6 := %000040 -- deny IFC reg2 := %177777 -- clear any interrupt conditions if reg1 (11) = 0 then -- PHI is not the controller reg7 := 0 -- set PHI offline CCG else -- PHI is the controller (REN asserted) CCE end if end if So the diag is expecting CCG, but INIT causes it to be the system controller. --> Is it the "not controller" that causes it to do CCG??? If IMB INIT set system controller but not controller, then the above test passes. But then I get: Set 'SYS CTRL' switch on GIC under test to 'ON' ( in) Respond 'GO' > scp> set gic sys GO Error in step 05 Expected CCE after INIT > ...because the INIT after setting system controller on still fails with CCG when the controller bit is off. RESOLUTION ~~~~~~~~~~ System controller status is determined by the state of the SYS CTRL switch when the PHI is online. When it is offline, the PHI is always the system controller. Also, controller status is cleared on initialization and when the PHI goes from offline to online. It is set when IFC is asserted and the PHI is the system controller. The INIT insutrction asserts the IMB INIT signal to pulse the PHI -SRST line, which sets the PHI offline, clears controller status, and sets system controller status. The test of bit 12 succeeds. Then the PHI is set online. This changes bit 12 to reflect the condition of the SCTRL line and so the SYS CTRL switch. If the switch is OFF, then the PHI is not the system controller, and so it cannot control REN and IFC, so the inhibited assertion does not set controller status. The bit 11 test fails, and the INIT instruction returns CCG. When the SYS CTRL switch is on, the same sequence occurs up to IFC assertion. This time, however, system controller status is set, so IFC and REN are asserted, and the PHI becomes the controller. The bit 11 test now succeeds, and INIT returns CCE. ------------------------------------------- Detach following attach halts with an error ------------------------------------------- HP3000 / MPE V E.01.00 (BASE E.01.00). FRI, FEB 1, 1991, 5:46 PM : Simulation stopped, P: 071144 (PAUS 0) mpe> go sim> att ms0 a.tape sim> det ms0 sim> c Unit not attached, P: 163771 (STOR Q-12) sim> c 17:46/10/LDEV 7 I/O ERROR IGNORED DURING AVR. I/O STATUS % 73 17:46/10/Vol (unlabelled) mounted on LDEV# 7 : MTSE_UNATT status results in a data error reported to the caller (%73 is error code 101 = 5 = "tape error"). Is this all OK? ----------------------------------------------------- Interrupt received for non-configured device on DRT 6 ----------------------------------------------------- 4:05 PM 1/28/19 reported by Robert Mills. Sometimes I get this message when I attach a tape: hh:mm/3/Interrupt received for non-configured device on DRT 6.  Check I/O configuration. For example: : Simulation stopped, P: 072770 (PAUS 0) sim> do mount 7 r TapeLibrary/CSL_Release_F0.tape /HP3000_III/mount-42> attach -r ms0 TapeLibrary/CSL_Release_F0.tape MS: unit is read only MS0    attached to TapeLibrary/CSL_Release_F0.tape, read only, 7970E, unlimited capacity     online, SIMH format 15:02/3/Interrupt received for non-configured device on DRT 6.  Check I/O configuration. : There is no problem with accessing the TAPE I have just mounted or the other 2 TAPES or JOBTAPE (LDEV 10). I might attach several tapes in a session and it doesn't happen. Then on another session the first time i attach a TAPE it will. Thinking it might have something to do with a specific TAPE I attached the same TAPE 20 times. Nothing. I then shutdown the simulator, rebooted (cool start), and started attaching the same TAPE. The second attach generated the message. Did it all again but this time the message appeared on the seventh attach. Next time the fourth, then sixth, then first. The last time I attached the same TAPE 40 times. Nothing. As you see there is no obvious pattern. The config entries for DRT 6 show no problems. LIST I/O DEVICES? Y LOG DRT U  C T SUB              REC   OUTPUT   MODE   DRIVER   DEVICE DEV  #  N  H Y TYPE  TERMINAL   WIDTH  DEV             NAME    CLASSES  #      I  A P      TYPE SPEED          T  N E 7   6   0  0 24 0                128    0             IOTAPE0  TAPE     8   6   1  0 24 0                128    0             IOTAPE0  TAPE     9   6   2  0 24 0                128    0             IOTAPE0  TAPE     10  6   3  0 24 0                128  LP       JA     IOTAPE0  JOBTAPE  ---------- The message comes from procedure EXTGHOST in module 10 (ININ), which does: PROCEDURE EXTGHOST; 01068000 OPTION PRIVILEGED,UNCALLABLE,INTERRUPT; <<03665>>01070000 BEGIN 01072000 INTEGER DRTN = Q+3; << LOC OF DRT NUMBER ON INTERRUPT >> <<03665>>01074000 <<03665>>01076000 EQUATE UNKNOWN'INT'MSG = 410, << MSG CATALOG MSG # >> <<03665>>01078000 OPCONSOLE = 0; << SYSTEM CONS CODE FOR GENMSG >><<03665>>01080000 <<03665>>01082000 DISABLE; << ISSUE A CLEAR INTERFACE >> 01084000 IOMESSAGE(1,UNKNOWN'INT'MSG,%10000,DRTN,,,,,OPCONSOLE); <<03665>>01086000 TOS := %100000; 01088000 ASMB(CIO 1); 01090000 IF >= THEN <<03665>>01092000 BEGIN << MASTER RESET WORKED - RESET INTERRUPTS >> <<03665>>01094000 TOS := %040000; <<03665>>01096000 ASMB(CIO 1); <<03665>>01098000 END; <<03665>>01100000 <<03665>>01102000 END; 01104000 Procedure EXTGHOST is called from the outer block of ININ and corresponds to STT #13 (octal), which is marked as "unused". Interestingly, all STT handlers EXCEPT this one (and STT #10, which calls "CALLHELP") call procedure GHOST, which does a SUDDENDEATH (15). The msssage also occurs in procedure GIP in module 55 (HARDRES), which does: ASMB(TIO 0); << GET STATUS FROM DEVICE >> 02888000 IF < THEN IOFAILURE(DRTN, 0 ); << CONTROLLER FAILURE >> 02890000 DUPLICATE; 02892000 TOS := DBIUNIT; << UNIT EXTRACT INSTRUCTION, DB IS AT BASE OF ILT >> 02894000 IF <> THEN ASMB(XCH; XEQ 1); << EXTRACT UNIT # FROM STATUS >> 02896000 ASMB(DELB); << Q+5 - UNIT >> <<00148>>02898000 TOS := 0; 02900000 TOS := SYSDB; 02902000 ASMB(XCHD); << SET DB TO SYSDB >> 02904000 TOS := SYSDB; 02906000 ASMB(SUB,DELB); << Q+6 - ILT POINTER >> <<00148>>02908000 TOS := ILTP(ISIOP); << Q+7 - SIOP >> <<00148>>02910000 TOS := ILTP(IFLAG).HCUNIT; << HIGHEST CONFIGURED UNIT >> <<01300>>02912000 IF TOS < UNIT OR (TOS := ILTP(UNIT+IDITP)) <= 0 THEN <<00148>>02914000 BEGIN <<00148>>02916000 << Print message here >> <<03663>>02918000 IOMESSAGE(1,UNKNOWN'INT'MSG,%10000,DRTN,,,,,OPCONSOLE); <<03663>>02920000 ASMB( IXIT ); <<00148>>02922000 END; <<00148>>02924000 Tracing the execution, the reported unit number is correctly extracted from the status return. Moreover, unit 3 (shown as JOBTAPE but actually a pseudo device used for streaming) is configured, and attaching to unit 3 succeeds. Maybe this is a compilation bug? No one else has reported it, and both conditions that produce the message appear to be impossible. ---- After you've made a debug.log that captures the error, start a new simulator session (without debug) and see if you can get the error to occur after doing: sim> set ms3 disabled ...after bootup and before your first tape attach. ---- ---------------------------- HP32002E.01.00 MPE Operation ---------------------------- - Attempting to RESTORE U00U232A.USL.SYS from tu.tape causes a SF 206 and leaves the USL subdirectory with a file that cannot be PURGEd. ----------------------------------------------------------- CARTRIDGE DISC (HP 30129A) DIAGNOSTIC OFF-LINE (D419A.01.4) ----------------------------------------------------------- - NOTE: The diagnostic seems to malfunction if Section Register bits 13 or 15 are turned on!!! - The diagnostic fails step 66 (retry counter test) if REALTIME is set. If FASTTIME, the diagnostic passes. Setting the STIME (full sector time) greater than about 5300 causes the failure; the REALTIME value is 138, corresponding to 347.22 usec, but it is multiplied by the sector delta (47), giving an equivalent value of 6527. The reported failure is E66 00 TOTAL INTERRUPTS 09 SHOULD BE 10. However, a trace shows 10 interrupts occurring. The diagnostic does writes a bad sector at 1/1/42 and then reads it: Channel loaded IOCW 040000 (Control) from address 102757 Channel loaded IOAW 007600 Set File Mask from address 102760 Channel loaded IOCW 040000 (Control) from address 102761 Channel loaded IOAW 006000 Address Record from address 102762 Channel loaded IOCW 067776 (Write) from address 102763 Channel loaded IOAW 045765 from address 102764 Channel loaded IOCW 020000 (Interrupt) from address 102765 Channel loaded IOAW 177777 from address 102766 Channel loaded IOCW 040000 (Control) from address 102767 Channel loaded IOAW 002400 Read from address 102770 Channel loaded IOCW 077600 (Read) from address 102771 Channel loaded IOAW 046120 from address 102772 Channel loaded IOCW 004000 (Conditional Jump) from address 102773 Channel loaded IOAW 102761 from address 102774 Channel loaded IOCW 034000 (End with Interrupt) from address 102775 Channel loaded IOAW 177777 from address 102776 The diagnostic expects one original try, then eight retries, then a final interrupt when the retry counter expires, for a total of 10. >>> The diagnostic sits in a timed loop, and if it expires before the final interrupt, it prints the error message with the count as of that point! The loop consists of an outer loop of 300 executions of an inner loop of 150 executions of the MTBA P+0 instruction. This produces a time of ~48,000 instructions. At 2.5 usec/instruction, this represents 120.0 msec. BUT...each failed sector read is followed by a 47-sector rotation before the sector may be reread, which requires 16.32 msec each (6527 instructions). Eight retries therefore takes 130.55 msec, so the timed loop expires. >>> The problem is that MTBA P+0 takes longer than the average 2.5 usec. It takes a minimum of 23 microinstructions (4.0 usec) from LUT entry, not including the EA time. Changing 00.020124 from 000454 -> 000620 (300. -> 400.) works around the issue. Note that simply decrementing sim_interval twice for the MTBA instruction works for the disc diagnostic but fails Step 532 of the tape diagnostic. This is because the tape data transfer service event time counts down twice as fast, while the multiplexer channel data transfer polls occur at the usual one per instruction, leading to a data overrun. -------------------------------------------- HP 30115A 9-TRACK MAGNETIC TAPE (D433A.01.4) -------------------------------------------- - (MTB) calculated crc word = 141400 E274 STEP-0434 COMP. AND READ CRCC ARE DIFFER. E116 STEP-0434 EXPECT.- OBTAIN. CRCC 120200 032400 WRZ writes 16 bytes of data, then 6 bytes of zeros, then 2 bytes (240, 200), then 4 bytes of zeros, then 3 bytes (357, 354, 156), then 1 byte of zero. More importantly, the CRC is erroneous, and diag expects the RDC to return the same erroneous CRC (meaning that calculating it won't work!) - (MTB) calculated crc word = 140000 E274 STEP-0437 COMP. AND READ CRCC ARE DIFFER. E116 STEP-0437 EXPECT.- OBTAIN. CRCC 000310 032400 Ditto. - (MT11) P020 HEAD ADJUSTING: HP 9162-0027 (YES/NO)? NO P022 STEP1114 WRITE/READ TEST (%177777) (YES/NO)? YES P056 TYPE SELECTED DRIVE ? 0 P025 LOAD TAPE(RING) IN DRIVE 0 AND RESPOND 'CR' sim> attach ms0 scratch.tape sim> go Q042 TYPE 'YES' (OPERATOR STOPS RUN BY 'CR') Q043 'NO' (TAPE WILL RUN UNTILL END) NO E238 SAME STEP - TAPE ERROR - IN READ DS E238 SAME STEP - TAPE ERROR - IN READ DS ... 00.047603: 100552 >>> Problem is diag attempts to write a record the size of the full tape and the tape library aborts the write after 32768 words. ---------------------------- TERMINAL DATA PD427A 01.01 ---------------------------- - NOTE: Test section 6 seems to pass, but there are no "receive overrun" debug messages! Is this correct?