HP 3000 Simulator Debugging In Progress
                     =======================================


Listed below are problems in the HP 3000 simulator that are currently being
debugged.


------------------------
Extended Instruction Set
------------------------

Note this SSB from the CD-ROM:

 HP3000 MPE-V Software Status Bulletin

           KPR Number: 5000465195

        Product Name: MPE V/E             

      Product Number: 32033G          

One Line Description: 
On 70 and 6x, the CVBD instruction does not trap on Decimal Overflow.                                                         

        Update Date: 19910830

Submit Date: March 1989


*** PROBLEM TEXT ***
On the HP3000 series 6x and 70, decimal overflow is not trapped.

*** CAUSE TEXT ***
Problem is caused by the microcode routine that outputs the converted
numbers to memory.  This routine is shared amoung several decimal
instructions and does not work correctly for CVBD.

*** TEMPORARY SOLUTION TEXT ***
Assemble a 207X4 instead of a 206X4 or CVBD.  This instruction calls
up a firmware emulation routine that does work correctly.  This
routine does not reside in microcode.
.
A patch is availiable that permanetly causes CVBD and CVDB to trap
to unimplemented instruction, which in turn causes ININ to call
up the firmware simulation module FIRMSIM.  This module correctly
implements CVBD and CVDB.  Using this patch will cause the instruction
execution to take longer and the system will suffer a performance
penalty.  MAKE SURE THE CUSTOMER UNDERSTANDS THIS.!!!!

*** FIX TEXT ***
There is no permanet fix.


----------------------
CST Expansion Firmware
----------------------

Regarding tests, the changes appear to be OK by inspection, but I haven't 
written any tests because I don't know how to control the M bit for a 
segment, nor the U bit for the STT header (using old firmware), nor how to 
arrange to have a procedure start at PB + 0 (the target of an STT 0 call).  
The U-bit in the STT header seems to be set by the compiler always, but 
maybe I'm missing some command.

What I need is a program that tests the following inter-segment transfer 
conditions starting from user mode:

 1. XBR to a PM segment (should abort with a Privilege Violation trap with
    both old and new firmware).

 2. PCAL 0 with an external label specifying STT 0 (should abort with an
    STT Uncallable trap with new firmware).

 3. PCAL 0 with an external label specifying STT 0 with the U-bit set
    (should abort with an STT Uncallable trap with old firmware).

 4. PCAL 0 to a PM segment with an external label specifying STT 0 with the
    U-bit set (should abort with an STT Uncallable trap with old firmware).

 5. PCAL 0 with an external label specifying STT 0 with the U-bit clear
    (should succeed with old firmware).

 6. PCAL 0 to a PM segment with an external label specifying STT 0 with the
    U-bit clear (should succeed with old firmware).

 7. PCAL > 0 to an uncallable procedure (should abort with an STT
    Uncallable trap with both old and new firmware).


-------
MPE V/E
-------


---------------------------
GICDIAG 2.11 (S28S231C.SPL)
---------------------------


---------------------------
GICDIAG 1.26 (S28S231C.SPL)
---------------------------


---
DUS
---


--------
Colossus
--------

Running the COLOSSUS program disc test on a system with three CS/80 drives
fails.  The drives are located on GIC channel 11 address 0 (LDEV 1), channel 11
address 5 (LDEV 2), and channel 6 address 6 (LDEV 3).  The symptom is the "Brief
System Summary" is partially printed, and then the simulator hangs in an
infinite loop (i.e., is unresponsive to CTRL+E).

BUG: The clear_fifos routine in CPP clears the inbound FIFO by reading register
     2 and, if inbound data is present, reading register 0 to remove it.  This
     action is repeated in a loop until the data present status denies.  The
     problem is that if a CS/80 Reporting Phase returns qstat 0, it appears on
     the bus as EOI + 00H, which is loaded into the inbound FIFO as 140000 (tag
     + data).  Unfortunately, this is the same encoding as an uncounted transfer
     enable, which the fifo_unload routine will leave in the FIFO to satisfy the
     diagnostic.  This causes clear_fifos to loop forever.

FIX: Perform the uncounted transfer test only for the outbound FIFO.


The next problem is that a GIC timeout occurs.  The console response is:

  DISC LDEV #1   NOT RESPONDING TO I/O
  DISC LDEV #2   NOT RESPONDING TO I/O
  MPE Table SBUF has overflowed!!!

...and then MPE hangs.  The sequence is:

 - bus 0 does SIOP
 - bus 0 does Locate and Read
 - during xfer, bus 5 does HIOP; this is deferred
 - during xfer, bus 5 does several more HIOPs; these are ignored, as status is
    "stopping"
 - when DMA ends, CPP does poll, which gets DMA completion (but DMA is still
     busy until OBSI)
 - bus 0 does WAIT, which does poll; now DMA is idle, which sets New Status
     CSRQ, but PHI IRQ gets precedence
 - bus 0 does DSJ
 - bus 0 gets talk 0, secondary 10; this should deny NSEN, i.e., clear
     Status_Interrupt, because poll stopped
 - CPP sets reg 3 mask to IRQ | data
 - CPP reschedules for deferred HIOP bus 5 (new status recognized)
     ==> HIOP should not be recognized while GIC is waiting for DSJ data!!
 - CPP sets reg 3 mask to IRQ | status change | poll response
     --> so now incoming qstat byte won't cause CSRQ!
 - bus 0 sends 0 byte + EOI
 - bus 5 does SIOP
 - CPP sets bus 5 to "starting" but GIC SIOP does not assert CSRQ
     --> now, neither channel program is running
 - bus 0 DSJ times out

BUG: NSEN is not calculated correctly.  Should be:

     [R1] The PHI is the controller-in-charge
     [RB] and CSRQ is not disabled
     [RB] and DMA is inactive
     and a parallel poll is in progress (ATN and EOI are asserted)
     and the PHI is not interrupting for a PPR when an OBSI is received
     and the PHI is not requesting a DMA cycle
     
     or:

     [R1] The PHI is not the controller-in-charge
     [RB] and CSRQ is not disabled
     [RB] and DMA is inactive.

     The following state changes affect the condition of the NSEN signal:

     - CIC changes for reset, TCT (acceptor), R1/R6 (IFC)/R7 (offline) write
     - CSRQ disable changes for RB/RF write
     - DMA busy changes for reset, RB (start) write, DMA state 4
     - poll changes for acceptor, DCL/SPD/UNL/dataout (acceptor)

    The following state changes affect Status_Interrupt:

     - RB write denies
     - R1/R6/R7/RF writes test and set
     - DMA state 4 tests and sets
     - acceptor no-poll denies; poll tests and sets

FIX: Correct NSEN so that an HIOP is not recognized in the middle of a transfer.

BUG: Once the timeout bit in reg F sets, it is never cleared!

FIX: Clear the timeout bit when reg B is written to start DMA.

With these corrected, COLOSSUS runs properly for a full set on one disc.


Specifying all tests for just the two discs on channel 11 results in:

  I/O OPCODE = %000001
  DATA WORD1 = %000130
  DATA WORD2 = %001020
  DATA WORD3 = %022150
  I/O STATUS = %000000
  MAILBOX #5 = %030370
  MAILBOX #6 = %030370
  MAILBOX #7 = %030370

  **** SYSTEM FAILURE #201    (HARDRES failure, non-responding device on SIOP)
  STATUS %100031
  DELTA-P %000356

...with all showing "in test 3".  This failure also occurs if only the READ TEST
is specified for both LDEV 1 and 2.

This problem appears to be that issuing SIO to IMBA with IMB IRQ and CSRQ pending
doesn't seem to service channel 1 (IMBA) before idling CPP.

BUG: CPP IRQ service does WIOC to IMBA to assert INTREQ on the IOP bus.  But
     IMBA does not assert CSRQ in return signals, so imb_cycle removes the CSRQ
     for channel 1.  This causes pending SIO to be ignored, and next attempted
     SIO gets "not ready" because channel_request is still set, which causes the
     SF.
     
FIX: Return CSRQ from "imba_imb_interface" if the "channel_request" is set.


Specifying all tests for all three discs results in

  **** SYSTEM FAILURE #642    (ININ stack overflow while I/O frozen and disabled)
  STATUS %102001
  DELTA-P %002047

...with all showing "in test 4" or "in test 3".  Does not fail when specifying
all tests with any two discs.

BUG: HIOP is returning CCG when program is in the wait state.  Should be
     returning CCE.  COLOSSUS appears to send HIOP repeatedly (as a result?).
     The HIOP logic is wrong.  See PDF p.249 in ucode manual and p.149 in HP 300
     manual.  Currently, the routine Is masking off the wait bit and then
     testing the same wait bit to determine CCE vs. CCG!

FIX: Correct HIOP CCG return.

With these fixes, the COLOSSUS disc test succeeds.


----------------------
CSRQ Set vs. Scheduled
----------------------

Attempting to use a gate to enter CPP service on each pass through the CPU
instruction loop vs. scheduling for 2 (or 1) event ticks fails.  First fail
is during Identify, where the CPP sets up a PHI interrupt on data reception.
This asserts CSRQ when the first byte arrives.  The CPP code then reads both
bytes because it assumes that the device is fast enough.  But we get only the
first ID byte; the second attempt gets DNV.

Next, Write Loopback fails.  After the last byte is sent, DMA asserts CSRQ,
and the CPP is entered.  This occurs before the DC device can deny NRFD and
complete the loopback.  The problem seems to be that the DC service is entered
for the penultimate byte, denies NRFD, so the GIC responder calls transfer_data
to source the final byte, and the DC responder accepts it, asserts NRFD, and
schedules NRFD denial and loopback completion.  transfer_data sees that the
last byte is out and so asserts CSRQ.  After unwinding from the DC service call,
cpp_request is set, so CPP is immediately entered.  The loopback completion
event is still scheduled, but CPP assumes that the transfer is complete and
sends an Unlisten, which aborts the loopback because the completion event has
not occurred.

The problem is the execution path:

  DC final byte event -> CPP CSRQ entry -> DC command completion

If CSRQ assertion is scheduled, even for one tick, then the completion event is
serviced before CPP entry, and everything works.


-----------------------------
CS80DIAG Step 69 (Burst Mode)
-----------------------------

Step 69 does a Set Burst (Last) with a burst size of 1 block.  Then it does:

  00.140551  Write secondary 05 count 14 burst 1 address 00140446 chain 0 | record mode | left byte
  00.140556  Wait | response 0400
  00.140560  Write secondary 16 count 2048 burst 256 address 00132035 chain 0 | burst mode | left byte | no EOI
  00.140565  Relative Jump 140571
  00.140567  Relative Jump 140556
  00.140571  Wait | response 0000
  00.140573  Device Specified Jump 140620, 140577
  00.140577  Write secondary 05 count 3 burst 1 address 00131204 chain 0 | record mode | left byte
  00.140604  Wait | response 0601
  00.140606  Read secondary 16 count 20 burst 1 address 00124621 chain 0 termination 140613 | record mode | left byte
  00.140613  Wait | response 2005
  00.140615  Device Specified Jump 140620
  00.140620  Interrupt/Halt 0001 | CPVA 1


---------------------
GIC Diagnostic Status
---------------------

Test sections 1-17 and 19-25 pass.

Section 18 tests memory parity error detection, which creates bad parity in
memory via commands to the Fault Logging Interface (FLI), which we do not
simulate.


------------------------------------
GIC Diagnostic Step 105 (GIC-to-GIC)
------------------------------------

Two problems are present so far:

 1. DMA data reception state 20 does FIFO unload of first byte, which denies NRFD
    by calling hpib_control.  That calls gic_hpib_respond, which sees the denial
    and calls transfer_data to resume the transfer.  But that reenters state 20
    recursively, which does another FIFO unload.  This time, NRFD is already
    denied, so it skips the hpib_control call, and continues on to states 22, 10,
    11, which unloads another FIFO byte (the third) before continuing into state
    15, which writes the last two bytes as the first memory word.  The first byte
    is lost.  The cycle continues until all bytes are unloaded from the FIFO.
    After all bytes are received, DMA exits in state 5 to wait for CSRQ.  But then
    the recursive call unwinds, and DMA resumes in state 24 (22 -> 24), going
    through 10, 18, 19 (memory read), 23, 21 (memory write of "final" byte), and
    back to 5.  Received data is screwed up as a result.

 2. Because DMA executes as long as it can, the two GICs transfer the entire data
    block within the WIOC to register B that starts DMA.  They also execute both
    channel programs through in response to the SIOP that starts the second program.
    The diagnostic expects to be able to check the CPVA while it is running.
    Because the channel program completes before the next instruction executes, the
    diagnotic reports "Error in step 105, UUT CPVA word 2 is !8001 expected !0000".

Looks like DMA has to be paced with an event timer, so that the program can execute
concurrently with DMA.

Controller program (bus address 30, channel 12, started as device 3) is:

  00.161000:      Write secondary 05 count 255 burst 0 address 00170000 chain 0 |record mode | left byte | no update
  00.161005:      Relative Jump 161011
  00.161007:      Interrupt/Run 0001 | CPVA 2
  00.161011:      Interrupt/Run 0001 | CPVA 1
  00.161013:      Wait | response 0000		--> this is waiting for PPOLL
  00.161015:      Read secondary 12 count 255 burst 0 address 00170000 chain 0 termination 161026 | record mode | left byte | no update
  00.161022:      Interrupt/Halt 0002 | CPVA 2
  00.161024:      Interrupt/Halt 0002 | CPVA 2
  00.161026:      Interrupt/Halt 0002 | CPVA 3
  00.161030:      (invalid)177777

Device program (bus address 3, channel 11, started as device 5) is:

  00.160000:      Write Register F value 000005
  00.160002:      Write Register 3 value 100004
  00.160004:      Wait | response 0000
  00.160006:      Write Register 2 value 177777
  00.160010:      Read Register 0 | response 040005
  00.160012:      Execute DMA read count 255 burst 0 address 00170400 termination 160023 | record mode | left byte | no update
  00.160017:      Relative Jump 160025
  00.160021:      Interrupt/Run 0001 | CPVA 2
  00.160023:      Interrupt/Run 0001 | CPVA 2
  00.160025:      Interrupt/Run 0001 | CPVA 1
  00.160027:      Write Register 6 value 000010	--> this generates a PPOLL response
  00.160031:      Write Register 3 value 100004
  00.160033:      Wait | response 0000
  00.160035:      Write Register 2 value 177777
  00.160037:      Read Register 0 | response 000000
  00.160041:      Write Register 6 value 000001
  00.160043:      Write Register 1 value 000001
  00.160045:      Execute DMA write count 511 burst 255 address 00171000 termination 160052 | burst mode | left byte | no EOI | no update
  00.160052:      Interrupt/Halt 0002 | CPVA 2
  00.160054:      Interrupt/Halt 0002 | CPVA 3
  00.160056:      (invalid)177777

- fail occurs because poll is already active when WIOC register 6 "poll response"
  should assert PPR 3, but device calls hpib_control to "redo poll", but poll
  is calculated in hpib_control (device call) and not in cross-call hpib_respond
  (controller), so it's never seen.

  it's seen in the reverse case because the poll becomes active after "poll response"
  is set, so controller sees it when it conducts the poll.


----------------------------
GIC Diagnostic Step 38 (DMA)
----------------------------

Test 8 step 38 says that it, "Verifies that the DMA EN bit, register 8, is not
set when this section of the diagnostic is begun."  However, the diagnostic
issues a code %12 to the IMBA, which does not appear to be defined.  If the CPP
returns an error code, the diagnostic faults.  However, if it returns success, a
succession of "HP-IB mailbox timeout" messages are printed, followed by:

  Error in step 38
  INIT did not clear DMA ENABLE nor set DMY4-0=0

The timeouts occur because the data word at location 00.053253 is set to 1
instead of 0.  This in turn causes the BRE P+4 at 00.056221 to fail, allowing
the BR P+17 following it to execute, which skips around the SIO that executes
the IMBA instruction.

The target of the BR P+17 is a PCAL to the procedure at 00.056004 that tests
location %774 for CPP instruction completion:

  00.056004:      ADDS 4
  00.056005:      LRA Q+3
  00.056006:      LRA P-4
  00.056007:      LDI 2
  00.056010:      MOVE PB,3
  00.056011:      LDX Q+4   (= %774)
  00.056012:      PLDA      (loads the completion code from %774)
  00.056013:      DUP,NOP
  00.056014:      STOR Q+2
  00.056015:      LSR #15
  00.056016:      BRE P+4   (tests bit 0 for completion)
  00.056017:      BR P+30   (branch if complete)
  00.056020:      NOP,DDEL
  00.056021:      NOP,STBX
  00.056022:      LOAD P+7
  00.056023:      STOR Q+1
  00.056024:      INCM Q+1
  00.056025:      LOAD Q+1
  00.056026:      CMPI 0
  00.056027:      BGE P+16  (branch if counter has expired)
  00.056030:      BR P+3
  00.056031:      142550    (timeout counter initial value)
  00.056032:      000013
  00.056033:      LDX Q+4   (= %774)
  00.056034:      PLDA      (reload the completion code to see if it changed)
  00.056035:      DUP,NOP
  00.056036:      STOR Q+2
  00.056037:      LSR #15
  00.056040:      BRE P+4   (test bit 0)
  00.056041:      BR P+6    (now is complete)
  00.056042:      NOP,DDEL
  00.056043:      NOP,INCX
  00.056044:      BR P-20   (continue to wait)
  00.056045:      LDI 312   (timeout error routine is called here)
  00.056046:      PCAL 77
  00.056047:      LOAD Q+2  (reload the completion code)
  00.056050:      EXF #1:#15
  00.056051:      LDI 0
  00.056052:      CMP,NOP
  00.056053:      BNE P+3   (verify that the other 15 bits are zero)
  00.056054:      EXIT 0
  00.056055:      NOP,DELB
  00.056056:      LOAD Q+2  (error; reload the completion code)
  00.056057:      LSR #14
  00.056060:      ANDI 1
  00.056061:      BRE P+3   (test bit 1)
  00.056062:      LDI 313   (bit 1 -> "IMB parity error detected by HP-IB interface")
  00.056063:      PCAL 77
  00.056064:      LOAD Q+2  (reload)
  00.056065:      LSR #13
  00.056066:      ANDI 1
  00.056067:      BRE P+3   (test bit 2)
  00.056070:      LDI 314   (bit 2 -> "Invalid timer interrupt detected by HP-IB interface")
  00.056071:      PCAL 77
  00.056072:      LOAD Q+2  (reload)
  00.056073:      LSR #12
  00.056074:      ANDI 1
  00.056075:      BRE P+3   (test bit 3)
  00.056076:      LDI 315   (bit 3 -> "Non-responding module timeout detected by HP-IB interface")
  00.056077:      PCAL 77
  00.056100:      LOAD Q+2  (reload)
  00.056101:      LSR #11
  00.056102:      ANDI 1
  00.056103:      BRE P+3   (test bit 4)
  00.056104:      LDI 316   (bit 4 -> "Invalid mailbox opcode detected by HP-IB interface")
  00.056105:      PCAL 77
  00.056106:      LOAD Q+2  (reload)
  00.056107:      LSR #10
  00.056110:      ANDI 1
  00.056111:      BRE P+3   (test bit 5)
  00.056112:      LDI 317   (bit 5 -> "SIO disabled flag detected by HP-IB interface during SIOP,RIOC or WIOC")
  00.056113:      PCAL 77
  00.056114:      LOAD Q+2  (reload)
  00.056115:      LSR #9
  00.056116:      ANDI 1
  00.056117:      BRE P+3   (test bit 6)
  00.056120:      LDI 320   (bit 6 -> "SIOP failed because previous channel program is not halted")
  00.056121:      PCAL 77
  00.056122:      LDX Q+3   (= %770)
  00.056123:      PLDA      (load the opcode from location %770)
  00.056124:      LDI 1
  00.056125:      LCMP,NOP
  00.056126:      BE P+7    (opcode 1 = HIOP skips test for bit 7)
  00.056127:      LOAD Q+2  (reload)
  00.056130:      LSR #8
  00.056131:      ANDI 1
  00.056132:      BRE P+3   (test bit 7)
  00.056133:      LDI 321   (bit 7 -> "SIOP or HIOP failure - halt pending but not in WAIT")
  00.056134:      PCAL 77
  00.056135:      LOAD Q+2  (reload)
  00.056136:      LSR #7
  00.056137:      ANDI 1
  00.056140:      BRE P+3   (test bit 8)
  00.056141:      LDI 322   (bit 8 -> "INIT failed - unable to bring system controller on-line")
  00.056142:      PCAL 77
  00.056143:      LOAD Q+2  (reload)
  00.056144:      LSR #6
  00.056145:      ANDI 1
  00.056146:      BRE P+3   (test bit 9)
  00.056147:      LDI 323   (bit 9 -> "INIT failed - GIC not system controller")
  00.056150:      PCAL 77
  00.056151:      LOAD Q+2  (reload)
  00.056152:      LSR #5
  00.056153:      ANDI 1
  00.056154:      BRE P+3   (test bit 10)
  00.056155:      LDI 324   (bit 10 -> "Data not valid detected by HP-IB interface")
  00.056156:      PCAL 77
  00.056157:      LOAD Q+2  (reload)
  00.056160:      LSR #4
  00.056161:      ANDI 1
  00.056162:      BRE P+3   (test bit 11)
  00.056163:      LDI 325   (bit 11 -> "IMBA status bit 11 set")
  00.056164:      PCAL 77
  00.056165:      LOAD Q+2  (reload)
  00.056166:      LSR #3
  00.056167:      ANDI 1
  00.056170:      BRE P+3   (test bit 12)
  00.056171:      LDI 326   (bit 12 -> "IMBA status bit 12 set")
  00.056172:      PCAL 77
  00.056173:      LOAD Q+2  (reload)
  00.056174:      LSR #2
  00.056175:      ANDI 1
  00.056176:      BRE P+3   (test bit 13)
  00.056177:      LDI 327   (bit 13 -> "IMBA status bit 13 set")
  00.056200:      PCAL 77
  00.056201:      LOAD Q+2  (reload)
  00.056202:      LSR #1
  00.056203:      ANDI 1
  00.056204:      BRE P+3   (test bit 14)
  00.056205:      LDI 330   (bit 14 -> "IMBA status bit 14 set")
  00.056206:      PCAL 77
  00.056207:      EXIT 0

[to test error printer, break at 00.056013, set RA to the completion code, continue]

This routine loops until bit 0 is set, indicating completion, or until a counter
expires.  Because the SIO was never done, the bit never sets, which leads to
counter expiration and the timeout error.

(P+20 would be the procedure exit, and it seems to be an error that the
completion checker is called when the SIO is never done.  But this isn't clear
from reading the code at this point.  Changing 00.056222 from 140017 to 140020
produces a BR P+20.)

Step 38 appears to begin at 00.024172, which does:

  00.024172:      LDI 46
  00.024173:      STOR DB+24
  00.024174:      PCAL 67

...which stores "38" in location 00.101760.  It then calls a procedure that
begins at location 00.075476.  It does:

  00.075476:      ADDS 1
  00.075477:      LRA Q+1
  00.075500:      LRA P-3
  00.075501:      LDI 1
  00.075502:      MOVE PB,3
  00.075503:      ZERO,NOP
  00.075504:      LDI 145
  00.075505:      PCAL 41   (--> 00.057522)
  00.075506:      BRE P+3   (this tests 00.053253, which is 0)
  00.075507:      EXIT 0
  00.075510:      NOP,DELB
  00.075511:      LDI 12    \
  00.075512:      LDX Q+1    | (this stores %12 in %770)
  00.075513:      PSTA      /
  00.075514:      PCAL 36   (--> 00.056211, which eventually does SIO 1)
  00.075515:      LDI 1     <<== SETTING THIS TO 1 INSTEAD OF 0 EVENTUALLY CAUSES THE FAILURE!
  00.075516:      LDI 145
  00.075517:      PCAL 42   (--> 00.057552)
  00.075520:      EXIT 0

The routine at 00.057552 does:

  00.057552:      ADDS 1
  00.057553:      LOAD P+6
  00.057554:      STOR Q+1
  00.057555:      LOAD Q-4  (= %145 from push at 00.075516)
  00.057556:      CMPI 1
  00.057557:      BNE P+10
  00.057560:      BR P+3
  00.057561:      001214
  00.057562:      000005
  00.057563:      LOAD Q-5
  00.057564:      LDX Q-4
  00.057565:      PSTA
  00.057566:      BR P+13
  00.057567:      BR P+2
  00.057570:      NOP,DADD
  00.057571:      ZERO,NOP
  00.057572:      PCAL 62
  00.057573:      LOAD Q+1  (= %1214)
  00.057574:      LOAD Q-4  (= %145 from push at 00.075516)
  00.057575:      LADD,NOP
  00.057576:      LADD,STAX
  00.057577:      LOAD Q-5  (= %1 from push at 00.075515)
  00.057600:      PSTA      (stores 1 in 00.053253)         <<== THIS IS THE PROBLEM!
  00.057601:      EXIT 2

This returns to:

  00.024175:      PCAL 25   (--> 00.036300)

...which appears to eventually set up an INIT instruction:

  00.000770  000003     (--> WIOC)
  00.000771  020130     (INIT, channel 11)
  00.000772  020130     (WIOC command to write)
  00.000774  000000

But because 00.053253 is set to 1 instead of 0, the SIO that starts the CPP is
not executed, so this (and all subsequent CPP commands) time out.

BOTTOM LINES:

 1. Unknown opcode %12 is sent to the CPP, for unknown reasons.  This should
    result in bit 4 set in the status return ("Invalid mailbox opcode detected
    by HP-IB interface"), but apparently it does not and is accepted by the
    hardware, with unknown effect.

 2. Because the code at 00.075515 eventually sets data word 00.053253 to 1
    instead of 0 for some unknown reason, the next (and all future) SIO
    instructions are bypassed, so no additional commands are sent to the CPP.
    This results in continual HP-IB timeouts until the diagnostic gives up.

A workaround for #1 is to ignore and return successful completion for opcode
%12.  A workaround for #2 is to change the code at 00.075515 from 021001 (LDI 1)
to 000600 (ZERO,NOP).

These workarounds appear to allow the diagnotic to continue properly.  Why they
are needed is a mystery.


----------------------------
GIC Diagnostic Step 21 (PHI)
----------------------------

Closest thing we have is S28S231A.SPL, which is the PIC diagnostic for the
Series 37.  The PHI test comment says, "This algorithm is from GICDIAG 1.26,
essentially unchanged."

GIC diagnostic step 21 corresponds to PIC diagnostic step 17, which beings at
source line 7458 -- search for "BEGIN'STEP(17)".

Errors during the test are accumulated as bits in the PHIERR (7) array.
Unfortunately, this is not dumped when an error occurs.

However, looks like PHIERR (0:6) are locations 00.103217-00.103225.
Examining the bits in these words should indicate where they are being set,
which in turn should indicate the causes of the errors.

The instructions that alter PHIERR bits are:

  >>CPU   reg: 00.103446  000003    A 000001, B 131427, C 001266, X 000003, M i t r o C CCG
  >>CPU instr: 00.021413  027341  DPF #14:#1
  >>CPU   reg: 00.103446  000002    A 131427, B 001266, X 000003, M i t r o C CCL
  >>CPU instr: 00.021414  053701  STOR S-1,I
  >>CPU  data: 00.103447  001266    stack read
  >>CPU  data: 00.103222  131427    data write

...where the DPF can be between 0 and 15.  But note that the memory addresses
containing the instructions vary.

Note that the log file is about 570 MB, and WnBrowse must be set for a large
text file setting to ensure that the entire file is displayed properly.


Register 2 Status Change Bit Tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One of the GICDIAG tests within Step 21 checks the response of the Status Change
bit in PHI register 2.  The description of the bit in the 264x HP-IB interface
manual says:

  This bit becomes set whenever there is a change in the value of the REMOTE bit
  in Register [1] while the PHI is a non-controller, or whenever there is a
  change in the value of the HP-IB CONTROLLER bit in Register [1].  It is
  cleared when the host processor writes a "1" into its bit position.

There are two tests for the presence of this bit.  The first sets bits 5 and 6
in PHIERR (2) if the Interrupt Pending and Status Change bits are not set.  The
second sets bit 0 of PHIERR (5) if the Status Change bit is not set.  The second
test is straightforward:

  <<SYSTEM CONTROLLER REGAINS CONTROL OF THE HP-IB BY ASSERTING IFC>>

  DO'WIOC(!0010,6); <<IFC=1>>
  DODELAY(STARTIME,2); << WAIT FOR IFC>>
  DO'WIOC(!0000,6); <<RESET IFC>>
  DO'RIOC(2); <<Read and test for status change>>
  PHIERR(5).(0:1) := NOT TOS'.(8:1);

An earlier INIT had set system controller status and cleared controller status.
So the IFC, which sets controller status, is expected to set the Status Change
bit also.

However, the first test is peculiar:

  <<THIS TESTS REMOTE-LOCAL>>

  DO'WIOC(!0020,6); <<ASSERT  REN>>
  DO'WIOC(!8080,3); <<SET STATUS CHANGE MASK >>
  DO'WIOC(!403E,0); <<PLACE !403E IN FIFO>>
  DO'RIOC(1); <<NOW READ BACK>>
  <<PHIERR(2).(4:1):=IF NOT TOS'.(10:1) THEN 1 ELSE 0;>>

  DO'RIOC(2); <<INTERRUPT PENDING>>
  <<PHIERR(2).(6:1):=IF NOT TOS'.(8:1) THEN 1 ELSE 0;  10-4-83>>

The first part of the test asserts REN and then sends a Listen 30 command.  This
is the proper bus sequence for placing the listener into remote mode.  The read
of Register 1 and the test of bit 10 (REMOTE) would verify that the command
succeeded.  However, the check is commented out in PICDIAG, and a code trace of
GICDIAG execution shows that the bit is never tested there either.

The test then proceeds by reading Register 2.  The code trace shows that it
checks bit 0 (Interrupt Pending) and bit 8 (Status Change) and sets PHIERR (2)
bits 5 and 6, respectively, if the Register 2 bits are set.  This differs from
the PICDIAG code, which has commented out the Status Change bit test.  The
apparent intent of the GICDIAG test is to verify that Status Change does not
occur when the REMOTE bit changes while the PHI is the controller.

The problem is that the Status Change bit has been set as a result of an earlier
INIT and IFC.  The INIT is done after testing registers 3, 4, 5, and 7.  It
clears Register 2, sets system controller status, and clears controller status.
Then an SRQ test is performed, followed by an IFC and REN test.  The IFC sets
controller status, and that sets the Status Change bit.  A FIFO test follows,
and then another IFC.

Lastly, a test is performed on the DEVICE CLEAR bit.  A Universal Device Clear
command is sent, and the bit is checked.  Because the PHI is the controller, the
DC bit is verified as clear.  Then the "REMOTE-LOCAL" test listed above is
performed.

From the manual description, the Status Change bit must be set here, but the
first test fails if it is.  The second test specifically checks that an IFC that
changes from non-controller (i.e., after an INIT) to controller sets the bit.
Yet the first test demands that the bit be clear, even though an INIT and IFC
have occurred with no intervening reset of the status bit before the test.

An exhaustive search has turned up no additional PHI information.  It's
described in the early 12009A HP-IB Interface for the 1000 A-Series machines,
bit the description is verbatim from the 264x manual.  The fact that the status
bit test has been commented out in the PICDIAG suggests that the results
confused the programmer.

In thinking about when the host processor would want a PHI Status Change
interrupt, there appear to be five cases (excluding INIT and PON, which clear
Register 2) in which a change of controller state occurs:

  1. IFC is asserted by the PHI as system controller.

  2. A Take Control command is sent by the PHI to another controller.

  3. The PHI changes its state from offline to online.

  4. IFC is asserted by another system controller.

  5. A Take Control command is received from another controller.

Of these, the first three are actions initiated by the host processor, so
notification of the action by an interrupt would be redundant.  The latter two
are actions initiated by another controller on the bus, and so an interrupt
would be useful, as otherwise the host processor would be unaware of the change.

The PHI might set the change bit only if the change arrived from an outside
source.  However, there is a later test that verifies that the change bit is set
when the PHI sends a Take Control command to another (non-existent) bus address.
And, as noted originally, the second test above verifies the change bit is set
when the PHI asserts IFC.  So that contention is invalid.

That leaves only two possibilities.  Either it's a bug in GICDIAG 1.26, or the
preceding Universal Device Clear resets the change bit.  The former would mean
that the diagnostic would fail on a good GIC.  It's an outside possibility,
though, as the version might be specific to the Starfish and so would have very
low visibility.  Still, it seems unlikely that it would have been shipped with
MPE V/R without any testing.

DCL seems not an unreasonable possibility, except that the PHI description of
the DCL command says, "Does not clear the current controller."  It's not
apparent whether that means "does not clear controller status" or "does not
clear the status of the controller."  If it means the former, and DCL does clear
some of the internal status, it's a question of what status is cleared (other
than the change bit, which must be cleared for the diagnostic to succeed).


-------------------------------
GIC Diagnostic Response to INIT
-------------------------------

The GIC diagnostic says:

  Set 'SYS CTRL' switch on GIC under test to 'OFF' (out)
  Respond 'GO'
  >GO

  Error in step 05
  Expected CCG after INIT
  >

Prior to the first GO, the diagnostic does a WIOC to send an INIT, then does a
WIOC to register 7 to set the PHI online, then does a RIOC of register 1 to
check that the system controller status bit is set.

After the GO it does a RIOC of register 1 to check that the system controller
status bit is no longer set.

Then it executes an INIT instruction, which sends an INIT IMB command.  This is
decoded on the GIC to assert -SRST to the PHI.  According to the PHI
documentation, this clears all registers except register 1 (PHI 3), the status
register.  So it should clear register 7 (PHI 5), which should take the PHI
offline.

The PIC diag has a comment that says:

  << The SYSTCTRL bit is always true if the PHI/ABI is not ONLINE. >>
  << Thus to check the SYSTCTRL switch setting, the PHI/ABI must   >>
  << be placed ONLINE.                                             >>

So after INIT, the PHI should report itself as the system controller.

The INIT instruction does:

  IMB (INIT)

  if reg14 (15) = 1 then         -- not GIC (ucode says bit 3 ?!?)
    CCG
  elsif reg1 (12) = 0 then       -- not system controller
    CCG
  else
    reg6 := %000010              -- set parallel poll response
    reg7 := %100200              -- set PHI online (bit 0 not used, says PHI)
    reg6 := %000060              -- assert REN and IFC
    delay 100 usec
    reg6 := %000040              -- deny IFC
    reg2 := %177777              -- clear any interrupt conditions
    if reg1 (11) = 0 then        -- PHI is not the controller
      reg7 := 0                  -- set PHI offline
      CCG
    else                         -- PHI is the controller (REN asserted)
      CCE
    end if
  end if

So the diag is expecting CCG, but INIT causes it to be the system controller.

--> Is it the "not controller" that causes it to do CCG???

If IMB INIT set system controller but not controller, then the above test
passes.  But then I get:

  Set 'SYS CTRL' switch on GIC under test to 'ON'  ( in)
  Respond 'GO'
  >
  scp> set gic sys
  GO

  Error in step 05
  Expected CCE after INIT
  >

...because the INIT after setting system controller on still fails with CCG when
the controller bit is off.


RESOLUTION
~~~~~~~~~~

System controller status is determined by the state of the SYS CTRL switch when
the PHI is online.  When it is offline, the PHI is always the system controller.
Also, controller status is cleared on initialization and when the PHI goes from
offline to online.  It is set when IFC is asserted and the PHI is the system
controller.

The INIT insutrction asserts the IMB INIT signal to pulse the PHI -SRST line,
which sets the PHI offline, clears controller status, and sets system controller
status.  The test of bit 12 succeeds.  Then the PHI is set online.  This changes
bit 12 to reflect the condition of the SCTRL line and so the SYS CTRL switch.
If the switch is OFF, then the PHI is not the system controller, and so it
cannot control REN and IFC, so the inhibited assertion does not set controller
status.  The bit 11 test fails, and the INIT instruction returns CCG.

When the SYS CTRL switch is on, the same sequence occurs up to IFC assertion.
This time, however, system controller status is set, so IFC and REN are
asserted, and the PHI becomes the controller.  The bit 11 test now succeeds, and
INIT returns CCE.


-------------------------------------------
Detach following attach halts with an error
-------------------------------------------

  HP3000 / MPE V  E.01.00 (BASE E.01.00).  FRI, FEB  1, 1991,  5:46 PM
  :
  Simulation stopped, P: 071144 (PAUS 0)
  mpe> go
  sim> att ms0 a.tape
  sim> det ms0
  sim> c

  Unit not attached, P: 163771 (STOR Q-12)
  sim> c

  17:46/10/LDEV 7 I/O ERROR IGNORED DURING AVR.  I/O STATUS % 73
  17:46/10/Vol (unlabelled) mounted on LDEV# 7

  :

MTSE_UNATT status results in a data error reported to the caller (%73 is
error code 101 = 5 = "tape error").  Is this all OK?


-----------------------------------------------------
Interrupt received for non-configured device on DRT 6
-----------------------------------------------------

4:05 PM 1/28/19 reported by Robert Mills.

  Sometimes I get this message when I attach a tape:

  hh:mm/3/Interrupt received for non-configured device on DRT 6.� Check I/O configuration.

  For example:

  :<Ctrl+E>
  Simulation stopped, P: 072770 (PAUS 0)
  sim> do mount 7 r TapeLibrary/CSL_Release_F0.tape
  /HP3000_III/mount-42> attach -r ms0 TapeLibrary/CSL_Release_F0.tape
  MS: unit is read only
  MS0��� attached to TapeLibrary/CSL_Release_F0.tape, read only, 7970E,
  unlimited capacity
  ��� online, SIMH format

  15:02/3/Interrupt received for non-configured device on DRT 6.� Check I/O configuration.
  :

  There is no problem with accessing the TAPE I have just mounted or the
  other 2 TAPES or JOBTAPE (LDEV 10).

  I might attach several tapes in a session and it doesn't happen. Then on
  another session the first time i attach a TAPE it will.

  Thinking it might have something to do with a specific TAPE I attached
  the same TAPE 20 times. Nothing.

  I then shutdown the simulator, rebooted (cool start), and started
  attaching the same TAPE. The second attach generated the message.

  Did it all again but this time the message appeared on the seventh attach.

  Next time the fourth, then sixth, then first.

  The last time I attached the same TAPE 40 times. Nothing.

  As you see there is no obvious pattern.

  The config entries for DRT 6 show no problems.

    LIST I/O DEVICES? Y
    LOG DRT U� C T SUB������������� REC�� OUTPUT�� MODE�� DRIVER�� DEVICE
    DEV� #� N� H Y TYPE� TERMINAL�� WIDTH� DEV������������ NAME��� CLASSES
    �#����� I� A P����� TYPE SPEED�
    ������� T� N E
    7�� 6�� 0� 0 24 0��������������� 128��� 0������������ IOTAPE0� TAPE����
    8�� 6�� 1� 0 24 0��������������� 128��� 0������������ IOTAPE0� TAPE����
    9�� 6�� 2� 0 24 0��������������� 128��� 0������������ IOTAPE0� TAPE����
    10� 6�� 3� 0 24 0��������������� 128� LP������ JA���� IOTAPE0� JOBTAPE�

----------

The message comes from procedure EXTGHOST in module 10 (ININ), which does:

  PROCEDURE EXTGHOST;                                                     01068000
  OPTION PRIVILEGED,UNCALLABLE,INTERRUPT;                        <<03665>>01070000
  BEGIN                                                                   01072000
  INTEGER DRTN = Q+3;    << LOC OF DRT NUMBER ON INTERRUPT >>    <<03665>>01074000
                                                                 <<03665>>01076000
  EQUATE UNKNOWN'INT'MSG = 410, << MSG CATALOG MSG # >>          <<03665>>01078000
         OPCONSOLE       = 0;   << SYSTEM CONS CODE FOR GENMSG >><<03665>>01080000
                                                                 <<03665>>01082000
     DISABLE; << ISSUE A CLEAR INTERFACE >>                               01084000
     IOMESSAGE(1,UNKNOWN'INT'MSG,%10000,DRTN,,,,,OPCONSOLE);     <<03665>>01086000
     TOS := %100000;                                                      01088000
     ASMB(CIO 1);                                                         01090000
     IF >= THEN                                                  <<03665>>01092000
        BEGIN     << MASTER RESET WORKED - RESET INTERRUPTS >>   <<03665>>01094000
        TOS := %040000;                                          <<03665>>01096000
        ASMB(CIO 1);                                             <<03665>>01098000
        END;                                                     <<03665>>01100000
                                                                 <<03665>>01102000
  END;                                                                    01104000

Procedure EXTGHOST is called from the outer block of ININ and corresponds to
STT #13 (octal), which is marked as "unused".  Interestingly, all STT handlers
EXCEPT this one (and STT #10, which calls "CALLHELP") call procedure GHOST,
which does a SUDDENDEATH (15).

The msssage also occurs in procedure GIP in module 55 (HARDRES), which does:

   ASMB(TIO 0); << GET STATUS FROM DEVICE >>                            02888000
   IF < THEN IOFAILURE(DRTN, 0 ); << CONTROLLER FAILURE >>              02890000
   DUPLICATE;                                                           02892000
   TOS := DBIUNIT; << UNIT EXTRACT INSTRUCTION, DB IS AT BASE OF ILT >> 02894000
   IF <> THEN ASMB(XCH; XEQ 1); << EXTRACT UNIT # FROM STATUS >>        02896000
   ASMB(DELB);                 << Q+5 - UNIT >>                <<00148>>02898000
   TOS := 0;                                                            02900000
   TOS := SYSDB;                                                        02902000
   ASMB(XCHD); << SET DB TO SYSDB >>                                    02904000
   TOS := SYSDB;                                                        02906000
   ASMB(SUB,DELB);             << Q+6 - ILT POINTER >>         <<00148>>02908000
   TOS := ILTP(ISIOP);         << Q+7 - SIOP >>                <<00148>>02910000
   TOS := ILTP(IFLAG).HCUNIT;  << HIGHEST CONFIGURED UNIT >>   <<01300>>02912000
   IF TOS < UNIT OR (TOS := ILTP(UNIT+IDITP)) <= 0 THEN        <<00148>>02914000
      BEGIN                                                    <<00148>>02916000
      << Print message here >>                                 <<03663>>02918000
      IOMESSAGE(1,UNKNOWN'INT'MSG,%10000,DRTN,,,,,OPCONSOLE);  <<03663>>02920000
        ASMB( IXIT );                                          <<00148>>02922000
      END;                                                     <<00148>>02924000

Tracing the execution, the reported unit number is correctly extracted from the
status return.  Moreover, unit 3 (shown as JOBTAPE but actually a pseudo device
used for streaming) is configured, and attaching to unit 3 succeeds.

Maybe this is a compilation bug?  No one else has reported it, and both
conditions that produce the message appear to be impossible.

----
After you've made a debug.log that captures the error, start a new simulator session (without debug) and see if you can get the error to occur after doing:

  sim> set ms3 disabled

...after bootup and before your first tape attach.
----


----------------------------
HP32002E.01.00 MPE Operation
----------------------------

 - Attempting to RESTORE U00U232A.USL.SYS from tu.tape causes a SF 206 and
   leaves the USL subdirectory with a file that cannot be PURGEd.


-----------------------------------------------------------
CARTRIDGE DISC (HP 30129A) DIAGNOSTIC OFF-LINE (D419A.01.4)
-----------------------------------------------------------

 - NOTE: The diagnostic seems to malfunction if Section Register bits 13 or 15
   are turned on!!!

 - The diagnostic fails step 66 (retry counter test) if REALTIME is set.  If
   FASTTIME, the diagnostic passes.  Setting the STIME (full sector time)
   greater than about 5300 causes the failure; the REALTIME value is 138,
   corresponding to 347.22 usec, but it is multiplied by the sector delta (47),
   giving an equivalent value of 6527.

   The reported failure is E66 00 TOTAL INTERRUPTS 09 SHOULD BE 10.  However,
   a trace shows 10 interrupts occurring.

   The diagnostic does writes a bad sector at 1/1/42 and then reads it:

     Channel loaded IOCW 040000 (Control) from address 102757
     Channel loaded IOAW 007600 Set File Mask from address 102760
     Channel loaded IOCW 040000 (Control) from address 102761
     Channel loaded IOAW 006000 Address Record from address 102762
     Channel loaded IOCW 067776 (Write) from address 102763
     Channel loaded IOAW 045765 from address 102764
     Channel loaded IOCW 020000 (Interrupt) from address 102765
     Channel loaded IOAW 177777 from address 102766
     Channel loaded IOCW 040000 (Control) from address 102767
     Channel loaded IOAW 002400 Read from address 102770
     Channel loaded IOCW 077600 (Read) from address 102771
     Channel loaded IOAW 046120 from address 102772
     Channel loaded IOCW 004000 (Conditional Jump) from address 102773
     Channel loaded IOAW 102761 from address 102774
     Channel loaded IOCW 034000 (End with Interrupt) from address 102775
     Channel loaded IOAW 177777 from address 102776

   The diagnostic expects one original try, then eight retries, then a final
   interrupt when the retry counter expires, for a total of 10.

   >>> The diagnostic sits in a timed loop, and if it expires before the final
       interrupt, it prints the error message with the count as of that point!

   The loop consists of an outer loop of 300 executions of an inner loop of 150
   executions of the MTBA P+0 instruction.  This produces a time of ~48,000
   instructions.  At 2.5 usec/instruction, this represents 120.0 msec.

   BUT...each failed sector read is followed by a 47-sector rotation before the
   sector may be reread, which requires 16.32 msec each (6527 instructions).
   Eight retries therefore takes 130.55 msec, so the timed loop expires.

   >>> The problem is that MTBA P+0 takes longer than the average 2.5 usec.  It
       takes a minimum of 23 microinstructions (4.0 usec) from LUT entry, not
       including the EA time.

   Changing 00.020124 from 000454 -> 000620 (300. -> 400.) works around the issue.

   Note that simply decrementing sim_interval twice for the MTBA instruction
   works for the disc diagnostic but fails Step 532 of the tape diagnostic.
   This is because the tape data transfer service event time counts down twice
   as fast, while the multiplexer channel data transfer polls occur at the usual
   one per instruction, leading to a data overrun.


--------------------------------------------
HP 30115A 9-TRACK MAGNETIC TAPE (D433A.01.4)
--------------------------------------------

 - (MTB)
   calculated crc word = 141400
   E274 STEP-0434 COMP. AND READ CRCC ARE DIFFER.
   E116 STEP-0434  EXPECT.- OBTAIN. CRCC
                   120200   032400
   WRZ writes 16 bytes of data, then 6 bytes of zeros, then 2 bytes (240, 200),
     then 4 bytes of zeros, then 3 bytes (357, 354, 156), then 1 byte of zero.
   More importantly, the CRC is erroneous, and diag expects the RDC to return
     the same erroneous CRC (meaning that calculating it won't work!)


 - (MTB)
   calculated crc word = 140000
   E274 STEP-0437 COMP. AND READ CRCC ARE DIFFER.
   E116 STEP-0437  EXPECT.- OBTAIN. CRCC
                   000310   032400
   Ditto.


 - (MT11)
   P020 HEAD ADJUSTING:  HP 9162-0027  (YES/NO)? NO
   P022 STEP1114 WRITE/READ TEST  (%177777)  (YES/NO)? YES
   P056 TYPE SELECTED DRIVE ?  0
   P025 LOAD TAPE(RING) IN DRIVE 0 AND RESPOND   'CR'
   sim> attach ms0 scratch.tape
   sim> go

   Q042 TYPE 'YES' (OPERATOR STOPS RUN BY 'CR')
   Q043      'NO'  (TAPE WILL RUN UNTILL END)    NO
   E238 SAME STEP -  TAPE  ERROR   - IN READ DS
   E238 SAME STEP -  TAPE  ERROR   - IN READ DS
   ...
   00.047603:      100552

   >>> Problem is diag attempts to write a record the size of the full tape and
       the tape library aborts the write after 32768 words.


----------------------------
TERMINAL DATA  PD427A  01.01
----------------------------

 - NOTE: Test section 6 seems to pass, but there are no "receive overrun" debug
   messages!  Is this correct?