Some Private Remarks on the Boyer-Moore(BM for short)-Prover
and on induction heuristics in general

Author: Claus-Peter Wirth

Date: Jan 10, 1992.


The unrefered references always refer to 
Boyer, Robert S.; Moore, J Strother:
A Computational Logic, Academic Press 1979, Menlo Park, California.

The page references refered to the `new' book refer to
Boyer, Robert S.; Moore, J Strother:
A Computational Logic Handbook, Academic Press 1988.

Further references:
[Stevens]  Andrew Stevens. A Rational Reconstruction of Boyer and Moore's
                           Technique for Constructing Induction Formulas.
                           ECAI 88, pp. 565-570.

[Walther1] Christoph Walther. Computing Induction Axioms. CADE-11 (?).

[Walther2] Christoph Walther. Computing Induction Axioms by Machine. 
                              LICS-92(?).


I. REMARKS TO CHAPTER XIV


1. INTERNAL CLAUSAL FORM
   
   The process of deductive proving is majorly influenced by exhaustive
   contextual rewriting on the basis of the internal clause form
   and the rippling-out of IFs using the meta-rewrite rule
   (f ... (IF test pos neg) ...) = (IF test (f ... pos ...) (f ... neg ...)),
   which becomes necessary because of the form of the BM-definitions
   (taking conditional rewrite rules this rippling is not needed anymore).

   It would be more beautiful therefore to show these clauses to the user
   and allow him to specify his theorems and definitions in a similar form.
   While the clauses are presented to the user in the form

     (IMPLIES (AND ...) concl)

   BM say (page 220 of the new book) that the user must not think
   in this expression but has to think in clausal form. Their example
   for this is that a rewrite-rule with left-hand side (AND ...) would
   never apply to the clause above (Note, however, that such a rule may
   apply during simplification of the original theorem-term or in case of
   a real AND function symbol that has survived due to the special treatment
   explained in `Answers of Ulrich, C.b' below.).

   We have had three open questions:   
   A. How does the BM-prover translate the internal clausal form into 
      the external implicational form for presentation to the user?
   B. Does this translation function have any influence on the behaviour
      of the BM-prover; i. e. is the distinguished `concl' a literal of
      major interest to the BM-prover?
   C. Does the BM-prover make any use of how the user chooses and
      structures his logical connectives for expressing a theorem?

   Answers of Ulrich:
   A. Clauses are represented internally as lists. 
      The last element of such a list is the conclusion `concl'.
   B. The list representing a clause is simplified in the order of the
      list; hence `concl' is simplified last (See (A)).
      This becomes important for cross-fertilization, where the literals
      of the form (NOT (EQUAL s t)) are tried for replacement in the order
      of the internal list until one applies (See page 149).
   C. The structure of the logical connectives has two influences:
      a. The ordering of the internal lists that represent the clauses
         may be influenced.
      b. (page 90, footnote 14)
         If more than one of the clauses obtained by clausifying 
         cannot be shown by simplification and it comes to an induction,
         then the BM-prover prefers to induct upon 
         the single input clause rather than several ones. 
         This is done by not spliting up normalized AND-connectives.
         The reason behind is: 
           If the user specifies an inductive
           theorem that yields different clauses requiring inductive proof,
           he will probably have done this because of the impossibility of 
           showing these clauses separately.


2. THE MACHINE OF A DEFINED FUNCTION
   
   The so-called `machine' (see page 165) of a function of the BM-prover is 
   conceptually nothing but the set of the recursive ones of the conditional
   rewrite-rules that correspond to the definition scheme of the function.
   (Actually, they differ by the flaw described in (3.) below and by the
    omission of the non-logical superterms of the recursive calls.
    Note: The twofold use of `machines' in BM (for termination on one hand
          and for suggestion of appropriate induction schemes on the other)
          mirrors the similar uses of conditional equations.
          For the termination purpose the BM-approach allows to throw away
          the non-logical superterms because there is never more than
          one function in the process of definition. 
          (henceforth called the `one per time fashion' of definition).
          For the suggestion of appropriate induction schemes the 
          non-logical superterms may be of no use --- so one could
          argue that the machines are more appropriate here than conditional
          equations. 
          However, the whole induction heuristics of BM have little reason
          (except: If there is an easy induction
                   (and all functions are defined in the one per time fashion),
                   then it is likely go the BM-way),
          and a more complicated induction scheme 
          (which need not be precomputed)
          might also make use of the non-logical superterms.)
   Therefore we think it would be nicer to have the user specify 
   the function by means of conditional equations instead.


3. MACHINES FOR TERMINATION PROOFS, PARTIALLY DEFINED FUNCTIONS
   
   The BM-prover has no concepts for partially defined functions. 

   This property becomes relevant even for totally defined functions during
   the definition process. Take a rule of the form

      f t = g f s  <-- goodnaturedp t = true, f u = 0.

   Now suppose the call of `f s' to be smaller than `f t' if `f u = 0', but 
   not necessarily without. Additionally suppose `f u' to be smaller than
   `f t' (if `goodnatured p t = true').

   While the definition is fine now (because it is left-right-compatible), 
   the BM-prover is unable to recognize this 
   because he is forced to show `f s' to be smaller than `f t' if
   `goodnaturedp t = true' without the necessary additional assumption
   `f u = 0'. (See page 165.)


4. REVISED MACHINES (i.e. machines with weakened cases)

   BM say that they do the induction not according to the `machine' 
   but to a modified version of it: 
   They weaken the cases of the machine (i. e. the conditions of the equations)
   by taking the implied conditions of the so-called
   `induction-lemmas' (which are used for proving termination) instead. 
   This weakening seems to be a rather stupid thing to do.
   Of course, it might save some proof computation that already has been
   done for the termination proof before. But weaker is weaker.
   BM give two reasons for their procedure:

   A. It eliminates weak base subgoals

   B. Because of usage of the same induction-lemmas (which tend to be very few)
      it arranges for different terms to suggest identical inductions
      by allowing subsumptions and mergings that would not be possible
      without.

   Now, (A) seems to be nonsense because it's not the base subgoals that make
   the real trouble. However, 
   (A) is said to be important in [Stevens](p. 566).
   [Walther1], contrariwise, recognize (correctly!) that the revision of the
   machines makes the proofs harder because 
   (in the absence of merging or subsumption) the thrown away assertions 
   must be regenerated (by case distinction) for the proof to succeed.
   [Walther1] even show that this can cause the induction to fail (See:
   Separate Relation Descriptions).
   Most interesting is our following discovery:
   While the book of 1979 says the weak base subgoal to be bad,
   the current version of the BM-prover exactly proves the very same example
   of the book of 1979 by the weak base subgoal. 
   However the base subgoal is not proved by induction
   (as claimed in the book of 1979), but by a simple case distinction.
   For a commented version of this proof see file `arithconj.boot' for 
   the definitions and `arithconj.session' for the proof.
   Note: The functions ZEROP, LESSP, PLUS were redefined to MY-ZEROP, MY-LESSP,
         PLUS-1 for avoiding build-in arithmetics. The second redefinition
         of PLUS-1 from a function definition to two defining rewrite-rules
         became necessary for the BM-prover to allow tracing on the rewriting
         of (PLUS I J). The only difference this last redefinition 
         makes for the proof of lemma P177 seems to be in the non-detection
         of (NUMBERP (PLUS I J)) which would avoid the last two steps in the 
         proof.
         For the complete tracing session see the rather large file
         `arithconj.session.total'.

   (B) was not understood by us until we read [Walther1] who give a
   straightforward explanation using their concept of relational descriptions.
   They call the revision process `range generalization'.

   In the new book there are no induction-lemmas anymore, but it is not
   described, whether they still have their machines revised.
   We suppose that BM also have discovered that (A) is nonsense while (B)
   is fine and therefore use their revised machines only for subsumption
   and merging of the induction schemes, whereas they use the unrevised 
   machine when no subsumption or merging took place on the chosen induction
   scheme.


5. UNMEASURED POSITIONS

   BM speak of `a measured subset of variables' instead of measured positions,
   but we do not do this because the term `variable' is misleading.

   What is the meaning of unmeasured positions?

   A. The unmeasured positions of a function f are irrelevant for 
      the definition of f to be compatible with the reduction ordering.

   B. The unmeasured positions of a function f are those that
      are not put to pieces by this function f in its recursive cases.
      By `putting to pieces' we mean the application of a destructor that
      has a somehow inverse constructor. Such destructors are sometimes
      called selectors. (E. g. : CAR has constructor CONS, SUB1 has constructor
      ADD1; but DELETE has no inverse constructor.)

   (Note that this misshaped double use of the same concept is typical for
    BM's AI-approach to the problem, according to the motto:
    ``Better a misshaped concept than one one does not know how to work with.''
    A rather nice example on which the two concepts (A) and (B) 
    differ is given by BM on page 176:
      There the function LESSP allows either its first or its
      second position to be a measured subset. 
      This is according to (A), because
      (A) is BM's means to get the concept working.
      However, for the induction proof it is more important to know
      which positions are taken to pieces in the sense of (B). 
      Hence, they are actually mislead to do an induction
      proof on the second position because it forms a measured subset for its
      own. Luckily, this misleading does not turn out to be harmful because
      the suggestion is right although the argumentation is wrong.
      Their lemma is (NOT (LESSP (PLUS I J) I)). So induction on I is right
      because this is suggested by PLUS, which recurses on the first argument.
      However, consider (NOT (LESSP (PLUS J I) I)).
      This needs induction on J, and the BM-prover really discovers this (see
      file `exampleforfive'), because the induction on I is `flawed'.
      (Note: This is a very nice example for the BM prover finding a necessary
             lemma by generalization.)
      So it is not so easy to fool him; even after some effort
      (file `secondexampleforfive'). Of course one can fool him
      (file `modifiedexampleforfive'), but only by mean tricks.
    Conclusion: The conceptual difference between (A) and (B) does not
                seem to be very important for simple examples in practice,
                whereas it should not be ignored.
                An example where the attempt to realize (B) by (A)
                fails is very likely to be corrected by `flaw'-detection.)
                
   (A) is of importance for the guidance of inductive proofs: Instead of
   just trying to ripple-out the wave-fronts one is suggested to 
   store the wave-fronts into the unmeasured positions.
   Compared to the more general approach of an enlargable reduction ordering,
   this is a rather primitive way of arguing; --- however, it has the advantage
   of giving a hint that can lead to an operationalization of
   the argumentation, which is not provided by the more general approach.

   (B) is most clearly exhibited when one uses the constructor-fashion of 
   definition instead of the destructor-fashion. 
   So if we use the constructor-fashion, we get the information most easily
   without the concept of measured positions.


6. MEASURED POSITIONS: CHANGING OR UNCHANGING

   The measured positions, i. e. the members of a set of measured positions,
   are separated into `changing' and `unchanging' ones.
   Position i of an n-ary function f 
   with $fx_{0}...x_{n-1}$ as left-hand side of definition 
   is `unchanging' :iff 
     for all subterms of the right-hand side of the definition of f
     that have the form $ft_{0}...t_{n-1}$ we have $t_{i}=x_{i)$ syntactically.
   
   What is the use of this separation?
   
   The technical aim is to allow additional induction schemes by weakening
   the restrictions for the unchanging variables.

   In terms of the two separated cancepts (A) and (B) of (5.) the
   reason can be described the following way:

     The (measured) unchanging positions are relevant for the induction
     ordering. Hence they do not belong to (A).
     However, they are not put to pieces in the function's recursive cases
     and therefore do belong to (B).
     Therefore this distinction can be thought of as a step from the
     operationalizable complement of (A) to the intended complement of (B). 

     However unchanging measured positions are not the only kind of 
     positions that are needed for the induction ordering 
     (i. e. do not belong to (A)) but are not put to pieces in the
     recursive cases of the function (i. e. do belong to (B)).

     Therefore, speaking in terms of my DA, 
     one really should use (B) for choosing covering sets of substitutions,
     and separately use (A) for guiding the proof towards induction
     hypotheses that are smaller than the conclusion.


II. REMARKS TO CHAPTER XV (FROM INDUCTION TEMPLATES TO INDUCTION SCHEMES)
 
What was computed for each defined function symbol f in the previous chapter
BM and we will now call a set of `induction templates' (page 183).
By an induction template we mean a pair consisting of
  i.  something that does a trichotomy of the positions of f into 
      unmeasured, unchanging measured, and changing measured ones.
  ii. the revised machine for the measure leading to (i.).

Now the general idea of generating induction schemes from these sets of 
induction templates is a three step process explained in this chapter:

1. Generate for each subterm of the conjecture and for each
   induction template of this subterm's top-symbol an induction scheme
   if possible.

2. Remove each induction scheme `subsumed' by another, 
   and substitute each pair of induction schemes of (1.) by their `merge' if
   possible.

3. Heuristically choose one of the induction schemes of (2.) for the
   proof attempt for the conjecture.

The conceptual approach of BM is again rather ugly
(but there does not seem to be a more beautiful way):
Instead of dealing with induction over variables, which is the most natural
and theoretically necessary way to tackle the problem, what BM actually do
is thinking in terms of positions instead of variables and then afterwards
correcting all the discrepancies of the two concepts. 
(In this correction process they forget one discrepancy 
 exhibited by the funny example of the file `bmmishit'.)


HOW BM TACKLE (1.)(GENERATION OF INDUCTION SCHEMES) 

An induction scheme for a fixed subterm TERM of a conjecture CONJ
and a fixed induction template of this subterm's top-symbol f 
is obtained the following way:
Instantiate the formals of the function f according to the actuals 
(i. e. the direct subterms of the fixed subterm TERM of the conjecture CONJ)
by a substitution $\gamma$.
For each case CASE of the revised machine of the fixed induction template 
generate a proof subgoal of the form

( ( CASE$\gamma$     $\wedge$ 
    CONJ$\sigma_{0}$ $\wedge$ .... $\wedge$ CONJ$\sigma_{m}$ 
  )
  --> 
  CONJ 
).

CONJ is the conjecture one wants to prove.
$\gamma$ is the substitution that maps the (naturally distinct)
formals of the function f to its actuals in the current subterm.
Let the case CASE guard (m+1) calls to the function f, namely:
$ft_{0,0}...t{n-1,0}$;
.
.
.
$ft_{0,m}...t{n-1,m}$. 
For each j < (m+1):
  $\sigma_{j}$ is a substitution
  whose domain are those actuals which are variables.
  
How to construct $\sigma_{j}$ form $\gamma$ and $ft_{0,j}...t{n-1,j}$
is the central problem of the whole process of (1.).
In the sequel we describe BM's procedure for a fixed j of a fixed CASE (of 
a fixed induction template for a fixed subterm TERM of the conjecture CONJ).
Note, however, that the induction scheme consists of all the induction
subgoals of the template plus the base subgoals, which say that the
conjunction of the negated cases [CASE] of the inductive subgoals
implies the conjecture.

Let $\mu$ be the substitution that maps each of the formals 
$x_{0},...,x_{n-1}$ of $fx_{0}...x_{n-1}$ to the 
term that occupies its position in $ft_{0,j}...t{n-1,j}$;
i. e. $x_{i}\mu := t_{i,j}$.
Now $\sigma_{j}$ is given by mapping 
(for each formal $x_{i}$ for which $x_{i}\gamma$ is a variable):
  
        $x_{i}\gamma$        to        $x_{i}\mu\gamma$.


If all actuals are distinct variables, everything is fine.
But two problems can occur:

A. The actual $x_{i}\gamma$ of a formal $x_{i}$ is not a variable.
  
B. The actual is a variable which is an actual of another formal, too.


TO A: 
  
Three cases:

  a. i is a changing measured position:

     BM say that the induction template does not apply to the subterm TERM.
     This is reasonable because we do not know how to put $x_{i}\gamma$
     to pieces in this case, and that is exactly what we should do when
     the position i is in the complement of (B) of (5.) of (I.).
     So instead of pushing down a selector on something it is usually
     better to have this something sending up the constructor.
     For non-selector destructors (e.g.: DELETE), however, it might be 
     necessary to send the destructor down, which the BM-prover cannot do.

  Hence assume in the sequel that for each changing measured i:
     $x_{i}\gamma$ to be a variable.

  b. i is an unchanging measured position:

  Two cases:

     alpha. $x_{i}\gamma$ contains a variable of the form $x_{k}\gamma$
            where k is a changing measured position:

        BM say that the induction template does not apply to the subterm TERM.
        This is forced by their way of induction foundation because the
        changing measured k must change in some case CASE' of the revised 
        machine of the induction template:
        In this case CASE' the foundation of the induction will be destroyed.

        Even when using a more general means of foundation of induction,
        it is not a bad heuristic not to try an induction here:
        One should generalize on the occurrence of $x_{k}\gamma$
        in $x_{i}\gamma$ before an induction attempt or 
        try to prove a lemma that $x_{i}\gamma$ is insensitive on
        the change of $x_{k}\gamma$ to $x_{k}\mu\gamma$.

     beta. $x_{i}\gamma$ contains no variable of the form $x_{k}\gamma$
           where k is a changing measured position:

        This is allowed by BM ($\sigma_{j}$ is undefined on $x_{i}\gamma$ 
        then), but it possibly changes $\sigma_{j}$ in the following way:
        If k is an unmeasured position and $x_{k}\gamma$ is a variable
        that occurs in $x_{i}\gamma$, then $x_{k}\gamma\sigma_{j}$ is
        $x_{k}\gamma$ instead of $x_{k}\mu\gamma$.
        This is heuristically good, but one is tempted not to
        precompute an induction scheme here (and in other cases).

  c. i is an unmeasured position:

     This is allowed by BM.
     (But $\sigma_{j}$ is undefined on $x_{i}\gamma$ of course.)


TO B: 

  Let i and k be positions of f with $x_{i}\gamma = x_{k}\gamma$ a variable.
  Six cases:

  a. i and k are changing measured positions:

     BM say the induction template not to be appropriate. This is stupid
     as exhibited by the example of the file `bmmishit'.
     The conceptually fitting way is to say the induction template to
     be appropriate here iff $x_{i}\mu\gamma = x_{k}\mu\gamma$.

  b. i and k are unchanging measured positions:

     Because of $x_{i}\mu\gamma = x_{i}\gamma = x_{j}\gamma = x_{j}\mu\gamma$
     this no problem at all.

  c. i and k are unmeasured positions:

     BM arbitrarily choose one of $x_{i}\mu\gamma$, $x_{k}\mu\gamma$ 
     for the value of $x_{i}\gamma$ under $\sigma_{j}$.
     A better idea is to define two substitutions instead of the 
     single $\sigma_{j}$, one for each suggestion.

  d. unmeasured / changing-measured:

     The value of $x_{i}\gamma$ under $\sigma_{j}$ is determined by the 
     changing measured position only.

  e. unchanging-measured / changing-measured:

     In this case BM says that the template does not apply.
     The problem is exactly the one of A.b.alpha.

  f. unchanging-measured / unmeasured:

     Treated analogously to  A.b.beta.


HOW BM TACKLE (2.)(SUBSUMPTION AND MERGING OF SCHEMES)

BM start with the introduction of further notations:

For an induction scheme $s$ generated as described above the following sets
are defined:

  ${\rm CHANGING}(s)$ is the union over all inductive subgoals of 
  the union over all j of the Domains of the $\sigma_{j}$.
  ${\rm UNCHANGING}(s) := \var{TERM}\setminus{\rm CHANGING}(s)$


1. SUBSUMPTION OF SCHEMES

If an induction scheme is `subsumed' by another it is thrown away
and this other scheme is given additional credit. BM try to 
explain what they mean by $s_{1}$ being `scheme-subsumed' by $s_{2}$ but 
we are not sure to understand it; it is something like a conjunction of this:

(1) ${\rm CHANGING}(s_{1})  \ \subseteq\ {\rm CHANGING}(s_{2})$
(2) ${\rm UNCHANGING}(s_{2})\ \subseteq\ {\rm UNCHANGING}(s_{2})$
(3) Each inductive subgoal of $s_{1}$ is subgoal-subsumed by 
    an inductive subgoal of $s_{2}$.
(4) No inductive subgoal of $s_{2}$ subgoal-subsumes more than one 
    inductive subgoal of $s_{1}$.

(Aside: We cannot see any heuristic reason for (4)! Do You?
        Furthermore, (4) can only occur after revising the machine,
        which we do not think of that it should be done.)

An inductive subgoal $g_{1}$ is subgoal-subsumed by 
an inductive subgoal $g_{2}$ :iff

(1) The terms in the conjunction $CASE\gamma$ of $g_{1}$ are a subset of 
    those in $g_{2}$.
(2) Each substitution $\sigma_{j}$ of $g_{1}$ is substitution-subsumed by 
    some such substitution of $g_{2}.
(3) No substitution $\sigma_{j}$ of $g_{2}$ substitution-subsumes more than one
    such substitution of $\sigma_{j}$.

(Asides: 1. We cannot see any heuristic reason for (3)! Do You?
         2. Funny enough, [Walther1] treats (1) just the other way round,
            and what's more, has very good reasons for this!!!)

Now what is `substitution-subsumption'?

According to our opinion, it should be something like equality on the 
induction variables of the subsumer. BM and [Stevens] see this different
and if you want to know their opinion You have to read the following three
complicated paragraphs:

The following definition is wrong because it thinks in terms of constructors
substituted into the conclusion instead of destructors substituted into the
hypotheses, but it is easier to state it this way:
A substitution $\sigma$ is substitution-subsumed by a substitution $\sigma'$
:(bugged!)iff there is a substitution $\kappa$ with $\sigma\kappa = \sigma'$.
Note that this last definition is just the reverse of what one usually 
expects from subsumption.

A proper definition is:
A substitution $\sigma$ is substitution-subsumed by a substitution $\sigma'$
:iff for each term $t'$ in the range of $\sigma'$ there is a set of positions P
such that
  (a) For each variable-position v in $t'$ 
      there are p,q with v=pq and $p\in P$.
  (b) For each $p\in P$ there is a term $t$ in the range of $\sigma$ 
      such that $t'/p=t$ syntactically.

[Stevens] understand BM in a different way:
They require $\sigma=\sigma'$ here but allow `repeated' schemes of 
the subsumee's: 
A repeated scheme SR for a function f is either an induction scheme
for the function f or the result of expanding a machine M for the function
f on one of the subgoals of a repeated scheme R for the function f.
Expanding a machine M on the subgoal $g$ is done by instantiating the 
formals of M according to some ${\rm TERM}\sigma_{k}$ of $g$ and
then by replacing $g$ by a new subgoal for each case CASE of M 
which adds this CASE to that one of $g$ 
and takes over all $\sigma_{j}$ from $g$
except for $\sigma_{k}$, which is changed according to the recursive
calls of CASE.

[Walther1](Related Work) understand BM's explanation different again
(all $t$ in the range of $\sigma$ occur as subterms in the range of $\sigam'$).

In their own work on the subject
[Walther1], however, require $\sigma=\sigma'$ on the induction variables of 
$s_{2}$. Their treatment allows the conclusion that the subsumption by
`repeated machines' is rubbish anyway (See chapter III). So do not be sorry
for not having understood the last paragraphs.

 
2.MERGING OF SCHEMES

Because BM precompute the induction scheme they must be very sophisticated in
guessing the right scheme. One step towards intelligent guessing is the
`merging' of schemes, which replaces two schemes (one mergable into the other)
from the set of schemes under choice by their merge.

Roughly speaking, two induction schemes merge, if their CHANGINGs overlap
in a way that is compatible in the sense that 
the suggested changes in the intersection of the CHANGINGs are identical and 
the remaining CHANGING-variables are not among the other scheme's UNCHANGINGs.

Scheme $s_{1}$ is mergable into  scheme $s_{2}$ :iff
(1) ${\rm CHANGING}  (s_{1})\cap{\rm CHANGING}  (s_{2})\not=\emptyset$
(2) ${\rm UNCHANGING}(s_{1})\cap{\rm CHANGING}  (s_{2})    =\emptyset$
(3) ${\rm CHANGING}  (s_{1})\cap{\rm UNCHANGING}(s_{2})    =\emptyset$
(4) There is an injective mapping INJG from the inductive subgoals of $s_{1}$
    into the inductive subgoals of $s_{2}$ such that for each inductive
    subgoal $g$ of $s_{1}$: $g$ is subgoal-mergeable into ${\rm INJG}(g)$.
(Aside: We do not see reason for the injectivity of (4)! Do You?)

Inductive subgoal $g$ is subgoal-mergable into inductive subgoal $g'$ :iff
there is some injective mapping INJS from the substitutions $\sigma_{j}$
of $g$ into these substitutions of $g'$ such that for each substitution
$\sigma_{j}$ of $g$: 
   $\sigma_{j}$ and ${\rm INJS}(\sigma_{j})$ are substitution-mergable.
(Aside: We see no reason for the injectivity! Do You? Note however:
        Even if injectivity is not needed at all, BM cannot come to a 
        symmetric kind of merging, where the unmerged subgoals of both
        induction schemes are taken over unchanged into the merged
        induction scheme (and so are the unmerged substituitons.)
        This because one of the old schemes must lay the foundation
        for the induction by its measure,
        and it is not enough to decrease w.r.t. one measure or another.)

Two substitutions are substitution-mergable :iff
(1) Their restrictions to the intersection of 
    the sets of variables of their terms TERM are identical.
(2) The intersection of their Domains is not empty.

The result $s''$ of the merging is similar to $s_{2}$ 
but with the following exceptions:
(1) ${\rm CHANGING}  (s'') :=
     {\rm CHANGING}  (s_{1}) \cup {\rm CHANGING}  (s_{2})$
(2) ${\rm UNCHANGING}(s'') :=
     {\rm UNCHANGING}(s_{1}) \cup {\rm UNCHANGING}(s_{2})$
Note that CHANGING and UNCHANGING still stay disjoint.
(3) For each (i. e. not only for the ones suggested by INJG)
    inductive subgoal $g$ of $s_{1}$ that is subgoal-mergeable
    into a subgoal $g'$ of $s_{2}$ we merge this subgoal $g$ into $g'$,
    yielding a new subgoal $g''$ which differs from $g'$ by:
    (a) ${\rm CASE}(g'') := {\rm CASE}(g)\ \wedge\ {\rm CASE}(g')$
    (b) Each substitution $\sigma$ from $g$ is merged into each substitution
        $\sigma'$ of $g'$ into which it is substitution-mergeable, yielding
        (for each merge) a substitution $\sigma''$ of $g''$ 
        which is the union of $\sigma$ and $\sigma'$.
(4) (Surprisingly found on page 196.) If now the CASE of two subgoals
    consists of the same conjuncted terms, the two subgoals are merged
    into one that conjuncts all conjuncted terms of the premises of the two
    subgoals in its premise. 
    This becomes necessary indeed. 
    For an example and a more generally applicable
    solution of this problem see [Walther1](Separated Relation Descriptions).
(5) The base subgoals are reconstructed from the new inductive subgoals.

Now $s''$ is sound for the same reason as $s_{2}$:
- Completeness of case distinction is given because of reconstruction of
  base subgoals.
- Making CASE stronger cannot do any harm to the foundation of the induction.
- The possibly additional substitutions occur only outside of TERM of $s_{2}$
  because of (1) of the definition of substitution-mergeable
  and (2),(3) of the definition of mergable.

[Stevens] recognized BM's approach to be too simple 
in not allowing repeated schemes in place of the scheme $s_{1}$.
They give a rather nice example for this.
Furthermore their approach allows to treat subsumption and merging
in a uniform way (for which they have to throw away the injectivities
(as suggested by us)).

Note: Having read this one might assume that the merging of schemes is of 
      no use when one does not want to precompute the induction scheme. However
      the next section about the FLAWED-heuristic makes sense even when
      not precomputing induction schemes and is majorly influenced
      by the assumption of all subsumption removed and all merging done.


HOW BM TACKLE (3.)(HEURISTIC SELECTION OF AN INDUCTION SCHEME)

1. FLAWED INDUCTION SCHEMES

Some schemes are said to be `flawed'.
Unflawed scemes are prefered to flawed ones.

Roughly speaking, we guess a flawed scheme should be one that overlaps
with its CHANGING-variables into another schemes CHANGING- or UNCHANGING-
variables in a mode that does not allow merging because of 
a clash on the CHANGING-variables or an overlaping of the CHANGING-
variables into the other's UNCHANGING-variables;
whereas BM's definition is a little different and a lot obscure.

A scheme $s$ is flawed :iff there is another scheme $s'$ with
$({\rm CHANGING}(s') \cup {\rm UNCHANGING}(s'))
 \cap {\rm IND-VAR}(s)
 \not= \emptyset$;
where the set of `induction-variables' of $s$ is defined by
  ${\rm IND-VAR}(s) := ???$.
??? seems to be just the set of actuals of the changing-measured positions
that have lead to the generation of $s$. However, we do not know whether
we understand this correctly and how to find the induction-variables of
merged schemes. 

For making sense it is necessary that the merging has been
completed before, i. e. $s$ and $s'$ are not mergeable.

[Stevens] try to explain what is good with BM's flawing concept and what is
bad. Their treatment, however, is not enlightening. We keep thinking
our initial (roughly speaking) formulation of the problem to be the adequate
one: On the one hand the more sophisticated treatments are not convincing.
On the other there are shortcomings (i.e. certain possibilities for induction
are not considered at all) in the concept 
which should be removed before introducing further sophistication.


2. SCORES

If telling apart the flawed from the unflawed schemes does not settle the 
score, then the remaining induction schemes are chosen by an attribute called 
SCORE that is constructed along with the scheme construction (in a manner we do
not understand) and manipulated when merging (new score is sum of old ones)
and subsuming (subsumer's score is increased by subsumee's one).
Highest score is winner. If several winners: Do the induction on
the scheme derived from the template whose term TERM
contains the biggest number of `nasty' functions in the 
conjecture. This tends to remove many nasty functions after simplification
of the inductive subgoal: Because something like TERM tends to appear
in premise and conclusion it is apt to be generalized or cross-fertilized away.
A function is said to be `non-nasty' :iff 
it does to its positions in its recursive cases 
nothing or selector application.

If all this does not give a single winner, BM claim the conjecture to
be most likely to be symmetric (whatever this means) and choose arbitrarily.


III. SOME REMARKS ON [Walther1] and [Walther2]

[Walther1,Walther2] also try to tackle the problems described here.
They describe the inductive subgoals of induction schemes by so-called
`atomic relational descriptions' and the whole induction scheme by a
`(compound) relational description' which is the set of its inductive subgoals,
which they do not require to have disjoint cases CASE\gamma; however, 
for the following reduction to orderings described by these compound
relational descriptions they have to require this and achieve this
by what they call `separation' and has to deal with (4) in the description
of BM's merging process.

Their general idea is to reduce the problems of subsumption and merging
to problems of an ordering defined by the compound relational description.
This reduction is a good idea for the simplification of the problems ---
even though it throws away the following kind of information:
For an induction to succeed it is not only necessary to have an induction
scheme that allows sufficiently strong inductive hypotheses 
(the strength measured by all constructor ground instantiations to the scheme)
(Note: This means that their strength is something 
       inductive instead of deductive),
but also to supply syntactic (deductive) means to use this strength for the 
induction proof. Therefore they better had defined their strength on terms 
instead of the objects of their model. 
This, however, does not fit into their world
and their approach may be sufficient for practical purposes because the
means to compare the strengthes is the theorem prover again (This means:
If the being stronger of one induction scheme than another does not
allow to conclude that it is more appropriate for induction, then
this being stronger is most unlikely to be detected.).

As this last flaw seems to be difficult to understand we restate it now:
Aim: Throw out of a set of induction schemes 
     those induction schemes for 
     which there is another induction scheme in this set which is guaranteed
     to work at least as well.
     s `working as least as well' as s'
     means: If s' is deductively valid then s is deductively valid, too.
     Maybe [Walter] think this a little bit different by
     `deductively valid' replaced by `provable by our theorem prover'.
     But this theorem prover is a first order theorem prover (at least on
     the first level until it decides to induct again).

[Walther1] differ from BM by doing the subsumption and the merging 
on induction schemes that do not determine special values for the 
unmeasured variables but give don't-care-variables instead, which are
considered to be renamed for each appearance. By this they correct
an up to now undetected flaw in BM's recursion analysis. The insight 
to this flaw comes from their problem reduction process.
So while [Walther1] has the distinction into measured and unmeasured positions,
they do not need the distinction of changing/unchanging variables of the
induction scheme.
This approach makes their subsumption a very straightforward heuristic
compared to which the approach of BM is really obscure.
Ulrich, however, draw our attention to the following problem:
[Walther] have their unmeasured variables unversally quantified in the premise
of the induction scheme. Drawing these quantifiers out they turn out to be
existential quantifiers. Now while INKA of [Walther] allows existential
quantifiers, BM do not. So before BM can prove the formula chosen by the 
heuristic of [Walther], each of the existentially quantified variables
must be replaced by some term with globally universally quantified variables.
These terms must be chosen the way BM choose the terms for the unmeasured
variables when generating their induction schemes from their induction 
templates. But even when limited to the choose of induction schemes, [Walter]
means an possible improvement to BM: If one must throw away information
one is really interested to keep, one should do it als late as possible:
So first choose (using the more adequate version of [Walther] including
                 existential quantifiers)
and then throw away some information for fitting the schemes to the BM
who does not know existential quantifers.
But a prover for inductive truths should better be able to include this very
restricted form of existential quantifiers --- but it is not clear how to do
this.

Furthermore, [Walther1] give a straightforward explanation (under the headline:
Range Generalization) for the revised machines of BM.

[Walther1] do not treat the concept of repeated machines of BM or [Stevens].
E.g. on (NOT (LESSP I (HALF I))) the cannot tell 

(NOT (ZEROP I))                          {{I-->(SUB1 I)}                     }

to be worse than

((NOT (ZEROP I)) (NOT (ZEROP (SUB1 I)))) {{I-->(SUB1 (SUB1 I))}              }

by subsumption as BM and [Stevens] can. Knowing that the second one works
well and that the first does not, one might be disappointed by this result.
Contrariwise, we are happy with this result because we always thought
that the two above should be merged into

((NOT (ZEROP I)) (NOT (ZEROP (SUB1 I)))) {{I-->(SUB1 (SUB1 I))},{I-->(SUB1 I)}}

which is shurely better than both above. Hence we would like it to
subsume the former ones. So let us see what the other authors subsumption
concepts achieve on this: 

BM's approach recognizes both subsumptions, 
which is not surprising because BM (while not knowing any reason for) tend
to work well with simple examples.

[Walther1] however fail on the subsumption of the first. That is because 
his subsumption concept (which has much more reason that BM's) is not
strong enough: It lacks a complete case distinction for the subsumee's case.
Instead of being sorry for the case of the first induction scheme not implying
the case of the last, [Walther1] should use the case distinction:
     (NOT (ZEROP (SUB1 I)))
(NOT (NOT (ZEROP (SUB1 I)))).
For the first case they then can establish their subsumption condition at once.
For the second case [Walther1] get (EQUAL I 1) which means that this is no
inductive subgoal anymore and need not be tested for.

Further problems not explained in [Walther1] are:
  Distinction changing/unchanging for measured positions (not to be confused
    with the omission of changing/unchanging variables of induction schemes,
    which is done on purpose)
  Merging of induction schemes
  Flawed induction schemes

In [Walther2] their is a kind of merging given called `separated union'.
For our two example schemes 

(NOT (ZEROP I))                          {                      {I-->(SUB1 I)}}

((NOT (ZEROP I)) (NOT (ZEROP (SUB1 I)))) {{I-->(SUB1 (SUB1 I))}               }

this separated union is 

((NOT (ZEROP I)) (NOT (ZEROP (SUB1 I)))) {{I-->(SUB1 (SUB1 I))},{I-->(SUB1 I)}}
((NOT (ZEROP I))      (ZEROP (SUB1 I)) ) {                      {I-->(SUB1 I)}}

If we reconsider our subsumption problem now, we see that the case distinction
still must be added to the subsumption of [Walther1], whereas the second
(artifically) inductive subgoal frees us from the recognition problem that 
(AND (NOT (ZEROP I))      (ZEROP (SUB1 I)) )
is  no inductive case anymore.

For this kind of merging, however, we have to do a termination proof,
which is not difficult here because the one is in the transitive closure
of the other. In general this termination proof is difficult.

[Walter2] still leaves open the concepts of unchanging measured positions,
which we believe can be easily included to their theory, and of flawed
induction schemes. The property of being flawed has two reasons in BM:
Firstly, to correct the misleading hints of their bad kind of merging and
subsumption. This is not needed anymore. Secondly, it establishes a kind
of measure how troublesome an induction will get by instantiating variables
of terms not in the focus of this recursion analysis. This heuristically
important measure has yet to be established by [Walther]. We suppose
they have not done it yet because it has nothing to do with the
ordering aspect of inductive proof. But the fog having cleared up in front
of the flaw-heuristic that prevented understanding and improvement of the
BM heuristics for more than a decade, one can hope to get further on this
last subject now.