Inductive logic programming

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Schema: positive examples + negative examples + background knowledge ⇒ hypothesis.

Inductive logic programming is particularly useful in bioinformatics and natural language processing. Gordon Plotkin and Ehud Shapiro laid the initial theoretical foundation for inductive machine learning in a logical setting.^[1]^[2]^[3] Shapiro built their first implementation (Model Inference System) in 1981:^[4] a Prolog program that inductively inferred logic programs from positive and negative examples. The first full first-order implementation of inductive logic programming was Theorist in 1986.^[5]^[6]^{[citation needed]} The term Inductive Logic Programming was first introduced^[7] in a paper by Stephen Muggleton in 1991.^[8] Muggleton also founded the annual international conference on Inductive Logic Programming, introduced the theoretical ideas of Predicate Invention, Inverse resolution,^[9] and Inverse entailment.^[10] Muggleton implemented Inverse entailment first in the PROGOL system. The term "inductive" here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction.

Formal definition

The background knowledge is given as a logic theory $B$ , commonly in the form of Horn clauses used in logic programming. The positive and negative examples are given as a conjunction $E^{+}$ and $E^{-}$ of unnegated and negated ground literals, respectively. A correct hypothesis $h$ is a logic proposition satisfying the following requirements.^[11]

{\begin{array}{llll}{\text{Necessity:}}&B&\not \models &E^{+}\\{\text{Sufficiency:}}&B\land h&\color {blue}{\models }&E^{+}\\{\text{Weak consistency:}}&B\land h&\not \models &{\textit {false}}\\{\text{Strong consistency:}}&B\land h\land E^{-}&\not \models &{\textit {false}}\end{array}}

"Necessity" does not impose a restriction on $h$ , but forbids any generation of a hypothesis as long as the positive facts are explainable without it. "Sufficiency" requires any generated hypothesis $h$ to explain all positive examples $E^{+}$ . "Weak consistency" forbids generation of any hypothesis $h$ that contradicts the background knowledge $B$ . "Strong consistency" also forbids generation of any hypothesis $h$ that is inconsistent with the negative examples $E^{-}$ , given the background knowledge $B$ ; it implies "Weak consistency"; if no negative examples are given, both requirements coincide. Džeroski ^[12] requires only "Sufficiency" (called "Completeness" there) and "Strong consistency".

Example

The following well-known example about learning definitions of family relations uses the abbreviations

par : parent

,

fem : female

,

dau : daughter

,

g : George

,

h : Helen

,

m : Mary

,

t : Tom

,

n : Nancy

, and

e : Eve

.

It starts from the background knowledge (cf. picture)

{\textit {par}}(h,m)\land {\textit {par}}(h,t)\land {\textit {par}}(g,m)\land {\textit {par}}(t,e)\land {\textit {par}}(n,e)\land {\textit {fem}}(h)\land {\textit {fem}}(m)\land {\textit {fem}}(n)\land {\textit {fem}}(e)

,

the positive examples

{\textit {dau}}(m,h)\land {\textit {dau}}(e,t)

,

and the trivial proposition $true$ to denote the absence of negative examples.

Plotkin's ^[13]^[14] "relative least general generalization (rlgg)" approach to inductive logic programming shall be used to obtain a suggestion about how to formally define the daughter relation $dau$ .

This approach uses the following steps.

Relativize each positive example literal with the complete background knowledge:
${\begin{aligned}{\textit {dau}}(m,h)\leftarrow {\textit {par}}(h,m)\land {\textit {par}}(h,t)\land {\textit {par}}(g,m)\land {\textit {par}}(t,e)\land {\textit {par}}(n,e)\land {\textit {fem}}(h)\land {\textit {fem}}(m)\land {\textit {fem}}(n)\land {\textit {fem}}(e)\\{\textit {dau}}(e,t)\leftarrow {\textit {par}}(h,m)\land {\textit {par}}(h,t)\land {\textit {par}}(g,m)\land {\textit {par}}(t,e)\land {\textit {par}}(n,e)\land {\textit {fem}}(h)\land {\textit {fem}}(m)\land {\textit {fem}}(n)\land {\textit {fem}}(e)\end{aligned}}$ ,
Convert into clause normal form:
${\begin{aligned}{\textit {dau}}(m,h)\lor \lnot {\textit {par}}(h,m)\lor \lnot {\textit {par}}(h,t)\lor \lnot {\textit {par}}(g,m)\lor \lnot {\textit {par}}(t,e)\lor \lnot {\textit {par}}(n,e)\lor \lnot {\textit {fem}}(h)\lor \lnot {\textit {fem}}(m)\lor \lnot {\textit {fem}}(n)\lor \lnot {\textit {fem}}(e)\\{\textit {dau}}(e,t)\lor \lnot {\textit {par}}(h,m)\lor \lnot {\textit {par}}(h,t)\lor \lnot {\textit {par}}(g,m)\lor \lnot {\textit {par}}(t,e)\lor \lnot {\textit {par}}(n,e)\lor \lnot {\textit {fem}}(h)\lor \lnot {\textit {fem}}(m)\lor \lnot {\textit {fem}}(n)\lor \lnot {\textit {fem}}(e)\end{aligned}}$ ,
Anti-unify each compatible ^[15] pair ^[16] of literals:
- ${\textit {dau}}(x_{me},x_{ht})$ from ${\textit {dau}}(m,h)$ and ${\textit {dau}}(e,t)$ ,
- $\lnot {\textit {par}}(x_{ht},x_{me})$ from $\lnot {\textit {par}}(h,m)$ and $\lnot {\textit {par}}(t,e)$ ,
- $\lnot {\textit {fem}}(x_{me})$ from $\lnot {\textit {fem}}(m)$ and $\lnot {\textit {fem}}(e)$ ,
- $\lnot {\textit {par}}(g,m)$ from $\lnot {\textit {par}}(g,m)$ and $\lnot {\textit {par}}(g,m)$ , similar for all other background-knowledge literals
- $\lnot {\textit {par}}(x_{gt},x_{me})$ from $\lnot {\textit {par}}(g,m)$ and $\lnot {\textit {par}}(t,e)$ , and many more negated literals
Delete all negated literals containing variables that don't occur in a positive literal:
- after deleting all negated literals containing other variables than $x_{me},x_{ht}$ , only ${\textit {dau}}(x_{me},x_{ht})\lor \lnot {\textit {par}}(x_{ht},x_{me})\lor \lnot {\textit {fem}}(x_{me})$ remains, together with all ground literals from the background knowledge
Convert clauses back to Horn form:
- ${\textit {dau}}(x_{me},x_{ht})\leftarrow {\textit {par}}(x_{ht},x_{me})\land {\textit {fem}}(x_{me})\land ({\text{all background knowledge facts}})$

The resulting Horn clause is the hypothesis $h$ obtained by the rlgg approach. Ignoring the background knowledge facts, the clause informally reads " $x_{me}$ is called a daughter of $x_{ht}$ if $x_{ht}$ is the parent of $x_{me}$ and $x_{me}$ is female", which is a commonly accepted definition.

Concerning the above requirements, "Necessity" was satisfied because the predicate $dau$ doesn't appear in the background knowledge, which hence cannot imply any property containing this predicate, such as the positive examples are. "Sufficiency" is satisfied by the computed hypothesis $h$ , since it, together with ${\textit {par}}(h,m)\land {\textit {fem}}(m)$ from the background knowledge, implies the first positive example ${\textit {dau}}(m,h)$ , and similarly $h$ and ${\textit {par}}(t,e)\land {\textit {fem}}(e)$ from the background knowledge implies the second positive example ${\textit {dau}}(e,t)$ . "Weak consistency" is satisfied by $h$ , since $h$ holds in the (finite) Herbrand structure described by the background knowledge; similar for "Strong consistency".

The common definition of the grandmother relation, viz. ${\textit {gra}}(x,z)\leftarrow {\textit {fem}}(x)\land {\textit {par}}(x,y)\land {\textit {par}}(y,z)$ , cannot be learned using the above approach, since the variable $y$ occurs in the clause body only; the corresponding literals would have been deleted in the 4th step of the approach. To overcome this flaw, that step has to be modified such that it can be parametrized with different literal post-selection heuristics. Historically, the GOLEM implementation is based on the rlgg approach.

Inductive Logic Programming system

Inductive Logic Programming system is a program that takes as an input logic theories $B,E^{+},E^{-}$ and outputs a correct hypothesis $H$ wrt theories $B,E^{+},E^{-}$ An algorithm of an ILP system consists of two parts: hypothesis search and hypothesis selection. First a hypothesis is searched with an inductive logic programming procedure, then a subset of the found hypotheses (in most systems one hypothesis) is chosen by a selection algorithm. A selection algorithm scores each of the found hypotheses and returns the ones with the highest score. An example of score function include minimal compression length where a hypothesis with a lowest Kolmogorov complexity has the highest score and is returned. An ILP system is complete iff for any input logic theories $B,E^{+},E^{-}$ any correct hypothesis $H$ wrt to these input theories can be found with its hypothesis search procedure.

Hypothesis search

Modern ILP systems like Progol,^[8] Hail ^[17] and Imparo ^[18] find a hypothesis $H$ using the principle of the inverse entailment^[8] for theories $B$ , $E$ , $H$ : $B\land H\models E\iff B\land \neg E\models \neg H$ . First they construct an intermediate theory $F$ called a bridge theory satisfying the conditions $B\land \neg E\models F$ and $F\models \neg H$ . Then as $H\models \neg F$ , they generalize the negation of the bridge theory $F$ with the anti-entailment.^[19] However, the operation of the anti-entailment since being highly non-deterministic is computationally more expensive. Therefore, an alternative hypothesis search can be conducted using the operation of the inverse subsumption (anti-subsumption) instead which is less non-deterministic than anti-entailment.

Questions of completeness of a hypothesis search procedure of specific ILP system arise. For example, Progol's hypothesis search procedure based on the inverse entailment inference rule is not complete by Yamamoto's example.^[20] On the other hand, Imparo is complete by both anti-entailment procedure ^[21] and its extended inverse subsumption ^[22] procedure.

Implementations

1BC and 1BC2: first-order naive Bayesian classifiers:
ACE (A Combined Engine)
Aleph
Atom Archived 2014-03-26 at the Wayback Machine
Claudien^{[permanent dead link]}
DL-Learner
DMax
FastLAS (Fast Learning from Answer Sets)
FOIL (First Order Inductive Learner)
Golem
ILASP (Inductive Learning of Answer Set Programs)
Imparo^[21]
Inthelex (INcremental THEory Learner from EXamples) Archived 2011-11-28 at the Wayback Machine
Lime
Metagol
Mio
MIS (Model Inference System) by Ehud Shapiro
PROGOL
RSD
Warmr (now included in ACE)
ProGolem ^[23]^[24]

References

^ Plotkin, G.D. (1970). Automatic Methods of Inductive Inference (PDF) (PhD). University of Edinburgh. hdl:1842/6656.
^ Shapiro, Ehud Y. (1981). Inductive inference of theories from facts (PDF) (Technical report). Department of Computer Science, Yale University. 192. Reprinted in Lassez, J.-L.; Plotkin, G., eds. (1991). Computational logic : essays in honor of Alan Robinson. MIT Press. pp. 199–254. ISBN 978-0-262-12156-9.
^ Shapiro, Ehud Y. (1983). Algorithmic program debugging. MIT Press. ISBN 0-262-19218-7.
^ Shapiro, Ehud Y. (1981). "The model inference system" (PDF). Proceedings of the 7th international joint conference on Artificial intelligence. Vol. 2. Morgan Kaufmann. p. 1064.
^ Poole, David; Goebel, Randy; Aleliunas, Romas (Feb 1986). Theorist: A Logical Reasoning System for Defaults and Diagnosis (PDF) (Research Report). Univ. Waterloo.
^ Poole, David; Goebel, Randy; Aleliunas, Romas (1987). "Theorist: A Logical Reasoning System for Defaults and Diagnosis". In Nick J. Cercone; Gordon McCalla (eds.). The Knowledge Frontier – Essays in the Representation of Knowledge. Symbolic Computation (1st ed.). New York, NY: Springer. pp. 331–352. doi:10.1007/978-1-4612-4792-0. ISBN 978-1-4612-9158-9. S2CID 38209923.
^ De Raedt, Luc (2012) [1999]. "A Perspective on Inductive Logic Programming". The Logic Programming Paradigm: A 25-Year Perspective. Springer. pp. 335–346. CiteSeerX 10.1.1.56.1790. ISBN 978-3-642-60085-2.
^ ^a ^b ^c Muggleton, S.H. (1991). "Inductive logic programming". New Generation Computing. 8 (4): 295–318. CiteSeerX 10.1.1.329.5312. doi:10.1007/BF03037089. S2CID 5462416.
^ Muggleton, S.H.; Buntine, W. (1988). "Machine invention of first-order predicate by inverting resolution". Proceedings of the 5th International Conference on Machine Learning. pp. 339–352. doi:10.1016/B978-0-934613-64-4.50040-2. ISBN 978-0-934613-64-4.
^ Muggleton, S.H. (1995). "Inverting entailment and Progol". New Generation Computing. 13 (3–4): 245–286. CiteSeerX 10.1.1.31.1630. doi:10.1007/bf03037227. S2CID 12643399.
^ Muggleton, Stephen (1999). "Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic". Artificial Intelligence. 114 (1–2): 283–296. doi:10.1016/s0004-3702(99)00067-3.; here: Sect.2.1
^ Džeroski, Sašo (1996). "Inductive Logic Programming and Knowledge Discovery in Databases" (PDF). In Fayyad, U.M.; Piatetsky-Shapiro, G.; Smith, P.; Uthurusamy, R. (eds.). Advances in Knowledge Discovery and Data Mining. MIT Press. pp. 117–152 See §5.2.4. Archived from the original (PDF) on 2021-09-27. Retrieved 2021-09-27.
^ Plotkin, Gordon D. (1970). Meltzer, B.; Michie, D. (eds.). "A Note on Inductive Generalization". Machine Intelligence. 5: 153–163. ISBN 978-0-444-19688-0.
^ Plotkin, Gordon D. (1971). Meltzer, B.; Michie, D. (eds.). "A Further Note on Inductive Generalization". Machine Intelligence. 6. Edinburgh University Press: 101–124. ISBN 978-0-85224-195-0.
^ i.e. sharing the same predicate symbol and negated/unnegated status
^ in general: $n$ -tuple when $n$ positive example literals are given
^ Ray, O.; Broda, K.; Russo, A.M. (2003). "Hybrid abductive inductive learning". Proceedings of the 13th international conference on inductive logic programming. LNCS. Vol. 2835. Springer. pp. 311–328. CiteSeerX 10.1.1.212.6602. doi:10.1007/978-3-540-39917-9_21. ISBN 978-3-540-39917-9.
^ Kimber, T.; Broda, K.; Russo, A. (2009). "Induction on failure: learning connected Horn theories". Proceedings of the 10th international conference on logic programing and nonmonotonic reasoning. LNCS. Vol. 575. Springer. pp. 169–181. doi:10.1007/978-3-642-04238-6_16. ISBN 978-3-642-04238-6.
^ Yamamoto, Yoshitaka; Inoue, Katsumi; Iwanuma, Koji (2012). "Inverse subsumption for complete explanatory induction" (PDF). Machine Learning. 86: 115–139. doi:10.1007/s10994-011-5250-y. S2CID 11347607.
^ Yamamoto, Akihiro (1997). "Which hypotheses can be found with inverse entailment?". International Conference on Inductive Logic Programming. Lecture Notes in Computer Science. Vol. 1297. Springer. pp. 296–308. CiteSeerX 10.1.1.54.2975. doi:10.1007/3540635149_58. ISBN 978-3-540-69587-5.
^ ^a ^b Kimber, Timothy (2012). Learning definite and normal logic programs by induction on failure (PhD). Imperial College London. ethos 560694.
^ Toth, David (2014). "Imparo is complete by inverse subsumption". arXiv:1407.3836 [cs.AI].
^ Muggleton, Stephen; Santos, Jose; Tamaddoni-Nezhad, Alireza (2009). "ProGolem: a system based on relative minimal generalization". International Conference on Inductive Logic Programming. Springer. pp. 131–148. CiteSeerX 10.1.1.297.7992. doi:10.1007/978-3-642-13840-9_13. ISBN 978-3-642-13840-9.
^ Santos, Jose; Nassif, Houssam; Page, David; Muggleton, Stephen; Sternberg, Mike (2012). "Automated identification of features of protein-ligand interactions using Inductive Logic Programming: a hexose binding case study". BMC Bioinformatics. 13: 162. doi:10.1186/1471-2105-13-162. PMC 3458898. PMID 22783946.