Inductive logic programming: Difference between revisions

Content deleted Content added

Inline

Revision as of 22:05, 9 December 2023

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Schema: positive examples + negative examples + background knowledge ⇒ hypothesis.

Inductive logic programming is particularly useful in bioinformatics and natural language processing.

History

Building on earlier work on Inductive inference, Gordon Plotkin was the first to formalise induction in a clausal setting around 1970, adopting an approach of generalising from examples.^[1]^[2] In 1981, Ehud Shapiro introduced several ideas that would shape the field in his new approach of model inference, an algorithm employing refinement and backtracing to search for a complete axiomatisation of given examples.^[1]^[3] His first implementation was the Model Inference System in 1981:^[4]^[5] a Prolog program that inductively inferred Horn clause logic programs from positive and negative examples.^[1] The term Inductive Logic Programming was first introduced in a paper by Stephen Muggleton in 1990, defined as the intersection of machine learning and logic programming.^[1] Muggleton and Wray Buntine introduced predicate invention and inverse resolution in 1988.^[1]^[6]

Several inductive logic programming systems that proved influential appeared in the early 1990s. FOIL, introduced by Ross Quinlan in 1990^[7] was based on upgrading propositional learning algorithms AQ and ID3.^[8] Golem, introduced by Muggleton and Feng in 1990, went back to a restricted form of Plotkin's least generalisation algorithm.^[8]^[9] The Progol system, introduced by Muggleton in 1995, first implemented inverse entailment, and inspired many later systems.^[8]^[10]^[11] Aleph, a descendant of Progol introduced by Ashwin Srinivasan in 2001, is still one of the most widely used systems as of 2022^[update].^[10]

At around the same time, the first practical applications emerged, particularly in bioinformatics, where by 2000 inductive logic programming had been successfully applied to drug design, carcinogenicity and mutagenicity prediction, and elucidation of the structure and function of proteins.^[12] Unlike the focus on automatic programming inherent in the early work, these fields used inductive logic programming techniques from a viewpoint of relational data mining. The success of those initial applications and the lack of progress in recovering larger traditional logic programs shaped the focus of the field.^[13]

Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with the Metagol system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014.^[14] This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.^[15]

Setting

Inductive logic programming has adopted several different learning settings, the most common of which are learning from entailment and learning from interpretations.^[16] In both cases, the input is provided in the form of background knowledge $B$ , a logical theory (commonly in the form of clauses used in logic programming), as well as positive and negative examples, denoted ${\textstyle E^{+}}$ and ${\textstyle E^{-}}$ respectively. The output is given as a hypothesis $H$ , itself a logical theory that typically consists of one or more clauses.

The two settings differ in the format of examples presented.

Learning from entailment

As of 2022^[update], learning from entailment is by far the most popular setting for inductive logic programming.^[16] In this setting, the positive and negative examples are given as finite sets ${\textstyle E^{+}}$ and ${\textstyle E^{-}}$ of positive and negated ground literals, respectively. A correct hypothesis $H$ is a set of clauses satisfying the following requirements, where the turnstile symbol $\models$ stands for logical entailment:^[16]^[17]^[18] ${\begin{array}{llll}{\text{Completeness:}}&B\cup H&\models &E^{+}\\{\text{Consistency: }}&B\cup H\cup E^{-}&\not \models &{\textit {false}}\end{array}}$ Completeness requires any generated hypothesis $h$ to explain all positive examples ${\textstyle E^{+}}$ , and consistency forbids generation of any hypothesis $h$ that is inconsistent with the negative examples ${\textstyle E^{-}}$ , both given the background knowledge $B$ .

In Muggleton's setting of concept learning,^[19] "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "Necessity", which postulates that $B$ does not entail ${\textstyle E^{+}}$ , does not impose a restriction on $h$ , but forbids any generation of a hypothesis as long as the positive facts are explainable without it. . "Weak consistency", which states that no contradiction can be derived from ${\textstyle B\land H}$ , forbids generation of any hypothesis $h$ that contradicts the background knowledge $B$ . Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.^[19]

Learning from interpretations

In learning from interpretations, the positive and negative examples are given as a set of complete or partial Herbrand structures, each of which are themselves a finite set of ground literals. Such a structure $e$ is said to be a model of the set of clauses ${\textstyle B\cup H}$ if for any substitution ${\textstyle \theta }$ and any clause ${\textstyle \mathrm {head} \leftarrow \mathrm {body} }$ in ${\textstyle B\cup H}$ such that ${\textstyle \mathrm {body} \theta \subseteq e}$ , $\mathrm {head} \theta \subseteq e$ also holds. The goal is then to output a hypothesis that is complete, meaning every positive example is a model of ${\textstyle B\cup H}$ , and consistent, meaning that no negative example is a model of ${\textstyle B\cup H}$ .^[16]

Example

The following well-known example about learning definitions of family relations uses the abbreviations

par : parent

,

fem : female

,

dau : daughter

,

g : George

,

h : Helen

,

m : Mary

,

t : Tom

,

n : Nancy

, and

e : Eve

.

It starts from the background knowledge (cf. picture)

{\textit {par}}(h,m)\land {\textit {par}}(h,t)\land {\textit {par}}(g,m)\land {\textit {par}}(t,e)\land {\textit {par}}(n,e)\land {\textit {fem}}(h)\land {\textit {fem}}(m)\land {\textit {fem}}(n)\land {\textit {fem}}(e)

,

the positive examples

{\textit {dau}}(m,h)\land {\textit {dau}}(e,t)

,

and the trivial proposition $true$ to denote the absence of negative examples.

Plotkin's ^[20]^[21] "relative least general generalization (rlgg)" approach to inductive logic programming shall be used to obtain a suggestion about how to formally define the daughter relation $dau$ .

This approach uses the following steps.

Relativize each positive example literal with the complete background knowledge:
${\begin{aligned}{\textit {dau}}(m,h)\leftarrow {\textit {par}}(h,m)\land {\textit {par}}(h,t)\land {\textit {par}}(g,m)\land {\textit {par}}(t,e)\land {\textit {par}}(n,e)\land {\textit {fem}}(h)\land {\textit {fem}}(m)\land {\textit {fem}}(n)\land {\textit {fem}}(e)\\{\textit {dau}}(e,t)\leftarrow {\textit {par}}(h,m)\land {\textit {par}}(h,t)\land {\textit {par}}(g,m)\land {\textit {par}}(t,e)\land {\textit {par}}(n,e)\land {\textit {fem}}(h)\land {\textit {fem}}(m)\land {\textit {fem}}(n)\land {\textit {fem}}(e)\end{aligned}}$ ,
Convert into clause normal form:
${\begin{aligned}{\textit {dau}}(m,h)\lor \lnot {\textit {par}}(h,m)\lor \lnot {\textit {par}}(h,t)\lor \lnot {\textit {par}}(g,m)\lor \lnot {\textit {par}}(t,e)\lor \lnot {\textit {par}}(n,e)\lor \lnot {\textit {fem}}(h)\lor \lnot {\textit {fem}}(m)\lor \lnot {\textit {fem}}(n)\lor \lnot {\textit {fem}}(e)\\{\textit {dau}}(e,t)\lor \lnot {\textit {par}}(h,m)\lor \lnot {\textit {par}}(h,t)\lor \lnot {\textit {par}}(g,m)\lor \lnot {\textit {par}}(t,e)\lor \lnot {\textit {par}}(n,e)\lor \lnot {\textit {fem}}(h)\lor \lnot {\textit {fem}}(m)\lor \lnot {\textit {fem}}(n)\lor \lnot {\textit {fem}}(e)\end{aligned}}$ ,
Anti-unify each compatible ^[22] pair ^[23] of literals:
- ${\textit {dau}}(x_{me},x_{ht})$ from ${\textit {dau}}(m,h)$ and ${\textit {dau}}(e,t)$ ,
- $\lnot {\textit {par}}(x_{ht},x_{me})$ from $\lnot {\textit {par}}(h,m)$ and $\lnot {\textit {par}}(t,e)$ ,
- $\lnot {\textit {fem}}(x_{me})$ from $\lnot {\textit {fem}}(m)$ and $\lnot {\textit {fem}}(e)$ ,
- $\lnot {\textit {par}}(g,m)$ from $\lnot {\textit {par}}(g,m)$ and $\lnot {\textit {par}}(g,m)$ , similar for all other background-knowledge literals
- $\lnot {\textit {par}}(x_{gt},x_{me})$ from $\lnot {\textit {par}}(g,m)$ and $\lnot {\textit {par}}(t,e)$ , and many more negated literals
Delete all negated literals containing variables that don't occur in a positive literal:
- after deleting all negated literals containing other variables than $x_{me},x_{ht}$ , only ${\textit {dau}}(x_{me},x_{ht})\lor \lnot {\textit {par}}(x_{ht},x_{me})\lor \lnot {\textit {fem}}(x_{me})$ remains, together with all ground literals from the background knowledge
Convert clauses back to Horn form:
- ${\textit {dau}}(x_{me},x_{ht})\leftarrow {\textit {par}}(x_{ht},x_{me})\land {\textit {fem}}(x_{me})\land ({\text{all background knowledge facts}})$

The resulting Horn clause is the hypothesis $h$ obtained by the rlgg approach. Ignoring the background knowledge facts, the clause informally reads " $x_{me}$ is called a daughter of $x_{ht}$ if $x_{ht}$ is the parent of $x_{me}$ and $x_{me}$ is female", which is a commonly accepted definition.

Concerning the above requirements, "Necessity" was satisfied because the predicate $dau$ doesn't appear in the background knowledge, which hence cannot imply any property containing this predicate, such as the positive examples are. "Sufficiency" is satisfied by the computed hypothesis $h$ , since it, together with ${\textit {par}}(h,m)\land {\textit {fem}}(m)$ from the background knowledge, implies the first positive example ${\textit {dau}}(m,h)$ , and similarly $h$ and ${\textit {par}}(t,e)\land {\textit {fem}}(e)$ from the background knowledge implies the second positive example ${\textit {dau}}(e,t)$ . "Weak consistency" is satisfied by $h$ , since $h$ holds in the (finite) Herbrand structure described by the background knowledge; similar for "Strong consistency".

The common definition of the grandmother relation, viz. ${\textit {gra}}(x,z)\leftarrow {\textit {fem}}(x)\land {\textit {par}}(x,y)\land {\textit {par}}(y,z)$ , cannot be learned using the above approach, since the variable $y$ occurs in the clause body only; the corresponding literals would have been deleted in the 4th step of the approach. To overcome this flaw, that step has to be modified such that it can be parametrized with different literal post-selection heuristics. Historically, the GOLEM implementation is based on the rlgg approach.

Inductive Logic Programming system

An inductive logic programming system is a program that takes as an input logic theories $B,E^{+},E^{-}$ and outputs a correct hypothesis $H$ with respect to theories $B,E^{+},E^{-}$ . Search-based ILP systems consist of two parts: hypothesis search and hypothesis selection. First a hypothesis is searched with an inductive logic programming procedure, then a subset of the found hypotheses (in most systems one hypothesis) is chosen by a selection algorithm. A selection algorithm scores each of the found hypotheses and returns the ones with the highest score. An example of score function include minimal compression length where a hypothesis with a lowest Kolmogorov complexity has the highest score and is returned. An ILP system is complete if and only if for any input logic theories $B,E^{+},E^{-}$ any correct hypothesis $H$ with respect to these input theories can be found with its hypothesis search procedure.

Hypothesis search

The ILP systems Progol,^[11] Hail ^[24] and Imparo ^[25] find a hypothesis $H$ using the principle of the inverse entailment^[11] for theories $B$ , $E$ , $H$ : $B\land H\models E\iff B\land \neg E\models \neg H$ . First they construct an intermediate theory $F$ called a bridge theory satisfying the conditions $B\land \neg E\models F$ and $F\models \neg H$ . Then as $H\models \neg F$ , they generalize the negation of the bridge theory $F$ with anti-entailment.^[26] However, the operation of anti-entailment is computationally more expensive since it is highly nondeterministic. Therefore, an alternative hypothesis search can be conducted using the operation of the inverse subsumption (anti-subsumption) instead which is less non-deterministic than anti-entailment.

Questions of completeness of a hypothesis search procedure of specific ILP system arise. For example, Progol's hypothesis search procedure based on the inverse entailment inference rule is not complete by Yamamoto's example.^[27] On the other hand, Imparo is complete by both anti-entailment procedure ^[28] and its extended inverse subsumption ^[29] procedure.

Implementations

1BC and 1BC2: first-order naive Bayesian classifiers:
ACE (A Combined Engine)
Aleph
Atom Archived 2014-03-26 at the Wayback Machine
Claudien^{[permanent dead link]}
DL-Learner
DMax
FastLAS (Fast Learning from Answer Sets)
FOIL (First Order Inductive Learner)
Golem
ILASP (Inductive Learning of Answer Set Programs)
Imparo^[28]
Inthelex (INcremental THEory Learner from EXamples) Archived 2011-11-28 at the Wayback Machine
Lime
Metagol
Mio
MIS (Model Inference System) by Ehud Shapiro
PROGOL
RSD
Warmr (now included in ACE)
ProGolem ^[30]^[31]

Probabilistic inductive logic programming

Probabilistic inductive logic programming adapts the setting of inductive logic programming to learning probabilistic logic programs. It can be considered as statistical relational learning within the formalism of probabilistic logic programming.^[32]^[1]

Given

background knowledge as a probabilistic logic program $B$ , and
a set of positive and negative examples ${\textstyle E^{+}}$ and ${\textstyle E^{-}}$

the goal of probabilistic inductive logic programming is to find a probabilistic logic program ${\textstyle H}$ such that the probability of positive examples according to ${\textstyle {H\cup B}}$ is maximized and the probability of negative examples is minimized.^[1]

This problem has two variants: parameter learning and structure learning. In the first, one is given the structure (the clauses) of $H$ and the goal is to infer the probabilities annotations of the given clauses, while in the second the goal was to infer both the structure and the probability parameters of $H$ . Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.^[1]

Parameter Learning

Parameter learning for languages following the distribution semantics has been performed by using an expectation-maximisation algorithm or by gradient descent. An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed. Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.^[1]

Structure Learning

Structure learning was pioneered by Daphne Koller and Avi Pfeffer in 1997,^[33] where the authors learn the structure of first-order rules with associated probabilistic uncertainty parameters. Their approach involves generating the underlying graphical model in a preliminary step and then applying expectation-maximisation.^[1]

In 2008, De Raedt et al. presented an algorithm for performing theory compression on ProbLog programs, where theory compression means removing as many clauses as possible from the theory in order to maximize the probability. No new clause can be added to the theory.^[1]

In the same year, Meert, W. et al. introduced a method for learning parameters and structure of ground probabilistic logic programs by considering the Bayesian networks equivalent to them and applying techniques for learning Bayesian networks.^[34]^[1]

ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming system FOIL with ProbLog. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.^[35]^[1]

In 2011, Elena Bellodi and Fabrizio Riguzzi introduced SLIPCASE, which performs a beam search among probabilistic logic programs by iteratively refining probabilistic theories and optimizing the parameters of each theory using expectation-maximisation.^[36] Its extension SLIPCOVER, proposed in 2014, uses bottom clauses generated as in Progol to guide the refinement process, thus reducing the number of revisions and exploring more effectively the search space. Moreover, SLIPCOVER separates the search for promising clauses from that of the theory: the space of clauses is explored with a beam search, while the space of theories is searched greedily.^[37]^[1]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Spinger. pp. 174–177. ISBN 978-3-540-62927-6. Cite error: The named reference ":0" was defined multiple times with different content (see the help page).
^ Plotkin, G.D. (1970). Automatic Methods of Inductive Inference (PDF) (PhD). University of Edinburgh. hdl:1842/6656.
^ Shapiro, Ehud Y. (1981). Inductive inference of theories from facts (PDF) (Technical report). Department of Computer Science, Yale University. 192. Reprinted in Lassez, J.-L.; Plotkin, G., eds. (1991). Computational logic : essays in honor of Alan Robinson. MIT Press. pp. 199–254. ISBN 978-0-262-12156-9.
^ Shapiro, Ehud Y. (1981). "The model inference system" (PDF). Proceedings of the 7th international joint conference on Artificial intelligence. Vol. 2. Morgan Kaufmann. p. 1064.
^ Shapiro, Ehud Y. (1983). Algorithmic program debugging. MIT Press. ISBN 0-262-19218-7.
^ Muggleton, S.H.; Buntine, W. (1988). "Machine invention of first-order predicate by inverting resolution". Proceedings of the 5th International Conference on Machine Learning. pp. 339–352. doi:10.1016/B978-0-934613-64-4.50040-2. ISBN 978-0-934613-64-4.
^ Quinlan, J. R. (August 1990). "Learning logical definitions from relations". Machine Learning. 5 (3): 239–266. doi:10.1007/bf00117105. ISSN 0885-6125.
^ ^a ^b ^c Nienhuys-Cheng, Shan-hwei; Wolf, Ronald de (1997). Foundations of inductive logic programming. Lecture notes in computer science Lecture notes in artificial intelligence. Berlin Heidelberg: Spinger. pp. 354–358. ISBN 978-3-540-62927-6.
^ Muggleton, Stephen H.; Feng, Cao (1990). Arikawa, Setsuo; Goto, Shigeki; Ohsuga, Setsuo; Yokomori, Takashi (eds.). "Efficient Induction of Logic Programs". Algorithmic Learning Theory, First International Workshop, ALT '90, Tokyo, Japan, October 8-10, 1990, Proceedings. Springer/Ohmsha: 368–381.
^ ^a ^b Cropper, Andrew; Dumančić, Sebastijan (2022-06-15). "Inductive Logic Programming At 30: A New Introduction". Journal of Artificial Intelligence Research. 74: 808. doi:10.1613/jair.1.13507. ISSN 1076-9757.
^ ^a ^b ^c Muggleton, S.H. (1995). "Inverting entailment and Progol". New Generation Computing. 13 (3–4): 245–286. CiteSeerX 10.1.1.31.1630. doi:10.1007/bf03037227. S2CID 12643399.
^ Džeroski, Sašo (2001), Džeroski, Sašo; Lavrač, Nada (eds.), "Relational Data Mining Applications: An Overview", Relational Data Mining, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 339–364, doi:10.1007/978-3-662-04599-2_14, ISBN 978-3-642-07604-6, retrieved 2023-11-27
^ De Raedt, Luc (2008), Logical and Relational Learning, Cognitive Technologies, Berlin, Heidelberg: Springer, p. 14, Bibcode:2008lrl..book.....D, doi:10.1007/978-3-540-68856-3, ISBN 978-3-540-20040-6
^ Muggleton, Stephen H.; Lin, Dianhuan; Pahlavi, Niels; Tamaddoni-Nezhad, Alireza (2013-05-01). "Meta-interpretive learning: application to grammatical inference". Machine Learning. 94 (1): 25–49. doi:10.1007/s10994-013-5358-3. ISSN 0885-6125.
^ Cropper, Andrew; Dumančić, Sebastijan; Evans, Richard; Muggleton, Stephen (2022). "Inductive logic programming at 30". Machine Learning. 111 (1): 147–172. doi:10.1007/s10994-021-06089-1. ISSN 0885-6125.
^ ^a ^b ^c ^d Cropper, Andrew; Dumančić, Sebastijan (2022-06-15). "Inductive Logic Programming At 30: A New Introduction". Journal of Artificial Intelligence Research. 74: 779–782. doi:10.1613/jair.1.13507. ISSN 1076-9757.
^ Džeroski, Sašo (1996). "Inductive Logic Programming and Knowledge Discovery in Databases" (PDF). In Fayyad, U.M.; Piatetsky-Shapiro, G.; Smith, P.; Uthurusamy, R. (eds.). Advances in Knowledge Discovery and Data Mining. MIT Press. pp. 117–152 See §5.2.4. Archived from the original (PDF) on 2021-09-27. Retrieved 2021-09-27.
^ De Raedt, Luc (1997). "Logical settings for concept-learning". Artificial Intelligence. 95 (1): 187–201. doi:10.1016/S0004-3702(97)00041-6.
^ ^a ^b Muggleton, Stephen (1999). "Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic". Artificial Intelligence. 114 (1–2): 283–296. doi:10.1016/s0004-3702(99)00067-3.; here: Sect.2.1
^ Plotkin, Gordon D. (1970). Meltzer, B.; Michie, D. (eds.). "A Note on Inductive Generalization". Machine Intelligence. 5: 153–163. ISBN 978-0-444-19688-0.
^ Plotkin, Gordon D. (1971). Meltzer, B.; Michie, D. (eds.). "A Further Note on Inductive Generalization". Machine Intelligence. 6. Edinburgh University Press: 101–124. ISBN 978-0-85224-195-0.
^ i.e. sharing the same predicate symbol and negated/unnegated status
^ in general: $n$ -tuple when $n$ positive example literals are given
^ Ray, O.; Broda, K.; Russo, A.M. (2003). "Hybrid abductive inductive learning". Proceedings of the 13th international conference on inductive logic programming. LNCS. Vol. 2835. Springer. pp. 311–328. CiteSeerX 10.1.1.212.6602. doi:10.1007/978-3-540-39917-9_21. ISBN 978-3-540-39917-9.
^ Kimber, T.; Broda, K.; Russo, A. (2009). "Induction on failure: learning connected Horn theories". Proceedings of the 10th international conference on logic programing and nonmonotonic reasoning. LNCS. Vol. 575. Springer. pp. 169–181. doi:10.1007/978-3-642-04238-6_16. ISBN 978-3-642-04238-6.
^ Yamamoto, Yoshitaka; Inoue, Katsumi; Iwanuma, Koji (2012). "Inverse subsumption for complete explanatory induction" (PDF). Machine Learning. 86: 115–139. doi:10.1007/s10994-011-5250-y. S2CID 11347607.
^ Yamamoto, Akihiro (1997). "Which hypotheses can be found with inverse entailment?". International Conference on Inductive Logic Programming. Lecture Notes in Computer Science. Vol. 1297. Springer. pp. 296–308. CiteSeerX 10.1.1.54.2975. doi:10.1007/3540635149_58. ISBN 978-3-540-69587-5.
^ ^a ^b Kimber, Timothy (2012). Learning definite and normal logic programs by induction on failure (PhD). Imperial College London. ethos 560694. Archived from the original on 2022-10-21. Retrieved 2022-10-21.
^ Toth, David (2014). "Imparo is complete by inverse subsumption". arXiv:1407.3836 [cs.AI].
^ Muggleton, Stephen; Santos, Jose; Tamaddoni-Nezhad, Alireza (2009). "ProGolem: a system based on relative minimal generalization". International Conference on Inductive Logic Programming. Springer. pp. 131–148. CiteSeerX 10.1.1.297.7992. doi:10.1007/978-3-642-13840-9_13. ISBN 978-3-642-13840-9.
^ Santos, Jose; Nassif, Houssam; Page, David; Muggleton, Stephen; Sternberg, Mike (2012). "Automated identification of features of protein-ligand interactions using Inductive Logic Programming: a hexose binding case study". BMC Bioinformatics. 13: 162. doi:10.1186/1471-2105-13-162. PMC 3458898. PMID 22783946.
^ De Raedt, Luc; Kersting, Kristian (2008), "Probabilistic Inductive Logic Programming", Probabilistic Inductive Logic Programming, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 1–27, ISBN 978-3-540-78651-1, retrieved 2023-12-09
^ Koller, Daphne; Pfeffer, Avi (August 1997). Learning probabilities for noisy first-order rules (PDF). IJCAI.
^ Blockeel, Hendrik; Meert, Wannes, "Towards Learning Non-recursive LPADs by Transforming Them into Bayesian Networks", Inductive Logic Programming, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 94–108, ISBN 978-3-540-73846-6, retrieved 2023-12-09
^ De Raedt, Luc; Thon, Ingo (2011), Frasconi, Paolo; Lisi, Francesca A. (eds.), "Probabilistic Rule Learning", Inductive Logic Programming, vol. 6489, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 47–58, doi:10.1007/978-3-642-21295-6_9, ISBN 978-3-642-21294-9, retrieved 2023-12-09
^ Bellodi, Elena; Riguzzi, Fabrizio (2012), "Learning the Structure of Probabilistic Logic Programs", Inductive Logic Programming, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 61–75, ISBN 978-3-642-31950-1, retrieved 2023-12-09
^ Bellodi, Elena; Riguzzi, Fabrizio (2014-01-15). "Structure learning of probabilistic logic programs by searching the clause space". Theory and Practice of Logic Programming. 15 (2): 169–212. doi:10.1017/s1471068413000689. ISSN 1471-0684.

This article incorporates text from a free content work. Licensed under CC-BY 4.0 (license statement/permission). Text taken from A History of Probabilistic Inductive Logic Programming, Fabrizio Riguzzi, Elena Bellodi and Riccardo Zese, Frontiers Media.

@@ Line 114: / Line 114: @@
 * Warmr (now included in ACE)
 * [http://ilp.doc.ic.ac.uk/ProGolem/ ProGolem] <ref>{{cite book |last1=Muggleton |first1=Stephen |last2=Santos |first2=Jose |last3=Tamaddoni-Nezhad |first3=Alireza |chapter=ProGolem: a system based on relative minimal generalization |citeseerx=10.1.1.297.7992 |chapter-url=https://link.springer.com/chapter/10.1007/978-3-642-13840-9_13 |title=International Conference on Inductive Logic Programming  |publisher=Springer |date=2009 |isbn=978-3-642-13840-9 |pages=131–148 |doi=10.1007/978-3-642-13840-9_13 |url=}}</ref><ref>{{cite journal|last1=Santos|first1=Jose|last2=Nassif|first2=Houssam|last3=Page|first3=David|last4=Muggleton|first4=Stephen|last5=Sternberg|first5=Mike|title=Automated identification of features of protein-ligand interactions using Inductive Logic Programming: a hexose binding case study|journal=BMC Bioinformatics|date=2012|volume=13|page=162|doi=10.1186/1471-2105-13-162|pmid=22783946|pmc=3458898 |doi-access=free }}</ref>
+== Probabilistic inductive logic programming ==
+Probabilistic inductive logic programming adapts the setting of inductive logic programming to learning [[Probabilistic logic programming|probabilistic logic programs]]. It can be considered as [[statistical relational learning]] within the formalism of probabilistic logic programming.<ref>{{Citation |last=De Raedt |first=Luc |title=Probabilistic Inductive Logic Programming |date=2008 |url=http://dx.doi.org/10.1007/978-3-540-78652-8_1 |work=Probabilistic Inductive Logic Programming |pages=1–27 |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-78651-1 |last2=Kersting |first2=Kristian}}</ref><ref name=":0">{{Cite journal |last=Riguzzi |first=Fabrizio |last2=Bellodi |first2=Elena |last3=Zese |first3=Riccardo |date=2014-09-18 |title=A History of Probabilistic Inductive Logic Programming |url=http://journal.frontiersin.org/article/10.3389/frobt.2014.00006/ |journal=Frontiers in Robotics and AI |volume=1 |doi=10.3389/frobt.2014.00006 |issn=2296-9144}}</ref>
+Given
+# background knowledge as a probabilistic logic program {{Mvar|B}}, and
+# a set of positive and negative examples <math display="inline">E^{+}</math> and <math display="inline">E^{-}</math>
+the goal of probabilistic inductive logic programming is to find a probabilistic logic program <math display="inline">H</math> such that the probability of positive examples according to <math display="inline">{H \cup B}</math> is maximized and the probability of negative examples is minimized.<ref name=":0" />
+This problem has two variants: parameter learning and structure learning. In the first, one is given the structure (the clauses) of {{Mvar|H}} and the goal is to infer the probabilities annotations of the given clauses, while in the second the goal was to infer both the structure and the probability parameters of {{Mvar|H}}. Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.<ref name=":0" />
+=== Parameter Learning ===
+Parameter learning for languages following the distribution semantics has been performed by using an [[expectation-maximisation algorithm]] or by [[gradient descent]].
+An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed.
+Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.<ref name=":0" />
+=== Structure Learning ===
+Structure learning was pioneered by  [[Daphne Koller]] and Avi Pfeffer in 1997,<ref>{{Cite conference |last=Koller |first=Daphne |last2=Pfeffer |first2=Avi |date=August 1997 |title=Learning probabilities for noisy first-order rules |url=http://www.robotics.stanford.edu/~koller/Papers/Koller+Pfeffer:IJCAI97.pdf |conference=[[IJCAI]]}}</ref> where the authors learn the structure of [[First-order logic|first-order]] rules with associated probabilistic uncertainty parameters. Their approach involves generating the underlying [[graphical model]] in a preliminary step and then applying expectation-maximisation.<ref name=":0" />
+In 2008, [[Luc De Raedt|De Raedt]] et al.  presented an algorithm for performing [[theory compression]] on [[ProbLog]] programs, where theory compression means removing as many clauses as possible from the theory in order to maximize the probability. No new clause can be added to the theory.<ref name=":0" />
+In the same year, Meert, W. et al. introduced a method for learning parameters and structure of [[Ground term|ground]] probabilistic logic programs by considering the [[Bayesian network|Bayesian networks]] equivalent to them and applying techniques for learning Bayesian networks.<ref>{{Citation |last=Blockeel |first=Hendrik |title=Towards Learning Non-recursive LPADs by Transforming Them into Bayesian Networks |url=http://dx.doi.org/10.1007/978-3-540-73847-3_16 |work=Inductive Logic Programming |pages=94–108 |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-73846-6 |last2=Meert |first2=Wannes}}</ref><ref name=":0" />
+ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming system [[First-order inductive learner|FOIL]]  with [[ProbLog]]. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.<ref>{{Citation |last=De Raedt |first=Luc |title=Probabilistic Rule Learning |date=2011 |url=http://link.springer.com/10.1007/978-3-642-21295-6_9 |work=Inductive Logic Programming |volume=6489 |pages=47–58 |editor-last=Frasconi |editor-first=Paolo |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-21295-6_9 |isbn=978-3-642-21294-9 |last2=Thon |first2=Ingo |editor2-last=Lisi |editor2-first=Francesca A.}}</ref><ref name=":0" />
+In 2011, Elena Bellodi and Fabrizio Riguzzi introduced SLIPCASE, which performs a beam search among probabilistic logic programs by iteratively refining probabilistic theories and optimizing the parameters of each theory using expectation-maximisation.<ref>{{Citation |last=Bellodi |first=Elena |title=Learning the Structure of Probabilistic Logic Programs |date=2012 |url=http://dx.doi.org/10.1007/978-3-642-31951-8_10 |work=Inductive Logic Programming |pages=61–75 |access-date=2023-12-09 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-642-31950-1 |last2=Riguzzi |first2=Fabrizio}}</ref>
+Its extension SLIPCOVER, proposed in 2014, uses bottom clauses generated as in [[Progol]] to guide the refinement process, thus reducing the number of revisions and exploring more effectively the search space. Moreover, SLIPCOVER separates the search for promising clauses from that of the theory: the space of clauses is explored with a [[beam search]], while the space of theories is searched [[Greedy search|greedily]].<ref>{{Cite journal |last=Bellodi |first=Elena |last2=Riguzzi |first2=Fabrizio |date=2014-01-15 |title=Structure learning of probabilistic logic programs by searching the clause space |url=http://dx.doi.org/10.1017/s1471068413000689 |journal=Theory and Practice of Logic Programming |volume=15 |issue=2 |pages=169–212 |doi=10.1017/s1471068413000689 |issn=1471-0684}}</ref><ref name=":0" />
 ==See also==
@@ Line 126: / Line 155: @@
 == References ==
 {{reflist}}
+{{Free-content attribution|author=Fabrizio Riguzzi, Elena Bellodi and Riccardo Zese |date=2014-09-18 |title=A History of Probabilistic Inductive Logic Programming |documentURL=http://journal.frontiersin.org/article/10.3389/frobt.2014.00006/ |publisher=[[Frontiers Media]]|license=CC-BY 4.0|license statement URL = http://creativecommons.org/licenses/by/4.0/}}
 ==Further reading==
 {{Refbegin}}