Narrowing of algebraic value sets: Difference between revisions

Content deleted Content added

Inline

Revision as of 08:18, 5 May 2014

Like logic programming, narrowing^[1] ^[2] of algebraic value sets gives a method of reasoning about the values in unsolved or partially solved equations. Where logic programming relies on resolution, the algebra of value sets relies on narrowing rules. Narrowing rules allow the elimination of values from a solution set which are inconsistent with the equations being solved.

Unlike logic programming, narrowing of algebraic value sets makes no use of backtracking. Instead all values are contained in value sets, and are considered in parallel.

The approach is also similar to the use of constraints^[3] in constraint logic programming, but without the logic processing basis.

Probabilistic value sets is a natural extension of value sets to classical probability. The value set construct holds the information required to calculate probabilities of calculated values based on probabilities of initial values.

Introduction

In mathematics an expression represents a single value. A function maps one or more values to one unique value.

Inverses of functions are not always well defined as functions. Sometimes extra conditions are required to make an inverse of a function fit the definition of a function.

Some Boolean operations, in particular do not have inverses that may be defined as functions. In particular the disjunction "or"" has inverses that allow two values. In natural language "or" represents alternate possibilities.

Resolution in logic provides a mechanism for dealing with the problems with Boolean operators. It is the basis for the Logic programming languages. However it is a specific solution, related to logic. It is not a general solution to the problem of inverses of functions that are not functions.

Constraint logic programming extends logic programming, by supporting constraints. Constraint logic programming languages such as ECLiPSe provide the ability to solve complex logic problems. However ECLiPSe is not lazy.

Lazy evaluation is an essential feature of modern functional programming languages such as Haskell. Haskell is the latest in a series of languages based on lambda calculus and let expressions. These languages provide rich functionality through lazy evaluation, and a polymorphic type system using type inference. Functional programming languages also naturally support higher-order functions.

Logic programming languages lack the flexibility and power of functional languages, and are not readily extended in the same ways that functional programming languages have, because of their dependence on resolution.

Multiple values

A general solution to the problem of inverses of functions that are not functions is needed. What is required is a representation of a value that is constrained to be a member of a set of values. A let expression may be used to represent a value that is a member of a set,

\operatorname {let} x\in X\operatorname {in} x

In this expression $x\in X$ is a constraint. A constraint is a Boolean expression that a variable must satisfy. The let expression allows the constraint be represented in an expression. If there was a general rule for function application of constraint expressions, then a constraint could be treated like a value.

Under function application, of one let expression to another,

(\operatorname {let} x\in X\operatorname {in} x)\ (\operatorname {let} y\in Y\operatorname {in} y)

=\operatorname {let} x\in X\land y\in Y\operatorname {in} x\ y

=\operatorname {let} (x,y)\in X\times Y\operatorname {in} x\ y

But a different rule applies for applying the let expression to itself.

(\operatorname {let} x\in X\operatorname {in} x)\ (\operatorname {let} x\in X\operatorname {in} x)

=\operatorname {let} x\in X\operatorname {in} x\ x

There appears no simple rule for combining let expressions. What is required is a general form of expression that represents a variable whose value is a member of a set of values. The expression should be based on the variable and the set.

Function application applied to this form should give another expression in the same form. In this way any expression on functions of multiple values may be treated as if it had one value.

It is not sufficient for the form to represent only the set of values. Each value must have a condition that determines when the expression takes the value. The resulting construct is a set of pairs of conditions and values, called a "value set".

Value sets

A "value set" K is defined as a set of pairs, each pair consisting of a value and a set of dependent conditions. The set of dependent conditions is used by the "condition function", to determine if the value set takes that value.

The condition function is defined by 3 axioms,

Each pair $(v,l)$ means that the value of the value set $V(K)$ is v if the condition function applied to the list, $C(l)$ , is true.
One of the conditions is true.
Only one of the conditions is true.

The condition is represented as a function applied to a set of dependent conditions, to allow the structure of the condition to be controlled. Also the set of conditions is used in narrowing by exclusion of dependent values. However for most purposes the value set may be thought of as a set of value, condition pairs. The condition function translates the set into the condition.

Formally,

Name	Definition
Condition function	$C(l)=({\bigwedge _{(r,z,u)\in l}z=u}))=(\forall r\forall z\forall u(r,z,u)\in l\to z=u)$
Value condition	$\forall v\forall l((v,l)\in K\land C(l)\to v=V(K)$
Complete set	$\exists v\exists l(v,l)\in K\land C(l)$
Exclusion	$\forall v_{1}\forall l_{1}\forall v_{2}\forall l_{2}((v_{1},l_{1})\in K\land (v_{2},l_{2})\in K\land (v_{1},l_{1})\neq (v_{2},l_{2}))\to \neg (C(l_{1})\land C(l_{2}))$

Value function

Using the value condition and complete set axioms,

\exists v\exists l(v,l)\in K\land C(l)\land v=V(K)

As a let expression this becomes,

V(K)=\operatorname {let} (v,l)\in K\land C(l)\operatorname {in} v

Single value

The value set to represent a single value is,

k=V(\{(k,\{\})\})

The derivation is,

V(\{(k,\{\})\})

=\operatorname {let} (v,l)\in \{(k,\{\})\}\land C(l)\operatorname {in} v

=\operatorname {let} v=k\land C(\{\})\operatorname {in} v

=\operatorname {let} v=k\operatorname {in} v

=k

Element of a set

The value set to represent an element of a set is,

\forall x\forall X(\operatorname {let} x\in X\operatorname {in} x)=\operatorname {let} R=V(\{(w,\{(R,x,w)\}):w\in X\})\operatorname {in} R

This rather strange definition adds the value set in as part of the dependent condition. This is used in narrowing by exclusion of dependent values.

Note also that,

x=V(R)

is the value of the expression. Both R and x must be included in the dependent condition, because R identifies the value set to which the dependent condition belongs, and x provides the variable used to carry the value in the let expression.

If the addition of R to the dependent condition is ignored the expression takes on a simpler and more understandable form,

\forall x\forall X(\operatorname {let} x\in X\operatorname {in} x)=V(\{(w,\{(\_,x,w)\}):w\in X\})

The derivation is,

V(\{(w,\{(r,x,w):w\in X\})

=\operatorname {let} (v,l)\in \{(w,\{(r,x,w)\}):w\in X\}\land C(l)\operatorname {in} v

=\operatorname {let} v\in X\land ({\bigwedge _{(r,z,u)\in \{(\_,x,v)\}}z=u})\operatorname {in} v

=\operatorname {let} v\in X\land (\forall r\forall z\forall u(r,z,u)\in \{(\_,x,v)\}\to z=u)\operatorname {in} v

=\operatorname {let} v\in X\land x=v\operatorname {in} v

=\operatorname {let} x\in X\operatorname {in} x

Application of functions

Function application of value sets is given by,

V(M)\ V(N)=V(\{(m_{v}\ n_{v},m_{l}\cup n_{l}):(m_{v},m_{l})\in M\land (n_{v},n_{l})\in N\})

Derivation,

V(M)\ V(N)

=\operatorname {let} (m_{v},m_{l})\in M\land C(m_{l})\operatorname {in} m_{v})\ (\operatorname {let} (n_{v},n_{l})\in N\land C(n_{l})\operatorname {in} n_{v})

=\operatorname {let} (m_{v},m_{l})\in M\land (n_{v},n_{l})\in N\land C(n_{l})\land C(n_{l})\operatorname {in} m_{v}\ n_{v})

Then using,

C(m_{l})\land C(n_{l})

=({\bigwedge _{(z,u)\in m_{l}}z=u})\land ({\bigwedge _{(z,u)\in n_{l}}z=u})

=({\bigwedge _{(z,u)\in m_{l}\cup n_{l}}z=u})

=C(m_{l}\cup n_{l})

get,

=\operatorname {let} (m_{v},m_{l})\in M\land (n_{v},n_{l})\in N\land C(n_{l}\cup n_{l})\operatorname {in} m_{v}\ n_{v})

=\operatorname {let} (v,l)\in \{(m_{v}\ n_{v},m_{l}\cup n_{l}):(m_{v},m_{l})\in M\land (n_{v},n_{l})\in N\}\land C(l)\operatorname {in} v

=V(\{(m_{v}\ n_{v},m_{l}\cup n_{l}):(m_{v},m_{l})\in M\land (n_{v},n_{l})\in N\})

Exclusion

The exclusion is a rule that determines when conditions must be false,

V(M)\in s\iff (\forall v\forall l((v,l)\in M\land v\not \in s)\to \neg C(l))

This may be derived from,

V(M)\in s

\to \forall v\forall l((v,l)\in M\land C(l))\to (v=V[M]\land V(M)\in s)

\to \forall v\forall l((v,l)\in M\land C(l))\to v\in S

\to \forall v\forall l((v,l)\in M\land v\not \in S)\to \neg C(l)

Simplification

The simplification rule allows values whose condition is false to be dropped.

V(K)=V(\{(v,l):(v,l)\in K\land C(l)\}

Derivation

V(\{(v,l):(v,l)\in K\land C(l)\})

=\operatorname {let} (v,l)\in \{(v,l):(v,l)\in K\land C(l)\}\land C(l)\operatorname {in} v

=\operatorname {let} (v,l)\in K\land C(l)\land C(l)\operatorname {in} v

=\operatorname {let} (v,l)\in K\land C(l)\operatorname {in} v

=V(K)

Summary of results

Name	Rule
Value function	$V(K)=\operatorname {let} (v,l)\in K\land C(l)\operatorname {in} v$
Single value	$k=V(\{(k,\{\})\})$
Set element	$\forall x\forall X(\operatorname {let} x\in X\operatorname {in} x)=\operatorname {let} R=V(\{(w,\{(R,x,w)\}):w\in X\})\operatorname {in} R$
Function application	$V(M)\ V(N)=V(\{(m_{v}\ n_{v},m_{l}\cup n_{l}):(m_{v},m_{l})\in M\land (n_{v},n_{l})\in N\})$
Exclusion	$V(M)\in s\iff (\forall v\forall l((v,l)\in M\land v\not \in s)\to \neg C(l))$
Simplification	$V(K)=V(\{(v,l):(v,l)\in K\land C(l)\})$
Assert equal	$V(M)=V(N)\to (\forall v\forall l((v,l)\in N\land v\not \in S(M))\to \neg C(l))$

A value sets identity

By defining the application of functions to value sets the definition of equality of value sets has also been redefined. The old definition of equality still exists, because value sets are constructed as a set of pairs. Two sets are equal if they contain the same elements. This definition of equality for value sets is at best misleading.

What is needed is to use the name, or identity of the variable from which the value set is constructed as part of the structure of the value set. This would make value sets distinct, unless they are based on the same variable.

In mathematics, quantification is over values, not formulas. To proceed further with the exact definition of value sets, quantification over formulas is needed, in a way that allows the comparison of the identity of formulas. The distinction between the formula representing a value and the identity of the formula is the use–mention distinction. The notation,

\forall x\#u

is introduced to mean quantification over formula x where x refers to the value, as a use, and u refers to the identity of the formula as represented or mentioned

Using this notation the element of a set definition would be,

\forall x\#u\forall X(\operatorname {let} x\in X\operatorname {in} x)=\operatorname {let} R=V((u,\{(w,\{(R,x,w)\}):w\in X\}))\operatorname {in} R

Every reference to a value set would then need to be changed to take account of the extra level of structure in the value set, which would make the description harder to read. For the sake of readability this extra level of structure has been omitted from the definition of value sets.

Narrowing

"Narrowing" is determining when conditions for values must be false. Narrowing starts when the value of two value sets is asserted equal.

Narrowing by asserting equal

Assertion that two value sets are equal gives the narrowing rule, $\forall m_{v}\forall m_{l}\forall n_{v}\forall n_{l}((m_{v},m_{l})\in M\land (n_{v},n_{l})\in N)\to (m_{v}\neq n_{v}\to \neg (C(m_{l})\land C(n_{l})))$

For the derivation, start with,

V(M)=V(N)

The value condition gives,

$(\forall m_{v}\forall m_{l}((m_{v},m_{l})\in M\land C(m_{l})\to v=V(M))\land (\forall n_{v}\forall n_{l}((n_{v},k)\in N\land C(n_{l}))\to n_{v}=V(N))\land V(M)=V(N)$

$\forall m_{v}\forall m_{l}\forall n_{v}\forall n_{l}(((m_{v},m_{l})\in M\land C(m_{l}))\to m_{v}=V(M))\land (((n_{v},n_{l})\in N\land C(n_{l}))\to n_{v}=V(N))\land V(M)=V(N)$

$\forall m_{v}\forall m_{l}\forall n_{v}\forall n_{l}((m_{v},m_{l})\in M\land (n_{v},n_{l})\in N)\to (C(j)\land C(k)\to v=u)$

$\forall m_{v}\forall m_{l}\forall n_{v}\forall n_{l}((m_{v},m_{l})\in M\land (n_{v},n_{l})\in N)\to (m_{v}\neq n_{v}\to \neg (C(m_{l})\land C(n_{l})))$

Narrowing by conjunction

If any base condition is false, all the conditions obtained from it are false.

This comes from the definition of the Condition function,

C(l)=({\bigwedge _{(r,z,u)\in l}z=u}))

The base condition for (r, z, u) is,

C(\{(r,z,u)\}=(z=u)

So if this is false $C(l)$ is false.

Narrowing by crossed conditions

If a dependent condition list has two different base conditions from the same value set it must be false.

To derive this, start with the exclusion rule is,

\forall v_{1}\forall l_{1}\forall v_{2}\forall l_{2}((v_{1},l_{1})\in K\land (v_{2},l_{2})\in K\land (v_{1},l_{1})\neq (v_{2},l_{2}))\implies \neg (C(l_{1})\land C(l_{2}))

Then for any set of dependent conditions l,

((K,x,v_{1})\in l\ \land (K,x,v_{2})\in l\land v_{1}\neq v_{2}

\implies (v_{1},\{(K,x,v_{1})\})\in K\land (v_{2},\{(K,x,v_{2})\})\in K\land (v_{1},l_{1})\neq (v_{2},l_{2}))

\implies \neg (C(\{(K,x,v_{1})\})\land C(\{(K,x,v_{2})\}))

\implies \neg C(l)

So if a dependent condition list is based on two conditions from the same value set, the condition value of that dependent condition list is false.

Narrowing by exclusion of dependent values

Each value set puts a constraint on the base value set from which it is constructed. If a base values set includes values that are not present as dependent values in the value set, the conditions for these values must be false.

To derive this, start with the complete set rule,

\exists v\exists \ l(v,l)\in K\to C(l)

The condition function is,

C(l)=({\bigwedge _{(r,z,u)\in l}z=u}))

A particular dependent condition may be chosen, as being implied by the whole condition,

\forall L\ C(l)\to (L,z,u)\in l\to z=u

So

\forall L\exists v\exists l\ (v,l)\in K\to C(l)\to (L,z,u)\in l\to z=u

Here $z=V(L)$ . The expression may be rearranged to define the set of values that $V(L)$ might take,

E(K,L)=\{w:(v,l)\in K\land C(l)\land (L,z,w)\in l\}

and so,

V(L)\in E(K,L)

Then using the exclusion rule,

V(M)\in s\iff (\forall v\forall l((v,l)\in M\land v\not \in s)\to \neg C(l))

Gives,

(\forall K\forall L\forall v\forall l((v,l)\in L\land v\not \in E(K,L))\to \neg C(l))

This is the narrowing exclusion rule. $E(K,L)$ is the set of values in the base value L set which are represented in the value set K. Conditions for other values must be false.

Relational evaluation

An assertion is an expression that is deemed true. So an expression like,

x=5

is a relationship between the values x, 5, and true. Because x is unknown, the relationship defines the value of x as 5. For another example,

y+3=5

This may be seen as two relationships; a relationship between y, 3 and another value which we may call x, and a relationship between x, 5, and true. The second relationship tells us that x is 5, so the first relationship between y, 3 and 5, which gives y is 2.

y is 2 is used to represent that y has already been found to have the value 2, but $y=2$ is a relationship, that implies that y is 2.

A relationship has a function, and one or more inverse functions. This set of functions may be used to calculate the unknown value from the known values. What values may be calculated from given inputs depends on the relation.

If the value of a particular value is required, or requested, then a relationship including the variable may be found. That relationship may have more unknowns. These unknowns may then be requested, chaining back eventually to known values.

There are limitations to this strategy. If a variable for a function is given twice, perhaps a recursive function is defined. But in general where a variable appears multiple times, the strategy fails, and the equation may need to be solved by other means.

Notation

Moving away from the technical details of the derivations above, value sets lend themselves to some simple intuitions and rules when applied as a method for evaluating logical problems.

Consider,

x^{2}=4

This is a relationship between x, 2 and 4. If x is requested then the relationship defines the values for x. However in this case there are two values. By using a value set, these two values may be handled as one value. This is written as,

x=V(\{(-2,x_{1}),(2,x_{2})\})

$x_{1}$ and $x_{2}$ are base conditions. They are written this way instead of as $x=2$ and $x=-2$ , because there details do not matter. Only the identity, and that they belong to the variable x matters. So x is sub-scripted with unique indexes to represent this.

Also for readability the operator :: is introduced to represent the condition, value pair. Define,

(v,l)=v::l

The list is represented as a conjunction of base conditions then,

x=V(\{-2::x_{1},-2::x_{2}\})

Consider now,

x+x=V(\{2::x_{1},-2::x_{2}\})+V(\{2::x_{1},-2::x_{2}\})

Function application may then be interpreted as,

=V(\{-2+-2::x_{1}\land x_{1},-2+2::x_{1}\land x_{2},2+-2::x_{2}\land x_{1},2+2::x_{2}\})

=V(\{-4::x_{1},0::\operatorname {false} ,0::\operatorname {false} ,4::x_{2}\})

Here the condition list is represented by the base conditions combined with the "and" operator $\land$ .

Elements of the value set where the condition is false may be dropped using simplification giving,

=V(\{-4::x_{1},4::x_{2}\})

Application to Booleans

a\land b

Is a relationship between a, b and true that implies that both a and b must be true.

a\lor b

Allows multiple values for a and b. If a is,,

a=V(\{\operatorname {false} ::a_{1},\operatorname {true} ::a_{2}\})

then for b

b=V(\{\operatorname {true} ::a_{1},\operatorname {false} ::a_{2},\operatorname {true} ::a_{2}\})

This means that if a is false then b must be true.

Now consider,

x=2\lor x=-2

gives,

x=V(\{\_::a_{1},2::a_{2}\})

and

x=V(\{-2::a_{1},\_::a_{2},-2::a_{2}\})

unifying these two value sets gives,

x=V(\{-2::a_{1},2::a_{2}\})

The pair $-2::a_{2}$ was dropped because of the application of "assert equal" narrowing. Its value did not match with $2::a_{2}$ .

not is its own inverse function, and is a 1 to 1 mapping. But the negation of the and operator will introduce value sets, because of De Morgan's laws, turning and into or.

In logic programming Boolean values are given special treatment by the logic engine. Using value sets, this is unnecessary, as where inverses have multiple values, they may be contained within a value set. Relational evaluation treats all evaluation in the same manner and always attempts to evaluate unknown values from known values.

Evaluation strategy

Combinatorial explosion occurs when value sets are combined. This must be minimized. The product of the sizes of the value sets in the relation, other than the unknown value, gives an estimate of the size of the value set created for the unknown value.

As evaluating any relation may contribute to reduce the size of other value sets by narrowing, the best strategy is to evaluate expressions with the smallest value set size first, as this may lead to narrowing that reduces the size of other vale sets.

The evaluation should proceed on a list of relations for which values have been requested, ordered by the predicted size of the value set to be created. This predicted size will change over time leading to the need for periodic re-ordering. For value sets with one value, this strategy is lazy. For values sets with multiple values the strategy minimizes the combinatorial explosion.

Unlike constraint logic programming there is no backtracking required for this algorithm. All possible values are contained in value sets, and are considered at the same time. The lack of backtracking may of benefit, or a disadvantage, depending on the problem. The use of value sets forces the consideration of all values, which allows the full benefits of narrowing in reducing the size of value sets. However it forces the solving of the whole problem at once, which if a problem may be naturally divided up, may be a disadvantage.

Accessing the value set

Narrowing allows the elimination of values that do not satisfy a variables constraints. Considered as the basis for an algorithm for solving equations, this narrowing gives a set of values consistent with the constraints on a variable. However in mathematics there is no way to access this set of values.

If $E(x)$ is an expression constraining a variable x then the set of values that the variable may take is,

\{z:E(z)\}

Define the gset of x to be the set of values that satisfy the constraints on x. Consider defining gset as,

\operatorname {gset} (x)=\{z:E(z)\}

This definition depends on knowing the expression E, which is the condition giving all the constraints on x. Within mathematics E may not be obtained from x. So there is no mathematical function that may be applied to a variable to request the set of values. So may the gset function be added to mathematics?

Meta math definition

A meta-mathematical definition of gset may be possible. Imagine that what we know of as mathematics is actually implemented by a meta function called math. math takes an abstract syntax tree and gives meaning to the variables and mathematical structures and adds existential quantifiers for variables not explicitly quantified.

math would be an expression in a meta mathematical environment with its own variables. To distinguish these meta-variables from math variables represent them by capital letters and the mathematical variables by lower case letters.

Now suppose there is an extended implementation of mathematics implemented by the xmath function, defined as,

\operatorname {xmath} [E]=\forall M,\operatorname {let} T[M]=\operatorname {math} [E]\operatorname {in} T[\_]

Using xmath, gset may be defined by,

\forall x\#u,\exists N,\operatorname {gset} (x)=\operatorname {let} M[u]=x\operatorname {in} \{z:z=N[u]\land T[N]\}

Here again the notation,

\forall x\#u

is used to mean quantification over variables x where x refers to the value, and u refers to the unique identity of the variable.

Example

For example take the constraint expression $x^{2}=4$ . Then,

x^{2}=4

\land s=\operatorname {gset} (x)

\land (\forall x\#u,\exists N\operatorname {gset} (x)=\operatorname {let} M[u]=x\operatorname {in} \{z:z=N[u]\land T[N]\})

Then the xmath expression is,

\forall M,\operatorname {let} T[M]=

\exists x,\exists s,(x^{2}=4

\land s=\operatorname {gset} (x)

\land (\forall x\#u,\exists N,\operatorname {gset} (x)=\operatorname {let} M[u]=x\operatorname {in} \{z:z=N[u]\land T[N]\})

\operatorname {in} T[\_]

Then where u is the unique identity of the variable x, represented here as the number 1 (for the first variable used in a call to gset),

M[1]=x\land s=\{z:z=N[1]\land T[N]\}

Here $T[N]$ invokes T with M as N.

s=\{z:z=N[1]\land x^{2}=4\land x=N[1]\}

s=\{z:z^{2}=4\}

Probabilistic value sets

The value set records the dependent conditions that the condition function may be applied to in order to deduce the truth of the proposition that the value set has a particular value. The same structure may be used to give the probability of a value set being equal to a particular value. The condition function is,

C(l)=({\bigwedge _{(r,z,u)\in l}z=u}))

The probability function is,

P(l)=({\prod _{(r,z,u)\in l}P(z=u)}))

This is the probability of each base case holding the particular value, if the events are independent.

The probability function is defined by 3 axioms,

Each pair $(v,l)$ means that the probability of the value set $V(K)$ is v is the probability function applied to the list, $P(l)$ .
The sum of the probabilities over the whole value set is 1.
The probability of any two pairs in the value set is zero.

The probability function gives probabilities for results based on initial probabilities given by Boolean inductive inference.

Formally,

Name	Definition
Probability function	$P(l)=({\prod _{(r,z,u)\in l}P(z=u)}))$
Value condition	$\forall vP(v=V(K))=\sum _{(v,l)\in K}P(l)$
Complete set	$\sum _{(v,l)\in K}P(l)=1$
Allowed values	$\sum _{(v,l)\in K\land v\in \operatorname {gset} (V(K))}P(l)=1$
Exclusion	$\forall v_{1}\forall l_{1}\forall v_{2}\forall l_{2}((v_{1},l_{1})\in K\land (v_{2},l_{2})\in K\land (v_{1},l_{1})\neq (v_{2},l_{2}))\to P(C(l_{1})\land C(l_{2}))=0$

Probabilities for each value in a value set may be calculated from probabilities in base value sets using the probability function and the value condition. Base value sets are either for a single value, or multiple value value set.

Probability for a single value

The value set to represent a single value is,

k=V(\{(k,\{\})\})

The complete set rule is,

\sum _{(v,l)\in K}P(l)

=\sum _{(v,l)\in \{(k,\{\})\}}P(l)

=P(\{\})

=1

Which is consistent with the axiom.

Probabilities for multiple values

The value set to represent multiple values is,

\forall x\forall X(\operatorname {let} x\in X\operatorname {in} x)=V(\{(w,\{(\_,x,w)\}):w\in X\})

The probability is given by the allowed values rule,

\sum _{(v,l)\in K\land v\in \operatorname {gset} (V(K))}P(l)=1

which simplifies to,

\sum _{v\in \operatorname {gset} (V(K))}P(x=v)=1

If prior estimates of probabilities for values are given then they will be proportional to the posterior probabilities, if the value is in the value set.

\forall v\in \operatorname {gset} (V(K)),P(x=v)=w*P_{i}(x=v)

If the value is not in the value set the probabilities will be zero,

\forall v\not \in \operatorname {gset} (V(K)),P(x=v)=0

So,

\sum _{v\in \operatorname {gset} (V(K))}w*P_{i}(x=v)=1

w={\frac {1}{\sum _{v\in \operatorname {gset} (V(K))}P_{i}(x=v)}}

\forall v\in \operatorname {gset} (V(K)),P(x=v)={\frac {P_{i}(x=v)}{\sum _{v\in \operatorname {gset} (V(K))}P_{i}(x=v)}}

\forall v\not \in \operatorname {gset} (V(K)),P(x=v)=0

If the prior probabilities are all the same the probabilities are,

\forall v\in \operatorname {gset} (V(K)),P(x=v)={\frac {1}{|\operatorname {gset} (V(K))|}}

Probabilities of general value sets

A general value set is created out of the application of base value sets. The value condition rule and the probability function may be combined to give,

\forall vP(v=V(K))=\sum _{(v,l)\in K\land v\in \operatorname {gset} (V(K))}({\prod _{(r,z,u)\in l}P(z=u)}))

Boolean inductive inference

Boolean induction^[4] starts with a target condition C which is a Boolean expression. Inductive inference is of the form,

A theory T implies the statement C. As the theory T is simpler than C, induction says that there is a probability that the theory T is implied by C.

The method used is based on Solomonoff's theory of inductive inference. ^[5] ^[6] ^[7]

Results

The expression obtained for the probability that the theory T is implied by the condion C is,

\forall t\in T(C),P(t\mid C)={\frac {\sum _{n:R(n)\equiv t}2^{-L(n)}}{\sum _{j\in T(C)}\sum _{m:R(m)\equiv j}2^{-L(m)}}}

A second version of the result derived from the first, by eliminating theories without predictive power is,

\forall t\in T(C),P(t\mid C)={\frac {P(F(t,c))}{P(F(C,c))+\sum _{j:j\in T(C)\land P(F(j,c))>P(F(C,c))}P(E(j,c))}}

P(\operatorname {random} (C)\mid C)={\frac {P(F(t,c))}{P(F(C,c))+\sum _{j:j\in T(C)\land P(F(j,c))>P(F(C,c))}P(F(j,c))}}

where,

\forall t,P(F(t,c))=\sum _{n:R(n)\equiv t\land L(n)<L(c)}2^{-L(n)}

Implication and condition probability

The following theorem will be required in the derivation of Boolean induction. Implication is related to conditional probability by the following law,

A\to B\iff P(B\mid A)=1

Derivation,

A\to B

\iff P(A\to B)=1

\iff P(A\land B\lor \neg A)=1

\iff P(A\land B)+P(\neg A)=1

\iff P(A\land B)=P(A)

\iff P(A)\cdot P(B\mid A)=P(A)

\iff P(B\mid A)=1

Probability of a bit string

The principle of indifference is not valid in the real world. Dice may be loaded, and coins not fair. Knowledge of the physical world must be used to determine if the coin is fair.

However in the mathematical world indifference may be used. One bit of information has equal probability of being 1 or 0. Then each bit string has a probability of occurring,

P(a)=2^{-L(a)}

where L(a) is the length of the bit string a.

Parser for mathematics

The parser for mathematics, represented by R is a function that takes a bit string and converts it into a logical condition.

The way the bit string is encoded will determine the probability of Boolean expressions. The encoding should be chosen to give the shortest representation of an expression. Consideration of other encodings is possible. For the purposes of Boolean induction, the encoding given here is considered the best, and no other encoding is necessary. Theoretically this is an approximation.

Each expression is the application of a function to its parameters. The number of bits used to describe a function call is,

The size of the encoding of the function id.
The sum of the sizes of the expressions for each parameter.

A constant is encoded as a function with no parameters.

The function id is encoded with the prefix code based on the number of uses of the function id in the expression.

Distribution of natural numbers

No explicit representation of natural numbers is given. However natural numbers may be constructed by applying the successor function to 0, and then applying other arithmetic functions. A distribution of natural numbers is implied by this, based on the complexity of each number.

Model for constructing worlds

A model of how worlds are constructed is used in determining the probabilities of theories,

A random bit string is selected.
A condition is constructed from the bit string, using the mathematical parser.
A world is constructed that is consistent with the condition.

If w is the bit string then the world is created such that $R(w)$ is true. An intelligent agent has some facts about the word, represented by the bit string c, which gives the condition,

C=R(c)

The set of bit strings consistent with any condition x is $E(x)$ .

\forall x,E(x)=\{w:R(w)\equiv x\}

A theory is a simpler condition that explains (or implies) C. The set of all such theories is called T,

T(C)=\{t:t\to C\}

Applying Bayes' theorem

The next step is to apply the extended form of Bayes' theorem

P(A_{i}\mid B)={\frac {P(B\mid A_{i})\,P(A_{i})}{\sum \limits _{j}P(B\mid A_{j})\,P(A_{j})}}\cdot

where,

B=E(C)

A_{i}=E(t)

To apply Bayes' theorem the following must hold,

$A_{i}$ is a partition of the event space.

For $T(C)$ to be a partition, no bit string n may belong to two theories. To prove this assume they can and derive a contradiction,

N\in T\land N\in M\land N\neq M\land n\in E(N)\land n\in E(M)

\implies N\neq M\land R(n)\equiv N\land R(n)\equiv M

\implies \operatorname {false}

Secondly prove that T includes all outcomes consistent with the condition. As all theories consistent with C are included then $R(w)$ must be in this set.

So Bayes theorem may be applied as specified giving,

\forall t\in T(C),P(E(t)\mid E(C))={\frac {P(E(t))\cdot P(E(C)\mid E(t))}{\sum _{j\in T(C)}P(E(j))\cdot P(E(C)\mid E(j))}}

Using the implication and condition probability law, the definition of $T(C)$ implies,

\forall t\in T(C),P(E(C)\mid E(t))=1

The probability of each theory in T is given by,

\forall t\in T(C),P(E(t))=\sum _{n:R(n)\equiv t}2^{-L(n)}

so,

\forall t\in T(C),P(E(t)\mid E(C))={\frac {\sum _{n:R(n)\equiv t}2^{-L(n)}}{\sum _{j\in T(C)}\sum _{m:R(m)\equiv j}2^{-L(m)})}}

Finally the probabilities of the events may be identified with the probabilities of the condition which the outcomes in the event satisfy,

\forall t\in T(C),P(E(t)\mid E(C))=P(t\mid C)

giving

\forall t\in T(C),P(t\mid C)={\frac {\sum _{n:R(n)\equiv t}2^{-L(n)}}{\sum _{j\in T(C)}\sum _{m:R(m)\equiv j}2^{-L(m)}}}

This is the probability of the theory t after observing that the condition C holds.

Removing theories without predictive power

Theories that are less probable than the condition C have no predictive power. Separate them out giving,

\forall t\in T(C),P(t\mid C)={\frac {P(E(t))}{(\sum _{j:j\in T(C)\land P(E(j))>P(E(C))}P(E(j)))+(\sum _{j:j\in T(C)\land P(E(j))\leq P(E(C))}P(j))}}

The probability of the theories without predictive power on C is the same as the probability of C. So,

P(E(C))=\sum _{j:j\in T(C)\land P(E(j))\leq P(E(C))}P(j)

So the probability

\forall t\in T(C),P(t\mid C)={\frac {P(E(t))}{P(E(C))+\sum _{j:j\in T(C)\land P(E(j))>P(E(C))}P(E(j))}}

and the probability of no prediction for C, written as $\operatorname {random} (C)$ ,

P(\operatorname {random} (C)\mid C)={\frac {P(E(C))}{P(E(C))+\sum _{j:j\in T(C)\land P(E(j))>P(E(C))}P(E(j))}}

The probability of a condition was given as,

\forall t,P(E(t))=\sum _{n:R(n)\equiv t}2^{-L(n)}

Bit strings for theories that are more complex than the bit string given to the agent as input have no predictive power. There probabilities are better included in the random case. To implement this a new definition is given as F in,

\forall t,P(F(t,c))=\sum _{n:R(n)\equiv t\land L(n)<L(c)}2^{-L(n)}

Using F, an improved version of the inductive probabilities is,

\forall t\in T(C),P(t\mid C)={\frac {P(F(t,c))}{P(F(C,c))+\sum _{j:j\in T(C)\land P(F(j,c))>P(F(C,c))}P(E(j,c))}}

P(\operatorname {random} (C)\mid C)={\frac {P(F(C,c))}{P(F(C,c))+\sum _{j:j\in T(C)\land P(F(j,c))>P(F(C,c))}P(F(j,c))}}

The relational model of state

Imperative programs are characterized by having implicit state. On the surface imperative programs are then very different from functional programs. However monads provide a way of introducing imperative commands into a functional language, by hiding the state in the monad object.

But monads do not give a completely natural way of representing imperative programming. They do not provide a general model for translating an imperative program into a functional language.

Transformations on imperative programs

Mathematical expressions may be transformed using mathematical laws to give equivalent expressions. The same ability is needed for imperative programs. A mathematical model for an imperative program would allow a change in the execution order of the program, without changing the overall effect of the program.

Other transformations of imperative programs are needed

Lambda Lifting
Transforming class functions from member functions to static functions.
De-entanglement of objects so that may run efficiently across different machines (using a message protocol).

Web development is based on a "stateless" model. In this model, each response by the web server is based on the web pages the server is serving and the response from client. The "state" of the users interaction with the web page must be handled by the web page in the browser.

For efficiency the server software does not want to construct all the objects in the object model, as this would require sending too much data in the client response. This leads to a need for static functions, that may work without constructing the object.

This is an implementation strategy, required for scalability, and interaction with standard web security protocols. But it may not be the best way describing the program so that it is literate or understandable by the people. Also there are legacy programs and fat client applications that may need to be transformed into web applications.

These problems, and many more, could be addressed if there was a model for transforming imperative programs using mathematical methods.

State holder

Using the relational evaluation model, a simple model of state transfer may be given. A statement in an imperative program has,

Input state
Input parameters
return value
Output state

It is natural in a functional to consider the input state as another parameter. But using the relational evaluation model this is not necessary. the relation model allows the output to be used in determining the input. For example in,

x\cdot 3=9

x is determined to be 3, when it is only input to the function.

Let the "state holder" be a triple consisting of,

Input state
return value
Output state

Because a state holder takes its input from its output, it does not behave in the same way as other mathematical objects. But with its own set of laws it may be included in a mathematical framework.

The advantage of using a state holder is that state may be transferred through any parameter, providing a full model of imperative programming. The state transfer is completely hidden, and there is a direct translation between imperative programs and state holder programs. State only needs to be transferred where needed.

In comparison monads provide only a linear transfer through primary parameters, but not through every parameter.

A disadvantage is that translating an imperative program to a a state holder form does not make it obey the fundamental axioms of equality. Different equality axioms are followed by state holders. Separate transformations must be applied to put the code in a functional form.

Expressions

For the purposes of describing state, lower case letters will represent symbols and upper case letters will represent variables.

Assignment

An assignment statement, represented by :=, is defined by,

\forall X\#U,\forall I,\forall R,\forall O,(X:=(I,R,O))=(I,R,O[U:=R]])

where,

X is a left expression.

I and O are states.

R is an expression.

O[U:=R]])

Is the state O with the value identified by the symbol X replaced by the value R.

The notation $\forall X\#U$ gives access to the identity of the variable X, in U. This identity is then used to look up the state.

State lookup

State lookup is represented by,

\forall X\#U,\forall I,[X]=(I,I[U],I])

This definition uses the identity of the variable X (as U) to lookup the state, and return the value for that identity.

In a typical imperative language, state lookup is implicit. But that would cause some problems in describing state in mathematics.

State transfer

The following definitions give a version of any function that manages state,

\forall I,\forall F,\forall X,\forall O,F\ (I,X,O)=(I,F\ X,O)

\forall I,\forall F,\forall X,\forall O,(I,F,O)\ X=(I,F\ X,O)

\forall I,\forall F,\forall X,\forall O,(I,F,M)\ (M,X,O)=(I,F\ X,O)

To avoid contradiction the standard definitions of functions would not apply to state holders. That is,

((I,X,M)=(M,X,O))\neq (I=M\land M=O)

instead it only is defined by,

((I,X,M)=(M,X,O))=(I,X=X,O)

This may be achieved given a class hierarchy. Standard equality may be defined for standard objects, but not for state holders.

State holder objects do not obey the substitution rule,

A=B\to f(A)=f(B)

This is a fundamental principle of mathematics. This rule must be suspended for state holders.

Example

Take for example,

#include <stdio.h>
long x;

long f()
{
  return (x = x + 5)+(x = x + 5);
}

void main()
{
  x = 2;
  long r = f();
  printf("x = %ld, r = %ld\n", x, r);
}

The function f may be represented as,

f=(x:=5+[x])+(x:=5+[x])

[x]=(I_{1},I_{1}[x],I_{1}])

[x]=(I_{2},I_{2}[x],I_{2}])

f=(x:=5+(I_{1},I_{1}[x],I_{1}]))+(x:=5+(I_{2},I_{2}[x],I_{2}]))

F(I,X,O)=(I,FX,O)

f=(x:=(I_{1},5+I_{1}[x],I_{1}]))+(x:=(I_{2},5+I_{2}[x],I_{2}]))

(X:=(I,R,O))=(I,R,O[X:=R]])

f=((I_{1},5+I_{1}[x],I_{1}[x:=5+I_{1}[x]]))+(I_{2},5+I_{2}[x],I_{2}[x:=5+I_{2}[x]])

(I,F,M)\ (M,X,O)=(I,F\ X,O)

f=((I_{1},5+I_{1}[x]+5+I_{2}[x],I_{2}[x:=5+I_{2}[x]])

where,

I_{2}=I_{1}[x:=5+I_{1}[x]]

Suppose in the initial state $I_{1}[x]=2$ . Then,

f=((I_{1},5+2+5+I_{1}[x:=5+2][x],I_{2}[x:=5+I_{2}[x]])

f=((I_{1},5+2+5+5+2,I_{2}[x:=5+7])

f=((I_{1},19,I_{2}[x:=12])

where, $I_{2}=I_{1}[x:=5+2]$

So the C++ program should return,

 x = 12, r = 19

Control statements

Statement composition

Two statements are composed together, by feeding the output state from one statement into the input state of the next.

\forall I,\forall R_{1},\forall M,\forall R_{2},\forall O,(I,R_{1},M);(M,R_{2},O)\ =(I,R_{2},O)

If statement

An if statement may be regarded as a function, with 3 parameters. Currying may be applied. The condition function may be state-full or stateless. Stateless is defined by,

\forall A,\forall B,(\operatorname {if} \operatorname {true} \operatorname {then} A\operatorname {else} B)=A

\forall A,\forall B,(\operatorname {if} \operatorname {false} \operatorname {then} A\operatorname {else} B)=B

where A and B may be state holders, or other types.

State-full is defined by

\forall I,\forall M,\forall A,\forall B,\forall C,\operatorname {if} (I,C,M)\operatorname {then} A\operatorname {else} B=(I,C,M);\operatorname {if} C\operatorname {then} A\operatorname {else} B

For example,

State-full if statement.
$x:=6;\operatorname {if} [x]=6\operatorname {then} x:=[x]/2\operatorname {else} x:=[x]/3$
$(I_{1},6,I_{1}[x:=6]);\operatorname {if} (I_{2},I_{2}[x]=6,I_{2})\operatorname {then} x:=[x]/2\operatorname {else} x:=[x]/3$
$(I_{1},6,I_{1}[x:=6]);(I_{2},I_{2}[x]=6,I_{2});\operatorname {if} I_{2}[x]=6\operatorname {then} x:=[x]/2\operatorname {else} x:=[x]/3$
$(I_{1},6,I_{1}[x:=6]);(I_{1}[x:=6],I_{1}[x:=6][x]=6,I_{1}[x:=6]);\operatorname {if} I_{1}[x:=6][x]=6\operatorname {then} x:=[x]/2\operatorname {else} x:=[x]/3$
$(I_{1},6=6,I_{1}[x:=6]);\operatorname {if} 6=6\operatorname {then} x:=[x]/2\operatorname {else} x:=[x]/3$
$(I_{1},\operatorname {true} ,I_{1}[x:=6]);x:=[x]/2$
$(I_{1},\operatorname {true} ,I_{1}[x:=6]);(I_{1}[x:=6],I_{1}[x:=6][x]/2,I_{1}[x:=6][x:=I_{1}[x:=6][x]/2])$
$(I_{1},I_{1}[x:=6][x]=6,I_{1}[x:=6]);(I_{1}[x:=6],6/2,I_{1}[x:=6][x:=6/2])$
$(I_{1},I_{1}[x:=6][x]=6,I_{1}[x:=6]);(I_{1}[x:=6],3,I_{1}[x:=3])$
$(I_{1},3,I_{1}[x:=3])$

While statement

The while statement is defined by,

\forall C,\forall B,\operatorname {while} C\operatorname {do} B=\operatorname {if} C\operatorname {then} (B;\operatorname {while} C\operatorname {do} B)

If C is not a state holder this statement either does nothing or expands recursively forever. If C is a state holder, the value represented by C does not need to be the same each time, even though the expression is the same.

Functions and return values

Many imperative languages use a fairly unnatural structure for defining functions. The natural structure is that the body of the function is an expression. The return value of the function is the value of the expression. The function call is then equal to the body of the function, with formal parameters replaced by actual parameters. The return value may be stateless, or it may be a state holder.

However imperative languages use statement composition which composes a state holder, which has no sensible return value. The return statement is then needed to capture a value from a state. An extra structure called the return frame is needed to get the value from a return statement.

Return frame

The return frame accepts the return value. The value may be stateless, or a state holder.

\forall X,\operatorname {returnframe} (\operatorname {return} X)=X

Declaration frame

The declaration frame is required to insure the calling of destructors. It is defined by,

\forall I,\forall R,\forall O,\forall X,\forall Q,\operatorname {declareframe} (X,\operatorname {return} (I,R,O))=(I,\_,O);destroy(X);(Q,R,Q)

Return statement

The return statement is defined by,

\forall I,\forall D,\forall M,\forall R,\forall O,(I,D,M);\operatorname {return} (M,R,O)=\operatorname {return} (I,R,O)

\forall X,\forall U,(\operatorname {return} X;U)=\operatorname {return} X

References

^ Kirchner, Hélène; Ringeissen, Christophe (1994). "Constraint Solving by Narrowing in Combined Algebraic Domains". Proc. 11th International Conference on Logic Programming. The MIT press. pp. 617–31.
^ Arenas, Puri; Artalejo, Mario Rodríguez (1997). "A Lazy Narrowing Calculus for Functional Logic Programming with Algebraic Polymorphic Types.". Proc. of the International Symposium on Logic Programming (ILPS'97). The MIT Press,. pp. 53–67.{{cite book}}: CS1 maint: extra punctuation (link)
^ Marriot, Kim; Stuckey, Peter J. (1998). Programming with constraints: An introduction. MIT Press.
^ Pfeifer, Niki; Kleiter, Gernot D. (2006). "INFERENCE IN CONDITIONAL PROBABILITY LOGIC". KYBERNETIKA. 42 (4): 391– 404. doi:10.1.1.85.7361. {{cite journal}}: Check |doi= value (help)
^ Solomonoff, R., "A Preliminary Report on a General Theory of Inductive Inference", Report V-131, Zator Co., Cambridge, Ma. Feb 4, 1960, revision, Nov., 1960.
^ Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1–22, March 1964.
^ Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224–254, June 1964.

[1] Kirchner, Hélène; Ringeissen, Christophe (1994). "Constraint Solving by Narrowing in Combined Algebraic Domains". Proc. 11th International Conference on Logic Programming. The MIT press. pp. 617–31.

[2] Arenas, Puri; Artalejo, Mario Rodríguez (1997). "A Lazy Narrowing Calculus for Functional Logic Programming with Algebraic Polymorphic Types.". Proc. of the International Symposium on Logic Programming (ILPS'97). The MIT Press,. pp. 53–67.{{cite book}}: CS1 maint: extra punctuation (link)

[3] Marriot, Kim; Stuckey, Peter J. (1998). Programming with constraints: An introduction. MIT Press.

[4] Pfeifer, Niki; Kleiter, Gernot D. (2006). "INFERENCE IN CONDITIONAL PROBABILITY LOGIC". KYBERNETIKA. 42 (4): 391– 404. doi:10.1.1.85.7361. {{cite journal}}: Check |doi= value (help)

[5] Solomonoff, R., "A Preliminary Report on a General Theory of Inductive Inference", Report V-131, Zator Co., Cambridge, Ma. Feb 4, 1960, revision, Nov., 1960.

[6] Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1–22, March 1964.

[7] Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224–254, June 1964.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 534: / Line 534: @@
 ==Boolean inductive inference==
+Boolean induction<ref>
-Boolean induction starts with a target condition ''C'' which is a Boolean expression.  [[Inductive inference]] is of the form,
+{{cite journal
+|first1=Niki
+|last1=Pfeifer
+|first2=Gernot D.
+|last2=Kleiter
+|title=INFERENCE IN CONDITIONAL PROBABILITY LOGIC
+|journal=KYBERNETIKA
+|date=2006
+|volume=42
+|issue=4
+|page=391– 404
+|doi=10.1.1.85.7361
+}}
+</ref> starts with a target condition ''C'' which is a Boolean expression.  [[Inductive inference]] is of the form,
 :''A theory T implies the statement C.  As the theory T is simpler than C, induction says that there is a probability that the theory T is implied by C''.

Revision as of 08:18, 5 May 2014

Introduction

Multiple values

Value sets

Value function

Single value

Element of a set

Application of functions

Exclusion

Simplification

Summary of results

A value sets identity

Narrowing

Narrowing by asserting equal

Narrowing by conjunction

Narrowing by crossed conditions

Narrowing by exclusion of dependent values

Relational evaluation

Notation

Application to Booleans

Evaluation strategy

Accessing the value set

Meta math definition

Example

Probabilistic value sets

Probability for a single value

Probabilities for multiple values

Probabilities of general value sets

Boolean inductive inference

Results

Implication and condition probability

Probability of a bit string

Parser for mathematics

Distribution of natural numbers

Model for constructing worlds

Applying Bayes' theorem

Removing theories without predictive power

The relational model of state

Transformations on imperative programs

State holder

Expressions

Assignment

State lookup

State transfer

Example

Control statements

Statement composition

If statement

While statement

Functions and return values

Return frame

Declaration frame

Return statement

See also

References