User:UniversalExplanation

From Wikipedia, the free encyclopedia

A Universal Analytical Model of Explanation Itself[edit]

by Richard D. Stafford

The subject of this paper is the creation of an abstract general model of absolutely any explanation of anything. Everything presented will be presented as an absolute truth in the sense of the philosophic concept of an analytical truth, as per Immanuel Kant:

"Analytic truth is a priori, necessarily true, independent of reality. It is what is true by how we understand and communicate. For example, a=a is analytic. Analytic truth is closely tied to equivalence, contradiction, definition, logic, and mathematics. Analytic truths are exact because they are about concepts. Since concepts are created, they can satisfy an exact relationship."

Establishing an Exact Definition of the concept I call "an Explanation":[edit]

The first issue is to define exactly what is meant by the phrase "an explanation". I will begin by pointing out that all explanations require something which is to be explained. Whatever it is that is to be explained, it can be thought of as information. It thus follows that "an explanation" is something which is done to (or for) information. The question then is, if we are to model "an explanation" in general, we must lay down exactly what an explanation does to (or for) information.

Many people have put forth the idea that an explanation makes information understandable. This idea requires one to clarify exactly how the existence of understanding is to be determined. This is a problem faced by every teacher in the history of the world. They attempt to discover the answer to that question by testing their students. The tests can easily be seen as an interaction where the teacher provides some information and then examines the students response to that information. If the student's response is consistent with the possible responses the teacher would give to the same information, then the teacher will presume the student understands the information. So the central issue of understanding anything is obtaining results in “agreement with expectations”.

In accordance with that observation, I will define “what an explanation must do for information” as “it must provide expectations of subsets of that information”. That is, it seems to me that if all the information is known, then any questions about the information can be answered (in fact, that could be regarded as the definition of "knowing"). On the other hand, if the information is understood (explainable), then questions about the information can be answered given only limited or incomplete knowledge of the underlying information: i.e., limited subsets of the information. What I am saying is that understanding implies it is possible to predict expectations for information which is not known and that an explanation of that information constitutes a method which provides one with those rational expectations for unknown information consistent with what is known.


Thus I define "An explanation" to be a method of obtaining expectations from given known information.

Of course, there are many different definitions of explanation in the philosophic literature, as summarized at this link:http://www.iep.utm.edu/e/explanat.htm; and at this link for definitions of 'scientific explanation':http://plato.stanford.edu/entries/scientific-explanation/. The actual issue here is to fabricate a mathematical model of "an explanation" (not a theory as such). Thus it would be of great interest to the author if any "theory" actually asserted that there exists an explanation which does not provide any method of obtaining some expectations of some kind from what is presumed known. It would be very interesting to understand exactly what such an explanation explained.

Until such a thing is brought up, I will continue with my presentation.

It follows that a model of an explanation must posses two fundamental components: the information to be explained and the mechanism used to generate expectations for possible additional information. The first fundamental component is, "what is to be explained"; thus our first problem is to find an abstract way of representing any body of information.

Let "A" be what is to be explained and proceed with the following primitive definitions:

  1. A is a set. "What is to be explained."
  2. B(tk) is a finite unordered collection of elements of A. "A hypothetical collection of information obtained from A."
  3. C is a finite collection of sets B(tk). "What is known about A: i.e., our given known information."

Note that in mathematics, a set can be thought of as any collection of distinct things considered as a whole and thus can be anything. Furthermore, I have used tk as a stand in for a sort of pseudo time. The sole purpose of tk is to denote the order with which the sets B were obtained from A and should not be thought of as implying any other common aspect of what is thought of as time. Lastly, if any information is conveyed by order in the elements of A that information will be in C via different B 's. This allows B to be defined as an unordered collection of elements of A.

The second fundamental component is the representation of an arbitrary method of obtaining expectations: i.e., the representation of the "explanation" itself. I will define the expectations to be the probability that a particular B(tk) will become a member of C: written as P(B(tk)). It should be clear that, in order to analyze an actual explanation (a process of no interest here) we need a way of referring to the specific elements of A which define B(tk). That would be so that we may know and discuss what we are dealing with. Note that acutally creating an explantion requires a way of referring to those elements so it is definitely a requirement of the model of a general explanation but is not a process being modeled (i.e., the actual process of identifying those elements is part of the specific explanation: information embedded in what will be called "assumed information").

Construction of a model:[edit]

From here forth, since the purpose is to create an absolutely general model, there are only two issue of significance: first, is the model totally general (that is, does there exist an explanation which cannot be represented via the model) and, second, are the explicit logical deductions made from the model valid. No other issues are significant as any other issue resides within the explanation itself.

Modeling the known information:[edit]

Since B is finite, its elements may be labeled:

  1. Let labeli be the label of a particular element of B(tk)
  2. Let all labeli be mapped into a set of real numbers xi.
  3. Let all numbers xi be mapped into points on the real x axis.

Thus it is seen that any set of labels for the elements of A available to the explanation (i.e., appearing in any set B) may be mapped into points on the real x axis; however a minor problem exists in any attempt to use points on a geometric axis as a general model of arbitrary information.

Sub Problem number 1:

Since all explanations must be modeled, B may contain the same element of A more than once: i.e., the points xi need not be unique. There is a problem with using points on the real x axis to display the information contained in B, as points with the same location can not represent multiple occurrences of the mapped label and information contained in B is thus lost in step three as put forth.

Solution to Sub Problem number 1:

Add to the model a real axis orthogonal to the real x axis. Attach to every xi an arbitrary such that every pair of identical xi points have different attached. The model can then display the fact of multiple occurrences of identical xi via this orthogonal axis.

The "known information" of any explanation is now modeled by a set of points (one set for each B(tk) making up C). That is, any collection of elements taken from A are now seen as points mapped into an (x,) plane.

We have now established a specific way of modeling all possible references to the elements in B in the set C; however, there still exists a difficulty.

Sub Problem number 2:

The model must be able to model all possible references to elements in A, not just to the elements found in the sets B contained in C. We must allow for the fact that there can exist elements of A which are yet to be known.

Solution to Sub Problem number 2:

Since the elements of C consist of a finite collection of sets Bk, references to the elements of C may be mapped into any ordered set of real numbers tk. We may associate the specific tk with all (x,)k elements in Bk. If we then set up a real t axis perpendicular to the (x,) plane, every element in the collection of Bk can be mapped to a point in the real (x,,t) space. Since only a finite number of (x,) planes are consumed in displaying C by means of this procedure, we have an uncountable number of planes remaining to model the set of all possible collections B. Providing for all possible sets B is absolutely equivalent to providing for all possible references to the elements of A as B is defined to be a collection of elements from A.

We now have constructed a model capable of modeling absolutely any collection of known information which also specifically models any and all possible changes in that information. The model consists of points in a real (x,,t) space where "real" means that x, and t are taken from the set of real numbers.

Modeling the explanation itself:[edit]

In order to complete the problem, it is necessary to model a general mechanism capable of yielding the probability of any specific set B(tk) derived from the elements of A. This general mechanism must transform the distribution of elements defined by B(tk), a set of points in a real (x,,t) space, into a probability, a real number bounded by zero and one. As mentioned earlier, this is exactly the concept of a mathematical algorithm.

The first requirement of the appropriate algorithm is that the result is a probability as our expectations, as defined here, can only be expressed as a probability. It may appear that only algorithms which yield a real number bounded by zero and one are of interest; however, that is not the case. Absolutely any mathematical algorithm can be seen as an operation which transforms a given set of numbers into another set of numbers. There exists a specific mathematical procedure which can transform any function of x, , and t into a form which can be interpreted as a probability. The desired probability can be seen as the "normalized" sum of the squares of the set of numbers produced by that algorithm. If the set of numbers produced by an arbitrary algorithm is seen as a vector in an abstract space, the specific procedure is exactly what is ordinarily referred to as an "inner product" of the associated vector function.

It follows that absolutely any algorithm can be seen as defining a probability, that probability being represented by the expression:

(where ) without introducing any constraints whatsoever on the nature of the explanation being modeled. As indicated, x with an arrow above stands for the complete collection of x and defining the specific B of interest in the x, , t space. The "dot" indicates a scalar product and is to be properly "normalized". Since no constraint whatsoever has been placed on the problem by this notation, it follows that absolutely any explanation may be modeled by a function of the form where the argument is the collection of points which are mapped from the elements of the appropriate B.

The first important consequences of this model:[edit]

There are some subtle aspects embedded in the model as so far described which need to be carefully examined. Of very great significance is the fact that the goal was to create a model which will model any explanation of A obtained from C. The specific mapping of the labels for the elements of C are part of the model and not at all an aspect of the phenomena to be modeled. If follows that the yielded by the model cannot be a function of that mapping procedure: i.e., all possible mappings must end up yielding the same probability distribution when seen as a function of the actual sets Bk. That is to say, the representing the explanation must yield results consistent with the expected distributions of the elements of Bk in C independent of the chosen mappings. This single fact can be used to prove that must satisfy the following three orthogonal differential constraints (for details, see appendix 1):

, , and

Adding one last component:[edit]

There is one very fundamental flaw in the model as so far presented. That flaw is the presumption that C is available to us. Everything in this presentation was supposed to be analytically true; however, in actual fact the only thing we know for sure is that we know nothing for sure. It follows that we cannot possibly know what elements of A are actually available to us. Thus it is that the only explanations available for analysis are based on a set consisting of the combination of two sets: C, the "given information" discussed in the design of this model, and a set I will call D which consists of information we presume is valid. It is very important to realize that the set D is not part of A but is rather part of the explanation itself: that is, information presumed to be true and, without which, the explanation is incomplete.

What is important here is the realization that any explanation is actually a combination of two very different components. There are things presumed to exist (the set D) and the rules of the explanation (which is modeled in this presentation by the function ). Since there exists absolutely no way of establishing whether or not a given piece of information is actually derived from C and not D, the B(tk) must itself be a combination. It follows that there is a trade off here. The rules are quite dependent upon what is presumed to exist and, likewise, what must exist is quite dependent upon what the rules are. Certainly, if one steps back and looks at the problem of creating explanations, that fact should not at all be unexpected. Actually, it is a rather mundane and obvious observation.

This suggests that the final aspect of a rational model is to actually design a universal rule which is capable of yielding the distributions of the elements of B in C for every possibility as the elements of D must obey exactly the same rule or the explanation fails immediately. In this regard, it is quite easy to prove that, for any B (any distribution of points in the (x,) plane) there exists a corresponding set D (a second distribution of points in the (x,) plane), which, under the simple constraint that no two points can be the same, will constrain the distribution of B in C to exactly that distribution, no matter what that distribution might be. (see appendix 2). The constraint that no two points can be the same is easily enforced by requiring:

,

where is defined to be the vector in the x, space defined by (xk, k) and represents the Dirac delta function. This express constraint on the elements of B can be converted into an express constraint on by noting that the proper constraint on is that must vanish whenever the above constraint on the elements is invalid; i.e., when F is not equal to zero, must be zero. Thus the product of the two must always be zero.

.

The four independent constraints on developed above may be expressed in a very succinct form through the use of some well known mathematical tricks.

If one defines a set of matrices as follows:

where

and define the two expressions and , a small shift in perspective will allow the four constraints on to be written in a single equation as follows (see appendix 3):

constrained by the requirement that .

It follows that all explanations of anything may be directly modeled by a set of points in an x, space moving through a t dimension and required to obey the fundamental equation given above. The probability of any particular set of elements in B being given by

.

Thus I have thus successfully created a valid model of any possible explanations of any A consistent with C plus D (what is presumed known). In doing so, I have accomplished two very significant things: first, I have deduced quantum mechanics from fundamental concepts and second, I have established a fundamental means of communication as anything conceivable is represent-able as an object (a collection of information) in the x, , t space which is required to obey the above relations.

I mention communications because what I have asserted is an analytical truth and applies to absolutely any collection of concepts. The significant factor often omitted in any discussion is that a translation from one language to another (including translation of invented machine language representations used by computers) actually constitutes an explanation of what is being said in the original language.

"An Explanation" is the primary concept underlying all other concepts!