Multiset

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 98.122.177.25 (talk) at 00:12, 29 December 2017. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In mathematics, a multiset (aka bag or mset) is a generalization of the concept of a set that, unlike a set, allows multiple instances of the multiset's elements. For example, {a, a, b} and {a, b} are different multisets although they are the same set. However, order does not matter, so {a, a, b} and {a, b, a} are the same multiset.

The multiplicity of an element is the number of instances of the element in a specific multiset. For example, an infinite number of multisets exist which contain only elements a and b, varying only by multiplicity:

  • The unique set {a, b} contains only elements a and b, each having multiplicity 1
  • In multiset {a, a, b}, a has multiplicity 2 and b has multiplicity 1
  • In multiset {a, a, a, b, b, b}, a and b both have multiplicity 3

Nicolaas Govert de Bruijn coined the word multiset in the 1970s, according to Donald Knuth.[1]: 694  However, the use of multisets predates the word multiset by many centuries. Knuth attributes the first study of multisets to the Indian mathematician Bhāskarāchārya, who described permutations of multisets around 1150. Knuth also lists other names that were proposed or used for multisets, including list, bunch, bag, heap, sample, weighted set, collection, and suite.[1]: 694  The word "visa" is also used.[citation needed]

Overview

The number of times an element belongs to the multiset is the multiplicity of that member. The total number of elements in a multiset, including repeated memberships, is the cardinality of the multiset. For example, in the multiset {a, a, b, b, b, c} the multiplicities of the members a, b, and c are respectively 2, 3, and 1, and the cardinality of the multiset is 6. To distinguish between sets and multisets, a notation that incorporates square brackets is sometimes used: the multiset {2, 2, 3} can be represented as [2, 2, 3].[2] In multisets, as in sets and in contrast to tuples, the order of elements is irrelevant: The multisets {a, a, b} and {a, b, a} are equal.

History

Wayne Blizard traced multisets back to the very origin of numbers, arguing that “in ancient times, the number n was often represented by a collection of n strokes, tally marks, or units.”[3] These and similar collections of objects are multisets as strokes, tally marks, or units are considered indistinguishable. This shows that people implicitly used multisets even before mathematics emerged.

This shows that necessity in this structure has been always so urgent that multisets have been several times rediscovered and appeared in literature under different names.[4] For instance, they were referred to as bags by James Lyle Peterson in 1981,[5] who attributed this term to a 1971 paper by Cerf et al.[6] A multiset has been also called an aggregate, heap, bunch, sample, weighted set, occurrence set, and fireset (finitely repeated element set).[4][7]

Although multisets were implicitly utilized from ancient times, their explicit exploration happened much later. The first known study of multisets is attributed to the Indian mathematician Bhāskarāchārya circa 1150, who described permutations of multisets.[1]: 694  The work of Marius Nizolius (1498–1576) contains another early reference to the concept of multisets.[8] Athanasius Kircher found the number of multiset permutations when one element can be repeated.[9] Jean Prestet published a general rule for multiset permutations in 1675.[10] John Wallis explained this rule in more detail in 1685.[11]

In the explicit form, multisets appeared in the work of Richard Dedekind.[12][13] Other mathematicians formalized multisets and began to study them as a precise mathematical object (structure) in the 20th century.

Example multisets

One of the simplest and most natural examples is the multiset of prime factors of a number n. Here the underlying set of elements is the set of prime divisors of n. For example, the number 120 has the prime factorization

which gives the multiset {2, 2, 2, 3, 5}.

A related example is the multiset of solutions of an algebraic equation. A quadratic equation, for example, has two solutions. However, in some cases they are both the same number. Thus the multiset of solutions of the equation could be {3, 5}, or it could be {4, 4}. In the latter case it has a solution of multiplicity 2.

A special case of the above are the eigenvalues of a matrix, if these are defined as the multiset of roots of its characteristic polynomial. However a choice is made here: the (usual) definition of eigenvalues does not refer to the characteristic polynomial, and other possibilities give rise to different notions of multiplicity, and therefore to different multisets. The geometric multiplicity of λ as eigenvalue of a matrix A is the dimension of the kernel of A − λI, which is often smaller than its multiplicity as root of the characteristic polynomial (the algebraic multiplicity) when the latter is larger than 1. The set of eigenvalues of A is also the set of roots of its minimal polynomial, but the multiset of those roots need not be the same either as the one defined using algebraic multiplicity, or using the geometric multiplicity.

Formal definition

Axiomatic way

Basically there are two different ways appoaching to the multiset concept. Just like for sets, an axiomatic definition may be used, as it has been performed by Wayne Blizard 1989.[3]. The complete list of multiset axioms will contain folowing subsections:

  1. axioms for crispt sets (e. g. Zermelo–Fraenkel (ZFC).
  2. axioms for the natural numbers (i. e. Peano axioms) - however the set of natural numbers may be constructed by (crisp) set theoretical means as given in item 1.
  3. multiset specific axioms.

Constructive way

However, a more common way is the other one: Constructing multisets using a given set theory (e. g. ZFC), and a concept of natural numbers - no matter if axiomatic (Peano) or constructed by use of the set theory. According to this, a constructive definition will be given in the following.

Within set theory, a multiset may be formally defined as a 2-tuple (A, m) where A is some set and is a function from A to the set of positive integers. The set A is called the underlying set of elements. For each a in A the multiplicity (that is, number of occurrences) of a is the number m(a). Like any function, the function m may be defined as its graph, i. e. the set of ordered pairs . With these definitions the multiset written as {a, a, b} is defined as ({a, b}, {(a, 2), (b, 1)}), and the multiset {a, b} is defined as ({a, b}, {(a, 1), (b, 1)}).

Given a universe U in which the elements of any set A must live, the definition can be simplified by extending the multiplicity to the whole universe. We then have a multiplicity function from U to the set of nonnegative integers, obtained by extending m to U with values 0 for each element not in A.

The concept of a multiset is a generalization of the concept of a set. A multiset corresponds to an ordinary set if the multiplicity of every element is one (as opposed to some larger natural number). An indexed family, (ai), where i is in some index-set, may define a multiset, sometimes written {ai}, in which the multiplicity of any element x is the number of indices i such that . The condition for this to be possible is that no element occurs infinitely many times in the family: even in an infinite multiset, the multiplicities must be finite numbers.

It is possible to extend the definition of a multiset by allowing multiplicities of individual elements to be infinite cardinals instead of natural numbers. Not all properties carry over to this generalization however, and this article uses the definition above, with finite multiplicities.

Multiplicity function

The set indicator function of a normal set is generalized to the multiplicity function for multisets.

For normal sets the scenario is as follows: The set indicator function of a subset A of a set X is the function

defined by

The set indicator function of the intersection of sets is the minimum function of the indicator functions

The set indicator function of the union of sets is the maximum function of the indicator functions

The set indicator function of the set difference of sets is the zero-truncated subtraction of the indicator functions

The set indicator function of the symmetric difference of sets is the absolute of the difference of the indocator functions

The set indicator function of a subset is smaller than or equal to that of the superset

The set indicator function of a cartesian product is the product of the indicator functions of the cartesian factors

The cardinality of a (finite) set is the sum of the indicator function values

Now let us consider multisets: The concept of set indicator function is generalized by releasing the constraint that the values must be 0 and 1 only and allow values of 2, 3, 4 and so on. The resulting function is called a multiplicity function and such a function defines a multiset.

The support of a multiset in a universe is the set of elements of with non-zero multiplicity, that is:

.

Please note that index now denotes the multiset either given by axioms or by definition as above, rather than a crisp set. The normal set in the above notation now turns into the multiset's support . denotes the multiplicity function (as defined on the whole universe ).

The concepts of intersection, union, difference, symmetric difference, subset, cartesian product and cardinality of multisets are defined by the above formulas. Some of these definitions are very similar to those defined for fuzzy sets, e. g. support. Other parameters of multisets may be defined similar to those of fuzzy sets as well: relative cardiniality RelCard(A|G), height Hgt(A), and width Width(A). Further operations are as follows:

The multiplicity function of a multiset sum, is the sum of the multiplicity functions

The scalar multiplication of a multiset by a natural number n may be defined as:

Example multiset operations

These examples demonstrate these operations of multiplicity functions.

Disjoint multisets

Two multisets are disjoint iff

which is equivalent to

and also equivalent to

Multisets are disjoint, iff their supports are disjoint according to the standard definition for sets.

For disjoint multisets intersection will give ∅ as well as well as scalar multiplication; and union will give the same result as multiset sum, which is denoted as

with its multiplicity function given by

Note that only one of both summands may be greater than zero.

Example: Symmetric difference:

For disjoint multisets the following holds true:

The above can be generalized to finite families of multiets as follows: Given a family of multisets with Index set I (e.g. I = {1,2,3,...n}). This family is (pairwise) disjoint iff

A family of multisets is disjoint, iff the family of underlying supports is disjoint in the standard sense for families sets.

Independend of the t/s-norm pair, intersection of a disjoint family will give ∅ again as well as mutiplication (in case of finite index set I), while union and (finite) multiset sum give the same result:

with its multipicity function given by

Again only one of the summands is greater than zero.

For disjoint families of multisets the following holds true:

Free commutative monoids

The free commutative monoid on a set X (see free object) can be taken to be the set of finite multisets with elements drawn from X, with the monoid operation being multiset sum and the empty multiset as identity element. Such monoids are also known as (finite) formal sums of elements of X with natural coefficients. The free commutative semigroup is the subset of the free commutative monoid which contains all multisets with elements drawn from X except the empty multiset.

Free abelian groups are formal sums (i.e. linear combinations) of elements of X with integer coefficients. Equivalently, they may be seen as signed finite multisets with elements drawn from X.

Counting multisets

Bijection between 3-subsets of a 7-set (left)
and 3-multisets with elements from a 5-set (right)
So this illustrates that .

The number of multisets of cardinality k, with elements taken from a finite set of cardinality n, is called the multiset coefficient or multiset number. This number is written by some authors as , a notation that is meant to resemble that of binomial coefficients; it is used for instance in (Stanley, 1997), and could be pronounced "n multichoose k" to resemble "n choose k" for . Unlike for binomial coefficients, there is no "multiset theorem" in which multiset coefficients would occur, and they should not be confused with the unrelated multinomial coefficients that occur in the multinomial theorem.

The value of multiset coefficients can be given explicitly as

where the second expression is as a binomial coefficient; many authors in fact avoid separate notation and just write binomial coefficients. So, the number of such multisets is the same as the number of subsets of cardinality k in a set of cardinality n + k − 1. The analogy with binomial coefficients can be stressed by writing the numerator in the above expression as a rising factorial power

to match the expression of binomial coefficients using a falling factorial power:

There are for example 4 multisets of cardinality 3 with elements taken from the set {1, 2} of cardinality 2 (n = 2, k = 3), namely {1, 1, 1}, {1, 1, 2}, {1, 2, 2}, {2, 2, 2}. There are also 4 subsets of cardinality 3 in the set {1, 2, 3, 4} of cardinality 4 (n + k − 1), namely {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}.

One simple way to prove the equality of multiset coefficients and binomial coefficients given above, involves representing multisets in the following way. First, consider the notation for multisets that would represent {a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d} (6 as, 2 bs, 3 cs, 7 ds) in this form:

This is a multiset of cardinality k = 18 made of elements of a set of cardinality n = 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of cardinality 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of cardinality 4 − 1 in a set of cardinality 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of cardinality 18 of a set of cardinality 18 + 4 − 1. This is

thus is the value of the multiset coefficient and its equivalencies:

One may define a generalized binomial coefficient

in which n is not required to be a nonnegative integer, but may be negative or a non-integer, or a non-real complex number. (If k = 0, then the value of this coefficient is 1 because it is the empty product.) Then the number of multisets of cardinality k in a set of cardinality n is

Recurrence relation

A recurrence relation for multiset coefficients may be given as

with

The above recurrence may be interpreted as follows. Let [n] :=  be the source set. There is always exactly one (empty) multiset of size 0, and if n = 0 there are no larger multisets, which gives the initial conditions.

Now, consider the case in which n,k > 0. A multiset of cardinality k with elements from [n] might or might not contain any instance of the final element n. If it does appear, then by removing n once, one is left with a multiset of cardinality k − 1 of elements from [n], and every such multiset can arise, which gives a total of

possibilities.

If n does not appear, then our original multiset is equal to a multiset of cardinality k with elements from [n − 1], of which there are

Thus,

Generalization and connection to the negative binomial series

The multiplicative formula allows the definition of multiset coefficients to be extended by replacing n by an arbitrary number α (negative, real, complex):

With this definition one has a generalization of the negative binomial formula (with one of the variables set to 1), which justifies calling the negative binomial coefficients:

This Taylor series formula is valid for all complex numbers α and X with |X| < 1. It can also be interpreted as an identity of formal power series in X, where it actually can serve as definition of arbitrary powers of series with constant coefficient equal to 1; the point is that with this definition all identities hold that one expects for exponentiation, notably

,

and formulas such as these can be used to prove identities for the multiset coefficients.

If α is a nonpositive integer n, then all terms with k > −n are zero, and the infinite series becomes a finite sum. However, for other values of α, including positive integers and rational numbers, the series is infinite.

Polynomial notation

The set {x} may be represented by the monomial x. Its power set, {{}, {x}}, is represented by the binomial 1 + x.

The set {x, y} may be represented by the monomial xy. Its power set, {{}, {x}, {y}, {x, y}}, is represented by the polynomial

The multiset {x, x} may be represented by the monomial x2. The multiset of submultisets of {x, x}, {{}, {x}, {x}, {x, x}}, is represented by the polynomial

The multiset of submultisets of the multiset represented by the polynomial xn is

That is why the binomial coefficient counts the number of k-combinations of an n-set.

The multiset represented by the polynomial xKyNK, containing N elements, K of which are hits, is called a statistical population, and a submultiset is called a statistical sample. The set of samples is

which by the binomial theorem equals

So the number of n-samples with k hits is

See hypergeometric distribution and inferential statistics for further on the distribution of hits.

The infinite set of finite multisets of elements taken from the set {x}:

may be represented by the formal power series

The formal solution,

makes sense as a set of multisets, but the intermediate result, 1 − x, does not make sense as a set of multisets.

The infinite set of finite multisets of elements taken from the set represented by the monomial xy is

The infinite multiset of finite multisets of elements taken from the multiset represented by the monomial x2 may be taken as a special case where x = y:

The infinite multiset of finite multisets of elements taken from the multiset represented by the monomial xn is

This explains why "multisets are negative sets." The negative binomial coefficients count the number of k-multisets of elements taken from an n-set.

Cumulant generating function

A non-negative integer n, can be represented by the monomial xn. A finite multiset of non-negative integers can likewise be represented by a polynomial f(x).

It is convenient to consider the cumulant generating function g(t) = log f(et).

  • The cardinality of the multiset is eg(0) = f(1).
  • The derivative of the cumulant generating function is .
    • The mean value of the multiset is .
    • The variance of the multiset is .

For example, the multiset {2, 2, 2, 3, 5} can be represented by the polynomial f(x) = 3x2 + x3 + x5, with cumulant generating function g(t) = log(3e2t + e3t + e5t), cardinality 3 + 1 + 1 = 5, derivative , and mean value μ = (3 + 1 + 1)−1(6 + 3 + 5) = 2.8.

The numbers are called cumulants.

The infinite set of non-negative integers {0, 1, 2, ⋯} is represented by the formal power series 1 + x + x2 + ⋯ = (1 − x)−1. The mean value and standard deviation are undefined. Nevertheless, it has a cumulant-generating function, g(t) = −log(1 − et). The derivative of this cumulant-generating function is .

A finite multiset of real numbers, A = {Ai}, is represented by the cumulant generating function

This representation is unique: different multisets have different cumulant generating functions. See partition function for the case where the numbers in question are the energy levels of a physical system.

The cumulant-generating function of a multiset of n real numbers having mean μ and standard deviation σ is g(t) = log n + μt + 2−1(σt)2 + ⋯ and the derivative is .

The cumulant-generating function of a set {k} consisting of a single real number k is g(t) = kt. The derivative is the number itself: . So the concept of the derivative of the cumulant generating function of a multiset of real numbers is a generalization of the concept of a real number.

The cumulant-generating function of a constant multiset, {k, k, k, k, ⋯, k} of n elements all equal to the same real number k, is g(t) = log n + kt, and the derivative is the number itself: , irrespective of n.

The cumulant-generating function of the multiset of sums of elements of two multisets of numbers is the sum of the two cumulant-generating functions:

There is no general formula for computing the cumulant generating function of a product

but the special case of a constant times a multiset of numbers is:

The multiset 2⋅A = {2Ai} is not the same multiset as 2 × A = A + A = {Ai + Aj}. For example, while 2 × {1, −1} = {1, −1} + {1, −1} = {1 + 1, 1 − 1, −1 + 1, −1 −1} = {2, 0, 0, −2}. The cumulant generating function of k × A is

The standard normal distribution is like a limit of large multisets of numbers.

This limit does not make sense as a multiset of numbers, but the derivative of the cumulant generating functions of the multisets in question makes sense, and the limit is well defined.

The constant term k2log(2) vanishes by differentiation. The terms ⋯ vanish in the limit. So for the standard normal distribution, having mean 0 and standard deviation 1, the derivative of the cumulant generating function is . For the normal distribution having mean μ and standard deviation σ, the derivative of the cumulant generating function is .

Applications

Multisets have various applications.[7] They are becoming the main structure of combinatorics because in its search for higher rigorousness, modern combinatorics has been developed not for sets but for multisets.[14][15][16][17] Multisets have become an important tool in databases.[18][19][20] For instance, multisets are often used to implement relations in database systems. Multisets also play an important role in computer science.[1]

There are also other applications. For instance, Richard Rado used multisets as a device to investigate the properties of families of sets. He wrote, "The notion of a set takes no account of multiple occurrence of any one of its members, and yet it is just this kind of information which is frequently of importance. We need only think of the set of roots of a polynomial f(x) or the spectrum of a linear operator."[4]

Generalizations

Different generalizations of multisets have been introduced, studied and applied to solving problems.

  • Real-valued multisets (in which multiplicity of an element can be any real number)[21][22]
This looks simple and straightforward as many definitions for fuzzy sets and multisets are very similar and can be taken over for real-valued multisets by just replacing the value range of the characteristic function ([0, 1] or ℕ0 = {0, 1, 2, 3, ...} respectively) by ℝ0+ = [0, ∞[. However this approach cannot be easily extended for genealized fuzzy sets where a poset or lattice is used instead of a simple degree of membership. Several other approachs for fuzzy multisets have been developped that don't show this restriction.
  • Fuzzy multisets[23]
  • Rough multisets[24]
  • Hybrid sets[25]
  • where multiplicity is any real-valued step function[26]
  • Soft multisets[27]
  • Soft fuzzy multisets[28]
  • Named set (unification of all generalizations of sets)[29][30][31][32]

See also

References

  1. ^ a b c d Knuth, Donald E. (1998). Seminumerical Algorithms. The Art of Computer Programming. Vol. 2 (3rd ed.). Addison Wesley. ISBN 0-201-89684-2.
  2. ^ Hein, James L. (2003). Discrete mathematics. Jones & Bartlett Publishers. pp. 29–30. ISBN 0-7637-2210-3.
  3. ^ a b Blizard, Wayne D (1989). "Multiset theory". Notre Dame Journal of Formal Logic. 30 (1): 36–66. doi:10.1305/ndjfl/1093634995.
  4. ^ a b c Blizard, Wayne D. (1991). "The Development of Multiset Theory". Modern Logic. 1 (4): 319–352.
  5. ^ Peterson, James L. (1981). Petri Net Theory and the Modelling of Systems. Englewood Cliffs, New Jersey: Prentice-Hall.
  6. ^ Cerf, Vint; Fernandez, E.; Gostelow, K.; Volansky, S. (December 1971). Formal Control Flow Properties of a Model of Computation (Report). Los Angeles, California: Computer Science Department, University of California. ENG-7178.
  7. ^ a b Singh, D.; Ibrahim, A. M.; Yohanna, T.; Singh, J. N. (2007). "An overview of the applications of multisets". Novi Sad Journal of Mathematics. 37 (2): 73–92.
  8. ^ Angelelli, I. (1965). "Leibniz's misunderstanding of Nizolius' notion of 'multitudo'". Notre Dame Journal of Formal Logic (6): 319–322.
  9. ^ Kircher, Athanasius (1650). Musurgia Universalis. Rome: Corbelletti.
  10. ^ Prestet, Jean (1675). Elemens des Mathematiques. Paris: André Pralard.
  11. ^ Wallis, John (1685). A treatise of algebra. London: John Playford.
  12. ^ Dedekind, Richard (1888). Was sind und was sollen die Zahlen?. Braunschweig: Vieweg.
  13. ^ Syropoulos, Apostolos (2001). "Mathematics of Multisets". In Calude, C. S.; et al. (eds.). Multiset processing: Mathematical, computer science, and molecular computing points of view. Springer-Verlag. pp. 347–358.
  14. ^ Aigner, M. (1979). Combinatorial Theory. New York/Berlin: Springer Verlag.
  15. ^ Anderson, I. (1987). Combinatorics of Finite Sets. Oxford: Clarendon Press.
  16. ^ Stanley, Richard P. (1997). Enumerative Combinatorics. Vol. 1. Cambridge University Press. ISBN 0-521-55309-1.
  17. ^ Stanley, Richard P. (1999). Enumerative Combinatorics. Vol. 2. Cambridge University Press. ISBN 0-521-56069-1.
  18. ^ Grumbach, S.; Milo, T (1996). "Towards tractable algebras for bags". Journal of Computer and System Sciences. 52 (3): 570–588. doi:10.1006/jcss.1996.0042.
  19. ^ Libkin, L.; Wong, L. (1994). "Some properties of query languages for bags". Proceedings of the Workshop on Database Programming Languages. Springer Verlag. pp. 97–114.
  20. ^ Libkin, L.; Wong, L. (1995). "On representation and querying incomplete information in databases with bags". Information Processing Letters. 56 (4): 209–214. doi:10.1016/0020-0190(95)00154-5.
  21. ^ Blizard, Wayne D. (1989). "Real-valued Multisets and Fuzzy Sets". Fuzzy Sets and Systems. 33: 77–97. doi:10.1016/0165-0114(89)90218-2.
  22. ^ Blizard, Wayne D. (1990). "Negative Membership". Notre Dame Journal of Formal Logic. 31 (1): 346–368.
  23. ^ Yager, R. R. (1986). "On the Theory of Bags". International Journal of General Systems. 13: 23–37. doi:10.1080/03081078608934952.
  24. ^ Grzymala-Busse, J. (1987). "Learning from examples based on rough multisets". Proceedings of the 2nd International Symposium on Methodologies for Intelligent Systems. Charlotte, North Carolina. pp. 325–332.{{cite book}}: CS1 maint: location missing publisher (link)
  25. ^ Loeb, D. (1992). "Sets with a negative numbers of elements". Advances in Mathematics. 91: 64–74. doi:10.1016/0001-8708(92)90011-9.
  26. ^ Miyamoto, S. (2001). "Fuzzy Multisets and their Generalizations". Multiset Processing. 2235: 225–235.
  27. ^ Alkhazaleh, S.; Salleh, A. R.; Hassan, N. (2011). "Soft Multisets Theory". Applied Mathematical Sciences. 5 (72): 3561–3573.
  28. ^ Alkhazaleh, S.; Salleh, A. R. (2012). "Fuzzy Soft Multiset Theory". Abstract and Applied Analysis.
  29. ^ Burgin, Mark (1990). "Theory of Named Sets as a Foundational Basis for Mathematics". Structures in Mathematical Theories. San Sebastian. pp. 417–420.
  30. ^ Burgin, Mark (1992). "On the concept of a multiset in cybernetics". Cybernetics and System Analysis. 3: 165–167.
  31. ^ Burgin, Mark (2004). "Unified Foundations of Mathematics". arXiv:math/0403186.
  32. ^ Burgin, Mark (2011). Theory of Named Sets. Mathematics Research Developments. Nova Science Pub Inc. ISBN 978-1-61122-788-8.