Talk:Geostatistics: Difference between revisions
Line 360: | Line 360: | ||
'''Going the through the current TOC: ''' |
'''Going the through the current TOC: ''' |
||
1. Background |
|||
## Role of statistics in geography |
## Role of statistics in geography |
||
* As said, geography is not the background of geostatistcs, even though Toblers law of geography is of the cited in this context. The typical background is (mineral resource) geology. |
* As said, geography is not the background of geostatistcs, even though Toblers law of geography is of the cited in this context. The typical background is (mineral resource) geology. |
||
2. Spatial data and descriptive statistics |
|||
## Boundary delineation (that spatial statistics) |
## Boundary delineation (that spatial statistics) |
||
## Modifiable areal units (thats spatial statistics) |
## Modifiable areal units (thats spatial statistics) |
||
## Spatial aggregation/scale problem (The upscaling problem is a topic in geostatistics, but not as aggregation of political units, but as mean values over (e.g. mining) blocks) |
## Spatial aggregation/scale problem (The upscaling problem is a topic in geostatistics, but not as aggregation of political units, but as mean values over (e.g. mining) blocks) |
||
3. Descriptive spatial statistics (point pattern analysis is a part of spatial statistics but not of geostatistics, see e.g. Cressie 1993 as cited above) |
|||
## Spatial measures of central tendency (thats related to districts and not random fields) |
|||
## Spatial measures of dispersion (this part is completely uninformative) |
|||
4. Topology (topology is not statistics, not even geostatistics, its merily a geospatial problem) |
|||
## Topology rules (still not statistics) |
|||
5. Computational Geometry (computational geometry is not even statistics) |
|||
6. Topography (might be interpolated with geostatistics, but that is not mentioned here) |
|||
7. Sampling methodology (geospatial sampling seams to be the first word after the first chapter actually related to the subject of geostatistics) |
|||
8. Criticism (This is clearly a paragraph critizinging kriging which is a major subject of geostatistics. Although I disagree with the cited work by Merks, it is the first part acutally related to geostatistics. Strictly speaking however this chapter belongs to Kriging and not here.) |
|||
9. Related software (this are all real geostatistical software packages and should stay here) |
|||
10. Notes (Should go to references and it is fully sufficient to cite Mr. Merks once) |
|||
11. References (Most of them are really good references on geostatistics, but most of them are not cited above) |
|||
12. See also (Only the kriging link seams appropriate) |
|||
13. External links (quite ok) |
|||
[[User:Bohunk|SCmurky]] ([[User talk:Bohunk|talk]]) 19:20, 12 August 2009 (UTC) |
[[User:Bohunk|SCmurky]] ([[User talk:Bohunk|talk]]) 19:20, 12 August 2009 (UTC) |
Revision as of 19:25, 12 August 2009
Statistics Unassessed | ||||||||||
|
Mathematics Stub‑class Low‑priority | ||||||||||
|
Please start new discussions at the bottom.
"Geostatistics is a fundamentally flawed …"
Geostatistics is a fundamentally flawed variant of classical statistics because it violates the requirement of functional independence and ignores the concept of degrees of freedom. It is a scientific fact that each distance-weighted average has its own variance. Changing its name to "kriged estimate" does not alter the fact that each kriged estimate, too, has its own variance. In addition, Fisher's F-test for spatial dependence cannot be applied because kriging variances and covariances of sets of kriged estimates are simply voodoo variances. —Preceding unsigned comment added by 66.183.14.73 (talk) 22:12, December 14, 2003
- Could the person who wrote these comments identify himself or herself? Michael Hardy 00:12, 15 Dec 2003 (UTC)
Geostatistics is a fundamentally flawed variant of mathematical statistics because it violates the requirement of functional independence and ignores the concept of degrees of freedom. It is a scientific fact that each distance-weighted average had its own variance before it became a kriged estimate. Its rebirth as an honorific kriged estimate does not make its variance vanish without a trace. In addition, Fisher's F-test for spatial dependence cannot be applied not only because kriging variances and covariances of sets of kriged estimates are pseudo variances and covariances but even more so because sets of kriged estimates give zero degrees of freedom. The preceding is a revision of the first comment above by User:JanWMerks: JanWMerks on 22:23, March 26, 2006 — Paul August ☎ 17:39, 1 July 2006 (UTC)
- I think more needs to be said than what is said above, before it can be taken seriously by anyone except those knowledgeable in the theory of geostatistics. Say enough, for example, so that anyone with a PhD in statistics can understand it. Then I might help rephrase it for a broader audience than that and incorporate it into the article. Michael Hardy 22:27, 29 March 2006 (UTC)
The comment provided in the beginning of this discussion is flawed; in particular, this is due to his outright assumption that geo-stats ignore the "requirements" of statistical methodology. I would only state that this users comments in the article do not provide any discussion of views or perspective, and only serve to damage the goal for which it is stated. Geostatistics is not perfect, but remains a viable route for the study of complex phenomena over vast areas.
In BC we use geostatistics to predict the spread of the pine beatle infestation, while results may exhibit a range of predictions, it is understood that there are a tremendous number of variables which are introduced into every equation; I might also state that any failure simply improves our methods of survey and analysis. This field is important because it serves to quantify attributes and behaviour that are extremely difficult to predict without an unimaginable amount of resources dedicated to surveys and other forms of data collection.
Perhaps those who inquire about geostatistics do not want a diatribe, provided by a disgruntled statistician, as to the flaws behind the science. For those individuals who do wish to read these issues, we should provide a subheading. Finally, I state that geostatistics is a branch of mathematical statistics which will increase our understanding of all sorts of spatially varied phenomena, from weather prediction to urban systems development; so perhaps it is the concept of functional independance that is flawed, and not any minor infraction of that rule.
Bohunk 20:47, 1 April 2006 (UTC)
- Geostatistics is a fundamentally flawed variant of mathematical statistics because it violates the requirement of functional independence and ignores the concept of degrees of freedom. It is a scientific fact that each distance-weighted average had its own variance before it became a kriged estimate. Its rebirth as an honorific kriged estimate does not make its variance vanish without a trace. As a result, Fisher's F-test for spatial dependence cannot be applied because the variance of a subset of some infinite set of distance-weighted average is as invalid a measure for variability, precision and risk as its covariance is for spatial dependence. Not surprisingly because an infinite set of distance-weighted averages gives exactly zero degrees of freedom.--Iconoclast 20:42, 16 April 2006 (UTC) The preceding is a revision of the first comment above by User:JanWMerks: JanWMerks on 20:43, April 16, 2006 — Paul August ☎ 18:10, 1 July 2006 (UTC)
- Yeah, this is completely wrong. I just completed a Quantitative Methods in Geography course and there definitely was like 2 chapters about degrees of freedom. Sorry. —Preceding unsigned comment added by 130.85.149.221 (talk) 20:50, June 7, 2006
===========================================================
The person who made the rude remarks may not have expressed himself very well, and I agree with all of you about the lack of references. A lack of references always invalidates criticism. There was also a bit of confusion about the difference between empirical measurement of spatial patterns (what most geostatistics is designed for) and statistical tests of the patterns. Geostatistics are perfectly valid for empirical measurement, it is when we want to make statistical inferences about some of them that we run into a few problems.
I have just begun to worry about this myself (I study geographic variation in animals and genes). However I can explain the main problem and give some references to get around them. The critical references (for how to avoid it!) are:
Benjamini, Y. and D. Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.
Efron, B. 2007. Size, power and false discovery rates. Annals of Statistics 35, 1351-1377.
Romano, J.P. and M. Wolf. 1007. Control of generalized error rates in multiple testing. Annals of Statistics 35, 1378-1408.
Li, Y., N. Wang, M. Hong, N.D. Turner, J.R. Lupton and R.J. Carroll. Nonparametric estimation of correlation functions in longitudinal and spatial data, with application to colon carcinogenesis experiments. Annals of Statistics 35, 1608-1643.
Storey, J.D. 2002. A direct approach to false discovery rates. Journal of the Royal Statistical Society 64, 479-498.
Briefly, unlike other kinds of data, geographical (or other spatial) data are not independent because what happens at one location is likely to happen at adjacent locations. This can happen for a variety of reasons including causes which operate over a larger spatial scale than a single sampling unit (e.g. operating over several of your sample locations), and effects which spread in space. This results in a high correlation between sample values close together, but a decreasing correlation with distance. There is a large literature on measuring this, under the term 'spatial autocorrelation' as well as under the term 'geostatistics'. Statistically it is a nuisance becuase most statistical tests assume each sample is independent--manifestly not true for spatial data. The references I gave above get at this problem. I also like the Baysian approach that one of you suggested, and this may be able to avoid the spatial autocorrelation problem with suitable design. We need to find a good statistician to describe for us the exact way to do this! As I said, I'm just learning this myself, but I thought I would share the references with you anyway.
Oddly enough the first people to attack this seem to be medical researchers because phenomena in one body or tissue location can affect adjacent cells and tissues. So don't be put off by the non-standard applications in the above references! Chlamydera 17:48, 30 November 2007 (UTC)
"To user(s) who have been editing this page"
To user(s) who have been editing this page recently: Please be civil in the editing of wikis. If you don't agree with the principles of geostats that's fine, but wikipedia is not the place for direct attacks. Thank you.
S lyster 02:35, 29 March 2006 (UTC)
"POV"
The first three paragraphs of this article were highly POV, most likely Original research, and I removed them. Geostatistical methods are not only widely accepted by the scientific community, they are required by regulatory agencies, at least in the U.S. We use them every day where I work. Antandrus (talk) 03:57, 1 April 2006 (UTC)
- I edited the argument against geostatistics, now the article is accessable without having to read through a thousand words about how shitty it is. Could someone please add some info regarding the history of this discipline, and possibly expand on its socio-political, and geoscience roles, as these are increasingly evident in a wide range of applications and scientific disciplines.
- Potential areas of discussion:
- Weather analysis, urban planning, invasive species analysis, geological surveys, water sustainability applications, economic development and planning, military strategy, psychological and criminal profiling, socio-economic disparities across local/regional/global scales, epidemiology, anthropology, historical applications. There are a lot of applications, so you think of some more.
- if confused remember the golden rule: KISS - Keep It Simple Stupid.
I also removed the POV paragraphs that were re-inserted. Basic kriging is a simple Bayesian technique, with a Gaussian Process prior (encapsulated in the kernel function) and a Gaussian posterior. See the elementary tutorial at [1]. I don't understand the objections at all --- the estimate of the posterior does have a covariance, see equation (21) of the tutorial.
Now, I don't understand a lot of the elaborations in geostatistics. I don't know what pseudo-kriging is, it may be wrong. But, it seems excessive to dismiss an entire field because one algorithm may be incorrect.
hike395 21:28, 2 April 2006 (UTC)
- I want to keep it simple because I like to understand my question. Did or didn't each distance-weighted average have its own variance before it metamorphosed into a kriged estimate or kriged estimator in the 1960s when geostatistics was hailed as a new scnience. The rest are details! --Iconoclast 16:14, 3 April 2006 (UTC)
I believe that none of us understand your objection or your question. Let's break it down into little steps for us non-geostatisticians, without the jargon, OK? Let's assume that we're estimating some quantity (say, a mineral concentration), as a function of spatial location on the surface of the Earth. As I understand the steps of kriging (cast into a Bayesian framework):
- Start with a Gaussian Process prior. This means that, for any N points on the surface, there is a multivariate normal distribution that is the prior distribution of the mineral concentration. The covariance of this distribution is the kernel function that is used in the kriging. In geostats terms, the kernel function is the semivariogram, although empirical semivariograms are not guaranteed to be positive semi-definite.
- Now, we measure the concentration at N-1 of these points. These measurements themselves are assumed to have a normal distribution. The standard kriging setup seems to be homoscedastic (equal variance for each measurement), but that isn't required.
- Finally, we can compute the posterior estimate of the concentration at the Nth point on the surface. This point can have arbitrary position. The posterior estimate is in the form of a normal distribution, also. The mean and variance of the posterior at any point can be computed from von Mises' formula --- it's simply an (N-1)x(N-1) inversion of the kernel matrix evaluated at the N-1 measurement points.
Please explain: which of these steps do you think is incorrect? These are the steps outlined in the Gaussian Process tutorial, corresponding to basic kriging. Or, are you objecting to some other geostatistical method, other than basic kriging?
hike395 04:50, 4 April 2006 (UTC)
- I think I finally understand the controversy here --- I should have known, many heated controversies in statistics often boil down to Bayesian vs. frequentist assumptions. I tried to clarify this in a paragraph at kriging --- please check to see if I got this right. -- hike395 16:02, 7 April 2006 (UTC)
"I have temporarily attributed …"
I have temporarily attributed Dr. Merks' edits explicitly to himself. However, I believe that paragraph (and the citations to Merks' papers) now probably violate the Wikipedia:Vanity guidelines. What do other editors think? -- hike395 02:09, 9 April 2006 (UTC)
- I agree completely. Even it it were not an issue of vanity, I still would remove it due to the fact that the sentence is written in jargon and has a sentence structure that is unintelligible to me. There is also an issue of WP:VERIFY. --mav 03:09, 9 April 2006 (UTC)
- In addition, the recent edit calling geostatistics "an invalid variant of mathematical statistics" is a violation of NPOV and I support keeping this out of the article. Until a significant portion of the geostatistical community agrees that it is "an invalid variant of mathematical statistics", and this can be reliably referenced, this is both a violation of WP:NPOV and WP:NOR. Thanks, Antandrus (talk) 17:43, 12 April 2006 (UTC)
The problem is that calculated (functionally dependent) values do not give degrees of freedom whereas n measured values with equal weighting factors give n-1 degrees of freedom for the set and 2(n-1) for the ordered set. The variance of a subset of some infinite set of distance-weighted averages-cum-kriged estimates gives zero degrees of freedom. Figure out where the zero kriging variance and the unity kriging covariance come from and why oversmoothing is forbidden. Geostatistics was based on a human error (Krige, Matheron and their students didn't know that each distance-weighted average had its own variance) but progressed into scientific hoax when geostatistics converted Bre-X's bogus grades and Busang's barren rock into phantom gold resource whereas analysis of variance proved that Bre-X's Busang was a salting scam. Read my reviews of the first three geostatistics textbooks. And read what Philip and Watson's Matheronian Geostatistics; Quo Vadis? wrote many years ago. Somebody should read Shurtz's original thoughts. Start to think in statistical terms and and stop that mindless talk about a violation of WP's NPOV. Please tell me how to NPOV about a voodoo variant of mathematical statistics? --Iconoclast 23:57, 12 April 2006 (UTC)
- OK, now we're getting somewhere, because we have a reference [2]
- This is an unreviewed position paper, sadly. I don't have access to this paper online -- I'll have to go to the library and see what it says.
- Do you have a more complete reference for Shurtz?
- By definition, you cannot have an NPOV discussion of a "voodoo variant of mathematical statistics". You have to write it like a Martian would, if he/she were looking down on the controversy from millions of miles away. A NPOV statement would be something like
- A critical assumption of geostatistics is that spatial dependency can be treated as a stochastic process. Some people in the field disagree that this assumption accurately models real geological data.[1] Other practitioners recommend using frequentist statistical tests to test the assumption of spatial dependency.[2][3][4]
- Unfortunately, you cannot talk about the analysis of variance to detect the Bre-X scam, because that is your own work (as I understand our vanity guidelines).
My only thought regarding these comments is that hike395 has the right idea. If we are to establish an objective article regarding geostatistics, then these issues should be discussed in a subtopic; it is not that geostatistics is without controversy, but that all statistics are invariably capable of distorting the facts. In the end it is up to the individual, and their own integrity and ethics, to determine the truth based on the best statistical method available. Geostatistics is a valid research tool, but it is not independent of statistical methodology; in fact, every method of statistical analysis may be applied to spatial problems. Geostatistics is simply the recognition that natural phenomena always has a spatial dimension, and seeks to accurately represent that aspect of the statistical information gathered.
For instance, you could not accurately determine, representing on a map, the distribution of AIDS in Africa without testing every individual on the continent for the disease, then connecting them to a GPS so that their position could be properly verified, and finally by giving each individual on the map a single dot (making the map unreadable).
Geostatistics would collect the surveyed information (collected by doctors or clinics), and use that to infer the total population with AIDS in that city or country. Depending on the area that we wish to cover, we wil need to establish our survey methodology, keeping in mind that the more areas that we test, and the more thoroughly we test, the more accurate our information will be. In the case of Bre-X, it just goes to show that if it is in an individuals interests, and it is within their power, then information can be manipulated or changed to their benefit. Im sure that if tobacco companies controlled the US health statistical data on tobacco, then we would all be smoking for our health.
Bohunk 16:08, 22 April 2006 (PDT)
- Critical thinkers will realize soon enough that infinite sets of distance-weighted averages-cum-kriged estimates are the equivalent of perpetual motion in data acquistion, and that functional independence and degrees of freedom are the equivalent of the laws of thermodynamics. Visit [3], play Clark and the Kriging Game by entering coordinates beyond her sample space, and figure out how many distance-weighted averages fail to converge on the arithmetic mean and how many variances of distance-weighted averages-cum-kriged estimates fail to converge on the Central Limit Theorem. Let me know which coordinates fail this simple heuristic test. --Iconoclast 20:06, 23 April 2006 (UTC)
References
- ^ Philip, G. M. (1986). "Matheronian Statistics --- Quo vadis?". Mathematical Geology. 18 (1): 93–117.
{{cite journal}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Fortin, Marie-Josee (2005). Spatial Analysis: A Guide for Ecologists.
{{cite book}}
: Cite has empty unknown parameter:|1=
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Ullah, Ullah (1998). Handbook of Applied Economic Statistics. p. 265.
- ^ Schabenberger, Oliver (2001). Contemporary Statistical Models for the Plant and Soil Sciences. p. 653.
{{cite book}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help)
More references
- Armstrong, M and Champigny, N, 1988, A Study on Kriging Small Blocks, CIM Bulletin, Vol 82, No 923
- Clark, I, 1979, Practical Geostatistics, Applied Science Publishers, London
- David, M, 1977, Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing Company, Amsterdam
- Huff, D, 1954, How to lie with statistics, Penguin Books, New York
- Journel, A G and Huijbregts, C J, 1978, Mining Geostatistics, Academic Press Inc, London
- Lipschutz, S, 1961, Theory and Problems of Probability, McCraw-Hill Book Company, New York
- Moore, D S, 1979, Statistics: Concepts and Controversies, W H Freeman and Company, San Francisco
- Reichmann, W J, 1961, Use and abuse of statistics, Penguin Books, Middlesex, England
- Volk, W, 1980, Applied Statistics for Engineers, Krieger Publishing Company, Huntington, New York
All Wikipedians and all Krige's men cannot put the distance-weighted average and its variance together again.
Bringing knowledge of the pseudo science of assuming, kriging, smoothing and rigging he rules of mathematical statistics to the world.--Iconoclast 20:37, 16 April 2006 (UTC)
"I have to say that JanWMerks arguments are extremely convoluted"
I have to say that JanWMerks arguments are extremely convoluted. Two main points: (1) Redundancy - as I understand it, there is one basic argument repeated several times; (2) Grammar - sentence structure lacks sophistication, and if someone who didn't understand the topic were to read this they would learn nothing. If you get the opportunity check his link: [4]. I quote: "mineral sampling expert, consultant, lecturer, author, scambuster, whistleblower, 'a bit of a rebel', 'gadfly', ' iconoclast', 'pariah'" - obviously a little over the top. SCmurky 04:38, 16 June 2006 (UTC)
Mr JanWMerks, you should discuss this topic with someone as opposed to continually skewing it. I am more than willing to discuss a compromise, and am open to your pov, with good references there is no reason that your thoughts should be excluded. However to simplify this topic into an argument over Kriging and Bre-X is to completely disregard significant contributions to a variety of fields.
You may feel that Geostats does not deserve any regard, I do not believe this is the place to make that statement; instead all positions must be presented accurately, leave the reader to make their mind. SCmurky 02:15, 21 June 2006 (UTC)
- Hi, Bohunk|SCmurky, Most monikers were given by geostatisticians whose only compromise is to violate the requirement of functional independence a little but not a lot!!! Take a look at Armstrong and Champigny's A Study on Kriging Small Blocks and find out what smoothing is all about. I've asked Krige himself what happened to the variance of the single distance-weighted average before is metamorphosed into a kriged estimate. It's such a simple question but Krige and the geostatocracy rather assume, krige and smooth than respond. By the way, it was IAMG's current President who called me an iconoclast. I kinda like that! By the way, let's try to make progress in solving this problem by asking some authority to respond. Here's the first problem! In geostatistics, the distance-weighted average lost its variance even before it was reborn as an honorific kriged estimate because ...? Perhaps solving one problem at a time is the Wikipedian way to bring scientific integrity and knowledge to the world.--Iconoclast 16:22, 21 June 2006 (UTC)
Thanks, I agree with that. I did contact some people, from a few different universities, statisticians and otherwise, including my own; only a few expressed interest, although I did get some references and one or two potential contributors.
[[5]] - This seems like a good reference with plenty of material.
To conclude, and please argue if you disagree or add if I miss something, the remaining issues are:
- The inclusion of the formulae for statistical and geostatistical theory, in a context which defines their differences and explains this divergence in order to highlight the concepts behind geostatistics (why it was invented).
- Improve article structure, we should determine the context of arguments (however valid) throughout the article, in particular to avoid their repetition.
- Kriging should remain a small portion of this article, that is not all that geostatistics consists of; a more elaborate explanation of that topic should remain within the primary kriging article.
I simply feel that any argument should come after the objective explanation of the subject.
SCmurky 03:37, 24 June 2006 (UTC)
- I've a simple mind! First, I really want to know is why the variance of the single-distance-weighted average didn't make it into the geostatistical body of knowledge. My second question is why the true variance of the single distance-weighted average is replaced with the false variance of a set of degrees-of-freedom and variance-deprived distance-weighted averages. Is that too much to ask?--Iconoclast 16:07, 24 June 2006 (UTC)
Ok, I figured that out already; I am saying that there is more to geostatistics than distance weigted average (do you mean Inverse distance weighting), geostatistics follows mathematical modelling, in particular linear interpolation. Perhaps you could state in the article, with support from mathematica/statistical formulas, what you are stating above. We need to include the math to discuss it further.
I dont think that this article will satisfy your "why" question, at least not without an objective inclusion of historical and formula based information. SCmurky 03:03, 25 June 2006 (UTC)
- Geostatistics is a unique case of junk science by consensus. When Professor D G Krige was working at the Witwatersrand gold reef complex in South Africa in the early 1950s, he discovered that two or more gold assays, determined in samples selected at positions with different coordinates in a finite sample space, define an infinite set of distance-weighted averages. Professor Dr G Matheron was so taken with Krige’s aptitude in augmenting measured data with distance-weighted averages that he conferred on him the ubiquitous krige eponym.
- Krige and Matheron were unaware that one-to-one correspondence between distance-weighted averages and variances is inviolable in mathematical statistics. That is why geostatisticians replaced the true variance of the distance-weighted average with the false kriging variance of a set of kriged estimates. In time, kriging variances and kriging covariances of sets of kriged estimates became the cornerstones of geostatistics. Incredibly, the kriging variance of a set of kriged estimates is as invalid a measure for variability, precision and risk as the kriging covariance is for spatial dependence simply because geostatistics violates the requirement of functional independence and ignores the concept of degrees of freedom. Blatantly biased, shamelessly self-serving peer review turned geostatistics into a junk science by consensus.
- All it takes to reconcile irreconcilable differences is the ingenuiyt of the geostatistical mind!--Iconoclast 23:42, 25 June 2006 (UTC)
Do you have a reference, say within a mathematical or statistical journal perhaps? If you check the links below, you will see that pattern analysis is a mathematical discipline which has been studied in far more applications than kriging. I am curious as to what you think geostatistics is exactly. I see it as nothing more than the study of the mathematical/statistical properties of spatial data (eg measured tree sizes or a geological survey). SCmurky 08:18, 26 June 2006 (UTC)
Wiki Links
- Atmospheric dispersion modeling
- Support vector machine
- Numerical weather prediction
- Kernel methods
- Global climate model - read flux correction
- Linear interpolation
- Polynomial interpolation
- Spline interpolation
- Bilinear interpolation
From RfC
This business is not totally quack. Not all geostats depend on Gaussian or Bayesian analyses - one can screw up stats and make a mess of it in any field. Yes, Geostatistics as a reasonably new field has its share of screw-ups I am sure. But that is not to say it is all pseudo-science. Criminology for one makes great, successful use of it (Crime mapping). The argument here could apply to medicine - "some doctors do stupid things, therefore Western medicine is pseudoscience". That "History" piece needs total rewrite, though - very unencyclopedic the way it looks right now; I would suggest pulling it out altogether until it's tweaked.Bridesmill 05:05, 30 June 2006 (UTC)
Page refactor
I found the previous version of this page very difficult to follow, so for the sake of readability, I've refactored the page. In particular I have organized the discussions chronologically as is the established WP talk page practice. I also broke separate discussions into sections with section headings, indented comments to indicate a change of writer and signed unsigned comments. If you think I have somehow misrepresented anyones comments —I apologize — please correct as you see fit, but please try to follow standard WP practice for talkpage discussions. Thanks — Paul August ☎ 17:50, 1 July 2006 (UTC)
Just one simple question
Hello SCmurky and other interested Wikipedians, Here’s a simple question that would get geostatistics going again. To have or not to have a variance, that’s the question! The rest are details!! Does or doesn’t every distance-weighted average have its own variance? Mutually exclusive responses are yes or no but geostatisticians may have conditional responses.
If the answer is YES, complete: Yes but ... and fill in the blanks. If the answer is NO , complete: No because ... and fill in the blanks. If the answer is conditional, study the properties of variances in Volk’s "Applied Statistics for Engineers" before responding. For example, the distance-weighted average does have a variance but the kriged estimate doesn't is a cute but scientifically nonsensical response. IAMG’s brass and JMG’s brains should be requested not only to respond to the same question but also to assign the most gifted geostatistical mind to complete geostatistics. JWM--Iconoclast 18:42, 1 July 2006 (UTC)
Whether or not every distance-weighted average-turned-honorific kriged estimate has its own variance has been troubling the geostatistical fraternity since the early 1990s. Here are a few more possible answers : Who knows? Who wants to know? Who cares? What's wrong with a little kriging? Why does this ... (fill in the blanks!) not assume, krige and smooth like the rest of us? JWM. --Iconoclast 23:02, 7 July 2006 (UTC)
Cut from article
The question under dispute is whether every distance-weighted average had its own variance before it was reborn as an honorific but variance-deprived kriged estimate. The seminal work that led to the loss of the variance of the distance-weighted average can be condensed as follows: Professor D G Krige discovered at the Witwatersrand gold reef complex in South Africa in the early 1950s that two or more gold assays, determined in samples selected at positions with different coordinates in a finite sample space, define an infinite set of distance-weighted averages. Professor Dr G Matheron was so taken with Krige’s aptitude in augmenting small sets of measured data in large samples spaces with subsets of infinite sets of distance-weighted averages that he conferred on Krige the ubiquitous krige eponym. Krige and Matheron did not know that one-to-one correspondence between distance-weighted averages and variances is inviolable in mathematical statistics, which explains why pioneering geostatisticians replaced the true variance of the distance-weighted average with the false variance of a set of distance-weighted averages. In time, kriging variances and kriging covariances of sets of kriged estimates became the cornerstones of geostatistics. However, the false variance of a set of kriged estimates is as meaningless a measure for variability, precision and risk as its false covariance is for spatial dependence. Geostatistics is an invalid variant of mathematical statistics because it violates the requirement of functional independence and ignores the concept of degrees of freedom. JanWMerks --Iconoclast 21:33, 3 July 2006 (UTC)
I removed the above from the article space - given that it was signed, I am assuming that JWM erroneously placed a talk item on the article page; accidents happen. I am assuming also that non of this was intended as article, based on the OR & reasonably vitriolic slam of Matheson & Krige. I'm also statrting to see that the argument seems to focus on one set of geostats applications, in particularl focussing on assay methods; whihc is not what the article is about - 'undue weight(ing)' applies to WP articles as well as stats...Bridesmill 21:55, 3 July 2006 (UTC)
You're right. I'm new to the Wikipedian way. I've noticed Wikipedian soft- and back-pedaling because significant examples of junk science are lacking and scientific fraud has mutated into scientific misconduct. Matheron's new science of geostatistics is behind Bre-X where bogus grades and barren rock created a phantom gold resource. Most investors lost money, many lost all their savings and a few committed suicide. Read what Krige wrote in the early 1990s when Bre-X was to come and I was polite. Much more is posted on my own website. Vitriolic slam? You've seen nothing yet until you read my retro-reviews!! Some Bre-X evidence is posted on [6]. Who posted it? Why? JWM --Iconoclast 22:56, 3 July 2006 (UTC)
Sorry, but I'm not interested in Bre-X conspiracy theories. Bre-X scammed people because they were a bunch of crooks, and there was a lot more to it than just the fact they used (if they did) geostats - salting the samples for one thing. Yes, geostats are very susceptible to abuse; but that does not make them useless. The concept is used in crime-mapping, where it works to catch crooks. So to slam-dunk it as junk science because some people a. don't know how to use it or b. some people abuse the hell out of it (as happens with many disciplines) is just flawed logic. There needs to be a piece here on the weaknesses and susceptibilities, but let's not throw the baby out with the bath-water.Bridesmill 23:22, 3 July 2006 (UTC)
Neither am I. The point is that mathematical statistics proved that Bre-X was a salting scam several months before the boss salter vanished, and could have done so based on duplicate bogus assays for the first three to five boreholes. My point of contention remains that the true variance of the single-distance-weighted average cannot possibly be replaced with the false variance of a set of distance-weighted averages. Calling it an honorific kriged estimate does not make it functionally independent. JWM.--Iconoclast 00:35, 4 July 2006 (UTC)
Clark and the Kriging Game
Visit ai-geostats and open Clark and the Kriging Game. Clark’s hypothetical uranium data can be found in her 1979 Practical Geostatistics, Chapter 4. Estimation, Figure 4.1 and Table 4.1. Clark has posted this third geostatistics texbook on her website, and a retro-review is posted on mine.
Step 1. Go to Cells E10 and E11 on Sheet Kriging game and replace Clark’s coordinates of 1,265 m Easting and 713 m Northing within her sample space with 4,500 m Easting and 4,500 m Northing beyond this sample space. Observe that the distance-weighted average and arithmetic mean are numerically identical at 366 ppm, and that the variance of the weighted average and the variance of the arithmetic mean are numerically identical at 896 ppm^2. Go to Sheet xy=0-5000 and look at the distance between Clark’s sample space and the distance-weighted average of 366 ppm for the entered coordinates. Cells G42.46 show that all weighting factors converge on 1/n=0.2 whereas Cells E33 and F33 show that the degrees of freedom converge on df=n-1.
Step 2. Go to Cells E10 and E11 on Sheet Kriging game and re-enter Clark’s coordinates of 1,265 m Easting and 713 m Northing. Go to Sheet xy=0-5000 and observe the distance-weighted average of 371 ppm for this point within Clark’s sample space! Go to Sheet x=725 y=1300 and take another look! The distance-weighted average still has its own variance and confidence limits!!
Step 3. It’s easy to enter more coordinates because there’re infinite pairs to work with! Those beyond Clark’s sample space give distance-weighted averages numerically identical to the arithmetic mean of 366 ppm, and variances numerically identical to the Central Limit Theorem (896 ppm^2). Yet, didn't the variance of the distance-weighted average disappear somewhere in Clark’s sample space? It did so in Krige’s Witwatersrand complex, didn’t it? What’s wrong with this spreadsheet?
Step 4. Go to Fisher’s F-test for spatial dependence and look at Cells E5.E17. This test is applied to the variance of a random walk and the variance of a systematic walk, both of which visit each hypothetically sampled position only once. Given that the calculated value of F=4,480/2,161=2.07 is below the tabulated value of F0.05;4;8=3.84 at 5% probability, it follows that Clark’s hypothetical uranium data do no display a significant degree of spatial dependence in her sample space. Hence, the distance-weighted average of 371 ppm at Clark’s coordinates is not necessarily an unbiased estimate. Therefore, interpolation within Clark's sample space is impermissible and extrapolation beyond it is a scientific fraud.
I’ve worked with area-, count-, density-, distance-, length-, mass- and volume-weighted averages. Each of these weighted averages has its own variance. In early 1990s, we found out that the true variance of the distance-weighted average was replaced with the false kriging variance of some set of kriged estimates. The formula for the variance of a set of measured values with variable weights is also posted on the above website. It is simple to show that this formula converges on the Central Limit Theorem as all weighting factors converge on 1/n. Print out the read me file and do read it because my case against geostatistics is ongoing! Start thinking about unbiased statistical inferences such as confidence limits.
Geostatistics with its pseudo kriging variances and covariances is beyond salvation. Give each distance-weighted average its own variance and mathematical statistics is back. JWM --Iconoclast 22:48, 8 July 2006 (UTC)
Look, Ive done all that. You dont need to argue with me over this, I believe you. I dont think its a conspiracy, I do think you should calm down. Lets add your information in a logical, respectful way. Start by adding a degrees of freedom subcategory, explain the formula, discuss its implications, inform as to its abscence in geostats. SCmurky 00:00, 9 July 2006 (UTC)
Why didn't you tell me? I'm trying to explain my case against geostatistics in a clear and concise manner and nobody gives meaningful feedback. JWM. --Iconoclast 02:09, 9 July 2006 (UTC)
I believe you've received plenty of feedback but you chose to ignore it. Your case against this subject is, in terms of WP policies (Soapbox and WP:OR in particular) irrelevant. Unfortunately, we're a bit preoccupied right now working on useful content & dealing with meatspace issues, but this article has not been forgotten.Bridesmill 02:31, 23 July 2006 (UTC)
I was talking about meaningful feedback such as why does it make sense to replace the genuine variance of a SINGLE distance-weighted average with the pseudo variance of a SET of degrees-of-freedom and variance-deprived functionally dependent distance-weighted averages. JWM. --Iconoclast 04:38, 23 July 2006 (UTC)
I'm Over It
Y'know, Ive tried to be accomodating, but I find that this article is going nowhere. I have said before that geostatistics is a broader category than kriging, Ive also attempted to clarify your comments so that they may be comprehended by readers, but you persist with a singular, argumentative, and non-referenced POV statements. In any case, I shall remove any reference to kriging within this article, I believe that this argument will be better placed in the Kriging page than here, and I will not interfere with that page; but as for geostatistics, I will begin to explore statistical problem solving in geography. SCmurky 02:55, 13 July 2006 (UTC)
I'll be pleased to peruse your argument that the variance of the distance-weighted average is not replaced with the kriging variance of a set of kriged estimate when you're exploring non-geostatistical but statistical problem solving in geography. JWM.--Iconoclast 04:31, 13 July 2006 (UTC)
Start over
What about Krige? Did he really not know (ie. implies that he is ignorant), or did he choose to ignore this rule; given the reserved tone of your recent edits, I wont delete this outright, but I still need clarification to make this statement NPOV. Keep in mind however, that the majority of this controversy section should be in the kriging article, with only a short summary in the geostats article. SCmurky 00:16, 27 July 2006 (UTC)
I was not so much surprised that Krige didn’t know that each distance-weighted average gold grade had its own variance. After all, the world shunned South Africa in the early 1950s and Sichelt didn’t teach Krige the right statistical stuff. What really surprised me was that Matheron and his students didn’t know either that each functionally dependent value (of a set of independently measured values) has its own variance. Krige was drafted twice to shut up this geostatistical agnostic but to no avail. Peruse Matheron’s ponderous Foreword to Journel and Huijbregts’ 1978 Mining Geostatistics. One cannot help but feel sorry for Krige! My retro-review of this textbook explains why mathematical statistics didn’t stand a chance under Matheron’s tutelage. I subscribe to ai-geostats as part of my efforts to put more mathematical statistics into geostatistics. You know much more than I do about where to post this sort of information. Thanks for your message! JWM.--Iconoclast 16:07, 27 July 2006 (UTC)
Turning original research into criticism that observes Wikipedia policies
The criticism by Merks in the section "Criticism" was original research, and as such, cannot be part of Wikipedia. HOWEVER, the fact that Merks criticizes geostatistics *is* verifiable, notable, and as such can be reported by Wikipedia. I wrote the text so that it conveys (as well as I could) the criticism, without violating WP:V, WP:OR, and WP:NPOV. Renato (talk) 14:35, 17 August 2008 (UTC)
- In the criticism section, I removed claims against or in favor of the criticism, because it is controversial, and no suitable references could be found. The only references against the criticism (and claiming that the scientific community refutes it) are a few excerpts from internet chats, which is not strong enough. There are no references for a broad support of the criticism -- only one more example. If one has suitable references for either side, please add it. Renato (talk) 09:01, 28 November 2008 (UTC)
Geostatistics is not "Statistics in Geography" and not "Spatial Statistics"
The article as of now 2. July 2009 the article makes the impression that geostatistics is a vague collection of statistical methods in geography. Hart (1954) Central tendency in areal distributions, Economic Geography, 30, 48-59 might have coid the term in that way. However to my understanding the modern meaning of the the word is something like "statistical theory and application for processes with continues spatial index" (Cressie 1993 Statistics of spatial data, Wiley) and looking at the citations list of the article with the word geostatistics in it, which I know and contain geostatistics in the title:
4. Clark I, 1979, Practical Geostatistics, Applied Science Publishers, London 5. David, M, 1977, Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing Company, Amsterdam 7. Chilès, J.P., Delfiner, P. 1999. Geostatistics: modelling spatial uncertainty, Wiley Series in Probability and Mathematical Statistics, 695 pp. 8. Deutsch, C.V., Journel, A.G, 1997. GSLIB: Geostatistical Software Library and User's Guide (Applied Geostatistics Series), Second Edition, Oxford University Press, 369 pp., http://www.gslib.com/ 9. Deutsch, C.V., 2002. Geostatistical Reservoir Modeling, Oxford University Press, 384 pp., http://www.statios.com/WinGslib/index.html 10. Isaaks, E.H., Srivastava R.M.: Applied Geostatistics. 1989. 12. Journel, A G and Huijbregts, 1978, Mining Geostatistics, Academic Press 14. Lantuéjoul, C. 2002. Geostatistical simulation: models and algorithms. Springer, 256 pp. 16. Matheron, G. 1962. Traité de géostatistique appliquée. Tome 1, Editions Technip, Paris, 334 pp. 19. Merks, J W, 1992, Geostatistics or voodoo science, The Northern Miner, May 18 21. Myers, Donald E.; "What Is Geostatistics? 22. Philip, G M and Watson, D F, 1986, Matheronian Geostatistics; Quo Vadis?, Mathematical Geology, Vol 18, No 1 26. Wackernagel, H. 2003. Multivariate geostatistics, Third edition, Springer-Verlag, Berlin, 387 pp.
all understand geostatistics in as something that solves or are related to the Problem of interpolating and (to a minor degree) analysing a random function (process) with a continues index observed at some locations, coming from the geological definition of Matheron of geostatistics focusing of inferring ore reserves. The same hold true of
Modern spatiotemporal Geostatistics by G. Christakos or
Nonparametric Geostatistics von S. Henley
Applied Geostatistics with SGeMS: A User's Guide von Nicolas Remy, Alexandre Boucher, und Jianbing Wu von Cambridge University Press (Gebundene Ausgabe - 22. Januar 2009) An Introduction to Applied Geostatistics von Edward H. Isaaks, Isaaks, und R. Mohan Srivastava von Oxford Univ Pr (Taschenbuch - 11. Januar 1990)
Basic Linear Geostatistics von Margaret Armstrong von Springer, Berlin (Taschenbuch - November 1997)
In the same way spatial statistics is not a part of geostatistics but the other way round geostatistics a spatial case of spatial statistics. However the article totally misinterprets the term:
Going the through the current TOC:
* 1 Background o 1.1 Role of statistics in geography (As said, geography is not the background of geostatistcs, even though Toblers law of geography is of the cited in this context. The typical background is (mineral resource) geology. ) * 2 Spatial data and descriptive statistics o 2.1 Boundary delineation (that spatial statistics) o 2.2 Modifiable areal units (thats spatial statistics) o 2.3 Spatial aggregation/scale problem (The upscaling problem is a topic in geostatistics, but not as aggregation of political units, but as mean values over (e.g. mining) blocks) * 3 Descriptive spatial statistics (point pattern analysis is a part of spatial statistics but not of geostatistics, see e.g. Cressie 1993 as cited above) o 3.1 Spatial measures of central tendency (thats related to districts and not random fields) o 3.2 Spatial measures of dispersion (this part is completely uninformative) * 4 Topology (topology is not statistics, not even geostatistics, its merily a geospatial problem) o 4.1 Topology rules (still not statistics) * 5 Computational Geometry (computational geometry is not even statistics) * 6 Topography (might be interpolated with geostatistics, but that is not mentioned here) * 7 Sampling methodology (geospatial sampling seams to be the first word after the first chapter actually related to the subject of geostatistics) * 8 Criticism (This is clearly a paragraph critizinging kriging which is a major subject of geostatistics. Although I disagree with the cited work by Merks, it is the first part acutally related to geostatistics. Strictly speaking however this chapter belongs to Kriging and not here.) * 9 Related software (this are all real geostatistical software packages and should stay here) * 10 Notes (Should go to references and it is fully sufficient to cite Mr. Merks once) * 11 References (Most of them are really good references on geostatistics, but most of them are not cited above) * 12 See also (Only the kriging link seams appropriate) * 13 External links (quite ok)
In conclusion, the article is talking about the wrong thing, although in a previous age somebody knowing the subject seamed to have added references, software and web-sites.
In my view a current article must explain the main purpose: interpolation, conditional simulation, the main methodology: analysing spatial dependence, and the main methods of geostatistics: Variograms, Kriging (Matheronian Geostatistics), Baysian Maximum Entropy (following the works of George Christakos, e.g. http://www.unc.edu/depts/case/BMElab/), Multiple point methods like SNESIM and FILTERSIM (following the works of Journel, Strebell and Caers,e.g. http://pangea.stanford.edu/groups/iamg/workshop1.pdf ).
I propose is to remove chapters 1-7 and to rewrite the whole article in that sense.
Any comments.
Boostat (talk) 15:58, 2 July 2009 (UTC)
- I am much in favor that someone who is knowledgeable in this area does revamp the article to to provide a verifiable and neutral description of Geostatistics. So, Boostat, I completely support you if you are willing to do this. Renato (talk) —Preceding undated comment added 08:47, 3 July 2009 (UTC).
- It looks like the best course would be to move much of what is already here into the existing (but only 6 line) article statistical geography. A brief mention of earlier terminology would be a suitable way to point readers to that article if neccesary. Melcombe (talk) 09:07, 3 July 2009 (UTC)
- I also agree that this article needs a rewrite, with much of the content more appropriate for other articles. Tdslk (talk) 20:49, 7 August 2009 (UTC)
- Right on!!! About time that some new people got involved. I agree that this article could use a rewrite. One distinction that I would like to understand is that between Geostatistics, Spatial Statistics, and Statistical Geography. Also, is Spatial Statistics the same thing as Spatial Analysis? As both of these topics are in the Spatial Analysis article. Thanks for the comments, I would like to get this article up to shape; just don't go on a mass deleting frenzy, as this article used to be nothing more than a rant against kriging. If we can nail down the appropriate topics that each section should be listed under, then I am all for moving information around. Seeing how I created much of the ToC, I will reply to Boostat's ToC.
Going the through the current TOC: 1. Background
- Role of statistics in geography
- As said, geography is not the background of geostatistcs, even though Toblers law of geography is of the cited in this context. The typical background is (mineral resource) geology.
2. Spatial data and descriptive statistics
- Boundary delineation (that spatial statistics)
- Modifiable areal units (thats spatial statistics)
- Spatial aggregation/scale problem (The upscaling problem is a topic in geostatistics, but not as aggregation of political units, but as mean values over (e.g. mining) blocks)
3. Descriptive spatial statistics (point pattern analysis is a part of spatial statistics but not of geostatistics, see e.g. Cressie 1993 as cited above)
- Spatial measures of central tendency (thats related to districts and not random fields)
- Spatial measures of dispersion (this part is completely uninformative)
4. Topology (topology is not statistics, not even geostatistics, its merily a geospatial problem)
- Topology rules (still not statistics)
5. Computational Geometry (computational geometry is not even statistics) 6. Topography (might be interpolated with geostatistics, but that is not mentioned here) 7. Sampling methodology (geospatial sampling seams to be the first word after the first chapter actually related to the subject of geostatistics) 8. Criticism (This is clearly a paragraph critizinging kriging which is a major subject of geostatistics. Although I disagree with the cited work by Merks, it is the first part acutally related to geostatistics. Strictly speaking however this chapter belongs to Kriging and not here.) 9. Related software (this are all real geostatistical software packages and should stay here) 10. Notes (Should go to references and it is fully sufficient to cite Mr. Merks once) 11. References (Most of them are really good references on geostatistics, but most of them are not cited above) 12. See also (Only the kriging link seams appropriate) 13. External links (quite ok)