Talk:Data mining

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Data mining in Econometrics[edit]

Is it possible/desirable to include the rather different approach/attitude to data mining that econometricians tend to have? I was expecting to find this here but found a rather different approach. The stuff I have in mind might be found here: Lovell, Michael C. (1983) ‘Data mining’, The Review of Economics and Statistics, 65: 1–12., here Hoover, Kevin D. (1995) ‘In defense of data mining: some preliminary thoughts’, in Kevin D. Hoover and Steven M. Sheffrin (eds) Monetarism and the Methodology of Economics: Essays in Honour of Thomas Mayer. Aldershot: Edward Elgar or here Kevin D. Hoover and Stephen J. Perez (2000) Three attitudes towards data mining, Journal of Economic Methodology 7:2, 195–210. In this article they offer a definition of data mining:

"Data mining’ refers to a broad class of activities that have in common a search over different ways to process or package data statistically or econometrically with the purpose of making the final presentation meet certain design criteria."
And they list three attitudes towards it. "Data mining is"
  1. "to be avoided and, if it is engaged in, we must adjust our statistical inferences to account for it"
  2. "inevitable and that the only results of any interest are those that transcend the variety of alternative data mined specifications.
  3. "essential and that the only hope that we have of using econometrics to uncover true economic relationships is to be found in the intelligent mining of data."

This stuff does seems to be using data mining more like a synonym for some aspects of data dredging. Anyway I was expecting to find stuff on this data mining here but didn't and think it ought to be somehwere. Best wishes (Msrasnw (talk) 10:38, 11 September 2012 (UTC))

Yes, this clearly refers to the "old" use of the term data mining with respect to generating hypotheses, which is covered by the article data dredging. The term "data mining" is way too broad to cover everything, and it is not used consistently (or correctly) throughout literature. The much clearer defined term is "knowledge discovery". I do not think we need to cover all abuses of the term in the article, but instead we should focus on the "knowledge discovery" based term; other are to be references as "maybe you are looking for: data dredging". --Chire (talk) 11:08, 15 September 2012 (UTC)
I think this alternative negative use of data mining needs to be more acknowledged in the article. It what is commonly meant by the term among many scientists. --Pengortm (talk) 22:13, 11 August 2014 (UTC)

Temporal data mining[edit]

This page needs to include info on temporal data mining. An attempt to do so was undone by user Chire in order to remove a valid reference. Can someone please add the temporal section again and use a more relevant reference if they can find it? — Preceding unsigned comment added by (talk) 02:56, 31 August 2014 (UTC)

Not a misnomer[edit]

I don't want to create OR, but the term data mining is clearly not a misnomer as described in the lead. "To mine" when used with a direct object can mean "to avail oneself of or draw useful or valuable material from" so the expression "data mining" is apt. — Preceding unsigned comment added by 2601:141:300:49e0:6d4f:e9b2:8f3c:671c (talk) 04:39, 24 July 2015‎ (UTC)

Well, the source given (high reputation textbook) says so, and we don't do OR.
Furthermore gold mining, coal mining, iron mining, ... etc. clearly demonstrate the critizised pattern why it should probably have been called "knowledge mining". --Chire (talk) 09:28, 24 July 2015 (UTC)
Agreed that it is not a misnomer. The source, as listed here, has a misspelling in it, which is not promising. The comparison with iron mining is facile too. It's like saying, "It should not be called 'iron mining,' it should be called 'bridge mining from iron' because bridges are made out of iron." I am deleting the piece about misnomer unless someone can find an additional independent quote to support it as a misnomer. --GoldCoastPrior (talk) 13:55, 24 July 2015 (UTC)
@GoldCoastPrior: Seriously? That source has 25239 citations on Google scholar: [1]. That typo is not in the book. How about just looking it up, and fixing the typo? --Chire (talk) 18:54, 24 July 2015 (UTC)
I agree that it is not a misnomer. The sentence describing it as a misnomer seems to be implying that data mining should mean the creation of data. I would say that data creation should be called research. It would really help, I think, to have a section which draws a distinction between data mining and plain statistical analysis. Also, clarify data mining versus big data. I disagree that cluster analysis is an example of data mining or anything else where the point is to look at the big picture rather than any specific datum. That would be big data. As the introduction to this article states, "The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use." The key word is "extract." It's called data mining because you are extracting specific desired data from a greater source. It's like extracting a specific mineral from a mine. — Preceding unsigned comment added by (talk) 15:48, 14 July 2016 (UTC)
Well, it is not much about our opinion. One of the most important textbooks discusses this - it's a sourced, notable opinion. The argumentation is along the lines that "coal mining" refers to finding coal (in rock), "gold mining" refers to finding gold (in rock). But in "data mining", most "data" is just rock, not nuggets. So it's not about finding "data from bits", but about finding "insights from data". That is a valid point of view; but most importantly it's a sourced point of view. It does not matter if we agree... Chire (talk) 13:03, 16 July 2016 (UTC)

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 2 external links on Data mining. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

You may set the |checked=, on this template, to true or failed to let other editors know you reviewed the change. If you find any errors, please use the tools below to fix them or call an editor by setting |needhelp= to your help request.

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

If you are unable to use these tools, you may set |needhelp=<your help request> on this template to request help from an experienced user. Please include details about your problem, to help other editors.

Cheers.—InternetArchiveBot (Report bug) 08:04, 7 December 2016 (UTC)

Wikipedia:Data mining Wikipedia[edit]

I just created Wikipedia:Data mining Wikipedia that attempts to inform about data mining projects that made use of Wikipedia as well as help those who intend to do so.

Maybe you can help out with it?

--Fixuture (talk) 15:21, 29 January 2017 (UTC)

Subheading 7 and 8 Question[edit]

Hello everyone, I have to ask a question on this article for a class assignment. I was wondering why is that for subheading 7 privacy concerns and ethics and subheading 8 Copyright laws only discusses the situation in Europe and United States. How about the situation regarding these issues in Asia for example? Lbeteta28 (talk) 19:15, 14 March 2017 (UTC)

Obviously nobody wrote it yet. Do you have reliable sources? (talk) 20:30, 14 March 2017 (UTC)
@Lbeteta28: Please avoid overly generic statements such as "China has a great deal of influence in the Asian region". You do not need to tell a reader that China is important! Try to deliver the facts. Never copy and paste. Right now, your contribution adds effectively nothing to the article besides "there is no privacy law". For references (and this probably is a good reference, but you should try to distill the essence out of it), do not just give an URL, but give authors, title, journal etc. HelpUsStopSpam (talk) 19:28, 16 March 2017 (UTC)
@HelpUsStopSpam:I'll keep all those suggestions in mind if I make any more contributions to a wikipedia page. In regards to my question, obviously nobody has written it yet, but why? I feel like a wikipedia article on something as broad as data mining should be complete. Nevertheless, thanks for your input.
Lbeteta28 (talk) 19:41, 16 March 2017 (UTC)
@N2W20ST8CHMP: I guess there are rather few experts on Chinese privacy regulations here. I certainly know too little to write anything here. It should be well researched, with appropriate references, not just some random things for fake completeness. HelpUsStopSpam (talk) 19:15, 17 March 2017 (UTC)