Talk:Data mining

This is the talk page for discussing improvements to the Data mining article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Mass surveillance

	Data mining is within the scope of WikiProject Mass surveillance, which aims to improve Wikipedia's coverage of mass surveillance and mass surveillance-related topics. If you would like to participate, visit the project page, or contribute to the discussion.Mass surveillanceWikipedia:WikiProject Mass surveillanceTemplate:WikiProject Mass surveillanceMass surveillance articles
???	This article has not yet received a rating on the project's importance scale.

Computing High‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
High	This article has been rated as High-importance on the project's importance scale.
	An editor has requested that an image or photograph be added to this article.

Computer science High‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Databases (inactive)

This article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.DatabasesWikipedia:WikiProject DatabasesTemplate:WikiProject DatabasesDatabases articles

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
High	This article has been rated as High-importance on the importance scale.

Archives

1, 2, 3

2009, 2010, 2011, 2012, 2013, 2014, 2016, 2017, 2018, 2022

This page has archives. Sections older than 31 days may be automatically archived by when more than 4 sections are present.

Merge Analytics into Data mining

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.

To not merge given that there are different topics with different sets of readers; readers are best served by keeping the pages separate. Klbrain (talk) 17:58, 9 November 2022 (UTC)[reply]

The jargon "analytics", which could mean any sort of analysis whatsoever, appears to be trying to monopolise a generic English word for a very specific topic, hiding the real meaning. I think that any useful content there could be merged either into this article, data mining, or maybe rather Examples of data mining, since the content seems to be more about specific examples of data mining, under the name "analytics", rather than about the methods of data mining themselves. Another possible target for a merger would be Data analysis. In any case, Analytics is clearly quite poor quality currently, full of businessy jargon trying to pretend that as long as enough people follow the fashion, you can pretend that there's some new meaning there. Please state support or oppose in bold; if support, then please state which target article (data mining, examples of data mining, or data analysis) you recommend for the merger of analytics (we should in principle add {{merge from}} templates to those target articles too...) Boud (talk) 15:03, 30 June 2022 (UTC)[reply]

Support merging the Analytics#Applications section with Examples of data mining into a more encyclopedic Applications of data mining. I think the subsections there would complement the disjointed lists of examples making up Examples of data mining currently.

Then I would redirect Analytics to Data analysis. I think that the Analytics buzzword is intended to have a broader scope than Data mining (at least according to my reading of the convoluted Analytics#Analytics vs analysis section), which makes Data analysis a better redirect target. Felix QW (talk) 09:56, 2 July 2022 (UTC)[reply]

Oppose It sounds like you are POV-pushing against the use of business jargon. It's better to remain neutral and just summarize the sources. Analytics is a broad term according to Gartner[1], but it is always almost used in a business context. My guess (I haven't performed a proper WP:BEFORE search) is that there is probably enough sourcing out there for a standalone article on the topic. But if editors come to consensus for a merge, then analytics is one aspect of Business intelligence and is better merged there. --{{u|Mark viking}} {Talk} 23:08, 3 July 2022 (UTC)[reply]

I am certainly pushing in favour of Wikipedia being an encylopaedia, with entries of knowledge about the real world. Words that are common but meaningless make more sense in the Wiktionary. Boud (talk) 20:33, 3 July 2022 (UTC)[reply]

It seems you have contempt for the business world and its jargon, and that is compromising your objectivity with respect to this topic. The encyclopedia is better served by editors summarizing reliable sources, not injecting their personal opinions into article content. Show some reliable sources that say analytics is meaningless and those that coined the term are nefariously trying to monopolize an English word, and we could add that criticism to the article. But it's also clear that there is a population of business folk, such as business analysts and consultants, who use this term and find it useful for charactering various forms of business intelligence. Summarizing their approach using RS is the best approach to developing this article. --{{u|Mark viking}} {Talk} 23:08, 3 July 2022 (UTC)[reply]

Hostility towards business sources does make sense to me (at least hostility towards ONLY using business sources, which is currently the case in the data analytics article). Data analysis is a broad scientific field and should not be defined solely by people who have a clear incentive to generate novel definitions at cost of sensibility. 98.43.49.101 (talk) 20:02, 21 October 2022 (UTC)[reply]

Oppose as the term "analytics" has evolved into a catch-all term for analysis of information. It seems that any person who applies information analysis to their domain can be considered to be doing "analytics." Although analytics has increasingly moved towards a role of describing statistical information in the business sphere, [2] it would not make sense as an "example of data mining" as all processes of data mining would include techniques in analytics. Meanwhile, IT and business users share interest in analytics departments, which might best make sense as an aspect of Business Intelligence, as @Mark viking mentioned. ZacharyWalkerPinto (talk) 15:58, 4 July 2022 (UTC)[reply]

Oppose, the article "data mining" is more on the academic use; analytics is business jargon. Or to put it differently, one is about the methodology, the other about the business purpose, and the third is on example applications. One of the reasons to create examples of data mining in the first place was to make the article less crowded with a rather useless list of examples (that attracts a lot of spam). Maybe move the more concrete applications from Analytics there, too. Chire (talk) 23:48, 6 July 2022 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

New Data Mining Process models

I have noticed that someone is keep deleting a new cited work for a process model that was published by IEEE. However, the citation of other papers was simply allowed including ones that simply compare process models to each other. In addition, the article is full with citation to a less significant work, journal articles and other type of research papers.

Somebody suggested building a consensus about this issue using the talk page. Please refere to the cited references using the following link: https://ieeexplore.ieee.org/iel7/6287639/8948470/09263253.pdf

— Preceding unsigned comment added by 176.29.83.94 (talk) 16:05, 11 July 2022 (UTC)[reply]

This is not a place for you to self promote. Please stop spamming us. MrOllie (talk) 12:15, 1 August 2022 (UTC)[reply]

India Education Program course assignment

This article was the subject of an educational assignment supported by Wikipedia Ambassadors through the India Education Program.

The above message was substituted from {{IEP assignment}} by PrimeBOT (talk) on 19:55, 1 February 2023 (UTC)[reply]

Data Mining versus Factor Analysis

Data mining is a large scale effort to increase the possibility of finding something that an investigator doesn’t know when a better procedure is Factor Analysis, a statistical analysis process developed by Dr Benjamin Fructer. This process uses both repeated linear and nonlinear regression to determine factors within the data. Factor Analysis can be used to test hypotheses or investigate a database for variables that are related. It is preferable to large scale data snooping because it provides statistical significance estimates. DoctorDuncan (talk) 01:42, 5 May 2023 (UTC)[reply]

Assuming you mean the author of Introduction to Factor Analysis (1954), I think his name is spelled "Benjamin Fruchter". Factor analysis is much older than that, but my understanding is that the term "data snooping" is typically used pejoratively for the misuse of tools to "provide" statistical significance, not for the specific tools themselves. Perhaps I'm wrong about this, but regardless, you will need a reliable source to add any of this to the article. Wikipedia doesn't publish original research. Grayfell (talk) 05:35, 5 May 2023 (UTC)[reply]

Wiki Education assignment: IFS213-Hacking and Open Source Culture

This article was the subject of a Wiki Education Foundation-supported course assignment, between 5 September 2023 and 19 December 2023. Further details are available on the course page. Student editor(s): Hacksasaur (article contribs). Peer reviewers: Yaman Shqeirat.

— Assignment last updated by T57fd (talk) 00:23, 1 December 2023 (UTC)[reply]

Definition of data mining in IEEE at least 8 definition also mention books name, write name,year of publication etc.

Definition of data mining in IEEE at least 8 definition also mention books name, write name,year of publication etc. 203.215.178.62 (talk) 12:50, 2 November 2023 (UTC)[reply]

Data Mining Downsides

I am adding this section to the Data Mining main page because I think it is important to note that there are cons to Data Mining. I feel that illuminating a section such as this will give users a better idea of what Data Mining is and how big of an undertaking it can be, as well as why it can be very difficult for independent persons or small business to data mine. Users who are not very experienced with technology may feel overwhelmed by Data Mining and might need something laid out to them that very easily shows them what to be wary of when it comes to Data Mining. Wkobrien2 (talk) 18:17, 21 February 2024 (UTC)[reply]

I reverted it - advertising materials such as vendor blogs are not considered reliable sources on Wikipedia. MrOllie (talk) 18:30, 21 February 2024 (UTC)[reply]