Wikipedia talk:Wikipedia Signpost/2024-04-25/Recent research: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
reply
Line 7: Line 7:
:Regards, [[User:HaeB|HaeB]] ([[User talk:HaeB|talk]]) 19:25, 25 April 2024 (UTC) (Tilman)
:Regards, [[User:HaeB|HaeB]] ([[User talk:HaeB|talk]]) 19:25, 25 April 2024 (UTC) (Tilman)
::Hi, Mr./Dr. Bayer, thank you for your enthusiastic defense. Your sample size is admirable. Maybe our difficulty is in defining terms. I use the term convenience to describe samples created at the convenience of the researcher, to include self-selected participants. The latter is the problem here. I have no knowledge of statistics to share, only the admonition from a former professor that convenience surveys are the weakest sort. It's pretty simple: I never do surveys. My sister always does. The same caveat applied when Elon Musk [https://twitter.com/elonmusk/status/1604617643973124097 asked] whether he should step down as head of Twitter. His answer looks legitimate and scientific all the way down to one decimal point. I promise to read your article and all of its sources in detail (which I have not had a chance to do) after my editing chores are done. -[[User:SusanLesch|SusanLesch]] ([[User talk:SusanLesch|talk]]) 13:55, 26 April 2024 (UTC)
::Hi, Mr./Dr. Bayer, thank you for your enthusiastic defense. Your sample size is admirable. Maybe our difficulty is in defining terms. I use the term convenience to describe samples created at the convenience of the researcher, to include self-selected participants. The latter is the problem here. I have no knowledge of statistics to share, only the admonition from a former professor that convenience surveys are the weakest sort. It's pretty simple: I never do surveys. My sister always does. The same caveat applied when Elon Musk [https://twitter.com/elonmusk/status/1604617643973124097 asked] whether he should step down as head of Twitter. His answer looks legitimate and scientific all the way down to one decimal point. I promise to read your article and all of its sources in detail (which I have not had a chance to do) after my editing chores are done. -[[User:SusanLesch|SusanLesch]] ([[User talk:SusanLesch|talk]]) 13:55, 26 April 2024 (UTC)
:::I still sense a lot of confusion here.
:::{{tq|Your sample size is admirable.}} - Not sure what you mean by the possessive pronoun here, I was not involved at all with this survey.
:::{{tq|Maybe our difficulty is in defining terms.}} - If you were using the term "[[convenience sampling]]" in a different meaning than the established one, it would have been good to clarify that from the beginning.
:::{{tq|to include self-selected participants}} - It sounds like you are referring to the mundane fact that participation in the survey was voluntary, which is the case for almost all large-scale social science surveys (and even legally compulsory surveys like the US census have great trouble achieving a 100% [[response rate (survey)|response rate]] and avoiding undercounting). Again, while this might cause [[participation bias]]es, these can be examined and to some extent handled (see above). It's not a valid reason for dismissing such empirical results out of hand.
:::I am also very unclear about the relevance of your sister and Elon Musk to this conversation, except perhaps that the latter's social media use illustrates the dangers of shooting off snarky one-sentence remarks based on a very incomplete understanding the topic being discussed. In any case, I appreciate your intention to now actually read the Signpost story that you have been commenting on.
:::Regards, [[User:HaeB|HaeB]] ([[User talk:HaeB|talk]]) 21:00, 26 April 2024 (UTC)
:It is great that we have some new good survey data about the community. It is ridcolous they are not available under open licence as open data, and that such a big survey was done without WMF cooperating with this and/or ensuring the data will be available. This is something for the mentioned white paper on best research practices to consider, actually. --<sub style="border:1px solid #228B22;padding:1px;">[[User:Piotrus|Piotr Konieczny aka Prokonsul Piotrus]]&#124;[[User talk:Piotrus|<span style="color:#7CFC00;background:#006400;"> reply here</span>]]</sub> 00:57, 26 April 2024 (UTC)
:It is great that we have some new good survey data about the community. It is ridcolous they are not available under open licence as open data, and that such a big survey was done without WMF cooperating with this and/or ensuring the data will be available. This is something for the mentioned white paper on best research practices to consider, actually. --<sub style="border:1px solid #228B22;padding:1px;">[[User:Piotrus|Piotr Konieczny aka Prokonsul Piotrus]]&#124;[[User talk:Piotrus|<span style="color:#7CFC00;background:#006400;"> reply here</span>]]</sub> 00:57, 26 April 2024 (UTC)
::I am a bit confused about what you are referring to.
::I am a bit confused about what you are referring to.

Revision as of 21:00, 26 April 2024

Discuss this story

Wikipedians are more careful than to believe in the results of convenience sampling. -SusanLesch (talk) 14:21, 25 April 2024 (UTC)[reply]

Huh, can you explain in more detail why you characterize the sampling method used by this survey as "convenience sampling"? That term is most often used for methods that rely on a grossly unrepresentative population (say surveying a class of US college students for making conclusions about all humans). But "people who access the Wikipedia website within a given timespan" is a pretty reasonable proxy for "Wikipedia users" (in the general sense).
For context: Recruitment of survey participants via banners or other kinds of messages on the Wikipedia website itself is kind of the state of the art in this area. (It has also been used in numerous editor and reader surveys conducted by the Wikimedia Foundation.) It e.g. forms the basis of many of the most-cited results on e.g. the gender gap among Wikipedia editors. Yes, it comes with various biases (which, as already indicated in the review, one can try to correct after the fact using various means, see e.g. our earlier coverage here of an important 2012 paper which did this regarding editors: "Survey participation bias analysis: More Wikipedia editors are female, married or parents than previously assumed", and the WMF's "Global Gender Differences in Wikipedia Readership" paper also listed in this issue). But so does any other method (door-knocking, cold-calling landline telephones, etc. - and regarding phone surveys, these biases have become much worse in the last decade or so, at least in the US, as political pollsters have found out).
In sum, it's fine to call out specific potential biases in such surveys (e.g. I have been reminding people for a over a decade now that - per the aforementioned 2012 paper - one of the best available estimate for the share of women editors in the US is 22.7% as of 2008, considerably higher than various other numbers floating around). But dismissing their results entirely strikes me as a nirvana fallacy.
Regards, HaeB (talk) 19:25, 25 April 2024 (UTC) (Tilman)[reply]
Hi, Mr./Dr. Bayer, thank you for your enthusiastic defense. Your sample size is admirable. Maybe our difficulty is in defining terms. I use the term convenience to describe samples created at the convenience of the researcher, to include self-selected participants. The latter is the problem here. I have no knowledge of statistics to share, only the admonition from a former professor that convenience surveys are the weakest sort. It's pretty simple: I never do surveys. My sister always does. The same caveat applied when Elon Musk asked whether he should step down as head of Twitter. His answer looks legitimate and scientific all the way down to one decimal point. I promise to read your article and all of its sources in detail (which I have not had a chance to do) after my editing chores are done. -SusanLesch (talk) 13:55, 26 April 2024 (UTC)[reply]
I still sense a lot of confusion here.
Your sample size is admirable. - Not sure what you mean by the possessive pronoun here, I was not involved at all with this survey.
Maybe our difficulty is in defining terms. - If you were using the term "convenience sampling" in a different meaning than the established one, it would have been good to clarify that from the beginning.
to include self-selected participants - It sounds like you are referring to the mundane fact that participation in the survey was voluntary, which is the case for almost all large-scale social science surveys (and even legally compulsory surveys like the US census have great trouble achieving a 100% response rate and avoiding undercounting). Again, while this might cause participation biases, these can be examined and to some extent handled (see above). It's not a valid reason for dismissing such empirical results out of hand.
I am also very unclear about the relevance of your sister and Elon Musk to this conversation, except perhaps that the latter's social media use illustrates the dangers of shooting off snarky one-sentence remarks based on a very incomplete understanding the topic being discussed. In any case, I appreciate your intention to now actually read the Signpost story that you have been commenting on.
Regards, HaeB (talk) 21:00, 26 April 2024 (UTC)[reply]
It is great that we have some new good survey data about the community. It is ridcolous they are not available under open licence as open data, and that such a big survey was done without WMF cooperating with this and/or ensuring the data will be available. This is something for the mentioned white paper on best research practices to consider, actually. --Piotr Konieczny aka Prokonsul Piotrus| reply here 00:57, 26 April 2024 (UTC)[reply]
I am a bit confused about what you are referring to.
It is ridcolous they are not available under open licence as open data - the dataset is available (it's how I was able to create the graphs for this review, after all), and licensed under CC-BY SA 4.0.
such a big survey was done without WMF cooperating with this - judging from the project's page on Meta-wiki, the team extensively cooperated with the Wikipedia communities where the survey was to be run (and also invited feedback from some WMF staff who had previously run related surveys). Plus they followed best practices by creating this public project page on Meta-wiki in the first place (actually on your own suggestion it seems?), something even some WMF researchers occasionally forget unfortunately. What's more, the team also notified the research community in advance on the Wiki-research-l mailing list.
Regards, HaeB (talk) 03:46, 26 April 2024 (UTC)[reply]
PS: Also keep in mind that the Wikimedia Foundation has so far not been releasing any datasets from its somewhat comparable "Community Insights" editor surveys. (At least that is my conclusion based on a cursory search and this FAQ item; CCing TAndic and KCVelaga to confirm.) So I am unsure why you are confident that a collaboration with WMF would have been ensuring the data will be available.
PPS: To clarify just in case, I entirely agree with you on the principle that (sanitized) replication data for such surveys should be made available as open data.
Regards, HaeB (talk) 04:08, 26 April 2024 (UTC)[reply]