User:The Anome: Difference between revisions
Appearance
Content deleted Content added
→Geodata to-do: *** I've now used spatial disambiguation to resolve some 2000+ of these. |
***And this: A verified Arabic-IPA mapping for Arabic transcription ... http://eprints.whiterose.ac.uk/79653/1/brierley14jss.pdf ***And this: http://geonames.nga.mil/gns/html/romanization.html |
||
Line 19: | Line 19: | ||
***It looks like a lot of this might be repetition of the same location in multiple places: the bot's code gets 7000+ multi-matches for Iran |
***It looks like a lot of this might be repetition of the same location in multiple places: the bot's code gets 7000+ multi-matches for Iran |
||
***See also this paper: "Cross linguistic name matching in English and Arabic: a "one to many mapping" extension of the Levenshtein edit distance algorithm" in {{cite doi|10.3115/1220835.1220895}} |
***See also this paper: "Cross linguistic name matching in English and Arabic: a "one to many mapping" extension of the Levenshtein edit distance algorithm" in {{cite doi|10.3115/1220835.1220895}} |
||
***And this: A verified Arabic-IPA mapping for Arabic transcription ... http://eprints.whiterose.ac.uk/79653/1/brierley14jss.pdf |
|||
***And this: http://geonames.nga.mil/gns/html/romanization.html |
|||
** [[:Category:Pakistan articles missing geocoordinate data]] (~3500 articles) -- transliteration problems, presumably |
** [[:Category:Pakistan articles missing geocoordinate data]] (~3500 articles) -- transliteration problems, presumably |
||
*** ''Note:'' 700+ multimatches from bot code |
*** ''Note:'' 700+ multimatches from bot code |
Revision as of 14:47, 10 May 2015
The Anome is a second-wave Wikipedian.
The Anome abides.
Interesting reading
To do
- Perhaps an artcle on solar influence on radioactive decay, if other reports emerge to confirm the initial tantalizing findings? It's so easy to set up a reasonably well-controlled experiment, even with the most minimal scientific resources, that I would imagine that science labs worldwide are rushing to set up their own.
- An article on signal processing delay / signal processing latency
- Liaise with User:PhotoCatBot's operator?
Geodata to-do
- Why do Republic of Dagestan etc. articles escape the {{coord missing}} sorter?
- Possible low-hanging fruit for geocoding: the following categories have thousands of non-geocoded articles that are not getting matched by my current software, and may benefit from special-purpose matching heuristics:
- Category:Brazil articles missing geocoordinate data (was 3000+ articles, now 2,478 as of 2015-03-25) -- ??
- Note: most of these appear to be rivers -- just matched 500+ of these by translating GNS names
- Category:Iran articles missing geocoordinate data (13,000+ articles) -- transliteration problems, presumably
- It looks like a lot of this might be repetition of the same location in multiple places: the bot's code gets 7000+ multi-matches for Iran
- See also this paper: "Cross linguistic name matching in English and Arabic: a "one to many mapping" extension of the Levenshtein edit distance algorithm" in Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.3115/1220835.1220895, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with
|doi=10.3115/1220835.1220895
instead. - And this: A verified Arabic-IPA mapping for Arabic transcription ... http://eprints.whiterose.ac.uk/79653/1/brierley14jss.pdf
- And this: http://geonames.nga.mil/gns/html/romanization.html
- Category:Pakistan articles missing geocoordinate data (~3500 articles) -- transliteration problems, presumably
- Note: 700+ multimatches from bot code
- Category:Philippines articles missing geocoordinate data (2000+ articles) -- ??
- Note: mostly universities, schools, other locatable organizations, very little here looks bot-matchable.
- Category:South Korea articles missing geocoordinate data (1700+ articles) -- not sure what's going on here: fixed my FIPS 10-4 mapping, but that doesn't go very far towards fixing the problem
- Note: insignificant number (< 100) of multimatches
- This may be a matter of transliteration: McCune–Reischauer vs. Revised Romanization
- Category:Turkey articles missing geocoordinate data (5000+ articles) -- lots of places with the same names but in different regions (eg. 17 villages all called "Akpınar"), same problem as was found with Polish placenames (also: why is Akçakoca failing to be caught?) The bot code finds 3000+ multi-matches for Turkey.
- Also, this is due to non-standard naming conventions for the hierarchy of Turkish article categories: see, for example Category:Ankara Province.
- I've now used spatial disambiguation to resolve some 2000+ of these.
- Category:Brazil articles missing geocoordinate data (was 3000+ articles, now 2,478 as of 2015-03-25) -- ??
Total is over 27,000 possibles: even doing a fraction of these would make a big dent in the backlog.
AI scenarios
From http://plato.stanford.edu/entries/logic-ai/ , the following list of AI/logic problem scenarios:
- The Baby Scenario, the Bus Ride Scenario, the Chess Board Scenario, the Ferryboat Connection Scenario, the Furniture Assembly Scenario, the Hiding Turkey Scenario, the Kitchen Sink Scenario, the Russian Turkey Scenario, the Stanford Murder Mystery, the Stockholm Delivery Scenario, the Stolen Car Scenario, the Stuffy Room Scenario, the Ticketed Car Scenario, the Walking Turkey Scenario, and the Yale Shooting Anomaly.
We should have articles on all of these that meet the notability criteria. -- The Anome (talk) 15:10, 1 March 2013 (UTC)
To do
- An article on Trust ports. See https://www.gov.uk/government/organisations/trust-ports
- p-hacking, replication crisis/replicability crisis?