User:Mill 1

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search


This editor is a
Veteran Editor III
and is entitled to display this
Silver Editor Star
.


Wiki statistics[edit]

As of Wednesday, 19 September 2018, 22:02 (UTC), the English Wikipedia has 34,518,818 registered users, 126,006 active editors, and 1,202 administrators. We made 855,525,222 edits, created 45,897,321 pages and created 5,719,388 articles. – Wikipedia Statistics

About Me[edit]

Babel user information
nl-N Deze gebruiker heeft het Nederlands als moedertaal.
en-3 This user has advanced knowledge of English.
de-2 Dieser Benutzer beherrscht Deutsch auf fortgeschrittenem Niveau.
es-1 Este usuario tiene un conocimiento básico del español.
fr-1 Cet utilisateur dispose de connaissances de base en français.
Users by language
Noia 64 apps karm.svg This user has been on Wikipedia for 7 years, 10 months and 15 days.
Wikignome crop.gifThis editor is a WikiGnome.
Bill Gates 1977This user qualifies as a
nerd extraordinaire.
Metallica wordmark.svgThis user is a fan of Metallica.
DJ Tiësto.jpgThis user is a fan of Tiesto.
MillaSagradaFamilia2005 01.jpgThis user has completed a 5K.
MB R107.jpgThis user loves his 107
Crystal Clear app tutorials (English-language variant).pngThis user has created 25 articles on the English-language Wikipedia.
Seconde-guerre-mondiale-debarquement-LCVP-6juin1944.jpgThis user is interested in
World War II
My Tho, Vietnam. A Viet Cong base camp being. In the foreground is Private First Class Raymond Rumpa, St Paul, Minnesota - NARA - 530621 edit.jpgThis user is interested in the Vietnam War.
Islay bottles.jpgThis user drinks single malt.
This user believes that
Facts Matter
Rudy Giuliani face.jpg This user has a zero tolerance policy on vandalism.
Chromium Material Icon.pngThis user contributes using
Chromium.
This user knows the difference between a dash and a hyphen, and follows MOS:DASH.
Quality, not quantity. This user believes that a user's edit count does not necessarily reflect on the value of their contributions to Wikipedia.
ANTIThis user opposes religion as a whole.
XThis user is a member of Generation X.
AndroidThis user's smartphone is powered by Android
San prospero colonne reggio emilia.jpgThis user is a history buff.
Tireless Contributor Barnstar.gifThis editor has been awarded the Tireless Contributor Barnstar  
BlankMap-World6.svg
6
This user has set foot in 6 continents of the world.

Wikipedia focus areas[edit]

Interesting links[edit]

Wiki links[edit]

Articles started by me[edit]

  1. KeyFilm
  2. Hanneke Niens
  3. Hans de Wolf
  4. Anton Smit
  5. Hans de Weers
  6. Swanehilde of Saxony
  7. Adalbert of Babenberg
  8. Eric of Lorraine
  9. Henny Kroeze
  10. Ursul Philip Boissevain
  11. Clayton Townsend
  12. Bert Röling
  13. Zhaoxing, Guizhou
  14. Aldrich Bowker
  15. Cees Gielis
  16. Heok Hee Ng
  17. Heok Hui Tan
  18. Hiroshi Inoue (entomologist)
  19. Lauri Kaila
  20. Ronald Fricke
  21. Lothar Seegers
  22. Donald R. Davis (entomologist) (backup)
  23. Yang Jun-Xing

Galleries started by me[edit]

  1. Colón, Cuba
  2. Cárdenas, Cuba
  3. Zhaoxing Town
  4. Hsipaw
  5. Kyaukme, Shan State
  6. Bofarreira
  7. João Galego
  8. Palaung people
  9. Tena, Ecuador
  10. Ahuano, Ecuador
  11. Baños de Agua Santa

Notable talks[edit]

  1. Days of the year-articles: guidelines for additions in births- and deaths sections
  2. Cynesige: Dates, the sequel
  3. Possible incorrect date of death Ismail I
  4. Peters TY
  5. Helpdesk: regex search
  6. Update on initiatives on WikiProject DOY
  7. Proposal for DOY-trimming tool
  8. Proposal WikiProject Biography
  9. Confusion regarding changed guideline
  10. New guideline: not giving up/in.

Project Missing Medieval Link[edit]

Motivation[edit]

As a history buff I've been visiting Wikipedia for years. Before long I developed an interest in the Timeline-articles, especially the year pages (like 1492) and the Days of the year pages (short: DOY pages) like December 24.
Days of the year pages all contain a Births- and Deaths-section, listing links to person's bio pages in chronological order. I noticed, however, that pre-medieval entries (before 1500 AD) were fastly underrepresented in DOY-pages. It seemed like no one was born or died on a specific date before the 16th century:


Days of the year page September 2

I also noticed that, compared to the DOY-pages, the Year pages listed far more persons in their Births and Deaths sections, many of them stating the exact date of their birth/demise.
A quick scan confirmed this: a lot of persons with a biography stated in a year page are not present in the corresponding Days of the year page.

Displaying Year page 1261

The reason for this is probably that it is less obvious for biographers to add an entry to the corresponding DOY-page than to the Year-page [1]. Once omitted by an author chances are small that someone else would add a link to a DOY manually since it is horrendously tedious.

Solution: the wiki-client app[edit]

Towards December 2016 I got an idea; shouldn't it be possible to (semi-)automate cross-referencing these timeline types? In case a missing link was spotted an added advantage would be that the text to insert into the DOY page could be generated based on the one in the Year page (whose format only slightly differs from the DOY page).
A potential big issue, however, is the fact that the text within the Births- and Deaths sections is unstructured. Luckily I noticed that the level of standardization within the pages and sections is quite high; I ran an automated check regarding the Years 500 BC - 1550 AD and only had to change a few dozen pages, fixing missing sections or re-applying template standardization. I was quite astonished that I encountered so few text structure errors, given the fact that editing wiki-pages is open to anybody. Unfortunately this proved not be the case regarding the actual content (see results and statistics).

In the weeks that followed I created and improved a VBA-powered MS Excel-application that implemented the envisioned functionality.
When clicking the button 'Check year' in the Excel-file, a specific Year page is analysed and used as a starting point to look for missing entries in matching DOY pages. Per section (Births and Deaths) the general algorithm looks like this:

  • Get the raw response text of a Year page regarding a specific section
  • Per person encountered do the following:
    • Store information (Display-/link-name, date of birth/death, link text)
    • Check if a bio page exists based on the name of the entry
    • In case an exact date of birth/death is known: check existence of a link to the biography in the matching date of the year page
  • Write the results to the project-sheet of the Excel-application.

Results[edit]

Handling a single DOY[edit]

After a specific year is processed the results may look like this:
Excel-app displaying processed year 1492

The results are shown in the sheet per section; Births on the left and Deaths on the right.
Per section all persons are listed of which an exact date was stated in the Year page and of which a bio exists.
Per person next information is displayed:

  • Name person: the wiki display name (hyperlink to the biography)
  • Exact date (of birth or death, hyperlink to the corresponding Day of the year page)
  • Name exist?: Does the entry exist in the corresponding DOY page?
  • Text to add to section: The generated link-text to insert into the DOY page. This text is manually copied and pasted.

When the application has determined that the date in the Year page and in the bio page are identical then the "Text to add to section"-cell gets a green backcolor. Also, When a person is already listed in the matching day-page 'Name exists?' is TRUE and the text to add is '-'. Otherwise the text to copy into the matching DOY page is generated based on the one in the year page.
Quite a few notable links are missing; in case of Year page 1492, section Births, 21 persons with exact birth dates are listed. But only 8 of them are present in the matching DOY pages[2].
Another thing that stands out are the erroneous entries; Not all "Text to add"-cells are green. If a discrepancy is identified between the date stated in the Year page and the one in the matching biography it is marked by a red back color instead.
As it turned out, quite a few types of errors existed that needed to be fixed before I could add missing entries:

  • Mismatch between date in Year page and in bio page (either date or year).
  • In the bio page no exact date was stated
  • In the Year page or biography incorrect date-formatting was used
  • In the Year page the link to the bio page was incorrect
  • Etcetera...

If such an error is detected further investigation is required; what is the source of error?; the year page or the bio? Or do I need to tweak my VBA-code?
For instance: take a look at Births; person: Adam Ries. The Year page states March 27 as the date of birth, whereas the matching bio states January 17. Further investigation will have to make clear which correction will have to be made to which page. As a consequence I had to correct numerous Year- and bio-pages (which I don't mind being a Wiki Gnome).

Adding the entries[edit]

After adressing the errors of a specific year I could finally do what was the initial goal of the project: insert links to bio's that are missing in WP:DAYS. Per year the generated text of the entries is copied manually from Excel to the correct location within the section of the DOY page.
So far (today is 26 June 2017) I checked all the years between 500 BC and 1625 AD this way. I added a few thousand entries during the process, in some cases adding 10+ (pre-)medieval entries to a DOY.

Wikipedia Days of the year page, section Deaths

However, not all persons should be added to a DOY because of 'sufficiently globally notable'. All my insertions are swiftly validated by the Wiki-community, especially by Rms125a@hotmail.com. Again, thanks for all your hard work!

Going through the centuries I noted a signifant increase in data from the 14th century onwards. Until 1350 every missing link to a referenced bio was added. From 1350 going forward the missing entries detected by the Excel-application were subject to more scrutiny. Apparently there's a lot of crap being added. Apart from notability I decided to also look at the size of the bio involved as an indication for possible insertion into DOY. I am fully aware of all the ongoing discussions around entry notability. However, because of the sheer amount of missing entries I had to come up with some criterium to quickly sift through the found missing links. I found that article size is a good indication of notability. Next table shows the century and the minimum number of characters (based on the raw http request content) required for an article usually to be eligable for insertion:

Century Mininum nr of chars
Before 14th (<1300) 0
14th (1300-1399) 2000
15th (1400-1499) 5000
After 16th (>1500) 8000

Keep in mind that its limits are quite flexible. For instance: some articles on medieval poets actually quote some of their entire poems, greatly diminishing the article's relevance and notability based on the article size in characters. Other bio's state all kinds of invisible information like an infobox with a lot of empty properties and/or numerous wiki categories with the same result. On the other hand I learned that, based on article size, the minimum number of characters to compose a relevant wiki bio regardless of its occurence in history is around 8,000.
Of course there are some other criteria for a bio to be notable or not:[ongoing]

  • European nobility is always notable.
  • Roman catholics and German theologists are never reverted.
  • Renaissance painters no matter how obsure are always accepted.
  • Anyone who had to do anything with the Mayflower is acceptable.
  • If > 15th century: a picture of a painting/drawing/bust of the subject is required.

In the end bio size doesn't seem to matter in order to be admitted to a DOY-page. Just take a look at this table (more summaries can be found in the archive at the top).

Milestones[edit]

Ageed, 'milestone' is a terrible term concocted by management ;). Anyway:

  • By now I've added between 5 and 20 (pre-)medieval entries to every Day of the Year page (to January 1 chk), average: ± 10.
  • Regarding these pages I've pushed back the latest year of the first item significantly. Births: 1495 (21 November), Deaths: 997 (23 July[3]).
  • Both the Births and Deaths section in all DOY pages now show at least one link from the 16th and 17th century[4][5].
  • Every Deaths section in a DOY page now contains at least 5 (pre-)medieval entries, stating at least 4 different centuries (yes, I went a bit overboard on that one..[6][7])


Revision history of Days of the year page June 29

Outlook[edit]

The initial plan was to process the years up to 1550 AD. Due to the effectivity and accuracy of the tool I am now considering to extend that period to 1700. It will be a lot more work though, especially correcting discrepancies.
Based on the results of this project I initiated Project All Who Are Born Must Die to add the missing entries to the Deaths section of the Year-pages.
With a little additional programming I could also process the Events-section the same way as the Births and Deaths-sections. I have to look into that more since it may pose specific issues. I also have other plans. For instance creating a specific Excel-tool for the Dutch wiki. These plans are very preliminary though.
Since the 'approved' guideline change it appears to be impossible to semi-automatically add missing entries. With regret I therefor decided to stop this project. Mill 1 (talk)

Statistics[edit]

Below charts per section are displayed that clearly shows the progress that's been made since the start of this endeavour[8]. Also note that the many existing erroneous links were removed from the DOY pages during the period December 2016 – June 2017.

The number of links across all DOY-pages per century (Section Births)

The number of links across all DOY-pages per century (Section  Deaths)


During this project I also compiled some other statistics. Following excerpt of the output explains itself. Pay special attention to the number/fraction of discrepancies per processed year. Luckily the fraction of erroneous entries dropped sharply after 1600. I never found out why.

Displaying the project statistics

Project All Who Are Born Must Die[edit]

Motivation[edit]

During #Project Missing Medieval Link I noticed something odd. Until the 15th century on a Year page the entries in the Deaths section generally outnumbered the Births section. The reason seemed evident; during their lifetime a person would become important enough to have their date of death archived for posterity although in many cases no records existed regarding the date of their birth. But from 1420 onwards the quality of administration regarding clergy, nobility and other privileged groups apparently reached a point that the number of birth dates start to overtake the death dates in the Year-pages. Within a century, the Births dates outnumber those in the Deaths section greatly, sometimes [by a factor of 5 or more]. That is strange since if the date of birth is known, in most cases this would also be true for the death date. I suspect that Wikipedians often omit to add the corresponding date of death after stating a person's date of birth on a year page. The following chart shows the stated links per DOY page aggregated per (pre-)medieval century (shown numbers are outdated):
The number of links per DOY page aggregated per century

Solution[edit]

I realised that it should be possible to check for these omitted entries via the WIKI-client application. This turned out to be true. The general algorithm looks like this:

  • Get the raw response text of the Births section of the Year page.
  • Per person encountered do the following:
    • Store information (link-name, date of birth, link text)
    • If an exact date of birth is stated, get the raw text of the person's biography, based on the name of the entry.
    • In the bio page look for the date of death, if present.
  • Write the results to the project sheet of the Excel-application.
  • If not present in the Deaths section of the corresponding Year-page, generate the link text.

Since the groundwork of the wikiclient-application was already done I suspected it would take me a couple of hours of programming to implement the new functionality. This proved not to be the case. I cost me an extra 30 hours to realize the solution. Especially retrieving the date of death from the article proved to be very complex, due to the fact that many different ways/formats exist to add a date. I had to resort to this kind of code a lot: sDayOfDeath = CStr(Val(Mid(Replace(sSearchText, "|", " "), lPos - IIf(lPos = 3, 2, 3), 2))) Next picture shows the fruits of my labor.

When a year is processed the text in column J is ready to be copied into the Deaths section of the Year-page stated in column I.
I also ran some checks to spot errors within/between pages. The most notable being:

  • Many biographies contain an infobox. If present it is the first place the programme looks for the death date. After that the programme will look for the matching date in the opening sentence of the article. If the two dates differ a warning is displayed after the entry-text.
  • If no infobox is present in the bio the date of birth should be stated in the opening sentence. It is the starting point of the search for the death date. A warning is generating in the link-text if the birth date is not encountered.
  • In the source Year pages, section Births, per entry the year of death is often stated. If this year is different from the year of death found in the matching article the application also shows a warning.

Screendump of Excel application wiki-client, tab Project All Who Are Born Must Die

Results[edit]

Since this project is W.I.P. I cannot state anything conclusive. What I can say is that this additional function yields a lot of missing entries. For instance, consider year 1475 whose results are displayed in the image above:

  • Section Births of the year 1475 shows 25 persons.
  • Of these 25 persons, only 14 entries state the exact date of birth.
  • In the matching bio's two dates of death are unclear. Two others show discrepancies (marked with orange back color) that need correcting.
  • Of the remaining ten persons, four are already mentioned in the Deaths sections of their death year.
  • Six death-links are missing and will be added to the corresponding year-page.

Update:
By 1623 I got so fed with adding obscure and badly sourced death-links of bio's that I defined following two conditions for insertion:

  1. Mininum nr of chars of linked article: 3000
  2. Article should contain at least one valid inline citation or general reference

Project Illustrate the Years[edit]

Motivation[edit]

Working on a private project seems to always provoke a new one; another thing I noticed working on project All Who Are Born Must Die was the clumsiness of the use of images in the Year-pages. Some years would have pictures and some not. If present, pictures were sometimes scattered all over the place. Thumnails linking to a person's bio would have different sizes between Years and even between sections. It was a mess.

Although the 5th pillar of Wikipedia is that it has no firm rules I deemed it necessary to introduce some kind of standardization to the use of images in Year-pages.

As a preliminary starting point I chose 1500 AD. Before that Year-pages very often just do not have sufficient entries to warrant decoration.

Solution[edit]

Based on how images were mostly used I established a set of simple rules to make the look & feel of the Year-pages more consistent and appealing. As I was progressing I discovered one final thing: when you insert a thumnail at the top of the article (above the first section) it will be displayed better in the Wikipedia app on a mobile device/tablet. The image is shown as you are searching for a specific year and it is also displayed nicely at the top of the article. Requirement seems to be that the thumnail may not be resized (f.i.110px); if omitted the default thumbnail width is used (tested on Android only).
Here are the rules I apply when illustrating a year:

  • Every section should have at least one thumbnail
  • Within a section the thumbnails should be ordered based on date of Event/Birth/Death
  • Section Events:
    • Per image use this template: [[File:A|thumb|right|[[B]]: [[C]]]]
      • A = filename, B = date (if known), C = linked article
    • The first thumbnail only should be inserted directly after {{C1* year in topic}}
      • Note that size is not explicitely set
    • If more Event images are added insert them directly after the section header == Events ==. F.i. see 1631.
  • Section Births and Deaths:
    • Use this template: [[File:A|thumb|right|110px|[[B]]]]
      • A = filename, B = linked bio
    • Add thumnails directly after the section header f.i. == Births ==. From 1800 onwards this rule does not apply.


Bio-images should placed at the top of the Births- and Deaths- (sub)sections. Exception: period 1700 – 1789. Reason: this period already has a lot of Births- and Deaths entries but it does not contain any subsections yet like January–March.

Results[edit]

Following picture shows my progress.

Number of image files per Year page per section (excerpt)

Chaining back the Years (of death dates)[edit]

todo Holding Back the Years

Project statuses[edit]

From a sanity point of view I found it more healthy to work on my projects simultaneously. To keep track of what I am doing next table shows the status per project.

Project name From To Current
Project Missing Medieval Link 101 1700 1425[9]
Project All Who Are Born Must Die 1250 1664[10] 1700
Project Illustrate the Years 1500 1800 1825
Peters01 101 2000 979


BTW: on 26 July 2017 I made my 10,000th edit.
Stay tuned for updates!


  1. ^ I found out later another reason why there are far more Year-entries than DOY-entries: there are only 366 DOY-pages. As a result Project members of WikiProject DOY devote a lot of time to 'DOY-trimming'; the removal of not notable persons in order to control the page-sizes. There are far more Year-pages. Because of that 'Year-page trimming' is not required which explains the quantity difference.
  2. ^ As already explained a major reason why missing entries exist is because of DOY-trimming. Because of that you would expect that many of the pre-medieval entries that I had added would also be reverted. However, this turned out not to be the case: almost all of them were accepted. It was not until the 1400s that my entries were occasionally deemed not notable enough.
  3. ^ With the creation of page Adalbert of Babenberg on 15 May 2017 the last 1000+ first date of death was finally pushed back to the first millenium (to 23 July 997) for the first time. Because of his erroneous date of death, Muhammad ibn Tughj al-Ikhshid had to be moved from June 24 to July 24, creating a new 1st 1000+ date. After extensive googling I found a new death date so on 2 June 2017 the limit was pushed back again after updating the biography of Abu Isa al-Warraq
  4. ^ I had to devise some nifty search methods in order to complete this task. Although one might expect that there would be sufficient 17th century bio's around it was quite a struggle to add the last missing ones. Completing the 16th century proved even harder but on 7 June 2017 the last one was added, although I was not happy with its notability.
  5. ^ It also took quite some effort before I could fill in the last missing 16th century entry. It seemed that no one notable was born on 14 March between 1500 and 1599. However, I did find an article on the French wiki. So just for this purpose I used Google Translate to add a bio to the English wp. At last on 9 June 2017 00:55 CET the gap was finally closed with the addition of Eric of Lorraine, whose page was created just shortly before.
  6. ^ I also created this article to reach this goal. By the way: exception is the leap year 29 February
  7. ^ The same proved to be impossible for births. Even after considerable effort there are still 28 days with only one pre-16th century entry :(.
  8. ^ After some research I was able to programme the benchmark data, when I started adding entries on 5 december 2016
  9. ^ I actually already checked the years up 1651 (including checking for errors/discrepancies). I noticed, however, that many new notable missing links had emerged since I last checked it (partly because of all the new entries origininating from project AWABMD). That's why I decided to start over, this time from 1200 AD onwards. On 11 sep. 2017 I started over yet again from 600 AD to perform another check. Regarding bio's with exact DoB/DoY the existence of next categories are checked: [[Category:0000 births and [[Category:0000 deaths
  10. ^ From 1664 AD onwards the number of entries in the Births section violently decreases. I don't know why. Perhaps another private project.