Wikipedia:India Education Program/Analysis/Quantitative Analysis

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

This page outlines the quantitative post-mortem analysis of the India Education Program. This is an addition to Tory Read's work, which will be completed soon.

Executive Summary[edit]

Of the about 1000 students enrolled in the India Education Program, roughly 66% made edits to the article namespace. On average, these students added 1057 words each, comparable to the roughly 1800 words added by students in the Public Policy Initiative. However, a lot of the content was poor quality and/or ridden with copyright violations, and hence only about 21% of it survived cleanup efforts by the community, and more content may be removed as cleanup efforts remain ongoing.

There were some important differences between groups of students: first, although their numbers were still rather poor, students from the Symbiosis School Of Economics did a lot better than those at College of Engineering Pune on several measures outlined below. Second, students with non-zero content survival (about 40% of those who made edits) also added a lot more content in the first place. This perhaps indicates they took the assignment more seriously.


The process followed to perform this analysis is as follows:

  1. Using the tables at Wikipedia:India_Education_Program/Students, we created a master list of all students that were part of the India Education Program.
  2. Next, we attached information about what course and university they were enrolled with. In some cases, we had to group courses together, because students were enrolled in multiple courses.
  3. Using the Wikipedia API, and the Wikitrust API we looked up the following for every student and every edit they made to any namespace:
    1. Page title
    2. Namespace
    3. Words added by the student
    4. Words deleted by the student
    5. Words left on the current revision (only for article namespace edits)
  4. We used the data above to look at program-level, university-level and course-level trends.

Analysis Summary[edit]

Program-level analysis[edit]


  • Registered students: 1014
  • Registered students who made edits in the article namespace: 665 (66%)[1]

Content Survival

  • Gross content added by all students: 702961 words
    • Per student: 1057 words
  • Net content added by students that survived cleanup: 149978 words[2] As a basis of comparison, students of the Public Policy Initiative added 1.5 million words over 2 terms.
    • Per student: 226 words (roughly 40% of the average Wikipedia article[3]) Students of the Public Policy Initiative added 1,838 words (roughly 3 articles) over two semesters.[4]
    • Only 21% of total content has survived cleanup
    • For 40% of the students (ie 266 students) some content has survived cleanup
    • About 12% of the removal was performed by the students themselves, possibly after the copyvio issue was addressed in the classroom

Survivors vs. the rest

There is an interesting difference between students with zero and non-zero content survival.

  • The zero content survival group added 573 words per student (initially, before deletion) on average.
  • The non-zero group added an average of 1770 words - almost thrice as much as the other group.
    • About 564 words (roughly one article length) stayed after deletion for this group.

We can make an argument that the students who put in more work had much better results.

University-level analysis[edit]

Overall, the program worked a lot better at the Symbiosis School of Economics (SSE) than College of Engineering Pune (COEP) on several measures:

  • Better student engagement levels:77% SSE students made edits to the article namespace, vs. 62% for COEP
  • More articles edited: SSE students edited 2.87 articles vs. 2.16 each for COEP (though one should note that 10-15% SSE students were enrolled in multiple classes).
  • More words added: On average, SSE students added 1824 words each initially (ie before deletion) vs. 735 each for COEP (about 2.5 times).
  • More words stayed:
    • For SSE students, about 535 words each survived cleanup vs. only 96 for COEP.
    • For SSE, 29% of total content survived cleanup, vs. 13% for COEP.
    • For SSE, 49% (almost half) of the students ended up with non-zero content that survived cleanup vs. only 36% for COEP.

These findings are consistent with the WMF India consultants' assessment, which rates SSE favorably with regard to addressing copyright violations as well as Campus Ambassador and professor engagement.

A 10 student Master's course at SNDT Women's University: MSc (Communication Media for Children) was also part of the program. Only 12% of the content they added survived cleanup. Also, the actual amount of content added was about half of the average for the program.

Course-level analysis[edit]

IEP Analysis Mean article count per student, by course
IEP Analysis Mean article count per student, by course
IEP Analysis Percentage of content survival, by course(Note that the content survival percentage is off for COEP Y1 MDCG (Machine Drawing and Computer Graphics due to student count estimations (caveat # 6))


1. Computational Methods in Engineering was the worst performer of the lot, in relative terms.

  • Only 20 words per student survived cleanup, i.e. 7% of what was added.
  • Due to small class size (18, with only 10 editing articles), the community workload was minimal.
  • Interestingly, this Masters level course performed much worse than the other two Masters courses in the program.

2. Digital Signal Processing was the only course with first year students.

  • Student engagement was very low - only 3 out of 36 students made article edits.
  • There was very little activity: these students added about 400 words total.

3. Data Structures and Algorithms was a second year course, with 140+ students. The professor had edited Wikipedia before joining the program.

  • Engagement was very good, 85% students made edits to the article namespace.
  • Students edited 4.52 articles on average, almost twice the amount for the program overall. Actual words added per student were similar to the overall average. This means they had smaller, more dispersed edits.
  • Only 15% content survived. The surviving content came from about one-fourth of the students.[5]
  • Due to sheer volume of content, this course may have caused the most community workload.

4. Machine Drawing and Computer Graphics was a second year course with about 180 students.[6]

  • Engagement levels were low; only about half the class made article edits.
  • The amount of content added was about 700 words (30% less than average), and edits were dispersed across roughly 1.2 articles per student (also about half the average).
  • Content survival was 12%.

5. Solid State Devices and Linear Circuits Laboratory was a second year course with about 90 students.

  • Engagement was low, with only 45% students making article space edits, and about 420 words per student added before deletion.
  • Content survival was 12%.
  • 53% of the students had non-zero content that survived cleanup, which is the best ratio amongst all COEP courses.

6. Computer Organization and Advanced Microprocessors was a third year course with about 90 students.

  • Most figures were about on par with the COEP average: engagement was 64%, content survival was 12%.

7. Year 4 courses: Object Oriented Modeling and Design (16 students) & Software Testing and Quality Assurance (80 students)[7]

  • Initial content added per student was much higher than any other COEP course: at 1412 words per student.
  • Content survival was still poor at 12%.


1. All undergraduate courses and the Corporate Social Responsibility Certificate course[8]

  • Student engagement was high, with 77% students making article edits. They added about 1923 words each.
  • Students edited about 3.21 articles each (though some of this can be attributed to 15-20% SSE students being enrolled in multiple classes).
  • Only 46% students ended up with with non-zero content, but 29% of total content survived.
  • The high amount of content added also meant that community workload would have been fairly high.

2. Macroeconomics was a Masters' level course with 49 students.

  • Overall, it was perhaps the best performing course of the program.
  • Engagement was high, with 77% students making article edits, and adding 1398 words each.
  • 29% of the content survived, and most importantly, 62% of the people ended up with non-zero content that survived cleanup.


  1. ^ Some students worked only in sandboxes, and were instructed not to move their content to Wikipedia due to the program being halted.
  2. ^ It's important to note that further cleanup may still be needed. If that is the case, this figure (and several other figures represented here) will change.
  3. ^ The average Wikipedia article is 590 words (Wikipedia:Size_comparisons)
  4. ^ It's important to highlight some key differences: the Public Policy Initiative was targeting a specific kind of coursework, and was active in the US, where English is the primary language for most students
  5. ^ Some contributions were moved to WikiBooks, and are not accounted for.
  6. ^ For the Machine Drawing and Computer Graphics course, a significant amount of edits (88) came from a single IP address. To account for these, we assumed one IP = one student. This resulted in an incorrect student count for the class, but the ratios should still be fairly accurate.
  7. ^ Due to overlap of students, both of the Year 4 courses at COEP had to be combined for the analysis.
  8. ^ Due to 10-15% students having multiple course registrations, about 200 undergraduate students at SSE were grouped together.

Future Work[edit]

  1. A survey of students involved with the program both in India and US/Canada was recently concluded. Findings from the survey will also inform the analysis here.
  2. We will be working on a similar quantitative analysis for the US/Canada students who were part of the Wikipedia Education Program in Fall 2011. This will also help identify some differences.