User:Sj/wikiserve

wikiserver, n. : a proposed server including archives of publicly available wiki projects.

This would include:

Wikimedia
- dumps over time
- a [mostly-live] mirror of the current projects
other wiki projects (Wikia, Wikitravel)
- dumps over time
- a mirror of the current projects
Statistics and indices
- a copy of wmf statistics
- subsets of data, randomized or otherwise, that could be reused
- scripts and small services [available to copy and run on your own site; not necessarily run in place]
Computational clusters
- for running large-scale (not necc. interactive) scripts over large datasets, to generate secondary datasets
Bandwidth
- primarily for uploading new data and grabbing named datasets.

Specific projects that could use this:

HistoryFlow - running HF regularly to generate full visualizations of article history; making thumbnails of these, available at a guessable URL so readers can see these thumbnails alongside article titles
Dumpslicing - generating specific subsets, such as for dvds, offline readers, mobile clients, or stats generators; also producing indices and other summaries for use in aggregate research

Research that could use this:

What sorts of OS / SourceForge projects make it to an article? What sorts of corporations make it?
... (add yours here!)

toolservers could include a wikiserver, though the ideal wikiserver in my eyes would include all publicly available versioned and collaborative data -- and would want some 15-20TB of disk and a few dozen cpu's to generate some of the known and interesting secondary datasets that researchers are interested in.

interested people and groups?

- manal, cci, eric vh, mako
- yochai, judith, aaron
- large db analyses : BC... who else?

other thoughts : design levers, classifying cases, biology v genetics and memetics, amt-semiauto.