User:ProteinBoxBot/Ideas

From Wikipedia, the free encyclopedia
Jump to: navigation, search

NOTE: This page is effectively read-only, except by the bot organizers. Please post any ideas and suggestions on the discussion page.

Development plan and future ideas[edit]

NOTE: The items below are thoughts for the future and are not included in the initial proposed specs.

See also: User:ProteinBoxBot/Project_proposals

Next up for implementation[edit]

  • per discussion on Commons, add PDB infobox to all PDB images (Example [1])
  • Run bot update
  • pilot project for {{SWL}}
    • find some well-known facts
    • encode them in Gene Wiki article using {{SWL}}
    • figure out synchronization with wikidraft.org/SMW, converting SWLs to real semantic links
    • OUTPUT: demonstrate real inline queries on wikidraft.org
    • OUTPUT: export from SMW to RDF
  • pilot collaboration with MODs (specifically ZFIN)
    • scan through all Gene Wiki pages for inline citations
    • retrieve MeSH terms identify matching species (human, mouse, zebrafish, fly, rat, yeast)
    • generate four-column output file:
  1. WP article name
  2. cited pubmed ID
  3. matching organisms by MeSH
  4. sentence(s) referencing the publication
  • Notes
  • is there a MeSH-to-taxonomy mapping? or do free-text matching?
  • for pubs that reference multiple species, one line per species
  • for articles that reference a pub multiple times, concatenate sentences
  • redesign infobox to better handle linking to MODs (MGD, RGD, ZFIN, FlyBase, WormBase, etc.)

Add additional links[edit]

  • GeneCards
  • nextbio.com?
  • wikiprofessional
  • wikigenes
  • WikiPathways.org
  • KEGG (also add wikilinks to other gene pages in the same KEGG pathways)
  • HPRD
  • link to Bioinformatic Harvester? -- would need community consensus...

Add/improve stub data (gene-specific)[edit]

  • change format of the references section to make it small-screen friendly ([2])
  • Add GeneRIFs and references from Uniprot
  • import and display EC number
  • import and display protein domain information (through Uniprot/PFAM/COGs) See previous discussion.
  • UniProt fields: PFAM, "Protein name", "Synonyms", FUNCTION, DOMAIN, SUBCELLULAR LOCATION, CATALYTIC ACTIVITY, COFACTOR, SUBUNIT, and WEB RESOURCE
  • Need to fix the db links for genome locations: default for mouse has gone to mm9 User_talk:ProteinBoxBot#Mouse_location_links_lack_db_name_parameter (need to either change default in template, or need to do a second pass run on all infoboxes to add parameter)
  • Load PPI from Entrez Gene User_talk:ProteinBoxBot/Archives/Archive1#Interaction_partners
  • Add a note in infobox showing last-updated date
  • for GO section, add small note of evidence code and a link to Pubmed reference, if available.
  • add image maps to thumbnail expression images so that tissues can be identified
  • add a banner from gene talk pages to portal page ([3])

Add/improve stub data (structure)[edit]

Technical bot stuff[edit]

Parallel efforts[edit]

  • upload all PDB to flickr? allows browsing of entire SCOP sub-trees. maybe geotag by location?
  • create a WP category for every GO category? (Piggy back with Enzyme class effort?)
  • expand to create pages for each disease using {{Infobox_Disease}}
  • second bot to wikilink common biology concepts, specifically on pages with PBB_Controls
  • change {{Gene}} templates to internal wikilinks
  • systematic creation of articles around protein domains (e.g., SMART database)
  • Mass autogeneration of high-quality PDB images

Other[edit]

  • look into HSPA1A and HSPA1B [7]
  • automated way to create this table
  • create a mac dashboard widget for the Gene Wiki?
  • charting library to combine bar chart with background histogram... (not really Gene Wiki related...)

Completed tasks[edit]

  • Upload snapshots of all PDB images -- create a gallery? Done!
  • get structure image from RSCB Done!
    • not sure yet how to get links from genes to PDB entries
    • SCB public domain license is here or here.
  • modify orthologs box to automatically adjust rows and columns based on data Done! (I think)...
  • possible add a comment to the protein box area saying that changes (to the protein box only) will be overwritten by the next bot update; this may help us from having to worry about manual edits -- AND/OR -- allow users to manually enter comment in protein box to prevent bot from overwriting Done! through the PBB_Controls template.
  • use "Category: Human proteins" instead of simply "Proteins" Done!
  • add "Category: Gene from chromosome N" Done!
  • change spacing pattern (e.g., [8]) Fixed when infoboxes moved to template pages

Obsolete tasks[edit]

  • second bot to create redirects from gene aliases Removed! better for a human to do
  • add a comment <!--Add additional text here--> to make it clear where people can/should edit... Removed! better constrain areas for PBB edits
  • changing redirects so that primary title is HGNC name
    • maybe just flag these for manual inspection Removed! A human should handle anything with regards to page moves.
  • adding links to page (e.g., "ITK") from alternate symbols (e.g., EMT; LYK; PSCTK2; MGC126257; MGC126258) and full gene name (e.g., IL2-inducible T-cell kinase)
    • is redirecting from alternate symbols really a good idea? How would one list ITK on the EMT disambiguation page? Removed! Better that a human does this.
  • add a "update_PDB_image" tag in PBB_controls so that people can turn off automated edits for that part of the infobox specifically -- or, don't make any change to existing PDB image, only add if an image didn't previously exist Removed! Already default behavior