User talk:CheMoBot/Data
Appearance
General
[edit]- We could make the bot in such a way that it
- reports changes on-wiki.
- autoreverts changes of values (this may get resistance, we are the encyclopedia that anyone can edit).
- Autorepair changes (e.g. check the fields 5 minutes after the 'offending' edit has been performed, and reset changed fields back to the verified value.
--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)
Database format
[edit]I started this to have some fields to work with, but this is not a 'handy format'. Some suggestions to discuss:
csv
[edit]'Comma separated', which is similar to what it is now. A line would look like
Water=Water,0,100
Points:
- Easy to read when there are not too many fields (but we have 50 fields).
- Page would be huge in the end (for 4000+ compounds).
- Not too sensitive to errors, one missing field on a compound would render only that line useless.
- Relatively easy to update, many database programs can provide this output, and a simple find and replace can provide the proper format
- Only one (or a few) page(s) to render.
--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)
xml
[edit]xml is another format which is easy to read by a computer
<?xml version="1.0" encoding="utf-8"?> <compounds> <compound IUPACName="Water" MeltingPt="0" BoilingPt="100"/> </compounds>
Points:
- Easy to read, even with many fields as every one is named
- MUCH bigger than the csv above.
- Per compound not sensitive to errors, though some typos (especially in the tags) may render the WHOLE database useless
- Easy to update, many database programs can provide this output
- Only one (or a few) page(s) to render.
--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)
data-sub-page
[edit]Create for the compounds a sub-page with some easy to read/edit format, and use that as the base for data. So the subpage on water (molecule) could be water (molecule)/Verified, which could contain:
IUPACName=Water MeltingPt=0 BoilingPt=100
Points:
- Easy to read, even when many fields are there
- Small, if a field in Water (molecule) gets edited, it only needs to read this data, and check
- Really low sensitivity to errors
- Difficult to update, a bot would have to go and update every subpage, with would be impossible when the pages get 'protected'
- Bot has to render on every edit, but the data-throughput would be small, so that can be done quick.