Talk:DBpedia
| This page was nominated for deletion on 15 November 2009. The result of the discussion was Withdrawn. |
| WikiProject Computing | (Rated Start-class, Low-importance) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||
| WikiProject Wikipedia | (Rated Start-class, Low-importance) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||
[edit] Members of the DBpedia Team
To clear the WP:COI tag, here is a list of people, who work directly on the DBpedia project. We are scientists and value a neutral and scientific view point, but we will not do any more editing on the main article.
- Judging from henrik's notes it seems acceptable if we fix small errors and update things such as release numbers and item / link counts.
- Soeren1611 (talk)
- ChrisBizer (talk)
- Chrisahn (talk)
- SebastianHellmann (talk)
- Jens_Lehmann (talk)
- KingsleyIdehen (talk)
- Beckr (talk)
- Echera (talk)
- maybe more, please add
[edit] public sparql endpoint
The "public sparql endpoint" link (http://dbpedia.org/sparql) is 404. Espertus (talk) 18:14, 7 June 2008 (UTC)
-
- Usually works, sometimes it doesn't. Please send such requests to dbpedia-discussion. Chrisahn (talk) 15:33, 22 July 2009 (UTC)
[edit] Deleted references
A user has deleted these references from the External Links section with the edit summary "clean up - any relevant to article should be used as references IN the article - most are papers/talks by DBpedia's own founders", and didn't copy them here. I shall do so. I've replaced the other 2 links as obviously relevant. -- Quiddity (talk) 19:11, 7 November 2009 (UTC)
- Christian Bizer et al.: Interlinking Open Data on the Web (Poster). Poster at ESWC 2007.
- Christian Bizer et al.: DBpedia - Querying Wikipedia like a Database. Developers track presentation at WWW2007.
- Christian Becker, Chrisitan Bizer: DBpedia Mobile – A Location-Aware Semantic Web Client. Semantic Web Challenge at ISWC 2008, Karlsruhe, Germany, October 2008.
- Sören Auer, Jens Lehmann: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. Paper at ESWC 2007
- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives: DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference (ISWC 2007), Busan, Korea, November 2007.
- Fabian Martin Suchanek, Gjergji Kasneci, Gerhard Weikum: Yago: A Core of Semantic Knowledge - Unifying WordNet and Wikipedia. Paper at WWW2007.
What have Leipzig and Innsbruck in common is on rank 40 of most cited papers in computer science 2007: according to [citeseer]. The last one by suchanek is a paper at the WWW, which is a very good conference and an acceptance rate of 10-15%. It has therefore been peer reviewed. SebastianHellmann (talk) 14:18, 12 November 2009 (UTC)
[edit] Notability
[edit] 'article needs references' tag
[edit] Merge with Semantic Web
[edit] Improvements
This article is in desperate need of a cleanup. But right now the article accomplishes little else but hammering home how notable it is. 90% of the article is a list of numbers with no context, mentions of other "interlinked" datasets with no mention of who is involved (an uneducated reader might think the CIA and US Census Bureau are using the DBpedia data), plus TBL's comment which doesn't really tell me anything (is DBpedia famous the same way TBL is famous?). Since you guys claim to know something about DBpedia, can you improve the article by answering some of these questions?
-
- Who started the DBpedia project, and who maintains it?
- How often is the dataset updated, and who does it?
- What is the process through which the dataset is built (algorithms & software used, etc.)
- How is DBpedia actually used? (Beyond "NYT includes links" and "BBC uses it to organize stuff".. I have no idea what that even means!)
- In fact, how is OpenCalais even based on the NYT? I scanned the references and don't see any connection.--Jonovision (talk) 11:14, 16 November 2009 (UTC) (Copied from Wikipedia:Articles for deletion/DBpedia by Chrisahn (talk) 17:28, 16 November 2009 (UTC))
- I guess much of that is a result of AnmaFinotera's attempt to get this article deleted for lacking notability. Hopefully it can be edited into a much better article (and we'd love you guys who are actually involved to help) now that that's out of the way. I think that a simple paragraph of two of description and background would be a good start. henrik•talk 17:48, 16 November 2009 (UTC)
Dear Henrik, thank you for moving the discussion towards improving the article by raising these questions. Please find below pointers and initial answers to your questions. Please also note that I'm involved in the DBpedia project and therefore as my comments might be biased (as AnmaFinotera has pointed out).
1. Who started the DBpedia project, and who maintains it?
The project was started by researchers and students at Freie Universität Berlin and Universität Leipzig who implement the code that extracts the data from Wikipedia. The extracted data is hosted by OpenLink, a company that develops RDF databases. Please see http://wiki.dbpedia.org/Team for the complete list of participants.
2. How often is the dataset updated, and who does it?
The dataset is currently updated about every six month. The dataset is updated by project members from Freie Universität Berlin and Universität Leipzig. Please refer to http://wiki.dbpedia.org/ChangeLog for the history of releases.
3. What is the process through which the dataset is built (algorithms & software used, etc.)
The dataset is build using a Wikipedia-specific data extraction framework that has been developed by the DBpedia project. The framework extracts different types of structured information from Wikipedia and represents the extraction results using the RDF data model. Please refer to http://wiki.dbpedia.org/Documentation for more information about the DBpedia information extraction framework. The process is as follows: 1. We load the Wikipedia dumps into a local database. 2. We run the extraction against this database. 3. We send the extracted data to OpenLink for hosting.
4. How is DBpedia actually used? (Beyond "NYT includes links" and "BBC uses it to organize stuff".. I have no idea what that even means!)
There are three main uses of DBpedia:
4.1. Alternative Wikipedia search interfaces. As DBpedia makes structured information within Wikipedia articles easier accessible, the DBpedia dataset can be used as a basis for implementing alternative Wikipedia search interfaces that allow user to ask complex queries against Wikipedia content. Examples of such applications that have been developed my members of the DBpedia community are found at http://wiki.dbpedia.org/Applications.
4.2. Knowledge base. Various research projects within the Knowledge Representation and Semantic Web research community use the DBpedia dataset as a knowledge base for experimentation and demonstration http://wiki.dbpedia.org/UseCases#h19-6. We don't have a list of all these projects, but regularly hear of DBpedia being used at conferences like the World Wide Web conference or the Semantic Web conference.
4. 3. Interlinking Hub for the emerging Web of Data. This use case might appear a bit strange for people not involved in Linked Data and the Semantic Web. The idea of the Semantic Web is that different parties publish structured data on the Web and set datalinks between records in the different datasets describing the same entity or related entities. These data links can then be used by client applications like Linked Data browsers or search engines to retrieve data from various sources about an entity and integrate the data afterwards. As DBpedia offers data about a wide range of topics, many other data providers have started to set datalinks pointing at DBpedia. This makes DBpedia a interlinking hub of the emerging web of data as client applicatons can follow a link to DBpedia and then navigate into various other datasets about the same topic. Please refer to http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData for more information about Linked Data and the Web of Data.
5. In fact, how is OpenCalais even based on the NYT?
OpenCalais is not based on the New York Times. The connection between both projects is that both sets of data have links pointing at DBpedia. This means for example that the NYT says that some news articles are about a specific politician (identified with a DBpedia URI) and OpenCalais says that some other articles are about the same politician by also using the DBpedia URI for annotating the articles. As both data sources use the same URI to identify the politician, a client application can now retrieve articles from both sources and know that all articles are about the same guy.
ChrisBizer (talk) 10:13, 17 November 2009 (CET))
6. Re. "BBC uses it to organize stuff". As described in the ESWC2009 conference paper (see http://derivadow.files.wordpress.com/2009/06/eswc2009-bbc-dbpedia-2.pdf - apologies for linking to a file on my blog, but I can only find the abstract on the conference site) - the BBC is starting to use dbpedia URIs as a controlled vocabulary i.e. 'tagging' BBC content with dbpedia URIs (the wikipedia text provides the evidence set for what the tag means) and then aggregrating content around those tags.
You can see an example of this at bbc.co.uk/wildlifefinder (e.g. http://www.bbc.co.uk/nature/species/Polar_bear - apologies for those outside of the UK, the video is GeoIP restricted). Note the URL slug - it's the same as use on Wikipedia/ dbpedia. The reason the news stories, clips etc. are there is because they've been tagged with this URI http://dbpedia.org/page/Polar_bear. ( Actually, they've been tagged with http://dbpedia.org/resource/Polar_bear, which refers to the entity/resource. The "page" URI is an HTML representation of an RDF description of that entity. :-) MacTed (talk) 20:48, 24 November 2009 (UTC) )
Derivadow (talk) —Preceding undated comment added 11:13, 17 November 2009 (UTC).
[edit] A small note on COI
Just to be clear, the conflict of interest guidelines are not an absolute prohibition on editing an article where you have an outside interest. What is not allowed to advance the interest of the outside organization at the expense of Wikipedia's goal of a neutral, verifiable encyclopedia. It is a subtle difference, and most people have a hard time to know where to draw the line (it's hard to step out of your work and view it with impartial eyes), so editing articles you have a close connection to is often said to be "strongly discouraged". But I would be most unhappy if you let errors stand in articles because of that guideline. :-) Feel free to fix any minor mistakes yourselves, and perhaps propose larger changes here in the talk page for a bit before doing them.
It is my feeling that some other people you've interacted here with have been overly hostile and we could perhaps have done a better job of explaining our culture - something I'd sincerely like to apologize for. We most definitely need people who know what they are talking about. A partial explanation is that we have tons of people who daily do try to use wikipedia for marketing and promotional purposes fairly unrepentantly. A certain battleground mentality can be understood, if not condoned.
(Becoming more active participants here will likely make your work of integrating DBpedia and Wikipedia much easier too, both by you understanding the cultural norms of the site and for the community here to get to know you). Again, my apologies for the rough start. henrik•talk 13:20, 17 November 2009 (UTC)
[edit] Demos
First of all, thanks to everybody for the constructive feedback. May I suggest the inclusion of some demos, such as the DBpedia Faceted Browser and DBpedia Relationship Finder? I think they do a great job demonstrating what can be done with DBpedia. It should be noted that the Faceted Browser was developed in collaboration with, and is hosted by the company Neofonie, who included a "powered by" tagline that can be seen as advertising. However I think they did a great job and other than being a DBpedia team member, I have no affiliation with them that would lead me to think that I'm biased regarding these statements.
- DBpedia Faceted Browser: http://dbpedia.neofonie.de/browse/
- DBpedia Relationship Finder: http://relfinder.dbpedia.org/
On a sidenote, the DBpedia Mobile demo is currently broken and I hope to have it fixed soon.
--Beckr (talk) 14:30, 17 November 2009 (UTC)
- The facet based browser has been selected as one of the 365 most innovative ideas in Germany. I agree, although I'm biased, that it (and the RelFinder) might be worth to be mentioned in the article. --Jens Lehmann (talk) 10:01, 24 November 2009 (UTC)
[edit] How's this for an Introduction?
DBpedia is a project started by researchers and students[1] at Freie Universität Berlin and Universität Leipzig with the objective of extracting structured data from Wikipedia so that it can be published on the Web[2] as RDF, a central data model of the Semantic Web.
DBpedia describes 2.9 million things, including over 282,000 persons, 339,000 places, 88,000 music albums, 44,000 films and 130,000 species, including abstracts in multiple languages[3]. It has been described by Tim Berners-Lee as one of the more famous parts of the Linked Data project.[4]
By extracting structured information from Wikipedia and publishing it as RDF the underlying data can be accessed using an SQL-like query language for RDF called SPARQL.
In addition to structured data, DBpedia provides URIs for the underlying concepts described in Wikipedia, the number and breadth of concepts means that DBpedia now interlinks a very large number of additional datasets within the Linked Data cloud and has been used by some content providers, notably the NYT and the BBC, as a controlled vocabulary[5] i.e. 'tagging' content with DBpedia URIs and then aggregrating content around those tags.
--Derivadow (talk —Preceding undated comment added 23:26, 17 November 2009 (UTC).
[edit] History
(Hi, I've tried to write up a bit of history about the project - who was involved, what happend when etc. Apologies for all the people I've not referenced and the mistakes I've made and inparticular the relationship between Leipzig and Berlin which I've never really groked. I'm also aware that this probably lacks sufficient references to pass Wikipedia's quality standards. Although that might not be true. Does anyone else?
I'm not sure where to go from here - I guess if others would like to contribute to what I've written we should be able to get the article into a publishable state? In particular I think fact checking on what I've written would be helpful + a bit of info on the technology stack would be handy, anyone fancy writing that?)
---
DBpedia was first proposed by Chris Bizer in early December 2006 [6], in mid December 2006 Georgi Kobilarov and Richard Cyganiak joined the team and agreed on the name dbpedia.org on the 20th December 2006, on the 21st December Richard Cygniak registers the domain name dbpedia.org.
Over Christmas 2006 and early January 2007 Georgi Kobilarov developed the code to import Wikipedia dumps and on the 23rd January 2007 Sören Auer from the Universität Leipzig (who also developed the infobox extraction code) announced the first release of dbpedia.org[7]. The first release featured:
- two large extracted datasets
- a SPARQL endpoint and a data browser
- a visual query builder
This initial release featured structured data about people and cities and used D2R to publish the data.
After this initial release discussions began with OpenLink Software who offered to host the triple store, which took place late February 2007. The migration to OpenLink servers also also conincided with a SPARQL endpoint thanks to OpenLink's Virtuoso server.
By mid May 2007 dereferenceable URIs where available, published on top of the Virtuoso SPARQL endpoint. This initial "linked data frontend" built on top of the SPARQ endpoint was a hack based on the original D2R Server code. In June, after some additional work, Richard Cyganiak released it as "Pubby". The frontend ran on servers in Berlin, while the SPARQL endpoint ran on OpenLink-hosted machines - this architecture resulted in poor response times early on.
In July 2007 a new data extraction framework had been built with a unified codebase from Leipzig and Berlin. By this time DBpedia contained 1,950,000 "things", including at least 80,000 people, 70,000 places, 35,000 music albums, 12,000 films, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset consisted of around 103 million RDF triples.[8]
--Derivadow (talk) 13:28, 18 November 2009 (UTC)
- Hi - just as a general note: In what is supposed to be a first overview of a topic, brevity is a virtue. Some of the things in the above are a bit too detailed and probably better to put on the dbpedia.org site itself (some examples, just so you can know what I'm thinking of: "on the 21st December Richard Cygniak registers the domain name dbpedia.org", "and used D2R to publish the data", the second-to-last paragraph).
- Wikipedia is not paper of course, but one strength of WP is to give our readers an overview of a topic fairly quickly. I'm sure dbpedia will become larger and more successful and well known as time progresses, and as that happens more a detailed article will be constructed, but I think today the current article length is roughly appropriate. henrik•talk 20:16, 18 November 2009 (UTC)
-
- user:henrik Hi, thanks for the advice -- I'll certainly cut the text back before I copy it across to the article. I hope that others will help with the article, so I'll wait to see what comes in before I make any further edits. Thanks you.
--Derivadow (talk) 22:35, 18 November 2009 (UTC)
Not only do we need brevity, we also need accuracy re. the description of the DBpedia project. For starters, DBpedia isn't a Java + PHP application. Its a Live SPARQL compliant Quad Store (Virtuoso_Universal_Server) populated with data sets produced by the Java + PHP extraction code. If you don't have a Live Web Accessible Quad Store, you don't have DBpedia, period. In a nutshell, read my description (via blog post) of the DBpedia project which has been written to address many of the unfortunate, historic, and perpetual inaccuracies surrounding this project at: <http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1592>. --Kingsley Idehen 98.229.179.218 (talk) 19:04, 22 November 2009 (UTC)
Part of the present problem appears to be a blurring of what "DBpedia" is. Minimally, there are 3 components that probably deserve separate coverage. First, there's the website which, taken broadly, includes the project documentation, the data repository, and the SPARQL and other interfaces to the data. Second, there are the extractors which, though they did come first and are integral to the project, aren't its public-face and should generally be invisible to people who aren't involved in this or similar projects. Third, there's the RDF engine which powers the quad store, services the SPARQL queries, etc.
The page we've got now is mostly focused on documenting the first of these, while the infobox appears to have (at least initially) been focused on the second. The third is largely ignored, which doesn't seem appropriate given that no available alternative was able to deliver the needed functionality.
I suggest that the existing page be reworked to focus on the project on a whole, and that it treat that whole as a Website (albeit largely a non-traditional one, being Data vs Documents). I suggest that a section within this page be created for the DBpedia extractors; if justified, this section could be broken out to a new page. There should also be a section that addresses the technologies brought to bear for providing the DBpedia lookup service(s) and interface(s), including but not limited to the underlying engine. MacTed (talk) 21:32, 24 November 2009 (UTC)
[edit] COI tags removed
The COI tags were removed from this talk page last month.[6] Ikip (talk) 05:45, 19 December 2009 (UTC)
[edit] Open data for UK geo-tagged articles
[I can't find a "DBPedia liaison project page" where I can raise this; if there is one, please point me to it]
MySociety's MaPit service now provides open data for any point in the UK, from its coordinates (or postcode), returning the containing administrative areas such as ward and constituency. So, if a Wikipedia article about any UK subject is geotagged, the relevant containing areas (and thus authorities) for the location can be determined.
For example see Electric Cinema, Birmingham, then select the coordinates (shown by default in the top-right), and on the page of links to mapping services, the MaPit link - http://mapit.mysociety.org/point/4326/-1.8987,52.4766.html - is the final item under the "Great Britain" heading. Dropping the ".html" suffix returns JSON: http://mapit.mysociety.org/point/4326/-1.8987,52.4766
Can this data be used/ referenced by DBPedia? Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 14:52, 8 October 2010 (UTC)
- There is a mailing list dbpedia-discussion@lists.sourceforge.net more details can be found on the source forge support page [7]. Emailing the mailing list will be the best way to get in contact. The wikipedia page is not connected to the project.--Salix (talk): 15:59, 8 October 2010 (UTC)
[edit] Example
Could the example be a bit less obscure. Because it is it does not even help explain anything.--Aa2-2004 (talk) 16:02, 17 May 2011 (UTC)
[edit] Lack of license conformance resulting in copyright violation
DBpedia mentions that all content is dual licensed under GFDL and Creative Commons BY-SA[9] but fails to observe either license. The individual content pages like http://dbpedia.org/page/Washington,_D.C. have no license on them (required by ShareAlike) and the authors of the copyrightable text extracts are not attributed. The majority of DBpedia content presently seems to be therefore a (most likely unintentional) copyright violation.
- The above statement was added by Vigilius to the article page. I moved it here to discuss it. The main content of DBpedia is downloadable as dumps and a clear copyright statement is included there: http://wiki.dbpedia.org/Downloads . The page mentioned above http://dbpedia.org/page/Washington,_D.C. is a human readable rendering of the data. The data pages are http://dbpedia.org/resource/Washington,_D.C. or http://dbpedia.org/data/Washington,_D.C..ntriples . The main content of DBpedia is clearly made from Wikipedia articles and infoboxes. The source page is linked to with the foaf:page property, which seems to be the current best-practice when reusing content from Wikipedia. This is done in a machine readable way. We already had the discussion how to correctly attribute Wikipedia, but there has not been a solution yet. DBpedia could include one more link to a Wikipedia author page, but such a page does not exist. I removed the above text, because the term "content page" is used in a weird manner and the last sentence contains a "seems". We should discuss how DBpedia can fix this the best way. SebastianHellmann (talk) 16:32, 29 December 2011 (UTC)
- Thanks, Sebastian. AFAIK Copyright law is primarily about what humans see, so I will not discuss the technical formats you mention. Basically if you copy something from Wikipedia it first is a copyright violation, just like copying Sony Music property. But Wikipedia offers you the option of the BY-SA license, which, provided you are in full compliance with it, makes your copying legal. What counts is thus: http://creativecommons.org/licenses/by-sa/3.0/legalcode. BY-SA section 4 c clearly states that authors must be appropriately mentioned. Although authors ElSEWHERE may waive some rights and agree to appear only on the cover, without attribution of individual contributions, they don't on Wikipedia. The level of "work" is the page, here the authors are attributed on Wikipedia. -- You write "DBpedia could include one more link to a Wikipedia author page, but such a page does not exist.": You seem to be looking for a single author list of the entire Wikipedia. However, this would not be relevant. Imagine a poetry collection from several authors: If you re-use one poem, you cannot simple attribute the poem to the collection as a whole, not mentioning who wrote THIS poem (here the comparison ends, Wikipedia articles are certainly not poetic...). Thus, there are as many author attribution pages as there are works in the Collection of individual articles that Wikipedia is composed of. -- I see two things DBPedia might need to do: 1. Add a license statement and link on EACH page, clearly stating the license and perhaps license mix (as soon as images/files come into play). DBpedia can be a source for others. The licensing rules are meant to safeguard against re-users overlooking their obligations. You may also creating a derivative work, so you must license you own work under the same license. 2. Link all content coming from a specific page to the history of that page on Wikipedia, clearly labeled as "attribution to be found here". There may be more: Wikipedia has better copyright and license experts than me, try to contact them on their talk pages. See especially Wikipedia:Mirrors and forks and Wikipedia:Copyrights#Reusers.27_rights_and_obligations - users contribution to those pages can probably advise you better than I. --Vigilius (talk) 19:31, 29 December 2011 (UTC)
I've filed a bug here. Just to explain the context for Wikipedians, the only thing that might be slightly complicated is that DBpedia is a composite of numerous Wikipedia pages from multiple projects. Each DBpedia entry might pull from multiple language versions of Wikipedia, and there is also deductive inference of relationships between articles. Consider the article on a person: the infoboxes often contain "influenced by" and "influenced" sections that get turned into relationships. DBpedia may draw that stuff from multiple articles. So, it might say on Michael Jackson that Prince was influenced by Jackson, but not say that on the Prince article, but because DBpedia has indexed all of Wikipedia, it might then say that Prince was influenced by Jackson, because the article on Jackson says he influenced Prince. Providing full attribution for that kind of stuff may be difficult, especially if there is complex deductive inference rules. I think the most that might be practical is modifying it to say that material has been taken from a particular enwiki article and also perhaps noting that material was taken from articles in other languages on the same topic. This would be for the longer descriptions in various languages: do remember though that the facts contained in the infoboxes might not be copyrightable. —Tom Morris (talk) 18:55, 29 December 2011 (UTC)
- Since facts are not copyrightable, you probably don't need to worry about the deductions. Copyright protects creative text. But then you are copying plenty of that on the dbpedia pages. As said above, I believe you must attribute the authors of these works. Per page, per language. --Vigilius (talk) 19:31, 29 December 2011 (UTC)
- Just an aside, I'm not doing any copying. I know about the Semantic Web, but am not directly involved in DBpedia. —Tom Morris (talk) 21:44, 29 December 2011 (UTC)
In order to resolve this issue, we have now added attribution and licence information to the bottom of each page. --Soeren1611 (talk) 21:23, 29 December 2011 (UTC)
- The footnote does resolve the share-alike license-citation issue, but I believe not the attribution issue. The link refers only to the English page on Washington DC., which has different authors than, e.g., the copyrighted German or Spanish texts starting with "Washington, D.C. ] ist die Hauptstadt..." and "Washington D. C. es la capital ..." -- I also believe that it would be better if the footnote text would express that the attribution requirement is meant to be fulfilled by the link. "Extracted from" would probably not make it clear to further users that they need to cite this attribution when re-using content or deriving from it. --Vigilius (talk) 12:38, 30 December 2011 (UTC)
- The problem DBpedia has is that there is no attribution page on Wikipedia we can link to. To which page exactly should we link for attribution? Are the main article page or the history page sufficient? They do not look like attribution pages to me. SebastianHellmann (talk) 17:24, 30 December 2011 (UTC)
Cite error: There are <ref> tags on this page, but the references will not show without a {{Reflist}} template or a <references /> tag; see the help page.