bioguid.info


I've put together a web site called http://bioguid.info which, rather grandly, is an attempt to bootstrap the biodiversity Semantic Web by providing resolvable URIs for biological objects, such as publications, taxonomic names, nucleotide sequences, and specimens. These URIs (or "GUIDs") can be resolved by a web browser to display HTML, but under the hood are resolved to RDF (which you can see by viewing the source of the web page you get for a URI).

The web interface is really window dressing, I just wanted a way to display RDF that wouldn't frighten people (me included). For some URIs all I do is grab XML and reformat it (e.g., DOIs). For GenBank records, all manner of agony is involved in trying to extract specimen and publication links (e.g., DOIs for papers that don't have PubMed identifiers).

A good place to get a sense of what bioguid.info is about is to start with this Pubmed record: http://bioguid.info/pmid:17079492. Among the sequences is http://bioguid.info/gi:117651452, which is from specimen http://bioguid.info/casent:0106123 (you'll need Firefox 1.5, Camino, Webkit, or a browser with a SVG plugin for the full effect).

Under the hood it tries to play nice with Semantic Web browsers by returning 303's when resolving a URI for an object. The Disco - Hyperdata Browser, for example, displays the RDF nicely (although its timeout of 2 seconds means it might not load bioguid.info RDF first time round).

bioguid.info is a bit of a toy, but I'd welcome any comments. Some related background is on my SemAnt blog. In part bioguid.info is motivated by the fact that LSIDs (as discussed previously on nodalpoint) don't fit the Semantic Web model of HTTP URIs (see doi:10.1109/mis.2006.62).


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

relationship to LSID?

Hi Rod, I hope this isn't a stupid question, but can you explain a little more of the relationship between BioGUIDs and LSID? This isn't clear from browsing the site. Are you re-using any of the LSID infrastructure? Or is this a completely seperate effort?

On a related note, I remember you gave a talk here in Manchester about a year ago on biological identifiers, a "wisdom of the crowds" type presentation which had some nice examples of different types of identifiers in bioinformatics and problems with what they identify (NCBI taxonomy etc). Do you have the slides of this, or subsequent talks along the same lines?


LSIDs, talk, and database inconsistencies

Duncan, it's not a stupid question at all, especially as I glossed over this in the blurb about the site. Basically, I think LSIDs are probably a dead end, in that they don't play nice with current Semantic Web tools, which expect HTTP URIs. Hence, cool tools like Disco-Hyperdata Browser can't use LSIDs directly. Now, you could argue that if we use an HTTP proxy (like my own lsidres.org) to resolve the LSID, then we can use Semantic Web tools, but this is a hack. Furthermore, it doesn't work with tools like Disco. There are also issues with representing concepts and things with URIs (the infamous httpRange-14 problem, see my del.icio.us bookmarks tagged with 303 for some background). If we use HTTP URIs, then we can easily use 303 redirects to handle URIs that don't point to documents.

The other issue for me is that LSIDs are non-trivial to set up (you need to fuss with SRV records in the DNS, or convince your systems admin to do this). The fact that so few LSIDs are "real", in the sense that you can resolve them wiht the LSID protocol speaks volumes. None of the LSIDs in the Uniprot RDF dumps are real, for example. If the only way to make LSIDs semi-usable is to put an HTTP proxy in front of them, then why go to all the hassle of supporting the LSID resolution protocol in the first place? As much as I get the arguments that URNs have advantages, and HTTP may disappear one day, I just don't see that LSIDs give us want we want right now. Having said that, they did a huge service by getting people in my community to think about RDF, and then the Semantic Web, which is where the real action is.

bioguid.info will soon start supporting LSIDs, I just need to bolt on my resolver code from lsidres.org, and write some XSL to transform the RDF into the form that I'm using (I'm trying to use a consistent vocabulary to that I can meaningfully integrate RDF from multiple sources). So, soon you'll be able to resolve a LSID in the same way as a DOI, etc.

A version of the talk I gave at Manchester is on my iSpecies blog here. For some related ideas more specifically on identifiers and what happens when you link stuff together, see my posts on inconsistences within GenBank, and discovering that GenBank has sequences for species but doesn't know it (near the end of this post)..


Taxonomy is dead...long live taxonomy!

Thanks for the links. The whole HTTP URIs issue is a bit of a hot potato...and a philosophical quagmire.

If you're looking for more feedback on bioguid, there are lots of people interested in and using various biological identifiers over at http://lists.w3.org/Archives/Public/public-semweb-lifesci/