Pubmed module is back

I doubt many will have noticed but I have made a few minor changes to the nodalpoint pubmed integration module. Specifically I have upgraded to the new EUtilities interfaces. There are a few new abstracts in the submission queue waiting for moderation.

The nodalpoint pubmed module enables nodalpoint users to moderate and comment on scientific abstracts retrieved from the pubmed database. At the moment the module only serves as a proof of concept , a lot of work needs to be done to make the module more user friendly.

Recently I have noticed that others have picked up on this concept. For details see Clay Shirky's weblog post about the original nature article.

The pubmed module still broken and needs updating to the latest Drupal. I have been quietly working on a new version in my spare time although progress has been slow due to other commitments (PhD thesis). A few details follow for anyone still reading...

So far I have completed the EUtils client but I haven't started working on the Drupal integration module. Drupal has gone through quite a few changes so I face a further learning curve.

However in the mean time I have a question: Does a list of freely available online full text journals exist ?

I am aware of pubmedcentral and various others. I need the list because I would like to limit searches to pubmed made through nodalpoint to journals that offer free online access. The reason for this is that an online community centered around the discussion of scientific literature without access to that literature is rather pointless. So far I can filter at either two stages: retrieval of the pubmed ids or the actual article summaries. Both require two different approaches.

In the first case filtering would be on the server side by using a search restricted to free full text journals. i.e. "bioinformatics AND Proc Natl Acad Sci U S A[ta]" where ta is a the field term "Journal title or abbreviation". In the second case the search would be unrestricted and filtering would be done by matching the journal title field to a list of journals. Either way I need a list of journals to filter by. I think the list will just have to be compiled as we go along...

I would genuinely appreciate any thoughts or comments on this matter ?

Further issues: filtering articles with no abstracts, future integration of the taxonomy (Mesh terms ?) and automatic annotation of gene names etc. found in the abstracts.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Title field length

Is this tweakable? It seems that some of the longer titles are being truncated during import. Would it be possible to increase the field size -- it'll look neater :-)


Maybe I'm just a bit tired,

Maybe I'm just a bit tired, but do you have an example to point to of this?


Sure

Check this one out. It's still in moeeration atm, so hopefully it won't fall out by the time you look at it.


Trackback?

Great ideas - I forgot to mention nodalpoint the other day, so I'm glad you picked up the thread. If you can find a way to send Trackbacks from nodalpoint nodes to the appropriate HubMed references, that would be ideal.

As for open access, I think working out a list of free journals is virtually impossible. PM2mail had a list for a while of journals that were freely available, to complement the list of journals available to third world countries through HINARI, but it changed too often to be worthwhile. It woud be better to try and find a way to make these papers available. There has been a suggestion that, as the means of distribution most allowed by publishers is via the author's personal website, that would be the best place to make papers available: http://johnvu.net/blog/archives/000049.php Perhaps when a paper looks worth reviewing, someone could write to the authors and ask them to put the PDF on their website, and post the link on here so everyone can read it...

Regarding setting up MESH headings as a Drupal taxonomy, I've been looking for a way to do this for a while. If you have any ideas, I'd love to hear them.


Trackback problems

Great ideas - I forgot to mention nodalpoint the other day, so I'm glad you picked up the thread.

No problem, the recent thread gave me an incentive to go back and update to Eutils.

If you can find a way to send Trackbacks from nodalpoint nodes to the appropriate HubMed references, that would be ideal.

This is cool :)

I've installed the Drupal module for Trackback support. So theoretically this should be possible now. I haven't investigated throughly but at the moment the ping is manual. Registered users should be able to click on the trackback links for any node and add a ping url. I've tried to ping the hubmed trackback urls but I was having some problems (I'll send you some email about this).

There has been a suggestion that, as the means of distribution most allowed by publishers is via the author's personal website

I like this idea and this list (via John Vu's blog) is encouraging: http://www.lboro.ac.uk/departments/ls/disresearch/romeo/Romeo%20Publisher%20Policies.htm. According the their figures roughly %50 percent of journals allows self-archiving. I've noticed Citeseer links out to pdfs on author's websites. I have know idea how they compile this information and how could something like this be automated ?

Regarding setting up MESH headings as a Drupal taxonomy, I've been looking for a way to do this for a while. If you have any ideas, I'd love to hear them.

This was just an "idea" so nothing specific at the moment.


Active access

Citeseer has a spider that crawls through the web, picking up interesting papers and parsing information out of them as it goes along. This is out of our league, really, so I was thinking of something a little more personal. It would be analogous to writing to an author and asking them for a reprint of their paper, but in this case they'd put the paper on their personal website instead of sending it by post. Of course, if they didn't have a personal website, you could set up a personal blog for them and post the reprint here:)


Here is a list...

Just a quick post to point out a list of free online full-text articles.

It's nice to see "mainstream" interest in new forms of peer-reviewing. This problem of free-content is really complicated. A lot of journals are only free after some time. Even if there was a way to delay putting the articles up in Nodalpoint until they are free, we would be discussing 6 months,1 year old papers...

Could we take some more radical action ? :) Use a temporary FTP site and share resources ? Share login/passwords ? (Probably illegal) Create institutional accounts in Nodalpoint for some journals ? (Probably not a good idea to start thinking money)

The Nodalpoint users could post a list of journals they have access to and we could add the common journals (if any) to the free list.

Do you have a list of to do's ? We could help a bit with some code maybe :)


Thanks

Thanks for that list of journals. I'll have to format the list by hand, an XML feed of free full text journals would be a bit much to ask I guess :)

Even if there was a way to delay putting the articles up in Nodalpoint until they are free, we would be discussing 6 months,1 year old papers.

I would argue that discussion of research that is 6 months to a year old is probably pointless for an online community. PDFs for all abstracts need to be available to all members as soon as they are published.

Could we take some more radical action ? :) Use a temporary FTP site and share resources ? Share login/passwords ? (Probably illegal) Create institutional accounts in Nodalpoint for some journals ? (Probably not a good idea to start thinking money)

While the idea of radical action definitely appeals to me (and I'm sure Neil too), all of these solutions probably violate copyright or terms of use in one way or another. So these solutions would only last while the site remains relatively obscure.

Do you have a list of to do's ? We could help a bit with some code maybe :)

I'm reluctant to put my hand up and say "lets start a miniproject" around these ideas, I just don't have the time. I'll try to keep poking at the module code in the mean time and anyone who would like a copy of the code is of course welcome to it (don't laugh it was meant to be "proof of principle").


Giving this some thought

Good to see this back - it's a really valuable resource, if not a priority atm.
Access to free journals is tricky - I've been thinking about this for my own weekly searches. Ideally, you'd be able to grab a PDF automatically for each reference that had one and link its name to the reference, maybe via a database or something like RefDB.
There are actually very few truly freely available journals - the list at PubMed Central is rather small. I think it might be better to use the resources we have at UNSW to access more titles.
I will give this more thought, but I'm thinking along the lines of constructing a query using something like LWP (or the non-Perl equivalent) to an address whose output specifies a link to a PDF (or null if not).


PDFs

The directory of open access journals is here, by the way: http://www.doaj.org/.

I wouldn't even begin to try automatically retrieving PDFs with Perl - I've spent ages trying to get that to work, but some publishers go to great lengths to make it difficult, and of course they all have their own page layouts and ways of linking. In an ideal world, the PubMed link would go straight to the abstract page on the publisher's site, which would contain a machine-readable tag pointing to the PDF. Unfortunately I don't think the chances of that happening would fit on a scale of 1 to extremely unlikely.


Hard, impossible?

Correct, this is dead hard. I gave up on publisher's sites and took a look at the system used by our library, which presents a more standard interface to online journals. There are layers of authentication, cookies and javascript popups to contend with and you finish with an immensely long URL which is virtually impossible to construct from scratch. All in all, pretty unsatisfactory from the "I want the PDF" point of view.