Pages Menu
TwitterRssFacebook
Categories Menu

Posted by on Jul 30, 2009

Data, the currency of academic jargon, and a corpus of 14 billion words

So let me start with the following caveat:

I work for the Stor of J. There, I said it and I am proud of it.

That being said, this Data for Research service mentioned below is pretty amazing, especially if you love

  • academic literature
  • nerdy librarian things
  • data/statistics/factoids to impress people at parties
  • key terms/word counts/academic jargon/currency

JSTOR is offering a beta service called “Data for Research”. The original intention of the Data for Research tool was to make it easier to fulfill requests for data sets and support data mining needs. However, the DfR beta also makes it possible to search and browse across all JSTOR collections, using a type of faceted search interface. The journal content on this beta site is updated 1-2 weeks after each content release on the main site.

With DfR, researchers can

  • conduct full-text and fielded searching of the entire JSTOR archive using a powerful faceted search interface. Using this interface one can quickly and easily define content of interest through an iterative process of searching and results filtering.
  • view document-level data including word frequencies, citations, and key terms.
  • request and download datasets associated with the content selected.

From DfR, you can also request and download datasets associated with selected content or automate this process with our API. Curious to know when academic vocabulary fell in and out of favor in academic circles? DfR lets you track that information from the over 14 billions words, 4.8 million+ articles and 350 years worth of academic research found in JSTOR.

Personally, I love the fact that the term perestroika peaked in academic literature, perhaps not surprisingly, in the early 1990’s, while tuberculosis seemed to gain some use as an academic term at the turn of the last century. Librarians should take note as well as DfR will basically automatically pull the key terms for each discipline over the entire corpus. If you are struggling with synonymous search terms for an intricate advanced search statement, this is the place to go. Click on any of the 50 disciplines and see all the key terms associated with it.

A special feedback form has been established for this project (http://dfr.jstor.org/requests/contact/) and is linked to all of the pages of the DfR site. If you can think of any ways you want to mine the JSTOR data not supported in this beta, let JSTOR know and they will try to incorporate it into the next instance of DfR.

Share : Share on TwitterShare on LinkedinShare on GooglePlusShare on PinterestShare on Facebook

Post a Reply

Your email address will not be published. Required fields are marked *