Sci-Hub's remarkable coverage of scholarly literature & the future of publishing

Centre for Genomic Regulation

Aula Room, Barcelona

15:00 Friday, November 7, 2017

Online at slides.com/dhimmel/barcelona

Greene Lab

I'm a data scientist

http://www.greenelab.com/

Event details:

The website Sci-Hub provides access to scholarly literature via fulltext PDF downloads. The site enables users to access articles that would otherwise be paywalled. In March, Sci-Hub tweeted the identifiers (DOIs) for all articles in their repository. By integrating this dataset with a catalog of scholarly literature, we assessed Sci-Hub's coverage and found that Sci-Hub contains 86% of articles in toll-access journals. This number rises to 96% for recently-cited articles.

We suggest the ubiquity of Sci-Hub will disrupt scholarly publishing. Specifically, toll access publishing will no longer be a viable business model. We provide evidence that the transition is already underway and urge the community to adopt libre open access as an alternative. This study was performed openly on GitHub at https://github.com/greenelab/scihub. A preprint is available at https://doi.org/b9s5.

Himmelstein DS, Romero AR, McLaughlin SR, Greshake Tzovaras B, Greene CS. (2017) Sci-Hub provides access to nearly all scholarly literature. PeerJ Preprints DOI: 10.7287/peerj.preprints.3100

​Sci-Hub is available at:

  • https://sci-hub.cc
    Territory of Cocos (Keeling) Islands
  • https://sci-hub.io
    British Indian Ocean Territory
  • https://sci-hub.ac
    Saint Helena, Ascension and Tristan da Cunha
  • https://sci-hub.bz
    Belize
  • scihub22266oqcxt.onion
    Tor Hidden Service (dark web)

Ⓐ 2011-09-05: created by Alexandra Elbakyan, the Sci-Hub website goes live

🔒

2013-03-20: Sci-Hub switches to using LibGen as a repository to cache articles.

Ⓑ 2015-01-04: LibGen domain name registrations expire after site administrator dies from cancer.

Ⓒ 2015-06-03: Elsevier files a civil suit against Sci-Hub and LibGen in the U.S. District Court for Southern NY.

2015-10-30: Elsevier is granted a preliminary injunction to suspend domain names. Bye sci-hub.org

2016-02-10: “Meet the Robin Hood of Science” by Simon Oxenham

The New York Times:

Should All Research Papers Be Free?

Alexandra Elbakyan

Ⓕ 2016-04-29: Who’s downloading pirated papers? Everyone” by John Bohannon in Science

https://doi.org/bf37

Ⓗ 2016-04-29: Elsevier wins a default judgement ordering defendants to pay Elsevier $15 million.

Representative work #28

Ⓘ 2016-06-23: The American Chemical Society files suit against Sci-Hub in the Eastern District of Virginia..

Ⓚ 2017-09-05: Sci-Hub blocks access to Russian IP addresses due to disputes with the scientific establishment.

Idiogramma elbakyanae

2017-11-03: ACS wins suit against Sci-Hub

  • Ordered that any person or entity in active concert or participation with Defendant Sci-Hub and with notice of the injunction, including any Internet search engines, web hosting and Internet service providers, domain name registrars, and domain name registries, cease facilitating access to any or all domain names and websites through which Sci-Hub engages in unlawful access to, use, reproduction, and distribution of ACS’s trademarks or copyrighted works.
  • Computer and Communications Industry Association (CCIA) filed an amicus brief (rejected) regarding the suits targeting of "Neutral Service Providers"
  • ACS Mission: To advance the broader chemistry enterprise and its practitioners for the benefit of Earth and its people.
  • https://github.com/greenelab/scihub
  • https://github.com/greenelab/scihub-manuscript
  • https://github.com/greenelab/crossref
  • https://github.com/dhimmel/scopus
  • https://github.com/greenelab/scihub-browser-data

But what scholarly articles are not in Sci-Hub?

  • There are 10 DOI Registration Agencies
  • Crossref has registered 67% of all DOIs in existence
  • In March 2015, 99.9% of English Wikipedia DOI links were registered via Crossref
  • 90% of newly published articles in the sciences have DOIs
  • Catalog of 87,542,370 DOIs
  • cAsE InSENSITive

Metadata for porn from the Entertainment Identifier Registry

Study at https://doi.org/b9s5

49% of 2.8 million articles

85% of 54 million articles

Currently, the Sci-Hub does not store books, for books users are redirected to LibGen, but not for research papers. In future, I also want to expand the Sci-Hub repository and add books too.

Elbakyan (2017)

Data from "The State of OA" Study https://doi.org/gbqtxd

Data from "The State of OA" Study https://doi.org/gbqtxd

  • Extracted DOI citations from OpenCitations
  • Recent studies (since 2015) had 6,252,279 outgoing citations to articles in toll access journals
  • 96.2% in Sci-Hub

Coverage of cited articles

https://github.com/greenelab/library-access

How do oaDOI & Sci-Hub compare to the access of University of Pennsylvania?

Jacob Levernier

Monthly Bitcoin Donations

As of September 26, 2017:

  • Three known bitcoin addresses
  • received 1,128 donations, totaling 93.94 bitcoins
  • $64,455 US at time of donation
  • 68.48 donated bitcoins that remain unspent are now worth €444,000
    In October, much of the bitcoin was withdrawn
  • Sci-Hub tweeted: “the information on donations … is not very accurate, but I cannot correct it: that is confidential.”

While this study had a number of interesting aspects, its virtual lack of success as a tool for reducing the library's journal budget was largely due to the fact that the overall problem was seen by everyone concerned as a library problem. As such, the only solution available to the library in 1981 was to use monograph and binding funds to help offset the shortfall in the serials and journals budget. While the biology and chemistry libraries were spared drastic cuts because of very generous support from divisional funds, Caltech's engineering libraries were extremely hard hit, and only now after nearly seven years have they recovered (just in time for the current crisis). It should be pointed out here that from 1974 to 1983 the materials budgets for the departmental libraries were the responsibility of appropriate divisions.

Serials Crisis

Dana Roth (1990) "The Serials Crisis Revisited"

The Serials Librarian. https://doi.org/dvwb7f

Dana Roth (1990) "The Serials Crisis Revisited"

The Serials Librarian. https://doi.org/dvwb7f

Source: Association of Research Libraries. Expenditure Trends in ARL Libraries, 1986–2015

Prices 1986–2015

  1. Inflation — 118%
  2. Library expenditures — 197%
  3. Journal subscriptions 521%

Libre Open Access

Headlines:

  • Science: Sci-Hub’s cache of pirated papers is so big, subscription journals are doomed, data analyst suggest
  • Inside Higher Ed: Inevitably Open
  • Quartz: A pirating service for academic journal articles could bring down the whole establishment

https://doi.org/b9s5

Sci-Hub  ⇒ open scholarly literature?

What library will continue to subscribe if a growing proportion of articles is available for free elsewhere?
Tom Reller (2013) Vice President, Elsevier

Defendants’ actions also threaten imminent irreparable harm to Elsevier because it appears that the Library Genesis Project repository may be approaching (or will eventually approach) a level of “completeness” where it can serve as a functionally equivalent, although patently illegal, replacement for ScienceDirect.

DeMarco, Hirschberg & Sen (2015) Attorneys for Elsevier

100 articles every ecologist should read

Courchamp & Bradshaw (2017) Nature Ecology & Evolution https://doi.org/cf8f

Changing times? Thus far in 2017

  • University of Montreal cut 2,231 journal subscriptions from Taylor & Francis (93%)
  • Universities in the Netherlands dropped their Oxford University Press subscription
  • ​Germany, Peru, and Taiwan entered 2017 without  Elsevier deals after negotations reached impasses
  • Preprint growth

Libre Open Access

https://greenelab.github.io/scihub-manuscript

Manubot

powering the next generation of scholarly manuscript

Get started at tiny.cc/manubot

 

https://github.com/greenelab/manubot-rootstock

The Manubot project began with the [Deep Review](https://github.com/greenelab/deep-review),
where it was used to compose a highly-collaborative review article [@doi:10.1101/142760].
Other manuscripts that were created with Manubot include:

+ The Sci-Hub Coverage Study
  ([GitHub](https://github.com/greenelab/scihub-manuscript), [HTML manuscript](https://greenelab.github.io/scihub-manuscript/)) 
  [@doi:10.7287/peerj.preprints.3100]
+ Michael Zietz's Report for the Vagelos Scholars Program
  ([GitHub](https://github.com/zietzm/Vagelos2017), [HTML manuscript](https://zietzm.github.io/Vagelos2017/)) 
  [@doi:10.6084/m9.figshare.5346577]

The Manubot project began with the Deep Review, where it was used to compose a highly-collaborative review article [1]. Other manuscripts that were created with Manubot include:

1. Opportunities And Obstacles For Deep Learning In Biology And Medicine
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Wei Xie, Gail L. Rosen, … Casey S. Greene
Cold Spring Harbor Laboratory (2017-05-28) https://doi.org/10.1101/142760

2. Sci-Hub provides access to nearly all scholarly literature
Daniel S Himmelstein, Ariel R Romero, Stephen R McLaughlin, Bastian Greshake Tzovaras, Casey S Greene
PeerJ Preprints (2017-07-20) https://doi.org/10.7287/peerj.preprints.3100

3. Vagelos Report Summer 2017
Michael Zietz
Figshare (2017) https://doi.org/10.6084/m9.figshare.5346577

Write markdown

Automatically converted to rich text

Automatic bibliographic metadata

[@doi:10.7287/peerj.preprints.3100]
[@arxiv:1407.3561v1]
[@pmid:24159271]
[@url:http://blog.dhimmel.com/biorxiv-licenses/]

2. Continuous integration rebuilds the manuscript

Timestamped on the Bitcoin blockchain via OpenTimestamps

3. Continuous deployment back to GitHub

Pull requests for manuscript collaboration

the future: living but versioned

“Finally, we estimate that over a six-month period in 2015–2016, Sci-Hub provided access for 99.3% of valid incoming requests.”

— DOI: 10.7287/peerj.preprints.3100v1

“In the first version of this study, we mistakenly treated the log events as requests rather than downloads. Fortunately, Sci-Hub reviewed the preprint in a series of tweets, and pointed out the error…”

— DOI: 10.7287/peerj.preprints.3100v2

The Deep Review

  • review article on deep learning in precision medicine
  • 27 authors from 20 different institutions
  • readers appreciate the breadth of perspectives
Made with Slides.com