Sci-Hub and the future of publishing

2018 Library Science Talks, Switzerland

  • 16:15 Monday, June 25 at HEPIA in Geneva
  • 16:15 Tuesday, June 26 at Zentralbibliothek Zürich

Online at slides.com/dhimmel/switzerland

Greene Lab

I'm a data scientist

http://www.greenelab.com/

Event details:

The website Sci-Hub provides access to scholarly literature via fulltext PDF downloads. The site enables users to access articles that would otherwise be paywalled. In March 2017, Sci-Hub tweeted the identifiers (DOIs) for all articles in their repository. By integrating this dataset with a catalog of scholarly literature, we assessed Sci-Hub's coverage and found that Sci-Hub contained 86% of articles in toll access journals. This number rose to 96% for recently-cited articles. In fact, Sci-Hub contained more toll access articles than were electronically available from University of Pennsylvania's libraries, despite Penn's annual subscription expenditures of $13 million US.

Legal suits by publishers have been unable to curb Sci-Hub's adoption. We suggest the ubiquity of Sci-Hub will disrupt scholarly publishing. Specifically, toll access publishing will no longer be a viable business model. We provide evidence that the transition is already underway and urge the community to adopt libre open access as an alternative. This study was performed openly on GitHub at https://github.com/greenelab/scihub and is published at https://doi.org/ckcj.

Biography:

Daniel Himmelstein is a postdoctoral fellow in the Greene Lab at the University of Pennsylvania. His research integrates public data to help understand human disease.

Daniel is also a proponent for open science and has done several projects to improve openness and communication in science. These projects include investigating publishing delays at journals, legal issues regarding data reuse, the license choices of preprint authors, and most recently the coverage of Sci-Hub.

Previously, Daniel received his PhD in Biological & Medical Informatics from the University of California, San Francisco. This is Daniel's first time to Switzerland.

Himmelstein DS, Romero AR, Levernier JG, Munro TA, McLaughlin SR, Greshake Tzovaras B, Greene CS. (2018) Sci-Hub provides access to nearly all scholarly literature. eLife DOI: 10.7554/eLife.32822

​Sci-Hub was available at:

  • https://sci-hub.cc
    Territory of Cocos (Keeling) Islands
  • https://sci-hub.io
    British Indian Ocean Territory
  • https://sci-hub.ac
    Saint Helena, Ascension and Tristan da Cunha
  • https://sci-hub.bz
    Belize
  • scihub22266oqcxt.onion
    Tor Hidden Service (dark web)
  • https://sci-hub.hk
    Hong Kong
  • https://sci-hub.la
    Laos
  • https://sci-hub.mn
    Mongolia
  • https://sci-hub.name
    Generic
  • https://sci-hub.tv
    Polynesian island nation of Tuvalu
  • https://sci-hub.tw
    Taiwan
  • scihub22266oqcxt.onion
    Tor Hidden Service (dark web)

​Sci-Hub is available at:

  • https://sci-hub.nu
    Island state of Niue
  • https://sci-hub.tw
    Taiwan

Current domains listed at:

  • https://sci-hub.app
  • Sci-Hub page on Wikipedia

Ⓐ 2011-09-05: created by Alexandra Elbakyan, the Sci-Hub website goes live

🔒

2013-03-20: Sci-Hub switches to using LibGen as a repository to cache articles.

Ⓑ 2015-01-04: LibGen domain name registrations expire after site administrator dies from cancer.

Ⓒ 2015-06-03: Elsevier files a civil suit against Sci-Hub and LibGen in the U.S. District Court for Southern NY.

Image 3850

2015-10-30: Elsevier is granted a preliminary injunction to suspend domain names. Bye sci-hub.org

2016-02-10: “Meet the Robin Hood of Science” by Simon Oxenham

The New York Times:

Should All Research Papers Be Free?

Alexandra Elbakyan

Ⓕ 2016-04-29: Who’s downloading pirated papers? Everyone” by John Bohannon in Science

https://doi.org/bf37

Ⓗ 2016-04-29: Elsevier wins a default judgement ordering defendants to pay Elsevier $15 million.

Representative work #28

Ⓘ 2016-06-23: The American Chemical Society files suit against Sci-Hub in the Eastern District of Virginia..

From the Washington Times Legal Classifieds on 2017-07-27. ACS paid $305.55.

Ⓚ 2017-09-05: Sci-Hub blocks access to Russian IP addresses due to disputes with the scientific establishment.

Idiogramma elbakyanae

2017-11-03: ACS wins suit against Sci-Hub

  • Ordered that any person or entity in active concert or participation with Defendant Sci-Hub and with notice of the injunction, including any Internet search engines, web hosting and Internet service providers, domain name registrars, and domain name registries, cease facilitating access to any or all domain names and websites through which Sci-Hub engages in unlawful access to, use, reproduction, and distribution of ACS’s trademarks or copyrighted works.
  • Computer and Communications Industry Association (CCIA) filed an amicus brief (rejected) regarding the suits targeting of "Neutral Service Providers"
  • ACS Mission: To advance the broader chemistry enterprise and its practitioners for the benefit of Earth and its people.

Ⓛ December 2017: Search interest spikes as domains are suspended after ACS judgement.

  • https://github.com/greenelab/scihub
  • https://github.com/greenelab/scihub-manuscript
  • https://github.com/greenelab/crossref
  • https://github.com/dhimmel/scopus
  • https://github.com/greenelab/scihub-browser-data
  • https://github.com/greenelab/library-access

But what scholarly articles are not in Sci-Hub?

  • There are 10 DOI Registration Agencies
  • Crossref has registered 67% of all DOIs in existence
  • In March 2015, 99.9% of English Wikipedia DOI links were registered via Crossref
  • 90% of newly published articles in the sciences have DOIs
  • Catalog of 87,542,370 DOIs
  • cAsE InSENSITive

Metadata for porn from the Entertainment Identifier Registry

Preprint at https://doi.org/b9s5

Study at https://doi.org/ckcj

eLife podcast #46

49% of 2.8 million articles

85% of 54 million articles

Currently, the Sci-Hub does not store books, for books users are redirected to LibGen, but not for research papers. In future, I also want to expand the Sci-Hub repository and add books too.

Elbakyan (2017)

  • Extracted DOI citations from OpenCitations
  • Recent studies (since 2015) had 6,252,279 outgoing citations to articles in toll access journals
  • 96.2% in Sci-Hub

Coverage of cited articles

Data from "The State of OA" Study https://doi.org/gbqtxd

Sci-Hub's coverage by category of article access

University of Pennsylvania Libraries

  • Founded by open science pioneer Benjamin Franklin in 1749
  • Endowment of $10 billion
  • $1.29 billion spent on research in 2016
  • Penn Libraries spent $13.13 million on electronic resources in 2017
    • 7.3 million articles
    • 860 thousand ebook chapters
  • Average per-download cost of $1.61

PennText

  • Alma library resource management system from Ex Libris
  • PennText correctly identified access status for 88% of articles
  • Half of the articles PennText claimed not to have access to, it did

https://github.com/greenelab/library-access

Sci-Hub versus Penn Libraries

326 toll access articles (manually checked)

  • Penn's access: 80.7%
  • Sci-Hub's database: 94.2%

https://github.com/greenelab/library-access

  • Tools such as oaDOI are limited due to low prevalence of Green OA articles.

Sci-Hub download logs

2017 estimates are missing an average of 120,000 downloads per day. https://git.io/f4900

Monthly Bitcoin Donations

As of December 31, 2017:

  • Three known bitcoin addresses
  • received 1,232 donations, totaling ₿94.494
  • $69,224 US at time of donation
  • $421,272 US at time of withdrawal with ₿9.027 remaining
  • Sci-Hub tweeted: “the information on donations … is not very accurate, but I cannot correct it: that is confidential.”

While this study had a number of interesting aspects, its virtual lack of success as a tool for reducing the library's journal budget was largely due to the fact that the overall problem was seen by everyone concerned as a library problem. As such, the only solution available to the library in 1981 was to use monograph and binding funds to help offset the shortfall in the serials and journals budget. While the biology and chemistry libraries were spared drastic cuts because of very generous support from divisional funds, Caltech's engineering libraries were extremely hard hit, and only now after nearly seven years have they recovered (just in time for the current crisis). It should be pointed out here that from 1974 to 1983 the materials budgets for the departmental libraries were the responsibility of appropriate divisions.

Serials Crisis

Dana Roth (1990) "The Serials Crisis Revisited"

The Serials Librarian. https://doi.org/dvwb7f

Dana Roth (1990) "The Serials Crisis Revisited"

The Serials Librarian. https://doi.org/dvwb7f

Source: Association of Research Libraries. Expenditure Trends in ARL Libraries, 1986–2015

Prices 1986–2015

  1. Inflation — 118%
  2. Library expenditures — 197%
  3. Journal subscriptions 521%

Libre Open Access

Headlines:

  • Science: Sci-Hub’s cache of pirated papers is so big, subscription journals are doomed, data analyst suggest
  • Inside Higher Ed: Inevitably Open
  • Quartz: A pirating service for academic journal articles could bring down the whole establishment

https://doi.org/b9s5

Sci-Hub  ⇒ open scholarly literature?

Librarians will never drop subscription access & recommend illicit alternatives … ?

feedback loop

What library will continue to subscribe if a growing proportion of articles is available for free elsewhere?
Tom Reller (2013) Vice President, Elsevier

Defendants’ actions also threaten imminent irreparable harm to Elsevier because it appears that the Library Genesis Project repository may be approaching (or will eventually approach) a level of “completeness” where it can serve as a functionally equivalent, although patently illegal, replacement for ScienceDirect.

DeMarco, Hirschberg & Sen (2015) Attorneys for Elsevier

100 articles every ecologist should read

Courchamp & Bradshaw (2017) Nature Ecology & Evolution https://doi.org/cf8f

Changing times? Thus far since 2017

  • 'read & publish' deals.
    (see https://doi.org/gdhp83)
  • Sweden canceled Elsevier subscription (see also Germany, Peru, and Taiwan)
  • University of Montreal cut 2,231 journal subscriptions from Taylor & Francis (93%)
  • Universities in the Netherlands dropped OUP subscription
  • More at SPARC's cancellation tracker https://sparcopen.org/our-work/big-deal-cancellation-tracking/
  • Preprint growth
  • Funder policies (e.g. Gates Foundation mandates CC BY)

Libre Open Access

https://greenelab.github.io/scihub-manuscript

Manubot

powering the next generation of scholarly manuscript

Get started at tiny.cc/manubot

 

https://github.com/greenelab/manubot-rootstock

The Manubot project began with the [Deep Review](https://github.com/greenelab/deep-review),
where it was used to compose a highly-collaborative review article [@doi:10.1101/142760].
Other manuscripts that were created with Manubot include:

+ The Sci-Hub Coverage Study
  ([GitHub](https://github.com/greenelab/scihub-manuscript), [HTML manuscript](https://greenelab.github.io/scihub-manuscript/)) 
  [@doi:10.7287/peerj.preprints.3100]
+ Michael Zietz's Report for the Vagelos Scholars Program
  ([GitHub](https://github.com/zietzm/Vagelos2017), [HTML manuscript](https://zietzm.github.io/Vagelos2017/)) 
  [@doi:10.6084/m9.figshare.5346577]

The Manubot project began with the Deep Review, where it was used to compose a highly-collaborative review article [1]. Other manuscripts that were created with Manubot include:

1. Opportunities And Obstacles For Deep Learning In Biology And Medicine
Travers Ching, Daniel S. Himmelstein, Brett K. Beaulieu-Jones, Alexandr A. Kalinin, Brian T. Do, Gregory P. Way, Enrico Ferrero, Paul-Michael Agapow, Wei Xie, Gail L. Rosen, … Casey S. Greene
Cold Spring Harbor Laboratory (2017-05-28) https://doi.org/10.1101/142760

2. Sci-Hub provides access to nearly all scholarly literature
Daniel S Himmelstein, Ariel R Romero, Stephen R McLaughlin, Bastian Greshake Tzovaras, Casey S Greene
PeerJ Preprints (2017-07-20) https://doi.org/10.7287/peerj.preprints.3100

3. Vagelos Report Summer 2017
Michael Zietz
Figshare (2017) https://doi.org/10.6084/m9.figshare.5346577

Write markdown

Automatically converted to rich text

Automatic bibliographic metadata

[@doi:10.7287/peerj.preprints.3100]
[@arxiv:1407.3561v1]
[@pmid:24159271]
[@url:http://blog.dhimmel.com/biorxiv-licenses/]

2. Continuous integration rebuilds the manuscript

Timestamped on the Bitcoin blockchain via OpenTimestamps

3. Continuous deployment back to GitHub

Pull requests for manuscript collaboration

the future: living but versioned

“Finally, we estimate that over a six-month period in 2015–2016, Sci-Hub provided access for 99.3% of valid incoming requests.”

— DOI: 10.7287/peerj.preprints.3100v1

“In the first version of this study, we mistakenly treated the log events as requests rather than downloads. Fortunately, Sci-Hub reviewed the preprint in a series of tweets, and pointed out the error…”

— DOI: 10.7287/peerj.preprints.3100v2

The Deep Review

  • review article on deep learning in precision medicine
  • 27 authors from 20 different institutions
  • readers appreciate the breadth of perspectives

Questions?

@dhimmel

0000-0002-3012-7446

Packing List
https://lighterpack.com/r/8pklim
Slides
https://slides.com/dhimmel/swizterland

Tullio Basaglia

CERN