Daniel Himmelstein
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.
Daniel Himmelstein (@dhimmel)
HighWire’s Lunch & Learn
IET London: Savoy Place
June 14th, 2019 11:15 AM
slides released under CC BY 4.0
Abstract
Scholarly publishing is far from perfect, but nonetheless plays a crucial role in the dissemination of knowledge. How can we modernise publishing to increase its benefits while decreasing its inefficiency?
Daniel will discuss how publishing can be automated, reducing inevitable imperfection and delays caused by manual steps. In addition, we’ll discuss machine-readability, a precursor to effective voice search, as well as living literature such that users can interact with and improve upon it in real-time.
What is the ideal system for scholarly publication in the future? How can publishers use automation and standards to ensure machine-readability and increased interaction with scholarly literature?
About the Speaker
Daniel is a data scientist at the University of Pennsylvania. He performs large-scale data analysis to uncover trends in scholarly publishing. For example, Daniel has investigated the time from submission to publication at thousands of journals, what percent of the literature is in Sci-Hub, how preprints are licensed, and what bibliographic styles journals have applied over time. Currently, Daniel researches human disease and leads development of Manubot, a tool for open scholarly writing on GitHub.
Illustration by Matt Murphy.
© Nature News doi.org/f3mn4t
submission
↓
acceptance
acceptance
↓
publication
https://blog.dhimmel.com/history-of-delays/
https://blog.dhimmel.com/plos-and-publishing-delays
when authors submit a manuscript, can they immediately be shown a fully rendered proof?
formatting problems: let the authors fix them. suggest changes from automated checks.
Reproducible Document Stack: towards a scalable solution for reproducible articles
Giuliano Maciocci, Emmy Tsang, Nokome Bentley and Michael Aufreiter
eLife Labs (2019-05-22)
As a first step, eLife aims to publish reproducible articles as companions of already accepted papers. We will endeavour to accept submissions of reproducible manuscripts in the form of DAR files by the end of 2019.
As a first step, eLife aims to publish reproducible articles as companions of already accepted papers. We will endeavour to accept submissions of reproducible manuscripts in the form of DAR files by the end of 2019.
The Deep Review
most viewed bioRxiv preprint of 2017
a long lasting standardized reference to a citeable work
The only manual bibliographic step in the publication workflow, from authoring to production, is when an author chooses which work to cite.
This is a sentence with 5 citations [ @doi:10.1038/nbt.3780; @pmid:29424689; @pmcid:PMC5938574; @arxiv:1407.3561; @url:https://greenelab.github.io/meta-review/ ].
This is a sentence with 5 citations [1,2,3,4,5].
Erratum:
After publication of this article [1], it has been noticed that Figs. 1 and 3 (Figs. 1 and 2 respectively here) had been incorrectly reverted in the original article [1].
“Finally, we estimate that over a six-month period in 2015–2016, Sci-Hub provided access for 99.3% of valid incoming requests.”
— DOI: 10.7287/peerj.preprints.3100v1
“In the first version of this study, we mistakenly treated the log events as requests rather than downloads. Fortunately, Sci-Hub reviewed the preprint in a series of tweets, and pointed out the error…”
— DOI: 10.7287/peerj.preprints.3100v2
Timestamped on the Bitcoin blockchain via OpenTimestamps
(update slide post this issue)
Beyond the PDF First Day Notes
By De Jongens van de Tekeningen
Licensed under CC BY 3.0
Modified to invert colors
<meta name="DC.Format" content="text/html" />
<meta name="DC.Language" content="en" />
<meta name="DC.Title" content="Tracking the popularity and outcomes of all bioRxiv preprints" />
<meta name="DC.Identifier" content="10.1101/515643" />
<meta name="DC.Date" content="2019-04-02" />
<meta name="DC.Publisher" content="Cold Spring Harbor Laboratory" />
<meta name="DC.Rights" content="© 2019, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0/" />
<meta name="DC.AccessRights" content="restricted" />
<meta name="DC.Description" content="Researchers in the life sciences are posting work to preprint servers at an unprecedented and increasing rate, sharing papers online before (or instead of) publication in peer-reviewed journals. Though the increasing acceptance of preprints is driving policy changes for journals and funders, there is little information about their usage. Here, we collected and analyzed data on all 37,648 preprints uploaded to bioRxiv.org, the largest biology-focused preprint server, in its first five years. We find preprints are being downloaded more than ever before (1.1 million tallied in October 2018 alone) and that the rate of preprints being posted has increased to a recent high of 2,100 per month. We also find that two-thirds of preprints posted before 2017 were later published in peer-reviewed journals, and find a relationship between journal impact factor and preprint downloads. Lastly, we developed Rxivist.org, a web application providing multiple ways of interacting with preprint metadata." />
<meta name="DC.Contributor" content="Richard J. Abdill" />
<meta name="DC.Contributor" content="Ran Blekhman" />
<meta name="article:published_time" content="2019-04-02" />
<meta name="article:section" content="New Results" />
<meta name="citation_title" content="Tracking the popularity and outcomes of all bioRxiv preprints" />
<meta name="citation_abstract" lang="en" content="<h3>Abstract</h3>
<p>Researchers in the life sciences are posting work to preprint servers at an unprecedented and increasing rate, sharing papers online before (or instead of) publication in peer-reviewed journals. Though the increasing acceptance of preprints is driving policy changes for journals and funders, there is little information about their usage. Here, we collected and analyzed data on all 37,648 preprints uploaded to bioRxiv.org, the largest biology-focused preprint server, in its first five years. We find preprints are being downloaded more than ever before (1.1 million tallied in October 2018 alone) and that the rate of preprints being posted has increased to a recent high of 2,100 per month. We also find that two-thirds of preprints posted before 2017 were later published in peer-reviewed journals, and find a relationship between journal impact factor and preprint downloads. Lastly, we developed Rxivist.org, a web application providing multiple ways of interacting with preprint metadata.</p>" />
<meta name="citation_journal_title" content="bioRxiv" />
<meta name="citation_publisher" content="Cold Spring Harbor Laboratory" />
<meta name="citation_publication_date" content="2019/01/01" />
<meta name="citation_mjid" content="biorxiv;515643v2" />
<meta name="citation_id" content="515643v2" />
<meta name="citation_public_url" content="https://www.biorxiv.org/content/10.1101/515643v2" />
<meta name="citation_abstract_html_url" content="https://www.biorxiv.org/content/10.1101/515643v2.abstract" />
<meta name="citation_full_html_url" content="https://www.biorxiv.org/content/10.1101/515643v2.full" />
<meta name="citation_pdf_url" content="https://www.biorxiv.org/content/biorxiv/early/2019/04/02/515643.full.pdf" />
<meta name="citation_doi" content="10.1101/515643" />
<meta name="citation_num_pages" content="65" />
<meta name="citation_article_type" content="Article" />
<meta name="citation_section" content="New Results" />
<meta name="citation_firstpage" content="515643" />
<meta name="citation_author" content="Richard J. Abdill" />
<meta name="citation_author_institution" content="Department of Genetics, Cell Biology, and Development, University of Minnesota" />
<meta name="citation_author_orcid" content="http://orcid.org/0000-0001-9565-5832" />
<meta name="citation_author" content="Ran Blekhman" />
<meta name="citation_author_institution" content="Department of Genetics, Cell Biology, and Development, University of Minnesota" />
<meta name="citation_author_institution" content="Department of Ecology, Evolution, and Behavior, University of Minnesota" />
<meta name="citation_author_email" content="blekhman@umn.edu" />
<meta name="citation_author_orcid" content="http://orcid.org/0000-0003-3218-613X" />
metadata in the HTML <head> of a bioRxiv preprint
wanted since Y2K
access status data from
Illustration by Matt Murphy.
© Nature News doi.org/f3mn4t
submission
↓
acceptance
acceptance
↓
publication
Time from submission to acceptance for 3,330,333 articles since 1965
https://blog.dhimmel.com/history-of-delays/
In addition, a member of Reviewer #2's lab reviewed the manuscript for the life sciences overlay biOverlay. The most important comments in that review are included in the points below. If there are additional comments from biOverlay that you wish to address in the revision please highlight these in your author response.
In addition, a member of Reviewer #2's lab reviewed the manuscript for the life sciences overlay biOverlay. The most important comments in that review are included in the points below. If there are additional comments from biOverlay that you wish to address in the revision please highlight these in your author response.
Source: PrePubMed, released under MIT License
@dhimmel
0000-0002-3012-7446
Slides
https://slides.com/dhimmel/highwire
Packing List
https://lighterpack.com/r/pzft6
input
output
manubot process
Submitted to journal:
GEO refers to the Gene Expression Omnibus [165,166].
Published as:
GEO refers to the Gene Expression Omnibus (Edgar et al., 2002; Barrett et al., 20122013)
By Daniel Himmelstein
Presentation by Daniel Himmelstein at HighWire's London Lunch & Learn session on 2019-06-14. This presentation is released under a CC BY 4.0 License. A recording is available at https://vimeo.com/344994584.
Head of Data Integration at Related Sciences. Digital craftsman of the biodata revolution.