Things that make and break Clojure code analysis

Opinions

  • We mostly don't write code, we read it
  • We mostly don't read code, we browse it and search it
    • usually not for holistic understanding, but incremental investigations in causality
    • clarity > aesthetics, although both tend to be correlated
      • often overrated: prettiness, brevity, stylistic consistency
      • often underrated: contrast, redundancy

Things that help with code analysis

  • Simplicity
  • Clear names
  • Tests
  • Comments
    • In particular Rich Comments: (comment ...)
  • Stacktraces and other traceability information
    • Who called me?
  • Obvious types
  • Explicit bounded context
    • in particular the explicit from our own code vs from our libs distinction
    • ns aliases help signal that
  • Reproducible and observable execution

Names

  • Must-reads:
  • Names:
    • narrow, consistent,
    • natural / synthetic
  • Fear not the weird characters (contrast!), e.g.:
    • =my-channel=
    • *my-atom, *my-future
    • my+tuple
    • my-key->my-value
    • <my-react-component>

:name.spaced/keywords

  • The unsung heroes of Clojure codebases
  • Context-free, non-colliding names
  • Easy to search and follow, trivial to refer to
  • :this_namespacing_convention also works

Don't: transform keywords

  • By trimming the namespace
  • By changing the syntax
    • (I'm looking at you, camel->kebab!)
  • By string-building them
    • Keywords are NOT composites
    • Don't complect identification and data structure!
  • Dynamically transforming of keywords makes you lose the track.
  • Keywords should be trivial to follow and rename - don't ruin that.
    • the difference between trivial and not impossible is a big deal.

Namespaced keywords: don't

 

  • don't use the same keyword for different types;
    • clojure.spec will punish you!
  • don't use the same keyword for different semantics;
  • don't bike-shed all day over the structure of the keyword (e.g. :my.namespaced/keyword-convention vs :my.name.spaced.keyword/convention)
    • the existential crisis is not worth the time
    • what matters is that the keyword is clear, non-colliding and searchable.

Keywords: portability for ubiquity 

Things that make us lose track of execution

  • Asynchronous/Event-Driven flow control
    • incl. core.async channels
  • Colliding keywords
    • Worst offenders: :id, :name, :type, :n, :value, :key, ...
  • Dynamically transforming keywords
  • Silencing stacktraces
  • Network calls
  • Positional semantics (e.g. (nth my-csv-row 8))

It's just data

  • a.k.a Lost In Interpretation
  • a.k.a the call-by-fn/call-by-data Dilemma
[:defn :fibonacci [:n]
 [:cond 
  [:= :n 0] 0
  [:= :n 1] 1
  [:+
   [:fibonacci [:- :n 1]]
   [:fibonacci [:- :n 2]]]]]

Data-orientation is great, until...

  1. the semantics of the data become less trivial...
    • i.e. the language of the data becomes more expressive
  2. ...then the interpreters we write for the data become more sophisticated...
  3. ... and we find ourselves with a new programming language.
  4. But several affordances remain to be reinvented:
    1. Debugging and monitoring utilities
    2. Source mapping
    3. Editor support
    4. Linters and other static analyzers
    5. Documentation (usually), coding conventions, familiarity to newcomers
    6. Language maturity

Symptom

  • re-frame-10x is awesome...
  • ... by addressing a self-inflicted problem *ducks*

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

- Greenspun's 10th Rule

Any sufficiently data-oriented C or Fortran Clojure program contains an ad hoc, informally-specified, bug-ridden, slow interpretation of half of Clojure.

 

The special case of side-effects

  • "Side-effects at the edges" compels us to schedule side-effects by building data structures (commands)
  • The delayed interpretation of multi-effects commands can make it hard to trace what part of the system requested a given side-effect.
    • again, namespaced keywords help
    • carrying causality-encoding metadata seems like a promising and under-used approach
      • "source-mapping"
      • "higher-level stacktraces"
      • correlation IDs

The curious case of synchronous read-only data-fetching

  • (say, HTTP GET)
  • Is fetching (read-only) data across the network really a side-effect?
    • I mean, so is moving data from RAM to CPU registers, right?
    • Those read-only side-effects might be harmless enough that we can feel free to interleave them in execution.
      • (Datomic does that.)

Data-orientation: in summary

  • A valuable tool in the toolbox
  • With pitfalls and limitations
    • "it's data-oriented therefore it must be good" is naive
  • It's insightful to recognize that data-orientation consists of:
    • inventing data-encoded domain-specific languages
    • implementing ad hoc interpreters for them.
  • If those DSLs are highly expressive, or if their "code" is non-trivial to analyze, is it still data?
  • Preserving the traceability of execution is a challenge.
Made with Slides.com