Personal Knowledge Management Powered by Apache Solr
Ruda Zhang
PhD student, Civil Engineering
University of Southern California
2015/08/28
Evolution of personal knowledge management
Pre-digital era
PC era
Web era
AI era
Pre-digital era
notebooks,
library classification;
PC era
digital documents,
"file" system;
Web era
webpage,
"personal wiki"
gollum,
DokuWiki,
wikidPad
AI era
search bar,
NLP stacks
(Solr)
search engine,
clustering.
Knowledge Management with Apache Solr
Taking enterprise search software for personal use.
Functionality Need
powerful search engine (Lucene)
rich document indexing (Tika)
auto-completion, spell check
entry suggestion
dynamic clustering and labeling by document topics
integrating note snippets into specific "views", which facilitates long-form writing.
visualization: treemap (FoamTree), network
navigation: facets, breadcrumbs
cross-referencing / hyperlinking
Tailoring for personal knowledge pool
Data models (mostly schema-less natural language): .txt, .md, .docx, .pptx, .xlsx, .pdf, .html
minimal tailoring of legacy document markups and schemas.
users should be editing note contents, not HTML files.
Tailoring for personal knowledge pool
PKM use cases (against the global knowledge space)
logs: Snippets of notes "in sack" for in-depth documenting.
notes: Personally organized notes, beyond one-step reach of Google search.
literature: References for attribution. [Significant for critical writing.]
Apache Solr
the most popular
enterprise search platform
Netflix, digg, NASA, S
lack ...
Many libraries
are also using Solr
built on Apache Lucene, an information retrieval software library
Indexing
Indexing rich documents with Apache Tika.
Apache Tika has a wide spectrum of
supported formats
.
Clustering
Carrot²
is a document thematic clustering engine, written in Java and distributed under the BSD license.
Clustering algorithms: Lingo, STC, Lingo3G™.
Visualization: FoamTree, Circles
Search feeds: Lucene, Solr, other search APIs.
Front-ends
Carrot²
Web Application
Carro
t
²
Document Clustering
Workbench
Web UI
Velocity Search UI
[built-in]
Project Blacklight
, a Ruby on Rails Engine plugin.
Lucidworks Fusion
, a platform for building enterprise search applications.
Shredder: current status
figuring out file type support (.md, .jpg, etc.)
building UI
stay tuned on
my GitHub repo
Made with Slides.com