Gensim
Gensim
Unsupervised embeddings
Vector + Similarity Is All You Need
Gensim
Unsupervised embeddings
Why to use?
Online
Fast
Robust
Production-ready
Gensim
Unsupervised embeddings
When to use?
Text classification
Sentiment analysis
Search engines
NER
Topic modeling
...
Open-source
Maintainers
∞ - XX 2016
XX 2016 - May 2017
May 2017 - Feb 2019
Feb 2019 - ∞
Radim Řehůřek
Lev Konstantinovskiy
Ivan Menshikh
Michael Penkov
Open-source
Why I'm here?
- Used gensim on work
- Feature requests / bug fixes to gensim
- "Lev, I want to change a job"
- ....
- PROFIT, you are
firedhired!
Open-source
Core developers
Open-source
Core developers
Students
NMF
FastText (cython)
gensim-data
FastText (python)
TM Viz
Corpusfile
ATM
Doc
Open-source
Core developers
Students
Radom contributors
@544895340
@AMR
@AadityaJ
@Alexjmsherman
@AustenLamacraft
@CLearERR
@Cheukting
@DennisChen0307
@ELind77
@ElSaico
@Fil
@HodorTheCoder
@IrinaGoloshchapova
@Jayantj
@JonathanHourany
@Karamax
@KenjiOhtsuka
@KiddoZhu
@KokuKUSIAKU
@Kreiswolke
@LShostenko
@Laubeee
@MridulS
@MritunjayMohitesh
@PeteBleackley
@PeterHamilton
@RishabGoel
@RunHorst
@SamriddhiJain
@Shiki
@Stamenov
@Stigjb
@TheFlash10
@Utkarsh
@VorontsovIE
@Witiko
@Xinyi2016
@Zohaggie
@abhinavchawla
@accraze
@ajkl
@akarazeev
@akutuzov
@alantian
@alexgarel
@allenyllee
@andrewjlm
@aneesh
@anmol01gulat
@anmolgulati
@anujkhare
@aquatiko
@arlenk
@arttii
@bahbbc
and many others
Project structure
-
RaRe-Technologies/gensim
- Code & documentation
- CI (tests, docs, code-style)
-
RaRe-Technologies/gensim-data
- Pre-trained models
- Datasets
-
RaRe-Technologies/smart-open
- Universal reader/writer
- (De) Compression
- S3 / HTTP / HDFS
-
MacPython/gensim-wheels
- CI (tests + wheels PyPI)
-
conda-forge/gensim-feedstock
- CI (tests + Conda)
Community
-
Google Groups
- Any topic
- Support
-
Github
- Feature requests
- Bug reports
- Holywars
-
Twitter
- Announces
- Short discussions
-
Gitter
- Chat
- Awful, really
- Infinite context-switch
Maintainer?
Maintainer?
Expectation
Maintainer?
Expectation
Reality
Maintainer?
Goal: improve project
- Support
- Code-review & Merge
- Releases
- Roadmap
- Anything that nobody want to do
- Setup env, CI, checkers, etc
- Guides, documentation
- Coordinates an contributors
-
Sometimes(never) add a new features
What's most important?
- Support project in nice state (backward compatibility, no useless stuff)
- Documentation (always a probem)
- Communicate, no, you don't get it, COMMUNICATE
- Attract new contributors
- Love open-source
What's next with gensim?
- Project in "slow maintenance mode" until ¯\_(ツ)_/¯
- Bugfixes / documentation improvements
- Code cleanup
- No roadmap
- No GSoC 2019
- No student incubator
- No awesome features planned
When do you implement a model X ?
ULMFit, BERT, LASER, etc ...
Most likely never
How can I help?
Thanks!
Take a free sticker here
↓
gensim-oss-mlekb
By Ivan Menshikh
gensim-oss-mlekb
- 1,175