Group Study Seminar
JUL 1, 2024
UpStage AI Lab 3기
스터디 1차 - 3팀
3팀 : 디지털 보물찾기(Digital Treasure Quest)
박석
백경탁
한아름
이승현
이한국
E-mail : parksurk@gmail.com
Blog : http://parksurk.github.io
Linkedin: https://www.linkedin.com/in/parksurk
GitHub : https://github.com/parksurk
Be able to explain the major trends driving the rise of deep learning, and understand where and how it is applied today
The stuff promised in this video - still not really around.
Perceptron
by Frank Rosenblatt
1957
Adaline
by Bernard Widrow and Tedd Hoff
1960
“The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself an be conscious of its existence … Dr. Frank Rosenblatt, a research psychologist at the Cornell Aeronautical Laboratory, Buffalo, said Perceptrons might be fired to the planets as mechanical space explorers”
New York Times
July 08, 1958
linearly Separable?
Perceptrons
by Marvin Minsky (founder of MIT AI lab)
1969
Text
(1974, 1982 by Paul Werbos, 1986 by Hinton)
(by Hubel & Wiesel, 1959)
motivated by biological insights
(LeNet-5, Yann LeCun 1980)
"At some point in the late 1990s, one of these systems was reading 10 to 20% of all the checks in the US.”
"NavLab 1984 ~ 1994 : Alvinn”
Autoencoder
Boltzmann Machine
Belief Nets
Reinforcement Learning
Double pendulum
control problem
Recurrent Neural Network
Bengio wrote the 1993
1995 Paper
"Comparison of Learning Algorithm For Handwritten Digit Recognition"
"New Machine Learning approach worked better"
Canadian Institute for Advanced Research
which encourages basic research without direct application, was what motivated Hinton to move to Canada in 1987, and funded his work afterward.
http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/
by Hinton and Bengio
Neural networks with many layers really could be trained well, if the weights are initialized in a clever way rather than randomly. (By Hinton)
Deep machine learning methods (that is, methods with many processing steps, or equivalently with hierarchical feature representations of the data) are more efficient for difficult problems than shallow methods (which two-layer ANNs or support vector machines are examples of). (By Benzio)
신경망은 데이터를 잘 구분할 수 있는 선들을 긋고 이 공간들을 잘 왜곡해 합하는 것을 반복하는 구조라고 할 수 있습니다.
선 긋고, 구기고, 합하고, 선 긋고, 구기고, 합하고, 선 긋고, 구기고, 합하고 ...
어떠한 규칙으로 선을 긋고 공간을 왜곡할까요?
어떠한 규칙으로 선을 긋고 공간을 왜곡할까요?
아주 많은 데이터와 아주 오랜 시간의 최적화를 통해 데이터를 학습합니다.
여러 개의 뉴런(선형 맞춤 + 비선형 변환)이 합쳐지면 복잡한 형상의 함수도 추정할 수 있다.
최적화 알고리즘이 만약 진짜 최적값이 아닌 잘못된 최적값에 도달하면 어떻게 될까?
최적화 알고리즘이 만약 진짜 최적값이 아닌 잘못된 최적값에 도달하면 어떻게 될까?
신경망의 각 층을 먼저 비지도 학습방법(unsupervised learning)을 통해 잘 손질해주고, 그렇게 전처리한 데이터를 여러 층 쌓아올려 인공신경망 최적화를 수행하면 해결될수 있다.
딥 러닝 기법은 이후 압도적인 성능으로 각종 기계학습(Machine Learning) 대회의 우승을 휩쓸며 자신이 유아독존의 기법임을 과시했고, 현재는 다른 기계학습 방법을 통해 영상처리, 음성인식 등을 연구가 다시 딥 러닝으로...
refer to "Deep Learning Tutorial" by Yann LeCun and others
Wikipedia says:
“Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers, with complex structures or otherwise, composed of multiple non-linear transformations.”
Text
Neural Nets
Perception
RNN
CNN
RBM
DBN
D-AE
AlexNet
GoogLeNet
McCulloch&Pitt 1943
Rosenblatt 1958
Grossberg 1973
Fufushima 1979
Hinton 1999
Hinton 2006
Vincent 2008
Alex 2012
Szegedy 2015
The researchers say even they weren’t sure this new approach (152 layers!) was going to be successful – until it was.
“We even didn’t believe this single idea could be so significant,”
said Jian Sun, a principal research manager at Microsoft Research
A Style-Based Generator Architecture for Generative Adversarial Networks
Few shot Adversarial Leaning of Realistic Neural Talking Head Models http://arxiv.org/abs/1905.08233
(LSTM, RNN)
(End to End Speech to Speech Translation)
Deep Generative Models for Speech by Heiga Zen (Google Brain)
네오사피엔스 https://typecast.ai
네이버 Clova AI https://clova.ai/ko/events/celeb_voice/
Text
머니브레인 http://www.moneybrain.ai
딥마인드 https://deepmind.com
보스턴 다이나믹스 https://www.bostondynamics.com
- #AICAT : NLP 논문 #Transformer 사용하여 움악 생성
- Google #Magenta
- #Jukedeck : 악보를 Generation하는 수준에서 producing하는 수준까지 발전
Text
- 파인만 교수 “내가 만들지 못하는 것은 내가 완벽히 이해하지 못한 것이다.”
- Deep Dream : AI가 그린 29점이 1억 정도에 팔림
- Gemini : 로봇팔이 그린 수묵화가 $13,000 팔림
(TeamVoid)
- Gentle Monster 의 경제적 도움을 받아 “Malfuction” 로봇팔 연극...이것이 첫사업 시작이였음
- Samsung Galuxy 와 작업...10개국에 2명이 동시 전시 경험
- 전기의 발명으로 인해 인류에게 동시성을 경험하게 해줌
- 공기의 질이 않좋은 날/지역은 그림의 질이 더 좋음
(펄스나인)
The Era of Deep Learning !!!
- 4 of 7 papers most cited by Nature are on deep learning for 2019.
- 3 of 4 deep learning papers are on computer vision: ResNet, GoogleNet, and FCN.
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
PyMC
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
PyMC
NIPY
- Performance : Numba, Weave, Numexpr, Theano ...
- Visualization : Bokeh, Seaborn, Plotly, Chaco, mpld3, ggplot, MayaVi, vincent, toyplot, HoloViews ...
- Data Structures & Computation : Blaze, Dask, DistArray, XRay, Graphlab, SciDBppy, pySpark ...
- Packaging & Distribution : pip/wheels, conda, EPD, Canopy, Anaconda ...
1. Foundation
- Python 3
2. Visualization
- Matplotlib 1.4 , 2.0
- Seaborn = Matplotlib + Pandas + statistical visualization
- Bokeh = Powerful Interactive Visualization, HTML5, Javascript lib
3. Arrays & Data Structures
- Xray = NumPy + Pandas
- Dask = lightweight tool for general parallelized array storage and computation
4. Computation & Performance
- Numba = with a simple decorator, Python JIT compiles to LLVM and excutes at near C/Fortran speed
5. Distribution & Packaging
- Anaconda
So much happening ...
- The IPython/Jupyter split
- Widgets = awesome
- Docker-based backends
- Jupyter Hub
- new $6M grant 2015 July first week
- Python was created in the 1980s as a teaching language, and to bridge the gap between the shell and C.
- Guido Van Rossum "I thought we'd write small Python programs, maybe 10 lines, maybe 5, maybe 500 lines - that would be a big one"
- Python is not a scientific programming language
: Why did a "toy language" become the core of a scientific stack?
- Python is a glue
- Python glues together this hodge-podge of scientific tools.
- high-level syntax wraps low-level C/Fortran libraries, which is (mostly) where the computation happens.
- it is speed of development, not necessarily speed of execution. that has driven Python's popularity.
- Python was created in the 1980s as a teaching language, and to bridge the gap between the shell and C.
- Guido Van Rossum "I thought we'd write small Python programs, maybe 10 lines, maybe 5, maybe 500 lines - that would be a big one"
- Python is not a scientific programming language
: Why did a "toy language" become the core of a scientific stack?
- Python is a glue
- Python glues together this hodge-podge of scientific tools.
- high-level syntax wraps low-level C/Fortran libraries, which is (mostly) where the computation happens.
- it is speed of development, not necessarily speed of execution. that has driven Python's popularity.
- "Why don't you use C instead of Python? it's so much faster!"
: "Why don't you commute by airplane instead of by car? it's so much faster!"
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
1995 : "Numeric" was an early Python scientific array library. largely written by Jim Hugunin. Numeric -> NumPy
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
1998 : "Multipack" built on Numeric, was a set of wrappers of Fortran packages written by Travis Oliphant. Multipack -> SciPy
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2002 : "Numarray" was created by Perry Greenfield, Paul Dubois, and others to address fundamental deficiencies in Nemeric for larger datasets
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2006 : In a herculean effort to head-off this split in the community. Travis oliphant incorporated best parts of Numeric + Numarray into "Numpy"
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2000 : Eric Jones, Travis oliphant. Pearu Peterson, and others spun multipack into the "SciPy" package. aiming for a full Python MatLab replacement.
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2001 : Fernando Perez started the "IPython" projects, aiming for a mathematica-style environment for Scientific Python
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2002 : John Hunter wanted an open MatLab replacement, and started "matplotlib" as an effort at MatLab-style visualization
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2012 : The Ipython team released the "IPython Notebook" and the world has never been the same
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2009 : Wes McKinney began "Pandas", eventually drawing-in much larger Python user-base. especially in industry data science.
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2009 : With SciPy's sheer size making fast development difficult. community decided to promote "scikits" as an avenue for more specialized algorithms.
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2012 : Continuum releases "conda". a package manager for scientific computing.
1. No centralized leadership! What is "core" in the ecosystem evolves & up to the community
- Evolving computational core : Numba?
: Just as Cython matured to become a core piece. perhaps Numba might as well? How might a JIT-enabled SciPy, sklearns, pandas, etc. look?
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
PyMC
NIPY
2. To be most useful as an ecosystem, we must be willing for packages to adapt to the changing landscape.
- Evolving computational core : Pandas?
: Modern data is sparse, heterogeneous, and labeled, and NumPy arrays don't measure up : let's make Pandas a core dependency!
- Evolving computational core : pandas, Seaborn --> matplotlib
: With Pandas core dependency. what elements of Seaborn & Pandas could be moved into matplotlib?
- Evovling the core : SciPy
: SciPy's monolithic design was driven by packaging & distribution difficulties.
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
PyMC
NIPY
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
PyMC
NIPY
Seaborn
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
PyMC
NIPY
3. interoperability with core pieces of other languages has been key to the success of the SciPy stack(e.g. C/Fortran libraries, new Jupyter framework
- Universal Plotting Serialization?
: Much of modern interactive plotting (d3, HTML5, Bokeh, ggvis, mpld3, etc) involves generating & processing plot serializations
: matplotlib -> {JSON} -> javascript --> plotting at web
: Doing this natively in matplotlib would open up extensibility!
- Universal DataFrames?
: R, Python, Julia use C/Fotran Memory Block
: R, Python, Julia use RDataFrame , Pandas, Dataframe.jl
: in the future R, Python, Julia use ...so called...Uber DataFrame ?
4. The stack was built from both continuity(e.g. Numeric/Numarray->NumPy) and brand-new efforts(e.g. matplotlib, Pandas). Don't discount either approach!
- Considering the Future of Matplotlob (Usual compliaints about Matplotlib)
: Non-optimal stylistic defaults -> matplotlib 2.0
: Non-optimal API -> Seaborn, ggplot
: Difficulty exporting interactive plots -> Serialization to mpld3/Bokeh
: Difficulty with large datasets ->???
- Lesson from Numeric/Numarray, etc
:Stick with matplotlib & modify it(e.g serialization to VisPy? Numba-driven backend? new backend architecture? etc.)
- Lesson from Pandas & Matplotlib, etc: : Start something from scratch; features will draw users!(e.g. VisPy, Bokeh, Something new?)
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2006 : In a herculean effort to head-off this split in the community. Travis oliphant incorporated best parts of Numeric + Numarray into "Numpy"
-----------------------------------------------------------------------------------------
1995
2005
2015
2010
2000
2009 : Wes McKinney began "Pandas", eventually drawing-in much larger Python user-base. especially in industry data science.
2002 : John Hunter wanted an open MatLab replacement, and started "matplotlib" as an effort at MatLab-style visualization
Deep learning is taking off due to a large amount of data available through the digitization of the society, faster computation and innovation in the development of neural network algorithm.
1. Being able to train a big enough neural network
2. Huge amount of labeled data
Faster computation helps to iterate and improve new algorithm
Refer to
<학생들에게>
<직장인들에게>
Logic
Symbolic Expression
Reasoning
Big Vector