![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/1413436/esrf_logo_full.png)
Data Analysis Working Group
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/1413434/esrf_watermark.png)
presented by Matias & Wout, Software Group
23/06/2021
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/8698657/swissknife.png)
BLISS: a Swiss knife to open Data Analysis
Data flow
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/3628032/silx_small.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4206534/TANGO_controls_logo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942099/redis.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4940794/HDF_logo.svg.png)
Acquisition chain
Channels
1
2
3
Redis
1
What is Redis ?
- { key: value, } store
- binary safe string
- max 512 MB for a string value
- key can have a data type: List, Set, Sorted Set, Hash (containing multiple string values)
- in memory
- can also persist on disk, but we do not do that for data
- since Redis 6: streams
1
producer
producer
producer
producer
consumer
consumer
consumer
XADD
XREAD
- data publishing using redis streams
- 1 stream per data channel
Redis streams
1
individual string values
we arbitrary limit streams to 2000 string values max
1 string value can correspond to multiple data events
BLISS client API: knows scans structure, read data
Note: Redis data is only transient !
Consumers need to continuously read from it, to get complete data
(Like Nexus Writer for example)
1
- method of Scan object
- can be called from BLISS shell, or from a script
- gets scan data from redis only at the moment
- Is it ok to read from HDF5 while scan is running ?
1
Beware of "scan.get_data()"
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/2793638/warning.png)
Only useful for small scans with current implementation (alignment...)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/5534316/screaming_fear.png)
References
2
- Heavy data (== 2D data, up to now) is not stored in redis streams
- References are stored instead
- When receiving a reference, the BLISS client API resolves the reference
- get data from Tango server memory (if still available)
- fallback: reading from disk
References
2
Online Data Analysis
3
ODA: definition and ESRF use cases ?
3
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4566851/singe-qui-se-gratte-la-tete.jpg)
Real-time analysis while acquisition is running ?
Opening scan file just at the end of scans ?
Running in BLISS shell process ?
Running on another computer/cluster ?
Stopping scan if data is not good ?
Live feedback and scan re-orientation in real time ?
Automatic analysis ? How to feedback results ?
Lima
Flint
Writer
Analysis
BLISS
network I/O,
serialization,
deserialization
file I/O
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4201602/file.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4940794/HDF_logo.svg.png)
links
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942099/redis.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4201602/file.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4940794/HDF_logo.svg.png)
unused path ?
ODA data flow today
3
Leveraging BLISS API to perform Online Data Analysis
3
BLISS API for Online Data Analysis
3
-
low-level API
- used by Nexus Writer, PyMCA, BM29
-
high-level API: ScanWatcher class
- made for Flint use case
- easier to use for most use cases
Scan Watching
3
Example from Wout, running in a Jupyter Notebook; see also on-the-fly FFT from scan data
Example from BM29:
BLISS shell
Scan
BM29 watcher
already listening
Processing, DAHU server
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4206534/TANGO_controls_logo.png)
scan command
"scan
start"
execute commands,
send relevant data
scan command returns
"scan
data"
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
"scan end"
Questions so far
3
- Data API is within BLISS itself : need to install the full BLISS
- should we provide a separate bliss_data package ?
- BLISS relies on gevent, imposing the asynchronous I/O model to API users
- should we provide a threaded (or asyncio) version of the data API ?
- BLISS API has nothing to do with file API
- different code for offline and online analysis (for the "data access" part), is it fine ?
- should we provide data API for multiple languages ?
- Julia, R, C++, Golang, Rust ...
More complicated ODA...
Feeding back BLISS shell with ODA results
BLISS shell
Scan
Processing with Scan Watcher
already listening
scan command
"scan
start"
scan command returns
"scan
data"
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
"scan end"
start experiment script
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942096/redis.png)
set processing result value on Beacon channel
wait result on Beacon channel
"result received"
Driving acquisition from ODA (1)
Problems with running multiple BLISS sessions in multiple processes
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/2793638/warning.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/2793638/warning.png)
- no clean, automatic sharing of local state of BLISS objects within other processes
- inter-processing locking (need Distributed Lock)
- hardware communication
- devices accepting only 1 connection...
Driving acquisition from ODA (2)
Another way is to add "remote control" capabilities to a running BLISS process, like SPEC remote feature
from bliss.setup_globals import *
from bliss.common import standard
from xmlrpc.server import SimpleXMLRPCServer
import inspect
import gevent
xmlrpc_server = SimpleXMLRPCServer(("", 8000))
xmlrpc_server.register_introspection_functions()
# register all standard functions to make them available via xml-rpc server
for name, func in inspect.getmembers(standard, inspect.isfunction):
xmlrpc_server.register_function(func)
# start xml-rpc server in background
gevent.spawn(xmlrpc_server.serve_forever)
in BLISS setup script...
from remote Python process...
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from xmlrpc.client import ServerProxy
>>> xmlrpc_client = ServerProxy("http://localhost:8000", allow_none=True)
>>> xmlrpc_client.system.list_methods()
['Group', 'SoftAxis', '__move', '_lsmot', '_lsobj', 'a2scan', 'a3mesh', 'a3scan', 'a4scan', 'a5scan', 'amesh', 'anmesh', 'anscan', 'ascan', 'cleanup', 'clear_cache', 'ct', 'd2scan', 'd3mesh', 'd3scan', 'd4scan', 'd5scan', 'dmesh', 'dnscan', 'dscan', 'error_cleanup', 'info', 'interlock_state', 'iter_axes_position', 'iter_axes_position_all', 'iter_axes_state', 'iter_axes_state_all', 'iter_counters', 'lineup', 'lookupscan', 'loopscan', 'move', 'mv', 'mvd', 'mvdr', 'mvr', 'namedtuple', 'plot', 'pointscan', 'reset_equipment', 'rockit', 'safe_get', 'sct', 'sync', 'system.listMethods', 'system.methodHelp', 'system.methodSignature', 'timescan', 'wid']
>>> xmlrpc_client.mv("roby", 5)
>>>
Perspectives
A future objective is to turn BLISS into a server, which would host sessions and would offer a web shell
This will come with a REST interface for remote operation
What about a BLISS Data Analysis object ?
da.load_module("id31.analysis.whatever")
Proposal to have a common way to deal with analysis from BLISS scripts
BLISS process, computer C1
generic analysis server compatible with Data Analysis object, computer C1 (or C2)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4201602/file.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4205743/10-2-arrow-png.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/1417084/redis-white.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/6620209/python_logo_bw.png)
BLISS Scan Watcher
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/8701905/gui_cogs_icon_158494.png)
da.execute_while_scanning(scan, "function_name")
da.execute_at_scan_end(scan, "function_name")
da.wait_result()
Beyond Redis (?)
In-memory store seems a good choice
How to scale up ? Distributed memory across several computers ?
Which level of performance do we need ?
What are bottlenecks ?
(serialization/deserialization and copying ?)
Lima
Flint
Writer
Analysis
BLISS
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4201602/file.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4940794/HDF_logo.svg.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/4942099/redis.png)
Object store
shared memory
(indexing)
1) redis for events streaming and indexing of acquisition data
2) Immutable objects store provides access to a shared memory space. Possible technology: vineyard. Need infrastructure (Kubernetes ?)
3) Lima informs BLISS about acquisition progress and object IDs in store
Introducing a dedicated data manager
Conclusion
BLISS API can provide data for online data analysis
There is room for improvement, though - final components still to be defined
It is unclear what our needs are : how to address use cases efficiently ?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/6346485/smiley_PNG179.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/331850/images/8485601/collaboration.jpg)
Depending on use cases, a solution for the current needs and future needs might imply to opt for other technologies + infrastructure
BLISS and ODA
By Matias Guijarro
BLISS and ODA
BLISS and ODA
- 436