Data Analysis Working Group
presented by Matias & Wout, Software Group
23/06/2021
BLISS: a Swiss knife to open Data Analysis
Data flow
Acquisition chain
Channels
1
2
3
Redis
1
What is Redis ?
- { key: value, } store
- binary safe string
- max 512 MB for a string value
- key can have a data type: List, Set, Sorted Set, Hash (containing multiple string values)
- in memory
- can also persist on disk, but we do not do that for data
- since Redis 6: streams
1
producer
producer
producer
producer
consumer
consumer
consumer
XADD
XREAD
- data publishing using redis streams
- 1 stream per data channel
Redis streams
1
individual string values
we arbitrary limit streams to 2000 string values max
1 string value can correspond to multiple data events
BLISS client API: knows scans structure, read data
Note: Redis data is only transient !
Consumers need to continuously read from it, to get complete data
(Like Nexus Writer for example)
1
- method of Scan object
- can be called from BLISS shell, or from a script
- gets scan data from redis only at the moment
- Is it ok to read from HDF5 while scan is running ?
1
Beware of "scan.get_data()"
Only useful for small scans with current implementation (alignment...)
References
2
- Heavy data (== 2D data, up to now) is not stored in redis streams
- References are stored instead
- When receiving a reference, the BLISS client API resolves the reference
- get data from Tango server memory (if still available)
- fallback: reading from disk
References
2
Online Data Analysis
3
ODA: definition and ESRF use cases ?
3
Real-time analysis while acquisition is running ?
Opening scan file just at the end of scans ?
Running in BLISS shell process ?
Running on another computer/cluster ?
Stopping scan if data is not good ?
Live feedback and scan re-orientation in real time ?
Automatic analysis ? How to feedback results ?
Lima
Flint
Writer
Analysis
BLISS
network I/O,
serialization,
deserialization
file I/O
links
unused path ?
ODA data flow today
3
Leveraging BLISS API to perform Online Data Analysis
3
BLISS API for Online Data Analysis
3
-
low-level API
- used by Nexus Writer, PyMCA, BM29
-
high-level API: ScanWatcher class
- made for Flint use case
- easier to use for most use cases
Scan Watching
3
Example from Wout, running in a Jupyter Notebook; see also on-the-fly FFT from scan data
Example from BM29:
BLISS shell
Scan
BM29 watcher
already listening
Processing, DAHU server
scan command
"scan
start"
execute commands,
send relevant data
scan command returns
"scan
data"
"scan end"
Questions so far
3
- Data API is within BLISS itself : need to install the full BLISS
- should we provide a separate bliss_data package ?
- BLISS relies on gevent, imposing the asynchronous I/O model to API users
- should we provide a threaded (or asyncio) version of the data API ?
- BLISS API has nothing to do with file API
- different code for offline and online analysis (for the "data access" part), is it fine ?
- should we provide data API for multiple languages ?
- Julia, R, C++, Golang, Rust ...
More complicated ODA...
Feeding back BLISS shell with ODA results
BLISS shell
Scan
Processing with Scan Watcher
already listening
scan command
"scan
start"
scan command returns
"scan
data"
"scan end"
start experiment script
set processing result value on Beacon channel
wait result on Beacon channel
"result received"
Driving acquisition from ODA (1)
Problems with running multiple BLISS sessions in multiple processes
- no clean, automatic sharing of local state of BLISS objects within other processes
- inter-processing locking (need Distributed Lock)
- hardware communication
- devices accepting only 1 connection...
Driving acquisition from ODA (2)
Another way is to add "remote control" capabilities to a running BLISS process, like SPEC remote feature
from bliss.setup_globals import *
from bliss.common import standard
from xmlrpc.server import SimpleXMLRPCServer
import inspect
import gevent
xmlrpc_server = SimpleXMLRPCServer(("", 8000))
xmlrpc_server.register_introspection_functions()
# register all standard functions to make them available via xml-rpc server
for name, func in inspect.getmembers(standard, inspect.isfunction):
xmlrpc_server.register_function(func)
# start xml-rpc server in background
gevent.spawn(xmlrpc_server.serve_forever)
in BLISS setup script...
from remote Python process...
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from xmlrpc.client import ServerProxy
>>> xmlrpc_client = ServerProxy("http://localhost:8000", allow_none=True)
>>> xmlrpc_client.system.list_methods()
['Group', 'SoftAxis', '__move', '_lsmot', '_lsobj', 'a2scan', 'a3mesh', 'a3scan', 'a4scan', 'a5scan', 'amesh', 'anmesh', 'anscan', 'ascan', 'cleanup', 'clear_cache', 'ct', 'd2scan', 'd3mesh', 'd3scan', 'd4scan', 'd5scan', 'dmesh', 'dnscan', 'dscan', 'error_cleanup', 'info', 'interlock_state', 'iter_axes_position', 'iter_axes_position_all', 'iter_axes_state', 'iter_axes_state_all', 'iter_counters', 'lineup', 'lookupscan', 'loopscan', 'move', 'mv', 'mvd', 'mvdr', 'mvr', 'namedtuple', 'plot', 'pointscan', 'reset_equipment', 'rockit', 'safe_get', 'sct', 'sync', 'system.listMethods', 'system.methodHelp', 'system.methodSignature', 'timescan', 'wid']
>>> xmlrpc_client.mv("roby", 5)
>>>
Perspectives
A future objective is to turn BLISS into a server, which would host sessions and would offer a web shell
This will come with a REST interface for remote operation
What about a BLISS Data Analysis object ?
da.load_module("id31.analysis.whatever")
Proposal to have a common way to deal with analysis from BLISS scripts
BLISS process, computer C1
generic analysis server compatible with Data Analysis object, computer C1 (or C2)
BLISS Scan Watcher
da.execute_while_scanning(scan, "function_name")
da.execute_at_scan_end(scan, "function_name")
da.wait_result()
Beyond Redis (?)
In-memory store seems a good choice
How to scale up ? Distributed memory across several computers ?
Which level of performance do we need ?
What are bottlenecks ?
(serialization/deserialization and copying ?)
Lima
Flint
Writer
Analysis
BLISS
Object store
shared memory
(indexing)
1) redis for events streaming and indexing of acquisition data
2) Immutable objects store provides access to a shared memory space. Possible technology: vineyard. Need infrastructure (Kubernetes ?)
3) Lima informs BLISS about acquisition progress and object IDs in store
Introducing a dedicated data manager
Conclusion
BLISS API can provide data for online data analysis
There is room for improvement, though - final components still to be defined
It is unclear what our needs are : how to address use cases efficiently ?
Depending on use cases, a solution for the current needs and future needs might imply to opt for other technologies + infrastructure
BLISS and ODA
By Matias Guijarro
BLISS and ODA
BLISS and ODA
- 500