Data Analysis Working Group

presented by Matias & Wout, Software Group

23/06/2021

BLISS: a Swiss knife to open Data Analysis

Data flow

Acquisition chain

Channels

Redis

What is Redis ?

{ key: value, } store
- binary safe string
- max 512 MB for a string value
- key can have a data type: List, Set, Sorted Set, Hash (containing multiple string values)
in memory
- can also persist on disk, but we do not do that for data
since Redis 6: streams

producer

consumer

XADD

XREAD

data publishing using redis streams
- 1 stream per data channel

Redis streams

individual string values

we arbitrary limit streams to 2000 string values max

1 string value can correspond to multiple data events

BLISS client API: knows scans structure, read data

Note: Redis data is only transient !

Consumers need to continuously read from it, to get complete data

(Like Nexus Writer for example)

method of Scan object
- can be called from BLISS shell, or from a script

gets scan data from redis only at the moment
- Is it ok to read from HDF5 while scan is running ?

Beware of "scan.get_data()"

Only useful for small scans with current implementation (alignment...)

References

Heavy data (== 2D data, up to now) is not stored in redis streams
- References are stored instead
When receiving a reference, the BLISS client API resolves the reference
- get data from Tango server memory (if still available)
- fallback: reading from disk

References

Online Data Analysis

ODA: definition and ESRF use cases ?

Real-time analysis while acquisition is running ?

Opening scan file just at the end of scans ?

Running in BLISS shell process ?

Running on another computer/cluster ?

Stopping scan if data is not good ?

Live feedback and scan re-orientation in real time ?

Automatic analysis ? How to feedback results ?

Lima

Flint

Writer

Analysis

BLISS

network I/O,

serialization,

deserialization

file I/O

links

unused path ?

ODA data flow today

Leveraging BLISS API to perform Online Data Analysis

BLISS API for Online Data Analysis

low-level API
- used by Nexus Writer, PyMCA, BM29

high-level API: ScanWatcher class
- made for Flint use case
- easier to use for most use cases

Scan Watching

Example from Wout, running in a Jupyter Notebook; see also on-the-fly FFT from scan data

Example from BM29:

BLISS shell

Scan

BM29 watcher

already listening

Processing, DAHU server

scan command

"scan

start"

execute commands,

send relevant data

scan command returns

"scan

data"

"scan end"

Questions so far

Data API is within BLISS itself : need to install the full BLISS
- should we provide a separate bliss_data package ?
BLISS relies on gevent, imposing the asynchronous I/O model to API users
- should we provide a threaded (or asyncio) version of the data API ?
BLISS API has nothing to do with file API
- different code for offline and online analysis (for the "data access" part), is it fine ?
should we provide data API for multiple languages ?
- Julia, R, C++, Golang, Rust ...

More complicated ODA...

Feeding back BLISS shell with ODA results

BLISS shell

Scan

Processing with Scan Watcher

already listening

scan command

"scan

start"

scan command returns

"scan

data"

"scan end"

start experiment script

set processing result value on Beacon channel

wait result on Beacon channel

"result received"

Link to example from Wout

Driving acquisition from ODA (1)

Last example from Wout, running a BLISS session with full control from Jupyter Notebook

Problems with running multiple BLISS sessions in multiple processes

no clean, automatic sharing of local state of BLISS objects within other processes
inter-processing locking (need Distributed Lock)
hardware communication
- devices accepting only 1 connection...

Driving acquisition from ODA (2)

Another way is to add "remote control" capabilities to a running BLISS process, like SPEC remote feature

from bliss.setup_globals import *
from bliss.common import standard

from xmlrpc.server import SimpleXMLRPCServer
import inspect
import gevent

xmlrpc_server = SimpleXMLRPCServer(("", 8000))
xmlrpc_server.register_introspection_functions()

# register all standard functions to make them available via xml-rpc server
for name, func in inspect.getmembers(standard, inspect.isfunction):
    xmlrpc_server.register_function(func)

# start xml-rpc server in background
gevent.spawn(xmlrpc_server.serve_forever)

in BLISS setup script...

from remote Python process...

Python 3.7.10 (default, Feb 26 2021, 18:47:35) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from xmlrpc.client import ServerProxy
>>> xmlrpc_client = ServerProxy("http://localhost:8000", allow_none=True)
>>> xmlrpc_client.system.list_methods()
['Group', 'SoftAxis', '__move', '_lsmot', '_lsobj', 'a2scan', 'a3mesh', 'a3scan', 'a4scan', 'a5scan', 'amesh', 'anmesh', 'anscan', 'ascan', 'cleanup', 'clear_cache', 'ct', 'd2scan', 'd3mesh', 'd3scan', 'd4scan', 'd5scan', 'dmesh', 'dnscan', 'dscan', 'error_cleanup', 'info', 'interlock_state', 'iter_axes_position', 'iter_axes_position_all', 'iter_axes_state', 'iter_axes_state_all', 'iter_counters', 'lineup', 'lookupscan', 'loopscan', 'move', 'mv', 'mvd', 'mvdr', 'mvr', 'namedtuple', 'plot', 'pointscan', 'reset_equipment', 'rockit', 'safe_get', 'sct', 'sync', 'system.listMethods', 'system.methodHelp', 'system.methodSignature', 'timescan', 'wid']
>>> xmlrpc_client.mv("roby", 5)
>>>

Perspectives

A future objective is to turn BLISS into a server, which would host sessions and would offer a web shell

This will come with a REST interface for remote operation

What about a BLISS Data Analysis object ?

da.load_module("id31.analysis.whatever")

Proposal to have a common way to deal with analysis from BLISS scripts

BLISS process, computer C1

generic analysis server compatible with Data Analysis object, computer C1 (or C2)

BLISS Scan Watcher

da.execute_while_scanning(scan, "function_name")

da.execute_at_scan_end(scan, "function_name")

da.wait_result()

Beyond Redis (?)

In-memory store seems a good choice

How to scale up ? Distributed memory across several computers ?

Which level of performance do we need ?

What are bottlenecks ?

(serialization/deserialization and copying ?)

Lima

Flint

Writer

Analysis

BLISS

Object store

shared memory

(indexing)

1) redis for events streaming and indexing of acquisition data

2) Immutable objects store provides access to a shared memory space. Possible technology: vineyard. Need infrastructure (Kubernetes ?)

3) Lima informs BLISS about acquisition progress and object IDs in store

Introducing a dedicated data manager

Conclusion

BLISS API can provide data for online data analysis

There is room for improvement, though - final components still to be defined

It is unclear what our needs are : how to address use cases efficiently ?

Depending on use cases, a solution for the current needs and future needs might imply to opt for other technologies + infrastructure

BLISS and ODA

By Matias Guijarro

BLISS and ODA

More from Matias Guijarro