Data Analysis Working Group

presented by Matias & Wout, Software Group

23/06/2021

image/svg+xml

BLISS: a Swiss knife to open Data Analysis

Data flow

Acquisition chain

Channels

1

2

3

Redis

1

What is Redis ?

  • { key: value, } store
    • binary safe string
    • max 512 MB for a string value
    • key can have a data type: List, Set, Sorted Set, Hash (containing multiple string values)
  • in memory
    • can also persist on disk, but we do not do that for data
  • since Redis 6: streams

1

producer

producer

producer

producer

consumer

consumer

consumer

XADD

XREAD

  • data publishing using redis streams
    • 1 stream per data channel

Redis streams

1

individual string values

we arbitrary limit streams to 2000 string values max

1 string value can correspond to multiple data events

BLISS client API: knows scans structure, read data

Note: Redis data is only transient !

Consumers need to continuously read from it, to get complete data

 

(Like Nexus Writer for example)

1

  • method of Scan object
    • can be called from BLISS shell, or from a script

 

  • gets scan data from redis only at the moment
    • Is it ok to read from HDF5 while scan is running ?

1

Beware of "scan.get_data()"

Only useful for small scans with current implementation (alignment...)

References

2

  • Heavy data (== 2D data, up to now) is not stored in redis streams
    • References are stored instead
  • When receiving a reference, the BLISS client API resolves the reference
    • get data from Tango server memory (if still available)
    • fallback: reading from disk

References

2

Online Data Analysis

3

ODA: definition and ESRF use cases ?

3

Real-time analysis while acquisition is running ?

Opening scan file just at the end of scans ?

Running in BLISS shell process ?

Running on another computer/cluster ?

Stopping scan if data is not good ?

Live feedback and scan re-orientation in real time ?

Automatic analysis ? How to feedback results ?

Lima

Flint

Writer

Analysis

BLISS

network I/O,

serialization,

deserialization

file I/O

links

unused path ?

ODA data flow today

3

Leveraging BLISS API to perform Online Data Analysis

3

BLISS API for Online Data Analysis

3

  • low-level API
    • used by Nexus Writer, PyMCA, BM29

 

  • high-level API: ScanWatcher class
    • made for Flint use case
    • easier to use for most use cases

Scan Watching

3

Example from Wout, running in a Jupyter Notebook; see also on-the-fly FFT from scan data

Example from BM29:

BLISS shell

Scan

BM29 watcher

already listening

Processing, DAHU server

scan command

"scan

start"

execute commands,

send relevant data

scan command returns

"scan

data"

"scan end"

Questions so far

3

  • Data API is within BLISS itself : need to install the full BLISS
    • should we provide a separate bliss_data package ?
  • BLISS relies on gevent, imposing the asynchronous I/O model to API users
    • should we provide a threaded (or asyncio) version of the data API ?
  • BLISS API has nothing to do with file API
    • different code for offline and online analysis (for the "data access" part), is it fine ?
  • should we provide data API for multiple languages ?
    • Julia, R, C++, Golang, Rust ...

More complicated ODA...

Feeding back BLISS shell with ODA results

BLISS shell

Scan

Processing with Scan Watcher

already listening

scan command

"scan

start"

scan command returns

"scan

data"

"scan end"

start experiment script

set processing result value on Beacon channel

wait result on Beacon channel

"result received"

Driving acquisition from ODA (1)

 Problems with running multiple BLISS sessions in multiple processes

  • no clean, automatic sharing of local state of BLISS objects within other processes
  • inter-processing locking (need Distributed Lock)
  • hardware communication
    • devices accepting only 1 connection...

Driving acquisition from ODA (2)

Another way is to add "remote control" capabilities to a running BLISS process, like SPEC remote feature

from bliss.setup_globals import *
from bliss.common import standard

from xmlrpc.server import SimpleXMLRPCServer
import inspect
import gevent

xmlrpc_server = SimpleXMLRPCServer(("", 8000))
xmlrpc_server.register_introspection_functions()

# register all standard functions to make them available via xml-rpc server
for name, func in inspect.getmembers(standard, inspect.isfunction):
    xmlrpc_server.register_function(func)

# start xml-rpc server in background
gevent.spawn(xmlrpc_server.serve_forever)

in BLISS setup script...

from remote Python process...

Python 3.7.10 (default, Feb 26 2021, 18:47:35) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from xmlrpc.client import ServerProxy
>>> xmlrpc_client = ServerProxy("http://localhost:8000", allow_none=True)
>>> xmlrpc_client.system.list_methods()
['Group', 'SoftAxis', '__move', '_lsmot', '_lsobj', 'a2scan', 'a3mesh', 'a3scan', 'a4scan', 'a5scan', 'amesh', 'anmesh', 'anscan', 'ascan', 'cleanup', 'clear_cache', 'ct', 'd2scan', 'd3mesh', 'd3scan', 'd4scan', 'd5scan', 'dmesh', 'dnscan', 'dscan', 'error_cleanup', 'info', 'interlock_state', 'iter_axes_position', 'iter_axes_position_all', 'iter_axes_state', 'iter_axes_state_all', 'iter_counters', 'lineup', 'lookupscan', 'loopscan', 'move', 'mv', 'mvd', 'mvdr', 'mvr', 'namedtuple', 'plot', 'pointscan', 'reset_equipment', 'rockit', 'safe_get', 'sct', 'sync', 'system.listMethods', 'system.methodHelp', 'system.methodSignature', 'timescan', 'wid']
>>> xmlrpc_client.mv("roby", 5)
>>>

Perspectives

A future objective is to turn BLISS into a server, which would host sessions and would offer a web shell

 

This will come with a REST interface for remote operation

What about a BLISS Data Analysis object ?

image/svg+xml

da.load_module("id31.analysis.whatever")

Proposal to have a common way to deal with analysis from BLISS scripts

BLISS process, computer C1

generic analysis server compatible with Data Analysis object, computer C1 (or C2)

BLISS Scan Watcher

da.execute_while_scanning(scan, "function_name")

 

da.execute_at_scan_end(scan, "function_name")

 

da.wait_result()

Beyond Redis (?)

In-memory store seems a good choice

How to scale up ? Distributed memory across several computers ?

Which level of performance do we need ?

What are bottlenecks ?

(serialization/deserialization and copying ?)

Lima

Flint

Writer

Analysis

BLISS

Object store

shared memory

(indexing)

1) redis for events streaming and indexing of acquisition data

2) Immutable objects store provides access to a shared memory space. Possible technology: vineyard. Need infrastructure (Kubernetes ?)

3) Lima informs BLISS about acquisition progress and object IDs in store

Introducing a dedicated data manager

Conclusion

BLISS API can provide data for online data analysis

 

There is room for improvement, though - final components still to be defined

 

It is unclear what our needs are : how to address use cases efficiently ?

Depending on use cases, a solution for the current needs and future needs might imply to opt for other technologies + infrastructure