Use Case #86
Data product replica prepared for compute on request, interactive session started
Aim: Data found in Data Lake, transferred to compute site and access given to user via ESAP
SKA have delivered to WP3 OSSR a containerised workflow that takes simulated SKA data (images) and undertakes source detection and machine-learning classification. A good testing use case related to this would be to integrate this into the ESAP and prove that the workflow can be run at alternative, on-demand, interactive, compute resources.
Ideally we'd want to test a user being given compute access at a site that does not already have the data (but that does have a rucio RSE configured), triggering a rucio-managed(?) data transfer to that site and allowing user to go via the ESAP and start their JupyterHub session.
Data Products are already stored in the ESCAPE data lake, workflows are already in the OSSR.
Probably requires further ESAP/rucio integration. Not likely in time for DAC21?
- Identify rucio data via ESAP, take the DID to a Jupyterhub server running the rucio-jupyterlab extension environment, download it, run checksum
- Do above but with a custom docker image for the user’s environment (via binderhub)
- Compute to data model: interactive service (JHub) is dynamically launched at data location
- Rucio data identified via ESAP that is not close to any Jupyterhub service. ESAP creates rucio rule to move data as QoS transition and user is sent to jupyter server as before.