Project

General

Profile

Use Case #86

Data product replica prepared for compute on request, interactive session started

Added by Rosie Bolton 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Documentation:
Tags:

Description

Aim: Data found in Data Lake, transferred to compute site and access given to user via ESAP

SKA have delivered to WP3 OSSR a containerised workflow that takes simulated SKA data (images) and undertakes source detection and machine-learning classification. A good testing use case related to this would be to integrate this into the ESAP and prove that the workflow can be run at alternative, on-demand, interactive, compute resources.

Ideally we'd want to test a user being given compute access at a site that does not already have the data (but that does have a rucio RSE configured), triggering a rucio-managed(?) data transfer to that site and allowing user to go via the ESAP and start their JupyterHub session.

Data Products are already stored in the ESCAPE data lake, workflows are already in the OSSR.

Probably requires further ESAP/rucio integration. Not likely in time for DAC21?


Checklist

  • Identify rucio data via ESAP, take the DID to a Jupyterhub server running the rucio-jupyterlab extension environment, download it, run checksum
  • Do above but with a custom docker image for the user’s environment (via binderhub)
  • Compute to data model: interactive service (JHub) is dynamically launched at data location
  • Rucio data identified via ESAP that is not close to any Jupyterhub service. ESAP creates rucio rule to move data as QoS transition and user is sent to jupyter server as before.

Add

Related issues

History

#1

Updated by Rosie Bolton 2 months ago

  • Start date deleted (07/13/2021)
  • Tracker changed from Integration to Use Case

Also available in: Atom PDF