Project

General

Profile

Use Case #137

ALMA

Added by John Swinbank 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Documentation:
Tags:

Description

(Use case provided by Felix Stoehr, transcribed by John Swinbank)

Synopsis

This use case is written from the perspective of an observatory — ALMA — that has infrastructure in place, but does not yet have a science platform.

User story

As an observatory which already has extensive search and download capabilities through web interfaces, as well as programmatically through the VO, plus processing infrastructure connected to a local storage system, I would like to:

  • Obtain a turnkey solution that can be deployed on our systems which provides a complete local 'Science Platform' similar to ESA DataLabs, SciServer, NOAO Datalabs, CANFAR etc. including disk and CPU quota management, batch queue processing etc.
  • Have the solution to be based on well-used and well-maintained open-source standard building blocks to not run into obsolescence. Ideally that same solution would be used by many institutes
  • Be able to easily connect that solution to the observatory's user- account system as well as to ESAP's account system.
  • Have the local Science Platform fully integrated with the observatory's existing tools (search/download, gui/VO)
  • Have the local Science Platform be fully integrated with ESAP, exposing all the services automatically and being fully hooked-up
  • Have single-sign-on enabled, so that a user who is signed into the observatory, automatically is signed into ESAP (and if possible even vice versa).
  • See that the users of the science platform (both directly or through ESAP):
    • can process close to the data ('code to data') so that no network transfer is needed other than for the final results.
    • can access the data they are authorized to see without any staging directly from the storage system. In particular the data-mining of the entire holdings should be possible without the need of data-transfer — even inside the same building.
    • are enabled to invite selected local or ESAP users to their workspace to interactively collaborate with them and share data and code.
    • can use remote visualization (e.g. using cartavis.org) on their results without having to transfer the products anywhere
    • can push their final results directly to zenodo, obtaining a DOI and persistence for publications in return
    • can push their data through ESAP to any other instance or tool in ESAP.
    • can push their data through Science Platform interoperability VO standards to Science Platforms outside of ESAP, e.g. in the US

History

#1

Updated by John Swinbank 8 months ago

  • Start date deleted (11/12/2021)
  • Tracker changed from Integration to Use Case
#2

Updated by John Swinbank 8 months ago

Thoughts on this from the ESAP/WP5 perspective —

Obtain a turnkey solution that can be deployed on our systems which provides a complete local 'Science Platform' similar to ESA DataLabs, SciServer, NOAO Datalabs, CANFAR etc. including disk and CPU quota management, batch queue processing etc

Indeed, this is a compelling vision, but it goes beyond the scope of what is plausible within the ESCAPE project. I think much of the infrastructure delivered in ESCAPE can build towards this goal, and delivering such a solution is a compelling basis for future work.

It's worth noting that ESAP as conceived within ESCAPE is effectively an aggregator of existing services from the user point of view. For example, ESAP is not currently expected to directly provide a batch processing service, but rather to provide the interconnecting fabric between existing batch services, bulk storage, archive queries, etc. Given the “user point of view”, current ESAP goals do not include things like quota management.

However, combining services from across ESCAPE — in particular, ESAP and DIOS — goes a long way (albeit not the whole way) towards the vision presented here.

Have the solution to be based on well-used and well-maintained open-source standard building blocks to not run into obsolescence. Ideally that same solution would be used by many institutes.

💯

Be able to easily connect that solution to the observatory's user- account system as well as to ESAP's account system.
Have single-sign-on enabled, so that a user who is signed into the observatory, automatically is signed into ESAP (and if possible even vice versa).

ESAP itself doesn't provide an account system. It currently integrates with the ESCAPE IAM system, but integration with other systems would (should...) be straightforward. Across existing services, SSO with IAM is the intention, and that should certainly apply if IAM is swapped out for something else.

Have the local Science Platform fully integrated with the observatory's existing tools (search/download, gui/VO)
Have the local Science Platform be fully integrated with ESAP, exposing all the services automatically and being fully hooked-up

ESAP is designed in a modular and extensible way so that (fairly) arbitrary external services can be integrated with it. WP5 intends (indeed, already is) providing “service connectors” for a number of external services. Obviously, given the basically limitless number of potential external services and the limited ESCAPE project resources, we can't promise to provide “out of the box” support for arbitrary existing tools, but we should provide documented interfaces so that local developers can integrate services as required.

can process close to the data ('code to data') so that no network transfer is needed other than for the final results

Within ESCAPE, this should be covered by (in progress) integration between ESAP and the data lake (in particular, DLaaS). (Of course, “no” network transfer is very broad, but certainly we would intend to minimize it where possible.)

can access the data they are authorized to see without any staging directly from the storage system. In particular the data-mining of the entire holdings should be possible without the need of data-transfer — even inside the same building

I'm not sure exactly how to parse this one. Is the suggestion that data mining should take place on compute resources which are physically connected to the archive storage to avoid data transfer? If so, that seems to depend on the physical layout and capabilities of the existing services at the observatory, rather than on ESAP itself.

are enabled to invite selected local or ESAP users to their workspace to interactively collaborate with them and share data and code.

Within ESAP itself, I think sharing of “shopping baskets” is a great idea; I've added a ticket to the WP5 issue tracker to capture this (#82). Sharing workspaces within any particular service that the user has access to through ESAP would depend on the detailed capabilities of that service.

can use remote visualization (e.g. using cartavis.org) on their results without having to transfer the products anywhere

ESAP should enable this use case when those services are provided for integration.

can push their final results directly to zenodo, obtaining a DOI and persistence for publications in return

DOI minting within ESAP is on the roadmap. I think it's more likely we'd want to mint DOIs that refer to persistent locations in the data lake rather than by trying to directly upload data, but this would certainly be an interesting point to discuss.

can push their data through ESAP to any other instance or tool in ESAP

By staging through the shopping basket: ✔️.

can push their data through Science Platform interoperability VO standards to Science Platforms outside of ESAP, e.g. in the US

External data sharing would be through data lake (DIOS) interfaces as currently envisioned, but these ideas haven't been fully fleshed out and I'd certainly be open to exploring additional options.

Also available in: Atom PDF