Project

General

Profile

Use Case #114

Use case ATLAS: ATLAS001

Added by Arturo Sanchez Pineda about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Target version:
% Done:

0%


Description

Hi,

This issue describes ATLAS use case defined as ATLAS001

Name
ATLAS open data replication, augmentation, bookkeeping and validation

ID
ATLAS001

Goal/Aim
A series of exercises (data production, replication and documentation) before and during the DAC the next November 2021. They include the creation of datasets for real-kind final user analysis examples using current open access resources at http://opendata.atlas.cern/

Workflow

  • The creation of instances to simulate the usage by several users.

    • At CERN OpenStack, LAPP and personal computers
  • The usage of multiple sites (RSE’s) in the datalake by adding the data created artificially.

    • We will prepare several extra files, replicating those in several RSE's.
    • We will not replicate "100% x # RSE's". But we will plan two have two (2) replicas that, are later distributed in several RSE's.
    • All those files combined are mean to be ~10TB total size.
  • We will adjust such data volume (~ < 10 TB) by creating and using the artificially multiplied data.

    • Use the current ATLAS Open Data (AOD) datasets.
    • This augmentation is done using the ROOT hadd.
  • We will use that augmented data to run the analysis examples (See ATLAS002 in "Related issues") below.

  • Create bigger ROOT files (current largest files are ~2.5GB)

  • We will need to

    • Create and test a series of scripts that will automatically do the data augmentation, upload, and replication.
    • Create clear instructions for users/computers that can be part of the challenge.

Requirements

  • AOD public datasets

    • Original set of ~300 GB (in ~1000 files) already hosted in the Datalake.
    • The "multiplied data" is still to be created and integrated.
  • RUCIO instances/CLI clients in several clients (e.g. 3 or 4).

  • Monitoring tools.

People Involved
Arturo Sánchez Pineda

Work Packages
WP2


Success

  • Data is successfully stored.
  • Data is successfully transferred/replicated among several RSEs.
  • Basic metadata is stored.
  • Users can discover data using the ESCAPE-RUCIO instance.

Things to test

  • Faisability of the replication of the samples and transfer between RSEs.
  • Localisation of the datasets using RUCIO CLI and the Jupyter RUCIO extension
  • Reporting failure or bottlenecks.
  • Cleaning procedure of the data from the Datalake.

Impact
Creation, replication, user analysis usage and deletion of datasets.


Add

Subtasks


Add

Related issues

Related to ESFRI: HL-LHC - Use Case #115: Use case ATLAS: ATLAS002New

Delete relationActions
Related to ESFRI: HL-LHC - Product #123: Software: ATLAS Open Data analyses framework at 13 TeVNew

Delete relationActions

History

#1

Updated by Arturo Sanchez Pineda about 2 months ago

#2

Updated by Arturo Sanchez Pineda about 2 months ago

  • Description updated (diff)
#3

Updated by Jutta Schnabel about 1 month ago

  • Related to Integration #122: Onboarding: ATLAS Open Data C++ analysis software at 13 TeV added
#4

Updated by Jutta Schnabel about 1 month ago

  • Related to Product #123: Software: ATLAS Open Data analyses framework at 13 TeV added
#5

Updated by Jutta Schnabel about 1 month ago

  • Related to deleted (Integration #122: Onboarding: ATLAS Open Data C++ analysis software at 13 TeV)

Also available in: Atom PDF