PIC: Port d'Informació Científica

Gonzalo Merino


The Port d’Informació Científica (PIC) is a scientific-technological center maintained through a collaboration agreement between IFAE and CIEMAT, with the support of UAB. PIC uses distributed high throughput computing technologies including clusters, Grid, Cloud and Big Data, to support the scientific mission of several data-intensive research projects spanning a wide range of disciplines such as particle physics, astrophysics, biology or environmental sciences.

Highlight of the year: PIC upgrades its network connectivity

In June 2021 PIC upgraded its network connectivity by a factor of ten, up to 200 Gbps. This was a major infrastructure milestone towards the preparation of increased data volumes in the upcoming LHC run, and in virtually every experiment PIC collaborates with. The new improved connectivity was enabled by the deployment of a new dark fiber ring by the National and Regional academic network providers RedIRIS and Anella Científica. This new network infrastructure connects important Research Infrastructures in the Vallès area such as PIC, the ALBA Synchrotron and the UAB to the new national 100Gbps infrastructure RedIRIS-NOVA100.
Image
Figure 1:

Data Center Infrastructure

PIC’s data center is located at the UAB’s IT services building and contains more than 40 racks in 200m2. PIC’s WAN connectivity was increased to 200Gbps during 2021 also improving the reliability and capacity of the local area network.

Data processing services at PIC include 56 PB of tiered disk/tape storage, 8000 cpu cores and 20 GPUs. New equipment has been acquired during 2021 which will increase our disk capacity 5.7PB and CPU capacity to 10000 cores. .

Service Catalogue

  • Data Processing Services
  • Data handling and storage
    • Open Source software tools to manage 56 PB of scientific data: 10PB on high performance disk and 46PB on automated tape libraries for long term archive.
  • Data Transfer, data access and network connectivity
    • With annual I/O rates close to 60PB in and out, PIC is the largest data mover in the Spanish academic network.
    • Dedicated network links for experiments generating large data volumes. Front-End services and user support
    • Dedicated network links for experiments generating large data volumes.
Image
Figure 1:

Digital Infrastructures research projects

ESCAPE - Connecting ESFRI and EOSC

PIC is actively contributing to the H2020 project European Science Cluster of Astronomy and Particle Physics ESFRI research infrastructures (ESCAPE). During 2021 PIC was working in close collaboration with other CTA partners within ESCAPE proposing use cases and tests for the validation of the technologies and proposals for the development of the ESCAPE Data Lake prototype. PIC led the definition and validation of the long haul data transfers use case mimicking the real CTA escenario transferring data from the CTA-North observatory to PIC data center. Additional and instrumental use cases have been proposed for development purposes using the experience of managing the MAGIC data transfers to PIC and the analysis of simulated MAGIC-DL3 data using the Gammapy package. PIC is also collaborating with IFAE gamma-ray experts in extending the functionality of CosmoHub to include analysis capabilities for Gamma Ray astronomy. After the development and testing of the spatial data selection using the Cone Search technique in the CosmoHub, this was officially released as a new feature of the web interface.

ARCHIVER - Archival and preservation services for research

PIC is actively participating in the ARCHIVER Project since 2019 and up until June 2022. ARCHIVER is a Pre-Commercial Procurement that has the objective of creating tools for the archival and long term preservation of Petabyte scale scientific data sets in the cloud. It has a group of experts (also known as the Buyers Group, being PIC one of the four of them and leader of the first of three phases of the project) and a set of contractors that work together to define and fulfill the requirements needed to achieve the desired result. During 2021 the second phase went on and three of the five initial contractors went through the evaluation process. Real testing of the initial proposed solutions started moving data, using the developed tools (such as APIs, GUIs, CLIs and other proposed methods) and also defining the final modifications that had to be made to make it through to the third and last phase.

Digital Infrastructures research projects

Particle Physics

LHC TIER-1 Data Center

The Worldwide LHC Computing Grid (WLCG) is a distributed computing infrastructure comprising resources from more than 170 centers in 42 countries. WLCG is used to analyze the unprecedented rate of hundreds of Petabytes (PB) of data generated each year by the LHC.

The Tier-1 center at PIC provides services to three of the LHC experiments, accounting for ~5% of the total Tier-1 capacity for ATLAS, CMS, and LHCb. Besides providing 24x7 data processing services enabling the LHC science, the project is engaged in an R&D project that aims to federate the Tier-1 and the Tier-2 resources from IFAE and CIEMAT, aligning with the WLCG strategies towards HL-LHC. At the end of 2021, the PIC Tier-1 delivered 86 million cpu hours, 7.9 PB of disk storage space, and 27.2 PB of tape storage space, with measured reliability above 99%.

The infrastructure has grown and is ready for the restart of collisions at Run3. A dedicated network and tape challenge was carried out in October 2021, to verify the readiness of the sites to cope with the expected Run3 load. PIC successfully met the performance goals, showing that it is ready for Run3 data. Extensive R&D work continues, in preparation for the 10 fold increase in data rates expected for the upcoming HL-LHC (2029). Tests aimed to federate disk and computing resources at national level continued. Novel data cache techniques that enable efficient remote data streaming directly into the processing application are now used by CMS at PIC, caching user’s input data close to the processing resources, reducing data access latency by using read-ahead techniques.

PIC researchers continued the integration and exploitation of BSC resources for LHC computation. ATLAS, which has fully integrated these resources into its computing infrastructure, performed ~12 million hours of simulation on MareNostrum4. CMS researchers continued R&D activities to incorporate these resources into the CMS infrastructure and consumed ~18 million hours in 2021. CMS is now ready to run official simulation campaigns at the BSC. LHCb conducted R&D activities to integrate the BSC resources, a step that is expected to be prepared in early 2022.

ATLAS Tier-2 and the Tier-3 Interactive Analysis Facility

As a complement to the Tier-1 installation, PIC also provides resources to the ATLAS Spanish Tier-2 infrastructure, specializing in data analysis and simulation production. PIC’s contribution to the Spanish Tier-2 of ATLAS in 2021 has been 8.5 million CPU hours and 1.3 PB of disk storage, delivered with measured reliability above 99%. IFAE ATLAS physicists have access to a dedicated Tier-3 analysis facility hosted at PIC and direct access to Tier1 and Tier2 data. PIC is not only the largest LHC computing center in Spain, but it is also the only one to offer the full range of LHC data services: from the long term preservation of detector’s RAW data files to the final user analysis for publication.

DUNE

The Deep Underground Neutrino Experiment (DUNE) is an international experiment for neutrino science being built in the USA. Its research program targets fundamental questions about the nature of matter and the evolution of the universe. Since 2019 PIC is part of the distributed infrastructure for DUNE data processing. In 2021, PIC delivered over 650.000 cpu hours to the DUNE collaboration to run simulation and analysis jobs.

Astrophysics and Cosmology

PIC supports research in astrophysics and cosmology through several activities. It provides the main Data Center for the MAGIC telescopes and the off-site data repository for the LST1, the first telescope of the future array CTA. It also hosts the Spanish Science Data Center for the Euclid Ground Segment and, therein, the development of image simulations for the Euclid space mission, the integral treatment of data from the Physics of the Accelerating Universe (PAU), data quality analysis of the Dark Energy Survey (DES) and support for the exploitation and distribution of the simulations of the Marenostrum Institut de Ciències de l’Espai (MICE) project. PIC also provides computing services to the VIRGO/LIGO gravitational waves experiments.

MAGIC

Two new re-processing campaigns using the MaTaJu Image Cleaning technique took place during 2021, one with more than six months of data to be staged and reprocessed, and another one during September to correct August data observations. MAGIC operations were affected by the volcano eruption in La Palma, which caused observations and data transfers to be stopped for four months.

A new metadata schema was proposed in the context of the Data Management and Preservation Plan to the MAGIC Software Board with the aim to define a preservation protocol of the reduced datasets based on the Dublin Core standard. The proposal was officially approved and implemented for long preservation of the reduced datasets on tape storage. These activities were also part of the tests for the Archiver project.

CTA/LST1

In 2021, PIC was recognized by CTAO as one of the four off-site data centers for the future Observatory. Important efforts to improve and update the distributed computing services for the CTA-Grid simulation production started by increasing the resiliency of the core database of this system, hosted at PIC. Around 1 PB of CTA-LST1 data was transferred in 2021 from the Observatory in La Palma to the PIC archive. The LST1 observations were truncated due to the eruption of the Cumbre Vieja volcano. Due to this the observations stopped for four months and no data was transferred to PIC. Plans are underway to extend the off-site data replication besides the current copy to INFN-CNAF to other sites, by adding HPCI-UTokyo in the replication schema.

PAU

In 2021, PAU focussed on the reprocessing of RAW data and reaching the scientific objectives, in particular the determination of the photometric redshift by employing data analysis code developed in collaboration between PIC and the member groups of PAU. The PAU data archive underwent a process of restructuring in order to facilitate the access to RAW and reduced data to the scientific community.

Euclid

In 2021, PIC continued carrying on the activities derived from PIC’s role as the Spanish Science Data Center (SDC-ES) and member of the Organizational Unit for Simulations (OU-SIM), responsible for developing the Simulation Processing Function (SIM-PF). PIC participated in the Scientific Challenge 8, a major milestone to assess the readiness of the Euclid Ground Segment to produce scientifically viable data within given constraints on time and computing resources. This activity is related to the preparation of the Readiness Review scheduled by ESA for 2022.

CosmoHub

CosmoHub is a web portal for analysis and distribution of massive cosmology and astrophysics data sets developed at PIC. It currently hosts over 40 TiB of catalogued data and delivers hundreds of custom catalogues each month to a broad community of researchers. In 1, it was migrated to a new custom designed cluster of commodity hardware that improved both the performance and the storage capacity of the platform. There are plans for a new version with increased functionality. A PhD position has been open to work in this development project.

VIRGO/LIGO

In 2021, PIC has continued its participation in the LIGO/Virgo collaboration, both by supporting IFAE researchers in running their data analyses and by contributing to the Grid infrastructure where LIGO runs massive analysis of gravitational wave signals using large amounts of CPU and GPU resources. During 2021, PIC delivered 1.3 million cpu hours and 480.000 gpu hours to the LIGO/Virgo collaborations through the Grid.

DESI

In 2021, researchers from the IFAE cosmology group started the analysis of simulated data for the DESI collaboration. The simulations are stored on the mass storage system at PIC. The research is focussed on the analysis of hydrogen distributions and the resulting hydrogen forest lines in observed spectra. The computational work is done on the HTCondor cluster at PIC.

UAB

During 2021, PIC has continued to provide support and computational resources for various UAB Research Groups. Examples are the collaboration with the URBAG ERC project on urban sustainability and Eric Galbraith’s research groups from Institut de Ciència i Tecnologia Ambientals (ICTA-UAB).