PIC: Port d'Informació Científica

Gonzalo Merino


The Port d’Informació Científica (PIC) is a scientific-technological center maintained through a collaboration agreement between IFAE and CIEMAT, with the support of UAB. PIC uses distributed high throughput computing technologies including clusters, Grid, Cloud and Big Data, to support the scientific mission of several data-intensive research projects spanning a wide range of disciplines such as particle physics, astrophysics, biology or environmental sciences.

Highlight of the year: PIC joins the Spanish Supercomputing Network as a Data Service node

In September 2020 PIC joined the Spanish Supercomputing Network (RES) as one of the nine nodes of the new Data Service. RES was created in March 2007 by the Spanish Ministry of Education and Science and it is recognized as a “Unique Scientific and Technical Infrastructure (ICTS)” since 2014, as it is a singular infrastructure in its field, with public ownership and open to competitive access. The first RES call for Data projects opened at the end of 2020 and further calls are expected annually.
Image
Image

Data Center Infrastructure

PIC’s data center is located at the UAB’s IT services building and contains more than 40 racks in 200m2. An important upgrade of the network was started in 2020 that will increase the external connectivity throughput by an order of magnitude, from 20 to 200 Gbps. The new infrastructure is expected to be operational by Spring 2021.

The core data processing services at PIC include over 40 PB of tiered disk/tape storage, 8500 cpu cores and 20 GPUs.

Service Catalogue

  • Data Processing Services
  • Data handling and storage
    • Open Source software tools to manage 43 PB of scientific data: 10PB on high performance disk and 33PB on automated tape libraries for long term archive.
  • Data Transfer, data access and network connectivity
    • With annual I/O rates close to 60PB in and out, PIC is the largest data mover in the Spanish academic network.
    • Dedicated network links for experiments generating large data volumes.
  • Front-End services and user support
    • Customized data processing and analysis services.
Image
Figure 1:

Digital Infrastructures research projects

ESCAPE - Connecting ESFRI and EOSC

PIC participates in the H2020 project European Science Cluster of Astronomy and Particle Physics ESFRI research infrastructures (ESCAPE). The main contributions are focused in the development of a Data Infrastructure for Open Science and a Science Analysis Platform that are common for several large European research infrastructures. PIC is part of the ESCAPE Data Lake prototype and contributes to the testing of parts of the architecture such as data orchestration or data caching services in the context of gamma ray telescopes and particle physics experiments. PIC is also extending the functionality of CosmoHub to include analysis capabilities for Gamma Ray astronomy.

ARCHIVER - Archival and preservation services for research

PIC participates in the H2020 ARCHIVER project, which started in January 2019 and has a program of work that extends until summer 2022.
The ARCHIVER project is a Pre-Commercial Procurement that tries to develop a powerful set of applications for archival and data preservation. It consists of a group of experts (known as the Buyers Group) and a group of contractors that work together to define the requirements and the needs to develop the final result. The project will competitively procure R&D services from firms in three stages covering design, prototyping and pilot. In 2020 PIC led the first of these three phases, the design phase, managing all the meetings and being one of the active members in the weekly meetings and daily work that had to be done in order to reach the end of phase successfully.

Digital Infrastructures research projects

Particle Physics

LHC TIER-1 Data Center

The Worldwide LHC Computing Grid (WLCG) is a distributed computing infrastructure comprising resources from more than 170 centres in 42 countries. WLCG is used to analyse the unprecedented rate of hundreds of Petabytes (PB) of data generated each year by the LHC.

The Tier-1 center at PIC provides services to three of the LHC experiments, accounting for ~5% of the total Tier-1 capacity for ATLAS, CMS, and LHCb. Besides providing 24x7 data processing services enabling the LHC science, the project is engaged in an R&D project that aims to federate the Tier-1 and the Tier-2 resources from IFAE and CIEMAT, aligning with the WLCG strategies towards HL-LHC. At the end of 2020, the PIC resources delivered to the Tier-1 were 54M cpu hours, 7.7 PB of disk storage space, and 22.6 PB of tape storage space.

Extensive R&D work has been carried out, in preparation for the 10 fold increase in data rates expected for the upcoming HL-LHC (2026). Tests aimed to federate disk and computing resources at national level continued, with dynamic redirection of work execution and data reading between PIC and CIEMAT. Novel data cache techniques that enable efficient remote data streaming directly into the processing application were introduced, temporarily caching relatively small amounts of input data close to the PIC processing resources, and reducing data access latency by using read-ahead techniques.

In 2020, an agreement was signed with BSC, and LHC computing turned into one of the selected strategic projects at the HPC facility. Extensive simulations required for analysis in the future could be allocated in a dedicated share of the BSC compute resources. Researchers from PIC continued with the integration and exploitation of these resources. ATLAS, which fully integrated these resources in their computing infrastructure, ran 10.3 million hours of simulation at MareNostrum4. CMS researchers continued R&D activities to integrate these resources onto CMS infrastructure, and consumed 0.5 million hours in 2020. ATLAS Tier-2 and the Tier-3 Interactive Analysis Facility

As a complement to the Tier-1 installation, PIC also provides resources to the ATLAS Spanish Tier-2 infrastructure, specialized in data analysis and simulation production. PIC’s contribution to the Spanish Tier-2 of ATLAS in 2020 has been 8.3 million cpu hours and 830 TB of disk storage, delivered with measured reliability above 99%.

IFAE ATLAS physicists have access to a dedicated Tier-3 analysis facility hosted at PIC and direct access to Tier1 and Tier2 data. PIC is not only the largest LHC computing center in Spain, but it is also the only one to offer the full range of LHC data services: from the detector’s RAW data file to the final user analysis for publication.

DUNE

The Deep Underground Neutrino Experiment (DUNE) is an international experiment for neutrino science being built in the USA. Its research program targets fundamental questions about the nature of matter and the evolution of the universe. Since 2019 PIC is part of the distributed infrastructure for DUNE data processing, which is currently focused in the analysis of the data from the first detector prototypes being tested at CERN.

Astrophysics and Cosmology

PIC supports research in astrophysics and cosmology through several activities. It provides the main Data Center for the MAGIC telescopes and the off-site data repository for the LST1, the first telescope of the future array CTA. It also hosts the Spanish Science Data Center for the Euclid Ground Segment and, therein, the development of image simulations for the Euclid space mission, the integral treatment of data from the Physics of the Accelerating Universe (PAU), data quality analysis of the Dark Energy Survey (DES) and support for the exploitation and distribution of the simulations of the Marenostrum Institut de Ciències de l’Espai (MICE) project. PIC also provides computing services to the VIRGO/LIGO gravitational waves experiments. MAGIC During 2020 a large data re-processing was run that uses the MaTaJu Image Cleaning technique to improve low energy events resolution. The first complete cycle of the MAGIC Data Management and Preservation Plan was concluded and new archive policies were applied to preserve the low level datasets. Finally, important steps were taken towards the full automation of the MAGIC data processing flows at PIC.

CTA/LST1

PIC provides distributed computing services for the CTA simulation production. This task is developed in collaboration with the French IN2P3 Institute.

PIC provides data management and computing support for the first CTA telescope, LST1. In 2020, 500TB of data were transferred from the Observatory in La Palma to the PIC archive. Data is currently being replicated from PIC to INFN-CNAF by means of a data management infrastructure which is flexible to include other data centers as the CTA computing model evolves.

PAU

In 2020, PAU finished its last observation period so far. The data was transferred to PIC, where it was processed and analysed. The data is accessible to the PAU collaboration members, including raw data, processed data and its derived catalogs. IFAE researchers at PIC have continued to use the GPU infrastructure to develop Machine Learning techniques to process PAU data more efficiently. Deep learning models have been developed that improve background light estimation and estimate galaxies photometric redshift.

Euclid

In 2020, PIC continued carrying on the activities derived from PIC’s role as the Spanish Science Data Center (SDC-ES) and member of the Organizational Unit for Simulations (OU-SIM), responsible for developing the Simulation Processing Function (SIM-PF). A major milestone for the SDC and OU-SIM has been the preparation for the Ground Segment Implementation Review (GSIR). This includes the preparation of simulated data for the Science Challenge 8, which will be finished in 2021. The simulation includes data for a 500 deg2 area for the three Euclid instruments.

CosmoHub

CosmoHub is a web portal for analysis and distribution of massive cosmology and astrophysics data sets developed at PIC. It currently hosts over 40 TiB of catalogued data and delivers hundreds of custom catalogues each month to a broad community of researchers. In 2020, it was migrated to a new custom designed cluster of commodity hardware that improved both the performance and the storage capacity of the platform. There are plans for a new version with increased functionality. A PhD position has been open to work in this development project.

VIRGO/LIGO

In 2020, PIC has continued its participation in the LIGO/Virgo collaboration, both by supporting IFAE researchers in running their data analyses and by contributing to the Grid infrastructure where LIGO runs massive analysis of gravitational wave signals using large amounts of CPU and GPU resources.

MAGNESIA

MAGNESIA is an ERC project that started using PIC resources for its computing needs in 2020. MAGNESIA aims to produce 3D simulations of the evolution of astronomical structures such as galaxies and stars. The computation is supported by a new GPU server containing 8 nvidia V100 graphic cards and the HTCondor cluster at PIC.

DESI

In 2020, researchers from the IFAE cosmology group started the analysis of simulated data for the DESI collaboration. The simulations are stored on the mass storage system at PIC. The research is focussed on the analysis of hydrogen distributions and the resulting hydrogen forest lines in observed spectra. The computational work is done on the HTCondor cluster at PIC.

UAB

During 2020, PIC has continued to provide support and computational resources for various UAB Research Groups. Examples are the collaboration with the URBAG ERC project on urban sustainability and Eric Galbraith’s research groups from Institut de Ciència i Tecnologia Ambientals (ICTA-UAB).