PIC: Port d'Informació Científica

Gonzalo Merino


The Port d’Informació Científica (PIC) is a scientific-technological center maintained through a collaboration agreement between IFAE and CIEMAT, with the support of UAB. PIC uses distributed high throughput computing technologies including clusters, Grid, Cloud and Big Data, to provide services to manage large amounts of data to scientific collaborations whose researchers are spread all over the world.

Introduction

There have been a number of important events in 2019 at PIC, but one that deserves highlight is certainly the deployment of a new IBM TS4500 tape library. The long-term preservation of large volumes of experimental data is a flagship service at PIC, and the new tape library will play an important role for it in the years to come. With steadily increasing information density and throughput, tape technology is a key component of the data management for the LHC in addressing the need of cost-effective deep archives.
Image
Figure 1: Automated tape libraries at PIC for long-term preservation of valuable experimental data.

Data Center Infrastructure

PIC’s data center is located in UAB’s Building D and has two different server rooms with different characteristics and energy efficiency profiles:

  • 150m2 air-cooled room which hosts the storage and computing IT equipment
  • 25m2 liquid-cooled room which hosts only computing resources PIC’s air-cooled has 34 racks and 1400 rack units for IT equipment, a Storagetek SL8500 and a new IBM TS4500 tape libraries. PIC’s liquid-cooled room has four oil immersion tanks. It is the only liquid immersion scientific data center in Spain.

Service Catalogue

PIC supports a variety of scientific projects deploying computing and storage resources for intensive processing and analysis of extremely large amounts of data.

  • Data Processing Services
    • 8300 CPU cluster managed by HTCondor. These resources are integrated in the world’s largest distributed scientific computing infrastructure - the WorldWide LHC Computing Grid (WLCG) - and are also available to local users.
    • GPU compute nodes available both through jupyter notebooks and batch jobs.
    • Big Data platform to run Hadoop/HIVE and Spark workloads.
  • Data handling and storage
    • Open Source software tools to manage 39PB of scientific data: 10PB on high performance disk and 29PB on an automated tape library for long term archive.
  • Data Transfer, data access and network connectivity
    • With a current external connectivity of 20Gbps and annual I/O rates close to 35PB in and out, PIC is the largest data mover in the Spanish academic network. Each year the network throughput requirements increase about 30%.
    • During 2019 RedIRIS started to operate a new dedicated 10Gbps network link to perform the data transfer for the MAGIC and LST(CTA) telescopes from the Observatorio Roque de los Muchachos in La Palma to PIC.
  • Front-End services and user support
    • Customized access to data processing and analysis services.

Connecting ESFRI and EOSC - ESCAPE

PIC participates in the H2020 project European Science Cluster of Astronomy and Particle Physics ESFRI research infrastructures (ESCAPE). The main contributions are focused in the development of the Data Infrastructure for Open Science and in the implementation of the ESCAPE Science Analysis Platform. PIC operates a storage endpoint within the ESCAPE Data Lake prototype and will contribute to the testing of components of this architecture such as data orchestration or data caching services in the context of gamma ray telescopes and particle physics experiments.

PIC is also developing the GammaHub prototype, a gamma ray multi-instrument Big Data platform for interactive data selection, exploration and analysis of billions of gamma ray events.

Image
Figure 1:

The Worldwide LHC Computing Grid

The Worldwide LHC Computing Grid (WLCG) is a distributed computing infrastructure comprising resources from more than 170 centres in 42 countries. WLCG is used to analyse the unprecedented rate of hundreds of Petabytes (PB) of data generated each year by the LHC.

LHC TIER-1 Data Center

The Tier-1 center at PIC provides services to three of the LHC experiments, accounting for ~5% of the total Tier-1 capacity for ATLAS, CMS, and LHCb. Besides providing 24x7 data processing services enabling the LHC science, the project is engaged in a significant R&D project that aims to federate the Tier-1 and the Tier-2 resources from IFAE and CIEMAT, aligning with the WLCG strategies towards HL-LHC. At the end of 2019, the PIC resources delivered to the Tier-1 were 56M hours for compute, 7.6 PB of disk storage space, and 21.3 PB of tape storage space.

In 2019, the first steps to federate disk and computing resources at national level were taken, with dynamic redirection of work execution and data reading between PIC and CIEMAT. Data access studies have started in order to understand how to improve data handling efficiency, in preparation for the 10 fold increase in data rates expected for the upcoming HL-LHC (2026).

HPC systems could provide a significant contribution of the resources needed to execute extensive simulations required for analysis in the future. ATLAS researchers from PIC demonstrated the successful integration of the MareNostrum4 supercomputer at BSC in the experiment infrastructure and ran 4.8 million hours of simulations. PIC researchers are actively engaged in an R&D project to integrate CMS infrastructure with MareNostrum4. Additionally, commercial clouds integration was also demonstrated, by extending PIC’s cluster with AWS EC2 nodes on demand.

ATLAS Tier-2 and the Tier-3 Interactive Analysis Facility

As a complement to the Tier-1 installation, PIC also provides resources to the ATLAS Spanish Tier-2 infrastructure, specialized in the data analysis and simulation production. PIC’s contribution to the Spanish Tier-2 of ATLAS in 2019 has been of 10 million processor hours and 996 TB of disk storage, delivered with measured reliability above 99%.

IFAE ATLAS physicists have access to a dedicated Tier-3 analysis facility hosted at PIC. PIC is not only the largest LHC computing center in Spain, but also it is the only one providing the full range of LHC data services: from the archive of detector RAW data to the final user analysis for publication.

Astrophysics and Cosmology

PIC supports research in astrophysics and cosmology through several activities. It provides the main Data Center for the MAGIC telescopes and the off-site data repository for the LST1, the first telescope of the future array CTA. It also hosts the Spanish Science Data Center for the Euclid Ground Segment and, therein, the development of image simulations for the Euclid space mission, the integral treatment of data from the Physics of the Accelerating Universe (PAU), data quality analysis of the Dark Energy Survey (DES) and support for the exploitation and distribution of the simulations of the Marenostrum Institut de Ciències de l’Espai (MICE) project. In 2019, after IFAE joined the VIRGO collaboration, PIC also started providing computing services to the VIRGO/LIGO gravitational waves experiments.

MAGIC

In 2019 PIC was actively involved in the large reprocessing campaigns to apply new cleaning techniques to the data and in creating the MAGIC legacy repository. PIC also developed and started implementing the Data Management and Preservation Plan (DMPP) and the conversion of MAGIC data into Data Level 3 (DL3) format compatible with the CTA analysis.

CTA/LST1

PIC provides distributed computing services for CTA simulation production. This task is developed in collaboration with the French IN2P3 Institute.

PIC provides data management and computing support for the first CTA telescope, LST1. In 2019, a data transfer system based on the File Transfer Service (FTS) was deployed to automate data transfers from the ORM to PIC. This was one out of many pieces that were crucial for obtaining the important results of the first LST detection of the Crab Nebula](https://www.cta-observatory.org/lst1-detects-first-gamma-ray-signal/).

PAU

PIC is the main data center for the PAU survey, including data access services for external users. During 2019, analysis pipelines developed at PIC in collaboration with ICE have been run several times for optimization. As part of the optimization process, tests were carried out for the integration of the PAU pipelines with the Apache Hadoop platform available at PIC. Additional disk storage was also deployed to always have a copy of the RAW data on disk.

Following the work started last year, IFAE researchers at PIC have been using the GPU infrastructure to run Machine Learning analysis algorithms for star-galaxy separation and background estimation, obtaining very promising results.

Euclid

During 2019, PIC continued carrying on the activities derived from PIC’s role as the Spanish Science Data Center (SDC-ES) and member of the Organizational Unit for Simulations (OU-SIM), responsible for developing the Simulation Processing Function (SIM-PF).

At SDC level, major changes were implemented on the infrastructure: the batch system was migrated from PBS to HTCondor, the virtualization layer was moved from Docker to Singularity and the jobs temporary disk area was upgraded.

One of the main activities in 2019 was the preparation and execution of Scientific Challenge 4/5/6 which meant to further test the scalability and integration of the Science Ground Segment infrastructure, increasing significantly the data volume and the number of Processing Functions involved.

CosmoHub

CosmoHub is a web portal for analysis and distribution of massive cosmology and astrophysics data sets developed at PIC. Hosting over 6.000 catalogs generated by more than 500 users from all over the world, it is a very popular tool among researchers. Because of its success, there are plans for a new version with increased functionality. The new design will include several analysis pipelines from the separately developed SciPIC framework and offer its usage to the CosmoHub user community.

VIRGO/LIGO

In 2019, PIC joined the LIGO/Virgo collaboration. For Virgo, PIC primarily supports the computing efforts of the IFAE group, which includes CPU and GPU computing as well as helping users in running their pipelines in individual or collaborative environments.

PIC also joined the LIGO Grid infrastructure and is now one of the computing sites where the collaboration runs the massive analysis of gravitational wave signals using CPU and GPU resources through HTCondor. In Fall 2019, PIC contributed with 4.5% to the total LIGO-Virgo CPU accounting.