The PIC Big Data Common Service has deployed a new and much improved platform for its storage and computing requirements in 2023. This new platform consists of a custom in-house developed Hadoop distribution, that runs on top of a brand-new hardware cluster. The Hadoop distribution, code named Shepherd, has been developed to avoid vendor lock-in and allow migrating its components to more recent versions, and uses Docker along with Gitlab CI/CD to simplify and automate both testing and deployment. The new cluster is composed of 20 nodes that collectively provide 480 processing cores and 2 PB of net storage. An expansion with additional 10 nodes is planned for 2024. Both CosmoHub and the production of Mock Galaxy Catalogs for Euclid (among other surveys) has also been migrated to this new platform in February 2024.
CosmoHub has also seen several improvements in 2023, most of them on the backend side, such as the addition of Parquet as a download format for custom catalogs. We have also developed and integrated two net sets of user defined functions (UDFs) implementing aggregation over array columns, such as spectra or probability density functions, and operations on spherical geometries that are part of the ongoing effort to implement the ADQL standard. PIC participated in the meeting of the Red de Infraestructuras de Astronomía to present CosmoHub as a potential system to provide data hosting services for this community.
We have successfully reimplemented the deployment of the Rucio Data Management software using Helm, a package manager for Kubernetes. With a single YAML configuration file, we can fully deploy Rucio, including the server, daemons, PostgreSQL, and database schema. During 2023, we deployed a Rucio instance to automate the file transfer for the MAGIC telescopes, and we are planning to expand its use to several other experiments.
At the end of 2023, the resources deployed by PIC for LHC computing were ~115 kHS06 (which corresponds to about 9000 CPU cores), ~11.5 PB for disk storage, and about 27 PB for tape storage space. One of the main characteristics of Tier-1 centres, beyond a very large storage and computing capacity, is being able to provide these resources through services that need to be extremely reliable, hence the critical services in a Tier-1 operate in 24x7 mode. PIC Tier-1 was at the top of the stability and reliability rankings in WLCG for 2023.
In addition to contributing computing resources to WLCG, the team have also actively been involved in the R&D activities of the LHC experiments, necessary for the evolution of the infrastructure to cope with an ever-increasing scale and complexity of the LHC scientific program, and in preparation for the HL-LHC phase, which resulted in significant contributions conferences and computing-specific publications. In particular, the group has been actively involved in integrating HPC resources, testing new services to deploy an Analysis Facilities and studying the benefits for the inclusion of data caches in WLCG.
The computing centres within the WLCG are anticipated to handle wide area network (WAN) throughputs of tens of terabits per second during the HL-LHC era (2029+), prompting significant upgrades to the WAN infrastructure at major centres like the Spanish LHC Tier-1 at PIC. Initial evaluations indicate that PIC would need network upgrades in 2026 and 2029, with target speeds of 300 Gbps and 600 Gbps, respectively. This underscores the critical need for network upgrades at PIC and national service providers to ensure that scientific research benefits from a seamless and enhanced connectivity experience. These needs have been notified to both CSUC and RedIRIS.
Virtual galaxy catalogs - Throughout 2023, the Scientific Pipeline at PIC (SciPIC), dedicated to efficiently generating massive virtual galaxy catalogs, has undergone several improvements. Various releases have been deployed for use by the Euclid Organization Unit Simulation Data (OU-SIM) to generate simulated images for Science Performance Verification 3 (SPV3). SPV3 holds particular significance for the Euclid mission as it aims to verify the expected performances of the Euclid project, providing valuable insights into critical aspects of the project.
In collaboration with the professional enterprise “cacaocinema,” we presented an outreach video during the internal Euclid annual meeting in Copenhagen. This video was later made public through the European Space Agency’s website and YouTube in August, garnering over 34K views:
The goal of this project is to design, install, commission and define the exploitation strategy of an infrastructure for correlative analysis of advanced materials for energy applications. The project is the Catalan branch of the Advanced Materials coordinated project within the Planes Complementarios, funded by the by the European Union – NextGenerationEU in the context of the “Recovery, Transformation and Resilience Plan” (PRTR) and the Regional Government of Catalonia. During 2023 PIC has continued working in collaboration with ALBA, ICN2 and ICMAB collecting the requirements and prototyping the offline analysis and data preservation computing facility for the future research infrastructure that will be located in the ALBA synchrotron facility.