• IphtashuFitz@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        10 years ago I worked at a university that had a couple people doing research on LHC data. I forget the specifics but there is a global tiered system for replication of data coming from the LHC so that researchers all around the world can access it.

        I probably don’t have it right, but as I recall, raw data is replicated from the LHC to two or three other locations (tier 1). The raw data contains a lot of uninteresting data (think a DVR/VCR recording a blank TV image) so those tier 1 locations analyze the data and removes all that unneeded data. This version of the data is then replicated to a dozen or so tier 2 locations. Lots of researchers have access to HPC clusters at those tier 2 locations in order to analyze that data. I believe tier 2 could even request chunks of data from tier 1 that wasn’t originally replicated in the event a researcher had a hunch there might actually be something interesting in the “blank” data that had originally been scrubbed.

        The university where I worked had its own HPC cluster that was considered tier 3. It could replicate chunks of data from tier 2 on demand in order to analyze it locally. The way it was mostly used was our researchers would use tier 2 to do some high level analysis, and when they found something interesting they would use the tier 3 cluster to do more detailed analysis. This way they could throw a significant amount of our universities HPC resources at targeted data rather than competing with hundreds of other researchers all trying to do the same thing on the tier 2 clusters.