# Data Challenges

Click on the links below to obtain more information about each data challenge and assignments. The assignments were based

on preferences submitted by participants. Most challenges are meant to take ~4 days, but in some cases participants might finish earlier and move on to another challenge after working for 2-3 days on their initial assignment.

##### Your Title Goes Here

Your content goes here. Edit or remove this text inline or in the module Content settings. You can also style every aspect of this content in the module Design settings and even apply custom CSS to this text in the module Advanced settings.

##### Tommaso Dorigo: Machine Learning for Muon Energy Reconstruction in a High-Granularity Calorimeter

**Co-organizers**: Lukas Layer and Giles Strong

**Assigned participants, with CMU grad students, faculty, and postdocs who can help with Zoom logistics marked with star**: Manfred Paulini(*), Aaron Owens, Sofia Guglielmini, Tommaso Dorigo, Giles Strong, Mikael Kuusela(*), Lukas Layer, Dimitri Bourilkov, Tyler Smith

**Material to be provided for participants**: https://github.com/llayer/cmu_challenge

**Brief description of challenge**:

The muon is a subnuclear particle, a heavier replica of the electron. It is an important messenger in searches of new physics at present and future colliders. The muon is charged, and its energy can be measured from its curvature in a magnetic field, as E=0.3BR, with B in Tesla, R is the curvature radius of their trajectory in meters, and E is their energy in GeV (here we are assuming E>>m so energy and momentum coincide). When muons are ultra-relativistic (e.g. E>1000 GeV) their energy measurement from curvature is made imprecise by the too large radius (e.g. in B=2 Tesla, the curvature for E=1000 GeV is R=1666 meters).

We want to see how well we can measure the muon energy from the small, stochastic deposits of energy they leave behind as they radiate soft photons in traversing a dense material. Hence we simulate a 2-meter-thick calorimeter, segmented into 50 layers of 4cm, and transversely segmented in 3-mm-wide cells in x,y. Each cell records the energy of photons and ionization left behind by the muon. The idea is that by using the pattern of energy deposits, rather than just the sum of deposited energy, we get a better handle to regress to the true muon energy.

A set of O(100,000) events will be provided, with energy in the range 50-8000 GeV. For each event the files contain a set of readouts of N calorimeter cells, given as x_i, y_i, z_i, E_i and then the true energy of the muon, which we want to regress to. A separate set of events used for testing should have the same info but the true energy set to 0. Participants will submit answers in the guise of a file with the predicted energies. The evaluation metric is explained in the slides provided with the material.

More information can be obtained in the article at the following article .

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-dorigo

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/92378379265

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)

##### Francois Lanusse: Solving Deep Inverse Problems in Cosmology

**Co-organizers**:

**Assigned participants, with CMU grad students, faculty, and postdocs who can help with Zoom logistics marked with star**: Marc Huertas-Company, Sam Ward, Junik Sengupta , Supranta, Sarma Boruah, Ilsang Yoon, Karthik Reddy Solipuram, Zhaozhou An(*), Dimitrios Tanoglidis, Vahid Nikoofard, Harold Erbin, Nesar Ramachandra, Dinesh Shetty, Renbo Tu(*), Husni Almoubayyed(*), Ce Sui, Tanazza Khanam, Kangning Diao, Liting Xiao, Sangeon Park, Luca Masserano(*), Eyan Noronha’, Yurii Kvasiuk, Francois Lanusse

**Brief description of challenge**: The goal of this data challenge will be to explore differentiable forward models, generative modeling, and Variational Inference, to solve inverse problems arising on astronomical and cosmological data. A main“guided challenge”, will allow participants to learn about these methodologies on inpainting/denoising/deblending problems on galaxy images from the Subaru Telescope. While a more “open challenge” will task participants interested in going deeper with applying these methodologies in an open problem of reconstructing maps of Dark Matter from weak gravitational lensing data from the HSC Survey.

**Material to be provided for participants**:

– GitHub repo with code and notebooks: https://github.com/EiffL/Quarks2CosmosDataChallenge

– Datasets will be hosted on Bridges-2

– Live tutorial/introduction at the beginning of each day.

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-lanusse

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/94617600621

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)

##### Alex Malz: Assessing the Accuracy of ML-based Uncertainties in the Context of Galaxy Photometric Redshifts

**Co-organizers**:

**Assigned participants, with CMU grad students, faculty, and postdocs who can help with Zoom logistics marked with star**: Mike Stanley(*), Siddharth Chaini, Giovanni Ferrami, Richard Camuccio, Andresa Rodrigues de Campos(*), Nitin Mishra’, “Matthew O’Callaghan”, Biprateep Dey, Ann Lee(*), Alex Malz, Matthew Ho (*)

**Brief description of challenge**:This data challenge poses and invites participants to investigate three open-ended aspects of how to quantify the accuracy of estimated posterior probabilities derived by AI/ML methods:

1. How do we generate “true” posteriors to compare with estimates?

2. How do we estimate posteriors using AI/ML methods?

3. How do we assess the accuracy of estimated posteriors given true posteriors?

This data challenge addresses these questions in the context of photometric redshift estimation.

**Material to be provided for participants**: I’ll present a tutorial going through an annotated jupyter notebook introducing the context and questions this data challenge investigates and providing starter code showing at least one way to address each question. Participants are invited to build on those examples, based on some included suggestions or in self-guided experiments, to explore these questions in more depth. The notebook contains all the commands necessary to acquire starter data and run the demo code.

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-malz

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/95386322609

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)

##### Jennifer Ngadiuba: Finding New Physics with Anomaly Detection at the LHC

**Co-organizers**: Thea Aarrestad, Katya Govorkova, Maurizio Pierini, Ema Puljak, Kinga Wozniak

**Brief description of challenge**: The goal of the challenge is to develop an algorithm able to detect a general New Physics signature hidden in a cocktail of known Standard Model backgrounds. As the label of the signal is not known a priori, the participants will have to make use of unsupervised learning methods. Examples of such methods include autoencoders while not being the only option. The algorithm must take as input a particle-based representation of the event: four-vectors of the highest-momentum jets, electrons, and muons, together with the missing transverse energy. These inputs can be either plugged in directly in the algorithm or used to first build a higher-level representation of the event to generate new inputs if found more suitable. The participants are encouraged to do some research on the topic ahead of the event. Previous work in this direction include for instance https://arxiv.org/abs/1811.10276 .

**Material to be provided for participants**: TBD

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-ngadiuba

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/94741627183

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)

##### Brian Nord: Achieving Interpretable Error Bars with Deep Learning in Simple Scenarios

**Co-organizers**: Chad Schafer

**Brief description of challenge**: Definition of ‘endemic deep learning uncertainties’: uncertainties that are measured directly from deep neural networks — e.g., monte carlo dropout, formal Bayesian neural networks.

This data challenge gives an opportunity to ask the following questions:

- Are endemically deep learning-derived uncertainties a) consistent, b) indicative of
- How physically interpretable (e.g., as statistical or systematic) are uncertainties derived endemically from deep neural networks?
- How do endemically derived uncertainties a) compare to each other and b) compare to more traditional Bayesian inference methods.

We recommend the following projects for this data challenge:

- Replicate the results in the Deeply Uncertain paper.
- Modify the algorithms used in the paper:
- Can the existing algorithms be made to identify out-of-distribution data? (Deeply Uncertain, figure 2)?
- Can the existing algorithms be made to recover an aleatoric uncertainty that matches the statistical uncertainty derived from basic error propagation.

- Update the data set in the paper to a computer vision data set or another physical data set, like in Dr. Malz’s data challenge.
- Identify another deep learning endemic UQ algorithm and compare it to those in the paper.
- Perform simulation-based inference (aka, likelihood-free inference) on the pendulum data set. Compare these results to those in the paper. I recommend using the Macke Lab SBI package.
- Perform likelihood-based inference (i.e., with an analytic likelihood function) on the pendulum data set. Compare these results to those in the paper. I recommend using the PyMC3 package.

NB: The authors of Deeply Uncertain consider items 4, 5, and 6 in the list above to be interesting components of a follow-up paper. We welcome collaboration on this!

**Material to be provided for participants**:

- Participants will be running code from this repository: https://github.com/deepskies/DeeplyUncertain-Public .
- How to get code running on Bridges: DeeplyUncertain_InstructionsForBridges
- Directly related paper: https://arxiv.org/abs/2004.10710

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-nord

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/96241442796

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)

##### Harrison Prosper: Deep Learning for Symbolic Mathematics

**Co-organizers**:

**Brief description of challenge**: After a brief review of the relationship between loss functions and machine learning models, I review a few approaches, including Bayesian ones, for quantifying the uncertainty in the outputs of these models and I assess the degree to which these approaches succeed.

**Material to be provided for participants**: TBD

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-prosper

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/7332611522

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)

##### Ben Wandelt: Field-level inference of cosmological parameters with Information Maximizing Neural Networks (IMNNs) and Density Estimation Likelihood-Free Inference (DELFI)

**Co-organizers**: Lucas Makinen

**Brief description of challenge**: The challenge will involve estimating cosmological parameters from a non-Gaussian density field.

**Material to be provided for participants**: The challenge materials will take the form of a skeleton python notebook (shared on Google Colab) to be filled in by participants that will contain the documentation and instructions for the challenge as well as the code to set up the environment. This will be distributed to participants. The notebook will contain solution code in hidden text boxes that participants can open and copy-paste in case they get stuck. In addition, the challenge organizers will be available to answer technical or conceptual questions during the challenge.

**Slack channel for the challenge in “Quarks to Cosmos workspace”**: #challenge-wandelt

**Dedicated Zoom line to use for this challenge**: https://cmu.zoom.us/j/95482348893

(Password was shared with participants via email/Slack. All Zoom sessions will have a CMU-based host and some alt hosts who are working on the challenge.)