Building the Space Biology “Model Zoo”

Transfer learning is a machine learning technique in which a model is pretrained on a large, broad dataset to encode underlying features and relationships, and then refined using a smaller dataset for a specific problem space. This technique is relevant to space biology research, where datasets typically have limited sample size and the problem space is restricted.

Your challenge is: (1) to design a comprehensive database of publicly available biomedical datasets that could be used to pretrain different models for a “model zoo,” and (2) to determine relevant publicly available space biology datasets that could then be used to refine the models to investigate specific space biology questions.


BACKGROUND





Transfer learning is a machine learning technique in which a model is pretrained on a large, broad dataset to encode underlying features and relationships, and then refined using a smaller dataset for a specific problem space. This technique is relevant to space biology research, where datasets typically have limited sample size and the problem space is restricted.

A “model zoo” is a suite of pretrained models, each designed to help answer a different scientific question. Transfer learning is a technique that can be used to create a model zoo. Each of the models in a model zoo can then be used to solve a specific problem by refining the model using existing relevant datasets.

A model zoo captures a set of broad information that can be refined to address specific questions. For example, there are neural network models that were pretrained on over a million images, and can be refined for a specific image analysis task—like classifying images of cats and dogs. A model zoo would be useful for the space biology community, as well. For example, the models in a space biology model zoo could be pretrained on large biomedical datasets, and then refined using datasets from space biology experiments to investigate specific space biology questions.


OBJECTIVE


Your challenge is: (1) to design a comprehensive database of publicly available biomedical datasets that could be used to pretrain different models for a “model zoo,” and (2) to determine relevant publicly available space biology datasets that could then be used to refine the models to investigate specific space biology questions.

You are free to suggest multi-modal dataset combinations as well as single data types. If you like, you can expand the challenge and assemble and collect a list of promising transfer learning model architectures that could be tested. If there is time remaining, you could also write code to preprocess a data type and train a model on that data to contribute the next pretrained model to the model zoo.


POTENTIAL CONSIDERATIONS


You may (but are not required to) consider the following when creating your solution:

  • What is the best way to present this database to stakeholders?
    What preprocessing steps will need to take place to convert the public biomedical datasets to be transfer-learning-ready? Some may require more work than others.
  • What multi-modal data combinations are most likely to result in biological knowledge gain?
  • Are there transfer learning architectures that should be avoided in this effort for any reason?
  • Are there existing pretrained biological models that could be used out-of-the-box for this effort?
  • A model in a space biology “model zoo” uses an existing dataset of thousands of different RNA sequencing biomedical studies. Such a model could be leveraged for any combination of space biology RNA sequencing datasets in the NASA Open Science Data Repository (OSDR) (see Resources tab). However, OSDR has many different data types and it would be useful for the public to have pretrained models for each data type, and possibly for multi-modal datasets, or to help address different biological questions.
  • Additionally, it might be interesting to use different transfer learning model architectures to create the model zoo, and note the resulting effects.

    For data and resources related to this challenge, refer to the Resources tab at the top of the page. More resources may be added before the hackathon begins.

  • Makkah
    Oct 03, 2023

    Autotrophic nutrition in animals