Tutorial

A guide to using the HoloFood Data Portal

This tutorial showcases the core uses of the Data Portal. There is a solution to each learning objective.

Objective 1: Finding samples using the website

Open the HoloFood Data Portal.

Find all Salmon Fatty Acids samples from fish that were treated with Fermented Algae.

Solution

Click Salmon samples in the navigation bar.
Type fermented algae into the Treatment Search filter next to the table, select fatty_acids in the Sample type filter, and press Apply.

Objective 2: Find animal metadata for a specific sample

Find the Chicken sample SAMEA7025251.
From the sample’s host-animal metadata, find the average daily feed intake of chicken’s in that pen in the first trial week.

Solution

Search for SAMEA7025251 in the main search box at the top of every page.
From the sample’s detail page, follow the link to the sample’s host animal SAMEA112905290
Open the Animal metadata section of the animal detail page.
Switch to the Pen tab of metadata.
Find the Average Daily Feed intake: Day 00 - 07 row (measurement = 18.59g).

Objective 3: Find functional information for a sample in MGnify

Find chicken sample SAMEA7817177 on the data portal
Follow the links to the sample’s metagenomics analyses on MGnify
What is SAMEA7817177’s top Pfam entry (from MGnify’s functional analysis of it)?

Solution

Type SAMEA7817177 into the Search data and docs search bar at the top of the data portal
The sample detail page for SAMEA7817177 will open directly
Open the Metagenomics section of the sample detail page.
Click the assembly analysis in the table
On MGnify, switch to the Functional analysis tab.
Find the Pfam subtab
Find the first row of the table (the biggest bar in the bar chart)
It is Reverse transcriptase PF07727

Objective 4: Find MAG metadata from MGnify

Find the HoloFood Chicken Gut v1 genome MGYG000308389
Using the genome’s species representative, find the genome’s most prevalent COG category with a known function.

Solution

Click Genomes in the navigation bar.
Select the HoloFood Chicken Gut v1 catalogue in the sub navigation
Type MGYG000308389 in the Accession contains filter.
- You could also jump straight to here by entering the accession in the global search box
Click View on MGnify in the single row of the table.
On the MGnify page that opens, switch to the COG analysis sub-tab.
Look at the bar chart. Ignoring cataegory S (Function unknown), category K (Transcription) is the most prevalent COG category in this genome.

Objective 5: Find viral fragments within a MAG

Data coming soon

Viral catalogues are not yet available for HoloFood, however viral annotations are available for the genomes in MGnify.

Objective 6: Compare metadata between samples, using TSV files in a Spreadsheet

Find chicken animal SAMEA112905066 on the data portal, and download the sample’s metadata TSV file
Do the same for animal SAMEA112904813
Notice that these samples are chickens in different pens, fed different diets (Control, and Probioticrespectively)
Pick a spreadsheet application like Google Sheets, Excel, or LibreOffice Calc, (or write some code)
Make a plot comparing the average weight gain of chickens in each of the two pens over time

Solution

Open the Chicken samples navigation item
Type SAMEA14099422 into the Accession contains filter and press Apply
Click View in the only row in the table
Scroll down and open the Sample metadata section
Press Download all as TSV
Rename the downloaded file (metadata.tsv) to something useful like control.tsv
Repeat the previous steps for SAMEA7025235, getting to a file called e.g. probiotic.tsv
Open/import both files in your spreadsheet application
Find the rows labelled Average body weight at day 00.....35
Copy and paste the rows from each sheet next to each other
In Google Sheets, press Insert > Chart (or similar in other applications)

Screenshot of spreadsheet chart comparing average weight of chickens in two pens fed Control and Probiotic diets respectively

Objective 7: Use Python to analyse data from the API

Coding required

This objective takes a lot longer than the others – it involves coding and the API.

Packages required

pip install requests pandas matplotlib

Use the API to fetch a list of HoloFood salmon samples as a Pandas dataframe (the API docs will be very helpful)
- Only fetch salmon samples which were taken for Iodine measurements
- Only fetch salmon samples from tanks SD02 and SD04, which are tanks with treatment of 0.0% and 2.0% concentration seaweed treatment
- Only fetch samples which have a non-empty value for metadata variable Iodine (that’s the measurement of iodine in the fish feces)
Use the API to fetch sample and animal metadata for each sample and add it to the Pandas dataframe
Make a histogram of the iodine measurements, grouped by the treatment concentration

Here’s a startpoint for the libraries and base API endpoint you need:

samples_endpoint_base = 'https://www.holofooddata.org/api/samples'
animals_endpoint_base = 'https://www.holofooddata.org/api/animals'
import requests
import pandas as pd
import matplotlib.pyplot as plt

A complete solution is below.

Code

samples_df = None

# Fetch samples
for tank in ['SD02', 'SD04']:
    page = 1
    while page:
        print(f'Fetching {page = } of {tank = }')
        samples_page = requests.get(
            f'{samples_endpoint_base}?{page=}&system=salmon&sample_type=iodine&require_metadata_value=Iodine&title={tank}'
        ).json()
        samples_page_df = pd.json_normalize(
            samples_page['items']
        )

        if samples_df is None:
            samples_df = samples_page_df
        else:
            samples_df = pd.concat(
                [
                    samples_df,
                    samples_page_df
                ]
            )

        page += 1
        if len(samples_df) >= samples_page['count']:
            page = False

def get_iodine_and_sample_time(sample):
    """Fetch metadata for the sample and its host animal."""
    sample_detail = requests.get(
        f'{samples_endpoint_base}/{sample.accession}'
    ).json()
    metadata = sample_detail['structured_metadata']
    iodine = next(
        metadatum 
        for metadatum in metadata 
        if metadatum['marker']['name'] == 'Iodine'
    )

    animal_detail = requests.get(
        f'{animals_endpoint_base}/{sample.animal}'
    ).json()
    metadata = animal_detail['structured_metadata']
    timepoint = next(
        metadatum
        for metadatum in metadata
        if metadatum['marker']['name'] == 'Sampling time'
    )
    treatment_concentration = next(
        metadatum
        for metadatum in metadata
        if metadatum['marker']['name'] == 'Treatment concentration'
    )
    
    iodine_measurement = iodine['measurement']
    iodine_clean = iodine_measurement.replace(',', '.')
    try:
        iodine_float = float(iodine_clean)
    except ValueError:
        iodine_float = pd.NA
    
    return iodine_float, timepoint['measurement'], treatment_concentration['measurement']

# Consolidate the metadata with the samples
metadata = samples_df.apply(
    get_iodine_and_sample_time, 
    axis='columns', 
    result_type='expand'
).rename(
    columns={
        0: 'iodine_mg_kg_ww', 
        1: 'sampling_timepoint',
        2: 'treatment_concentration'
    }
)
trial_samples = pd.concat(
    [
        samples_df,
        metadata
    ],
    axis=1
)

# Drop the Day 0 samples
not_pretrial_control = trial_samples[trial_samples.sampling_timepoint != 'Day 0']

# Plot iodine measurements
not_pretrial_control.groupby('treatment_concentration').iodine_mg_kg_ww.hist(legend=True, bins=5, alpha=0.5)
plt.xlabel('Iodine mg/kg ww');

Fetching page = 1 of tank = 'SD02'
Fetching page = 1 of tank = 'SD04'