= 'https://www.holofooddata.org/api/samples'
samples_endpoint_base = 'https://www.holofooddata.org/api/animals'
animals_endpoint_base import requests
import pandas as pd
import matplotlib.pyplot as plt
Tutorial
A guide to using the HoloFood Data Portal
This tutorial showcases the core uses of the Data Portal. There is a solution to each learning objective.
Objective 1: Finding samples using the website
Open the HoloFood Data Portal.
Find all Salmon Fatty Acids samples from fish that were treated with Fermented Algae.
Solution
- Click
Salmon samples
in the navigation bar. - Type
fermented algae
into theTreatment Search
filter next to the table, selectfatty_acids
in theSample type
filter, and pressApply
.
Objective 2: Find animal metadata for a specific sample
- Find the Chicken sample
SAMEA7025251
. - From the sample’s host-animal metadata, find the average daily feed intake of chicken’s in that pen in the first trial week.
Solution
- Search for
SAMEA7025251
in the main search box at the top of every page. - From the sample’s detail page, follow the link to the sample’s host animal
SAMEA112905290
- Open the
Animal metadata
section of the animal detail page. - Switch to the
Pen
tab of metadata. - Find the
Average Daily Feed intake: Day 00 - 07
row (measurement = 18.59g).
Objective 3: Find functional information for a sample in MGnify
- Find chicken sample
SAMEA7817177
on the data portal - Follow the links to the sample’s metagenomics analyses on MGnify
- What is
SAMEA7817177
’s top Pfam entry (from MGnify’s functional analysis of it)?
Solution
- Type
SAMEA7817177
into theSearch data and docs
search bar at the top of the data portal - The sample detail page for
SAMEA7817177
will open directly - Open the
Metagenomics
section of the sample detail page. - Click the assembly analysis in the table
- On MGnify, switch to the
Functional analysis
tab. - Find the
Pfam
subtab - Find the first row of the table (the biggest bar in the bar chart)
- It is
Reverse transcriptase
PF07727
Objective 4: Find MAG metadata from MGnify
- Find the
HoloFood Chicken Gut v1
genomeMGYG000308389
- Using the genome’s species representative, find the genome’s most prevalent COG category with a known function.
Solution
- Click
Genomes
in the navigation bar. - Select the
HoloFood Chicken Gut v1
catalogue in the sub navigation - Type
MGYG000308389
in theAccession contains
filter.- You could also jump straight to here by entering the accession in the global search box
- Click
View on MGnify
in the single row of the table. - On the MGnify page that opens, switch to the
COG analysis
sub-tab. - Look at the bar chart. Ignoring cataegory
S (Function unknown)
, categoryK (Transcription)
is the most prevalent COG category in this genome.
Objective 6: Compare metadata between samples, using TSV files in a Spreadsheet
- Find chicken animal
SAMEA112905066
on the data portal, and download the sample’s metadata TSV file - Do the same for animal
SAMEA112904813
- Notice that these samples are chickens in different pens, fed different diets (
Control
, andProbiotic
respectively) - Pick a spreadsheet application like Google Sheets, Excel, or LibreOffice Calc, (or write some code)
- Make a plot comparing the average weight gain of chickens in each of the two pens over time
Solution
- Open the
Chicken samples
navigation item - Type
SAMEA14099422
into theAccession contains
filter and pressApply
- Click
View
in the only row in the table - Scroll down and open the
Sample metadata
section - Press
Download all as TSV
- Rename the downloaded file (
metadata.tsv
) to something useful likecontrol.tsv
- Repeat the previous steps for
SAMEA7025235
, getting to a file called e.g.probiotic.tsv
- Open/import both files in your spreadsheet application
- Find the rows labelled
Average body weight at day 00.....35
- Copy and paste the rows from each sheet next to each other
- In Google Sheets, press
Insert > Chart
(or similar in other applications)
Objective 7: Use Python to analyse data from the API
Coding required
This objective takes a lot longer than the others – it involves coding and the API.
Packages required
pip install requests pandas matplotlib
- Use the API to fetch a list of HoloFood salmon samples as a Pandas dataframe (the API docs will be very helpful)
- Only fetch salmon samples which were taken for Iodine measurements
- Only fetch salmon samples from tanks SD02 and SD04, which are tanks with treatment of 0.0% and 2.0% concentration seaweed treatment
- Only fetch samples which have a non-empty value for metadata variable
Iodine
(that’s the measurement of iodine in the fish feces)
- Use the API to fetch sample and animal metadata for each sample and add it to the Pandas dataframe
- Make a histogram of the iodine measurements, grouped by the treatment concentration
Here’s a startpoint for the libraries and base API endpoint you need:
A complete solution is below.
Code
= None
samples_df
# Fetch samples
for tank in ['SD02', 'SD04']:
= 1
page while page:
print(f'Fetching {page = } of {tank = }')
= requests.get(
samples_page f'{samples_endpoint_base}?{page=}&system=salmon&sample_type=iodine&require_metadata_value=Iodine&title={tank}'
).json()= pd.json_normalize(
samples_page_df 'items']
samples_page[
)
if samples_df is None:
= samples_page_df
samples_df else:
= pd.concat(
samples_df
[
samples_df,
samples_page_df
]
)
+= 1
page if len(samples_df) >= samples_page['count']:
= False
page
def get_iodine_and_sample_time(sample):
"""Fetch metadata for the sample and its host animal."""
= requests.get(
sample_detail f'{samples_endpoint_base}/{sample.accession}'
).json()= sample_detail['structured_metadata']
metadata = next(
iodine
metadatum for metadatum in metadata
if metadatum['marker']['name'] == 'Iodine'
)
= requests.get(
animal_detail f'{animals_endpoint_base}/{sample.animal}'
).json()= animal_detail['structured_metadata']
metadata = next(
timepoint
metadatumfor metadatum in metadata
if metadatum['marker']['name'] == 'Sampling time'
)= next(
treatment_concentration
metadatumfor metadatum in metadata
if metadatum['marker']['name'] == 'Treatment concentration'
)
= iodine['measurement']
iodine_measurement = iodine_measurement.replace(',', '.')
iodine_clean try:
= float(iodine_clean)
iodine_float except ValueError:
= pd.NA
iodine_float
return iodine_float, timepoint['measurement'], treatment_concentration['measurement']
# Consolidate the metadata with the samples
= samples_df.apply(
metadata
get_iodine_and_sample_time, ='columns',
axis='expand'
result_type
).rename(={
columns0: 'iodine_mg_kg_ww',
1: 'sampling_timepoint',
2: 'treatment_concentration'
}
)= pd.concat(
trial_samples
[
samples_df,
metadata
],=1
axis
)
# Drop the Day 0 samples
= trial_samples[trial_samples.sampling_timepoint != 'Day 0']
not_pretrial_control
# Plot iodine measurements
'treatment_concentration').iodine_mg_kg_ww.hist(legend=True, bins=5, alpha=0.5)
not_pretrial_control.groupby('Iodine mg/kg ww'); plt.xlabel(
Fetching page = 1 of tank = 'SD02'
Fetching page = 1 of tank = 'SD04'