Your next training dataset is already structured
Libre Biotech stores genomics research data using the ISA framework with ontology annotations, provenance chains, and reproducible CWL workflows. Every dataset exports as ML-ready JSON, CSV, ISA-JSON, or RO-Crate — ready for your pipeline, not manual wrangling.
Data Formats
| Format | Description | Use Case |
|---|---|---|
| ML-Ready JSON/CSV | Flat samples x features matrix with ontology CURIEs, factor values, measurements, and provenance summary | Direct loading into pandas, scikit-learn, PyTorch datasets |
| ISA-JSON | Full investigation metadata following ISA-JSON 1.0 specification — studies, processes, samples, annotations | Structured metadata parsing, repository submission |
| RO-Crate | Research Object package with workflow definitions, container images, inputs, outputs, and PROV-O provenance | Reproducible analysis, workflow re-execution |
| PROV-O | W3C provenance ontology export as JSON-LD, RDF, or Turtle — full activity/entity/agent graphs | Knowledge graphs, provenance-aware ML, data lineage validation |
| Data Cards | YAML frontmatter + Markdown body — aggregates metadata, scores, ontology stats, provenance, code examples | Dataset documentation, AI assistant context, Hugging Face-style cards |
| ML-Ready ZIP | CSV data file + README.md describing columns, types, and loading examples in a single archive | Self-documenting dataset sharing, offline analysis |
| CSV | Tabular exports of processes, samples, and measurements | Spreadsheet analysis, R/tidyverse workflows |
API Quick Start
Authenticate with an API key via the X-API-Key header. Generate keys from your account settings.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api.php/v1/investigations | List investigations |
| GET | /api.php/v1/investigations/{id} | Investigation details |
| GET | /api.php/v1/investigations/{id}/ml-export | ML-ready export (JSON) |
| GET | /api.php/v1/investigations/{id}/ml-export?format=csv | ML-ready export (CSV) |
| GET | /api.php/v1/investigations/{id}/export?format=isajson | ISA-JSON export |
| GET | /api.php/v1/investigations/{id}/provenance | Provenance graph |
| GET | /api.php/v1/investigations/{id}/prov-o | PROV-O export |
| GET | /api.php/v1/samples/{id} | Sample details |
| GET | /api.php/v1/samples/{id}/lineage | Sample lineage graph |
| GET | /api.php/v1/investigations/{id}/card | Investigation data card (Markdown or ?format=json) |
| GET | /api.php/v1/investigations/{id}/ml-export?format=zip | ML-ready ZIP bundle (CSV + README) |
| GET | /api.php/v1/platform-card | Platform skill file (JSON) |
| GET | /CLAUDE.md | Platform skill file (Markdown) — for AI coding assistants |
Code Examples
import requests
import pandas as pd
API = "https://librebiotech.org/api.php/v1"
KEY = "your-api-key"
headers = {"X-API-Key": KEY}
# Fetch ML-ready data as JSON
resp = requests.get(f"{API}/investigations/3/ml-export", headers=headers)
data = resp.json()
# Load into DataFrame
df = pd.DataFrame(data["rows"], columns=[c["name"] for c in data["columns"]])
print(df.head())
# Or fetch as CSV directly
resp = requests.get(f"{API}/investigations/3/ml-export?format=csv", headers=headers)
from io import StringIO
df = pd.read_csv(StringIO(resp.text))
library(httr)
library(jsonlite)
library(tibble)
api <- "https://librebiotech.org/api.php/v1"
key <- "your-api-key"
# Fetch ML-ready data
resp <- GET(paste0(api, "/investigations/3/ml-export"),
add_headers("X-API-Key" = key))
data <- fromJSON(content(resp, "text", encoding = "UTF-8"))
# Convert to tibble
df <- as_tibble(data$rows)
colnames(df) <- sapply(data$columns, function(c) c$name)
print(df)
# JSON format
curl -H "X-API-Key: your-api-key" \
"https://librebiotech.org/api.php/v1/investigations/3/ml-export"
# CSV format
curl -H "X-API-Key: your-api-key" \
"https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=csv"
# ISA-JSON
curl -H "X-API-Key: your-api-key" \
"https://librebiotech.org/api.php/v1/investigations/3/export?format=isajson"
# Data card (no auth for public investigations)
curl "https://librebiotech.org/api.php/v1/investigations/3/card"
# Platform skill file
curl "https://librebiotech.org/CLAUDE.md"
Provenance & Lineage
Every sample tracks its provenance through process_input_samples links. The API returns full lineage chains with process categories, enabling you to trace any derived result back to its original tissue or source material.
# Example: get sample lineage
GET /api.php/v1/samples/42/lineage
# Response includes ancestor chain:
{
"sample_id": 42,
"label": "RNA-EXTRACT-001",
"ancestors": [
{"sample_id": 15, "label": "TISSUE-001", "process_category": "extraction"},
{"sample_id": 3, "label": "MOUSE-DBA-01", "process_category": null}
]
}
Ontology Coverage
The platform indexes 3,085,313 terms from 13 ontologies, ensuring standardised vocabulary across all metadata.
| Prefix | Name |
|---|---|
CHEBI |
Chemical Entities of Biological Interest |
ECO |
Evidence and Conclusion Ontology |
ENVO |
Environment Ontology |
GO |
Gene Ontology |
mgi |
Mouse Genome Informatics |
NCBITAXON |
NCBI Taxonomy |
OBA |
Ontology of Biological Attributes |
OBCS |
Ontology of Biological and Clinical Statistics |
OBI |
Ontology for Biomedical Investigations |
PATO |
Phenotype And Trait Ontology |
PO |
Plant Ontology |
UBERON |
Uberon Anatomy Ontology |
UO |
Units of Measurement |
Getting Started
Create an account, generate an API key, and start pulling structured genomics data into your ML pipeline in minutes.
Using an AI coding assistant? Point it at https://librebiotech.org/CLAUDE.md for a machine-readable overview of the entire platform.