Your next training dataset is already structured

Libre Biotech stores genomics research data using the ISA framework with ontology annotations, provenance chains, and reproducible CWL workflows. Every dataset exports as ML-ready JSON, CSV, ISA-JSON, or RO-Crate — ready for your pipeline, not manual wrangling.

API Quick Start Data Formats Create Account

Investigations

609

Samples

525

Annotations

Succeeded Runs

Ontologies

3,128,941

Ontology Terms

Data Formats

Format	Description	Use Case
ML-Ready JSON/CSV	Flat samples x features matrix with ontology CURIEs, factor values, measurements, and provenance summary	Direct loading into pandas, scikit-learn, PyTorch datasets
ISA-JSON	Full investigation metadata following ISA-JSON 1.0 specification — studies, processes, samples, annotations	Structured metadata parsing, repository submission
RO-Crate	Research Object package with workflow definitions, container images, inputs, outputs, and PROV-O provenance	Reproducible analysis, workflow re-execution
PROV-O	W3C provenance ontology export as JSON-LD, RDF, or Turtle — full activity/entity/agent graphs	Knowledge graphs, provenance-aware ML, data lineage validation
Data Cards	YAML frontmatter + Markdown body — aggregates metadata, scores, ontology stats, provenance, code examples	Dataset documentation, AI assistant context, Hugging Face-style cards
ML-Ready ZIP	CSV data file + README.md describing columns, types, and loading examples in a single archive	Self-documenting dataset sharing, offline analysis
CSV	Tabular exports of processes, samples, and measurements	Spreadsheet analysis, R/tidyverse workflows

API Quick Start

Authenticate with an API key via the X-API-Key header. Generate keys from your account settings.

Endpoints

Method	Endpoint	Description
GET	`/api.php/v1/investigations`	List investigations
GET	`/api.php/v1/investigations/{id}`	Investigation details
GET	`/api.php/v1/investigations/{id}/ml-export`	ML-ready export (JSON)
GET	`/api.php/v1/investigations/{id}/ml-export?format=csv`	ML-ready export (CSV)
GET	`/api.php/v1/investigations/{id}/export?format=isajson`	ISA-JSON export
GET	`/api.php/v1/investigations/{id}/provenance`	Provenance graph
GET	`/api.php/v1/investigations/{id}/prov-o`	PROV-O export
GET	`/api.php/v1/samples/{id}`	Sample details
GET	`/api.php/v1/samples/{id}/lineage`	Sample lineage graph
GET	`/api.php/v1/investigations/{id}/card`	Investigation data card (Markdown or `?format=json`)
GET	`/api.php/v1/investigations/{id}/ml-export?format=zip`	ML-ready ZIP bundle (CSV + README)
GET	`/api.php/v1/platform-card`	Platform skill file (JSON)
GET	`/CLAUDE.md`	Platform skill file (Markdown) — for AI coding assistants

Code Examples

import requests
import pandas as pd

API = "https://librebiotech.org/api.php/v1"
KEY = "your-api-key"
headers = {"X-API-Key": KEY}

# Fetch ML-ready data as JSON
resp = requests.get(f"{API}/investigations/3/ml-export", headers=headers)
data = resp.json()

# Load into DataFrame
df = pd.DataFrame(data["rows"], columns=[c["name"] for c in data["columns"]])
print(df.head())

# Or fetch as CSV directly
resp = requests.get(f"{API}/investigations/3/ml-export?format=csv", headers=headers)
from io import StringIO
df = pd.read_csv(StringIO(resp.text))

library(httr)
library(jsonlite)
library(tibble)

api <- "https://librebiotech.org/api.php/v1"
key <- "your-api-key"

# Fetch ML-ready data
resp <- GET(paste0(api, "/investigations/3/ml-export"),
            add_headers("X-API-Key" = key))
data <- fromJSON(content(resp, "text", encoding = "UTF-8"))

# Convert to tibble
df <- as_tibble(data$rows)
colnames(df) <- sapply(data$columns, function(c) c$name)
print(df)

# JSON format
curl -H "X-API-Key: your-api-key" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export"

# CSV format
curl -H "X-API-Key: your-api-key" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=csv"

# ISA-JSON
curl -H "X-API-Key: your-api-key" \
  "https://librebiotech.org/api.php/v1/investigations/3/export?format=isajson"

# Data card (no auth for public investigations)
curl "https://librebiotech.org/api.php/v1/investigations/3/card"

# Platform skill file
curl "https://librebiotech.org/CLAUDE.md"

Provenance & Lineage

Every sample tracks its provenance through process_input_samples links. The API returns full lineage chains with process categories, enabling you to trace any derived result back to its original tissue or source material.

# Example: get sample lineage
GET /api.php/v1/samples/42/lineage

# Response includes ancestor chain:
{
  "sample_id": 42,
  "label": "RNA-EXTRACT-001",
  "ancestors": [
    {"sample_id": 15, "label": "TISSUE-001", "process_category": "extraction"},
    {"sample_id": 3, "label": "MOUSE-DBA-01", "process_category": null}
  ]
}

Ontology Coverage

The platform indexes 3,128,941 terms from 23 ontologies, ensuring standardised vocabulary across all metadata.

Prefix	Name
`CHEBI`	Chemical Entities of Biological Interest
`CHMO`	CHMO
`CL`	Cell Ontology
`DOID`	Human Disease Ontology
`ECO`	Evidence and Conclusion Ontology
`EDAM`	EDAM
`ENVO`	Environment Ontology
`GENEPIO`	GENEPIO
`GO`	Gene Ontology
`MCO`	MCO
`mgi`	Mouse Genome Informatics
`MMO`	MMO
`NCBITAXON`	NCBI Taxonomy
`NCIT`	NCIT
`OBA`	Ontology of Biological Attributes
`OBCS`	Ontology of Biological and Clinical Statistics
`OBI`	Ontology for Biomedical Investigations
`PATO`	Phenotype And Trait Ontology
`PO`	Plant Ontology
`PRIDE`	PRIDE

Getting Started

Create an account, generate an API key, and start pulling structured genomics data into your ML pipeline in minutes.

Using an AI coding assistant? Point it at https://librebiotech.org/CLAUDE.md for a machine-readable overview of the entire platform.

Create Account Generate API Key API Documentation