Libre Biotech

Your next training dataset is already structured

Libre Biotech stores genomics research data using the ISA framework with ontology annotations, provenance chains, and reproducible CWL workflows. Every dataset exports as ML-ready JSON, CSV, ISA-JSON, or RO-Crate — ready for your pipeline, not manual wrangling.

4
Investigations
567
Samples
288
Annotations
74
Succeeded Runs
13
Ontologies
3,085,313
Ontology Terms

Data Formats

FormatDescriptionUse Case
ML-Ready JSON/CSV Flat samples x features matrix with ontology CURIEs, factor values, measurements, and provenance summary Direct loading into pandas, scikit-learn, PyTorch datasets
ISA-JSON Full investigation metadata following ISA-JSON 1.0 specification — studies, processes, samples, annotations Structured metadata parsing, repository submission
RO-Crate Research Object package with workflow definitions, container images, inputs, outputs, and PROV-O provenance Reproducible analysis, workflow re-execution
PROV-O W3C provenance ontology export as JSON-LD, RDF, or Turtle — full activity/entity/agent graphs Knowledge graphs, provenance-aware ML, data lineage validation
Data Cards YAML frontmatter + Markdown body — aggregates metadata, scores, ontology stats, provenance, code examples Dataset documentation, AI assistant context, Hugging Face-style cards
ML-Ready ZIP CSV data file + README.md describing columns, types, and loading examples in a single archive Self-documenting dataset sharing, offline analysis
CSV Tabular exports of processes, samples, and measurements Spreadsheet analysis, R/tidyverse workflows

API Quick Start

Authenticate with an API key via the X-API-Key header. Generate keys from your account settings.

Endpoints
MethodEndpointDescription
GET/api.php/v1/investigationsList investigations
GET/api.php/v1/investigations/{id}Investigation details
GET/api.php/v1/investigations/{id}/ml-exportML-ready export (JSON)
GET/api.php/v1/investigations/{id}/ml-export?format=csvML-ready export (CSV)
GET/api.php/v1/investigations/{id}/export?format=isajsonISA-JSON export
GET/api.php/v1/investigations/{id}/provenanceProvenance graph
GET/api.php/v1/investigations/{id}/prov-oPROV-O export
GET/api.php/v1/samples/{id}Sample details
GET/api.php/v1/samples/{id}/lineageSample lineage graph
GET/api.php/v1/investigations/{id}/cardInvestigation data card (Markdown or ?format=json)
GET/api.php/v1/investigations/{id}/ml-export?format=zipML-ready ZIP bundle (CSV + README)
GET/api.php/v1/platform-cardPlatform skill file (JSON)
GET/CLAUDE.mdPlatform skill file (Markdown) — for AI coding assistants
Code Examples
import requests
import pandas as pd

API = "https://librebiotech.org/api.php/v1"
KEY = "your-api-key"
headers = {"X-API-Key": KEY}

# Fetch ML-ready data as JSON
resp = requests.get(f"{API}/investigations/3/ml-export", headers=headers)
data = resp.json()

# Load into DataFrame
df = pd.DataFrame(data["rows"], columns=[c["name"] for c in data["columns"]])
print(df.head())

# Or fetch as CSV directly
resp = requests.get(f"{API}/investigations/3/ml-export?format=csv", headers=headers)
from io import StringIO
df = pd.read_csv(StringIO(resp.text))
library(httr)
library(jsonlite)
library(tibble)

api <- "https://librebiotech.org/api.php/v1"
key <- "your-api-key"

# Fetch ML-ready data
resp <- GET(paste0(api, "/investigations/3/ml-export"),
            add_headers("X-API-Key" = key))
data <- fromJSON(content(resp, "text", encoding = "UTF-8"))

# Convert to tibble
df <- as_tibble(data$rows)
colnames(df) <- sapply(data$columns, function(c) c$name)
print(df)
# JSON format
curl -H "X-API-Key: your-api-key" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export"

# CSV format
curl -H "X-API-Key: your-api-key" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=csv"

# ISA-JSON
curl -H "X-API-Key: your-api-key" \
  "https://librebiotech.org/api.php/v1/investigations/3/export?format=isajson"

# Data card (no auth for public investigations)
curl "https://librebiotech.org/api.php/v1/investigations/3/card"

# Platform skill file
curl "https://librebiotech.org/CLAUDE.md"

Provenance & Lineage

Every sample tracks its provenance through process_input_samples links. The API returns full lineage chains with process categories, enabling you to trace any derived result back to its original tissue or source material.

# Example: get sample lineage
GET /api.php/v1/samples/42/lineage

# Response includes ancestor chain:
{
  "sample_id": 42,
  "label": "RNA-EXTRACT-001",
  "ancestors": [
    {"sample_id": 15, "label": "TISSUE-001", "process_category": "extraction"},
    {"sample_id": 3, "label": "MOUSE-DBA-01", "process_category": null}
  ]
}

Ontology Coverage

The platform indexes 3,085,313 terms from 13 ontologies, ensuring standardised vocabulary across all metadata.

PrefixName
CHEBI Chemical Entities of Biological Interest
ECO Evidence and Conclusion Ontology
ENVO Environment Ontology
GO Gene Ontology
mgi Mouse Genome Informatics
NCBITAXON NCBI Taxonomy
OBA Ontology of Biological Attributes
OBCS Ontology of Biological and Clinical Statistics
OBI Ontology for Biomedical Investigations
PATO Phenotype And Trait Ontology
PO Plant Ontology
UBERON Uberon Anatomy Ontology
UO Units of Measurement

Getting Started

Create an account, generate an API key, and start pulling structured genomics data into your ML pipeline in minutes.

Using an AI coding assistant? Point it at https://librebiotech.org/CLAUDE.md for a machine-readable overview of the entire platform.