Data Export & FAIR
Export your data in standard formats and understand how the platform supports FAIR principles.
Export formats
Libre Biotech supports multiple standard export formats to ensure your data is portable and interoperable:
| Format | What it includes | Use case |
|---|---|---|
| ISA-JSON | Full investigation metadata: studies, processes, samples, annotations, protocol references | Machine-readable exchange with ISA-compatible tools |
| ISA-Tab | Tab-separated investigation, study, and assay files in a ZIP archive | Submission to repositories (ENA, ArrayExpress, MetaboLights) |
| ML-Ready JSON | Flat samples × features matrix with ontology CURIEs, factor values, measurements, and provenance summary | Direct loading into pandas, scikit-learn, PyTorch datasets |
| ML-Ready CSV | Same matrix as ML-Ready JSON but as comma-separated values with a header row | Spreadsheet analysis, R/tidyverse workflows, quick inspection |
| RO-Crate | Research Object package with CWL workflow definitions, container images, inputs, outputs, and provenance | Reproducible analysis packaging and re-execution |
| PROV-O | W3C provenance ontology export as JSON-LD — full activity/entity/agent graphs | Knowledge graphs, provenance-aware ML, audit trails |
| Data Card | YAML frontmatter (machine-readable) + Markdown body — aggregates metadata, scores, ontology stats, provenance, and code examples | Dataset documentation (Hugging Face-style), AI assistant context |
| CSV | Tabular data (processes, samples, investigations) | Spreadsheet analysis, custom processing |
Exporting an investigation
From any investigation page, the toolbar provides export buttons:
- Navigate to the investigation page
- Click ISA-Tab to download a ZIP of tab-separated files
- Click ISA-JSON to download the full ISA-JSON metadata
- Click ML-Ready to download a flat CSV matrix for machine learning. Use the dropdown arrow to choose JSON or ZIP (with README) instead
- Click Data Card to download a Markdown data card (YAML frontmatter + body). Use the dropdown arrow for JSON format
Exports include all studies, processes, samples, and ontology annotations within the investigation. File attachments are referenced by path but not included in the metadata export. ISA-Tab ZIP archives also include a README.md and DATA_CARD.md describing the contents and investigation metadata.
ML-Ready export
The ML-Ready export flattens all investigation data into a single samples × features matrix designed for direct consumption by ML pipelines.
What's included
- Sample identifiers — ID and label for each sample
- Ontology annotations — Each annotation slot (e.g. organism, tissue, anatomy) becomes a column, prefixed with
annotation: - Study factors — Each factor (e.g. genotype, treatment) becomes a column, prefixed with
factor: - Quantitative measurements — Each measurement type (e.g. RQN, concentration) becomes a numeric column, prefixed with
measurement: - Provenance summary — Per-sample lineage depth, root sample, and process chain
JSON structure
{
"metadata": {
"investigation_id": 3,
"investigation_title": "Mouse Transcriptomics",
"export_date": "2026-03-14",
"sample_count": 42,
"feature_count": 8,
"ontology_prefix_map": {
"NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_",
"UBERON": "http://purl.obolibrary.org/obo/UBERON_"
}
},
"columns": [
{"name": "sample_id", "type": "identifier"},
{"name": "sample_label", "type": "string"},
{"name": "annotation:organism", "type": "ontology", "ontology_curie": "NCBITaxon:10090"},
{"name": "annotation:anatomy", "type": "ontology", "ontology_curie": "UBERON:0000955"},
{"name": "factor:genotype", "type": "categorical"},
{"name": "measurement:RQN", "type": "numeric", "unit": "UO:0000186"}
],
"rows": [
[1, "SAMPLE-001", "Mus musculus", "brain", "wild-type", 8.7],
[2, "SAMPLE-002", "Mus musculus", "liver", "knockout", 7.2]
],
"provenance_summary": {
"1": {"depth": 3, "process_chain": ["extraction", "sample_prep", "sequencing"]},
"2": {"depth": 2, "process_chain": ["extraction", "sequencing"]}
}
}
CSV format
The CSV export uses the same column names as the JSON columns array. Null values are represented as empty strings. First row is the header.
sample_id,sample_label,annotation:organism,annotation:anatomy,factor:genotype,measurement:RQN
1,SAMPLE-001,Mus musculus,brain,wild-type,8.7
2,SAMPLE-002,Mus musculus,liver,knockout,7.2
Column types
| Type | Description | Example |
|---|---|---|
identifier | Unique integer ID | sample_id |
string | Free text | sample_label |
ontology | Value from an ontology term, with CURIE in column metadata | annotation:organism |
categorical | Discrete category value (from study factors) | factor:genotype |
numeric | Numeric measurement value, with optional unit CURIE | measurement:RQN |
ZIP bundle
The format=zip option wraps the CSV data file and a README.md describing the columns, types, and loading examples into a single ZIP archive. This is ideal for sharing self-documenting datasets.
# Download ZIP bundle
curl -H "X-API-Key: YOUR_KEY" -o ml_export.zip \
"https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=zip"
Accessing via API
# JSON (default)
curl -H "X-API-Key: YOUR_KEY" \
"https://librebiotech.org/api.php/v1/investigations/3/ml-export"
# CSV
curl -H "X-API-Key: YOUR_KEY" \
"https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=csv"
# ZIP (data + README)
curl -H "X-API-Key: YOUR_KEY" -o ml_export.zip \
"https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=zip"
Loading into Python
import requests, pandas as pd
from io import StringIO
API = "https://librebiotech.org/api.php/v1"
headers = {"X-API-Key": "YOUR_KEY"}
# Option 1: JSON → DataFrame
resp = requests.get(f"{API}/investigations/3/ml-export", headers=headers)
data = resp.json()
df = pd.DataFrame(data["rows"], columns=[c["name"] for c in data["columns"]])
# Option 2: CSV → DataFrame (simpler)
resp = requests.get(f"{API}/investigations/3/ml-export?format=csv", headers=headers)
df = pd.read_csv(StringIO(resp.text))
Exporting analysis runs
Each completed analysis run can be downloaded as an RO-Crate or as a ZIP archive containing all output files. From the analysis run page:
- Download All — ZIP of all output files
- View RO-Crate — Inspect the JSON-LD metadata describing the workflow, inputs, outputs, container images, and parameters
- Individual files — Download specific outputs
RO-Crate metadata follows the RO-Crate 1.1 specification and includes:
- CWL workflow definition reference
- Container image (Apptainer/Singularity) used for execution
- All input parameters as JSON
- Output file manifests with checksums
- W3C PROV-O activity records
PROV-O provenance export
The platform exports W3C PROV-O provenance as JSON-LD, available for investigations and individual samples:
# Investigation-level provenance
GET /api.php/v1/investigations/{id}/prov-o
# Sample-level provenance
GET /api.php/v1/samples/{id}/prov-o
The PROV-O export includes prov:Activity (processes), prov:Entity (samples, files), and prov:Agent (people) nodes with prov:wasGeneratedBy, prov:used, and prov:wasAssociatedWith relationships.
Data Cards
Every investigation can generate a data card — a self-describing document inspired by Hugging Face dataset cards. Cards combine YAML frontmatter (machine-readable) with a Markdown body (human-readable) in a single portable file.
What's included
- Identity — title, DOI, license, visibility, status
- Contacts — names, ORCID identifiers, affiliations, roles
- Scale — study, sample, process, and publication counts
- Biology — organisms and material types
- Ontology — sources used and annotation coverage percentage
- Provenance — max lineage depth and process categories
- Scores — FAIR score (per-principle) and AI-Ready score
- API endpoints — direct links to all available exports
- Code examples — Python and curl snippets for quick access
Accessing data cards
| Method | How |
|---|---|
| Web UI | Click the Data Card button on any investigation page. Use the dropdown for JSON format |
| REST API (Markdown) | GET /api.php/v1/investigations/{id}/card |
| REST API (JSON) | GET /api.php/v1/investigations/{id}/card?format=json |
Data cards for public investigations are accessible without authentication. Cards are also bundled as DATA_CARD.md inside ISA-Tab ZIP exports.
Platform skill file (CLAUDE.md)
A platform-level skill file is available at /CLAUDE.md — a dynamically generated Markdown document describing the entire platform for AI coding assistants. It includes live statistics, data model overview, all API endpoints, authentication details, export formats, ontology coverage, and code examples. Also available as JSON via /api.php/v1/platform-card.
CSV exports
Tabular CSV exports are available for bulk data download:
- Processes —
/?action=export_processes— all processes with dates, categories, procedures, and study links - Samples —
/?action=export_samples— all samples with organism, material type, descriptions, and parent process - Investigations —
/?action=export_investigations— all investigations with study counts and status
API access
The REST API provides programmatic access to all platform data. Use API keys for authenticated access. See the AI Readiness page for ML-specific guidance and code examples.
FAIR compliance
The platform implements FAIR principles throughout:
Findable
- Rich, structured metadata using the ISA framework (Investigation → Study → Assay)
- Ontology annotations from 2.9M+ OBO Foundry terms across 13 ontologies
- Full-text search across 9 entity types with typeahead suggestions
- DOI support for persistent identification
- Public discovery pages for investigations and protocols
- FAIR Score card on every investigation (0-100, per-principle breakdown)
Accessible
- Open access to public content without authentication
- REST API with API key authentication for programmatic access
- Public API with CORS support for browser-based integrations
- Share links with optional password protection and expiry for controlled access
- Visibility controls: private, group, or public per investigation
Interoperable
- ISA-JSON and ISA-Tab export (international standards for life science metadata)
- ML-Ready JSON/CSV export for direct machine learning consumption
- CWL workflow definitions (portable across workflow engines)
- RO-Crate packaging (W3C PROV-based research objects)
- PROV-O provenance export (W3C standard, JSON-LD)
- Standard bioinformatics file formats (FASTQ, BAM, VCF, GFF3, BED)
- Ontology-annotated metadata using OBI, EFO, UBERON, NCBITaxon, and more
Reusable
- Complete sample provenance chains via
process_input_sampleslineage tracking - Protocol versioning with changelogs and community forking
- Per-entity licensing metadata (CC-BY-4.0, CC0, etc.)
- Containerised CWL workflows with pinned tool versions
- AI-Ready Score card measuring ML consumability across 8 dimensions
FAIR Score
Every investigation displays a FAIR Score card in the sidebar, scoring 0-100 across four principles: Findable, Accessible, Interoperable, and Reusable. The score is calculated from:
- Findable — title, description length, DOI, contacts with ORCID, studies, samples, publications
- Accessible — visibility setting, submission/release dates, API availability
- Interoperable — studies, processes, protocols, ontology annotations, study factors, files
- Reusable — license, samples, processes, protocols, contacts, description, analysis runs, publications
Actionable suggestions are shown below the score bars to help you improve. See also the AI Readiness page for the companion AI-Ready Score.