Data Export & FAIR

Export your data in standard formats and understand how the platform supports FAIR principles.

Export formats

Libre Biotech supports multiple standard export formats to ensure your data is portable and interoperable:

Format	What it includes	Use case
ISA-JSON	Full investigation metadata: studies, processes, samples, annotations, protocol references	Machine-readable exchange with ISA-compatible tools
ISA-Tab	Tab-separated investigation, study, and assay files in a ZIP archive	Submission to repositories (ENA, ArrayExpress, MetaboLights)
ML-Ready JSON	Flat samples × features matrix with ontology CURIEs, factor values, measurements, and provenance summary	Direct loading into pandas, scikit-learn, PyTorch datasets
ML-Ready CSV	Same matrix as ML-Ready JSON but as comma-separated values with a header row	Spreadsheet analysis, R/tidyverse workflows, quick inspection
RO-Crate	Research Object package with CWL workflow definitions, container images, inputs, outputs, and provenance	Reproducible analysis packaging and re-execution
PROV-O	W3C provenance ontology export as JSON-LD — full activity/entity/agent graphs	Knowledge graphs, provenance-aware ML, audit trails
Data Card	YAML frontmatter (machine-readable) + Markdown body — aggregates metadata, scores, ontology stats, provenance, and code examples	Dataset documentation (Hugging Face-style), AI assistant context
CSV	Tabular data (processes, samples, investigations)	Spreadsheet analysis, custom processing

Exporting an investigation

From any investigation page, the toolbar provides export buttons:

Navigate to the investigation page
Click ISA-Tab to download a ZIP of tab-separated files
Click ISA-JSON to download the full ISA-JSON metadata
Click ML-Ready to download a flat CSV matrix for machine learning. Use the dropdown arrow to choose JSON or ZIP (with README) instead
Click Data Card to download a Markdown data card (YAML frontmatter + body). Use the dropdown arrow for JSON format

Exports include all studies, processes, samples, and ontology annotations within the investigation. File attachments are referenced by path but not included in the metadata export. ISA-Tab ZIP archives also include a README.md and DATA_CARD.md describing the contents and investigation metadata.

ML-Ready export

The ML-Ready export flattens all investigation data into a single samples × features matrix designed for direct consumption by ML pipelines.

What's included

Sample identifiers — ID and label for each sample
Ontology annotations — Each annotation slot (e.g. organism, tissue, anatomy) becomes a column, prefixed with annotation:
Study factors — Each factor (e.g. genotype, treatment) becomes a column, prefixed with factor:
Quantitative measurements — Each measurement type (e.g. RQN, concentration) becomes a numeric column, prefixed with measurement:
Provenance summary — Per-sample lineage depth, root sample, and process chain

JSON structure

{
  "metadata": {
    "investigation_id": 3,
    "investigation_title": "Mouse Transcriptomics",
    "export_date": "2026-03-14",
    "sample_count": 42,
    "feature_count": 8,
    "ontology_prefix_map": {
      "NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_",
      "UBERON": "http://purl.obolibrary.org/obo/UBERON_"
    }
  },
  "columns": [
    {"name": "sample_id", "type": "identifier"},
    {"name": "sample_label", "type": "string"},
    {"name": "annotation:organism", "type": "ontology", "ontology_curie": "NCBITaxon:10090"},
    {"name": "annotation:anatomy", "type": "ontology", "ontology_curie": "UBERON:0000955"},
    {"name": "factor:genotype", "type": "categorical"},
    {"name": "measurement:RQN", "type": "numeric", "unit": "UO:0000186"}
  ],
  "rows": [
    [1, "SAMPLE-001", "Mus musculus", "brain", "wild-type", 8.7],
    [2, "SAMPLE-002", "Mus musculus", "liver", "knockout", 7.2]
  ],
  "provenance_summary": {
    "1": {"depth": 3, "process_chain": ["extraction", "sample_prep", "sequencing"]},
    "2": {"depth": 2, "process_chain": ["extraction", "sequencing"]}
  }
}

CSV format

The CSV export uses the same column names as the JSON columns array. Null values are represented as empty strings. First row is the header.

sample_id,sample_label,annotation:organism,annotation:anatomy,factor:genotype,measurement:RQN
1,SAMPLE-001,Mus musculus,brain,wild-type,8.7
2,SAMPLE-002,Mus musculus,liver,knockout,7.2

Column types

Type	Description	Example
`identifier`	Unique integer ID	`sample_id`
`string`	Free text	`sample_label`
`ontology`	Value from an ontology term, with CURIE in column metadata	`annotation:organism`
`categorical`	Discrete category value (from study factors)	`factor:genotype`
`numeric`	Numeric measurement value, with optional unit CURIE	`measurement:RQN`

ZIP bundle

The format=zip option wraps the CSV data file and a README.md describing the columns, types, and loading examples into a single ZIP archive. This is ideal for sharing self-documenting datasets.

# Download ZIP bundle
curl -H "X-API-Key: YOUR_KEY" -o ml_export.zip \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=zip"

Accessing via API

# JSON (default)
curl -H "X-API-Key: YOUR_KEY" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export"

# CSV
curl -H "X-API-Key: YOUR_KEY" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=csv"

# ZIP (data + README)
curl -H "X-API-Key: YOUR_KEY" -o ml_export.zip \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=zip"

Loading into Python

import requests, pandas as pd
from io import StringIO

API = "https://librebiotech.org/api.php/v1"
headers = {"X-API-Key": "YOUR_KEY"}

# Option 1: JSON → DataFrame
resp = requests.get(f"{API}/investigations/3/ml-export", headers=headers)
data = resp.json()
df = pd.DataFrame(data["rows"], columns=[c["name"] for c in data["columns"]])

# Option 2: CSV → DataFrame (simpler)
resp = requests.get(f"{API}/investigations/3/ml-export?format=csv", headers=headers)
df = pd.read_csv(StringIO(resp.text))

Exporting analysis runs

Each completed analysis run can be downloaded as an RO-Crate or as a ZIP archive containing all output files. From the analysis run page:

Download All — ZIP of all output files
View RO-Crate — Inspect the JSON-LD metadata describing the workflow, inputs, outputs, container images, and parameters
Individual files — Download specific outputs

RO-Crate metadata follows the RO-Crate 1.1 specification and includes:

CWL workflow definition reference
Container image (Apptainer/Singularity) used for execution
All input parameters as JSON
Output file manifests with checksums
W3C PROV-O activity records

PROV-O provenance export

The platform exports W3C PROV-O provenance as JSON-LD, available for investigations and individual samples:

# Investigation-level provenance
GET /api.php/v1/investigations/{id}/prov-o

# Sample-level provenance
GET /api.php/v1/samples/{id}/prov-o

The PROV-O export includes prov:Activity (processes), prov:Entity (samples, files), and prov:Agent (people) nodes with prov:wasGeneratedBy, prov:used, and prov:wasAssociatedWith relationships.

Data Cards

Every investigation can generate a data card — a self-describing document inspired by Hugging Face dataset cards. Cards combine YAML frontmatter (machine-readable) with a Markdown body (human-readable) in a single portable file.

What's included

Identity — title, DOI, license, visibility, status
Contacts — names, ORCID identifiers, affiliations, roles
Scale — study, sample, process, and publication counts
Biology — organisms and material types
Ontology — sources used and annotation coverage percentage
Provenance — max lineage depth and process categories
Scores — FAIR score (per-principle) and AI-Ready score
API endpoints — direct links to all available exports
Code examples — Python and curl snippets for quick access

Accessing data cards

Method	How
Web UI	Click the Data Card button on any investigation page. Use the dropdown for JSON format
REST API (Markdown)	`GET /api.php/v1/investigations/{id}/card`
REST API (JSON)	`GET /api.php/v1/investigations/{id}/card?format=json`

Data cards for public investigations are accessible without authentication. Cards are also bundled as DATA_CARD.md inside ISA-Tab ZIP exports.

Platform skill file (CLAUDE.md)

A platform-level skill file is available at /CLAUDE.md — a dynamically generated Markdown document describing the entire platform for AI coding assistants. It includes live statistics, data model overview, all API endpoints, authentication details, export formats, ontology coverage, and code examples. Also available as JSON via /api.php/v1/platform-card.

CSV exports

Tabular CSV exports are available for bulk data download:

Processes — /?action=export_processes — all processes with dates, categories, procedures, and study links
Samples — /?action=export_samples — all samples with organism, material type, descriptions, and parent process
Investigations — /?action=export_investigations — all investigations with study counts and status

API access

The REST API provides programmatic access to all platform data. Use API keys for authenticated access. See the AI Readiness page for ML-specific guidance and code examples.

FAIR compliance

The platform implements FAIR principles throughout:

Findable

Rich, structured metadata using the ISA framework (Investigation → Study → Assay)
Ontology annotations from 2.9M+ OBO Foundry terms across 13 ontologies
Full-text search across 9 entity types with typeahead suggestions
DOI support for persistent identification
Public discovery pages for investigations and protocols
FAIR Score card on every investigation (0-100, per-principle breakdown)

Accessible

Open access to public content without authentication
REST API with API key authentication for programmatic access
Public API with CORS support for browser-based integrations
Share links with optional password protection and expiry for controlled access
Visibility controls: private, group, or public per investigation

Interoperable

ISA-JSON and ISA-Tab export (international standards for life science metadata)
ML-Ready JSON/CSV export for direct machine learning consumption
CWL workflow definitions (portable across workflow engines)
RO-Crate packaging (W3C PROV-based research objects)
PROV-O provenance export (W3C standard, JSON-LD)
Standard bioinformatics file formats (FASTQ, BAM, VCF, GFF3, BED)
Ontology-annotated metadata using OBI, EFO, UBERON, NCBITaxon, and more

Reusable

Complete sample provenance chains via process_input_samples lineage tracking
Protocol versioning with changelogs and community forking
Per-entity licensing metadata (CC-BY-4.0, CC0, etc.)
Containerised CWL workflows with pinned tool versions
AI-Ready Score card measuring ML consumability across 8 dimensions

FAIR Score

Every investigation displays a FAIR Score card in the sidebar, scoring 0-100 across four principles: Findable, Accessible, Interoperable, and Reusable. The score is calculated from:

Findable — title, description length, DOI, contacts with ORCID, studies, samples, publications
Accessible — visibility setting, submission/release dates, API availability
Interoperable — studies, processes, protocols, ontology annotations, study factors, files
Reusable — license, samples, processes, protocols, contacts, description, analysis runs, publications

Actionable suggestions are shown below the score bars to help you improve. See also the AI Readiness page for the companion AI-Ready Score.

No lock-in guarantee: Every piece of data you create on Libre Biotech can be exported in standard, open formats. We use no proprietary schemas. See the Data Sovereignty Statement for the full commitment.