Libre Biotech

Data Export & FAIR

Export your data in standard formats and understand how the platform supports FAIR principles.

Export formats

Libre Biotech supports multiple standard export formats to ensure your data is portable and interoperable:

FormatWhat it includesUse case
ISA-JSONFull investigation metadata: studies, processes, samples, annotations, protocol referencesMachine-readable exchange with ISA-compatible tools
ISA-TabTab-separated investigation, study, and assay files in a ZIP archiveSubmission to repositories (ENA, ArrayExpress, MetaboLights)
ML-Ready JSONFlat samples × features matrix with ontology CURIEs, factor values, measurements, and provenance summaryDirect loading into pandas, scikit-learn, PyTorch datasets
ML-Ready CSVSame matrix as ML-Ready JSON but as comma-separated values with a header rowSpreadsheet analysis, R/tidyverse workflows, quick inspection
RO-CrateResearch Object package with CWL workflow definitions, container images, inputs, outputs, and provenanceReproducible analysis packaging and re-execution
PROV-OW3C provenance ontology export as JSON-LD — full activity/entity/agent graphsKnowledge graphs, provenance-aware ML, audit trails
Data CardYAML frontmatter (machine-readable) + Markdown body — aggregates metadata, scores, ontology stats, provenance, and code examplesDataset documentation (Hugging Face-style), AI assistant context
CSVTabular data (processes, samples, investigations)Spreadsheet analysis, custom processing

Exporting an investigation

From any investigation page, the toolbar provides export buttons:

  1. Navigate to the investigation page
  2. Click ISA-Tab to download a ZIP of tab-separated files
  3. Click ISA-JSON to download the full ISA-JSON metadata
  4. Click ML-Ready to download a flat CSV matrix for machine learning. Use the dropdown arrow to choose JSON or ZIP (with README) instead
  5. Click Data Card to download a Markdown data card (YAML frontmatter + body). Use the dropdown arrow for JSON format

Exports include all studies, processes, samples, and ontology annotations within the investigation. File attachments are referenced by path but not included in the metadata export. ISA-Tab ZIP archives also include a README.md and DATA_CARD.md describing the contents and investigation metadata.

ML-Ready export

The ML-Ready export flattens all investigation data into a single samples × features matrix designed for direct consumption by ML pipelines.

What's included

  • Sample identifiers — ID and label for each sample
  • Ontology annotations — Each annotation slot (e.g. organism, tissue, anatomy) becomes a column, prefixed with annotation:
  • Study factors — Each factor (e.g. genotype, treatment) becomes a column, prefixed with factor:
  • Quantitative measurements — Each measurement type (e.g. RQN, concentration) becomes a numeric column, prefixed with measurement:
  • Provenance summary — Per-sample lineage depth, root sample, and process chain

JSON structure

{
  "metadata": {
    "investigation_id": 3,
    "investigation_title": "Mouse Transcriptomics",
    "export_date": "2026-03-14",
    "sample_count": 42,
    "feature_count": 8,
    "ontology_prefix_map": {
      "NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_",
      "UBERON": "http://purl.obolibrary.org/obo/UBERON_"
    }
  },
  "columns": [
    {"name": "sample_id", "type": "identifier"},
    {"name": "sample_label", "type": "string"},
    {"name": "annotation:organism", "type": "ontology", "ontology_curie": "NCBITaxon:10090"},
    {"name": "annotation:anatomy", "type": "ontology", "ontology_curie": "UBERON:0000955"},
    {"name": "factor:genotype", "type": "categorical"},
    {"name": "measurement:RQN", "type": "numeric", "unit": "UO:0000186"}
  ],
  "rows": [
    [1, "SAMPLE-001", "Mus musculus", "brain", "wild-type", 8.7],
    [2, "SAMPLE-002", "Mus musculus", "liver", "knockout", 7.2]
  ],
  "provenance_summary": {
    "1": {"depth": 3, "process_chain": ["extraction", "sample_prep", "sequencing"]},
    "2": {"depth": 2, "process_chain": ["extraction", "sequencing"]}
  }
}

CSV format

The CSV export uses the same column names as the JSON columns array. Null values are represented as empty strings. First row is the header.

sample_id,sample_label,annotation:organism,annotation:anatomy,factor:genotype,measurement:RQN
1,SAMPLE-001,Mus musculus,brain,wild-type,8.7
2,SAMPLE-002,Mus musculus,liver,knockout,7.2

Column types

TypeDescriptionExample
identifierUnique integer IDsample_id
stringFree textsample_label
ontologyValue from an ontology term, with CURIE in column metadataannotation:organism
categoricalDiscrete category value (from study factors)factor:genotype
numericNumeric measurement value, with optional unit CURIEmeasurement:RQN

ZIP bundle

The format=zip option wraps the CSV data file and a README.md describing the columns, types, and loading examples into a single ZIP archive. This is ideal for sharing self-documenting datasets.

# Download ZIP bundle
curl -H "X-API-Key: YOUR_KEY" -o ml_export.zip \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=zip"

Accessing via API

# JSON (default)
curl -H "X-API-Key: YOUR_KEY" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export"

# CSV
curl -H "X-API-Key: YOUR_KEY" \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=csv"

# ZIP (data + README)
curl -H "X-API-Key: YOUR_KEY" -o ml_export.zip \
  "https://librebiotech.org/api.php/v1/investigations/3/ml-export?format=zip"

Loading into Python

import requests, pandas as pd
from io import StringIO

API = "https://librebiotech.org/api.php/v1"
headers = {"X-API-Key": "YOUR_KEY"}

# Option 1: JSON → DataFrame
resp = requests.get(f"{API}/investigations/3/ml-export", headers=headers)
data = resp.json()
df = pd.DataFrame(data["rows"], columns=[c["name"] for c in data["columns"]])

# Option 2: CSV → DataFrame (simpler)
resp = requests.get(f"{API}/investigations/3/ml-export?format=csv", headers=headers)
df = pd.read_csv(StringIO(resp.text))

Exporting analysis runs

Each completed analysis run can be downloaded as an RO-Crate or as a ZIP archive containing all output files. From the analysis run page:

  • Download All — ZIP of all output files
  • View RO-Crate — Inspect the JSON-LD metadata describing the workflow, inputs, outputs, container images, and parameters
  • Individual files — Download specific outputs

RO-Crate metadata follows the RO-Crate 1.1 specification and includes:

  • CWL workflow definition reference
  • Container image (Apptainer/Singularity) used for execution
  • All input parameters as JSON
  • Output file manifests with checksums
  • W3C PROV-O activity records

PROV-O provenance export

The platform exports W3C PROV-O provenance as JSON-LD, available for investigations and individual samples:

# Investigation-level provenance
GET /api.php/v1/investigations/{id}/prov-o

# Sample-level provenance
GET /api.php/v1/samples/{id}/prov-o

The PROV-O export includes prov:Activity (processes), prov:Entity (samples, files), and prov:Agent (people) nodes with prov:wasGeneratedBy, prov:used, and prov:wasAssociatedWith relationships.

Data Cards

Every investigation can generate a data card — a self-describing document inspired by Hugging Face dataset cards. Cards combine YAML frontmatter (machine-readable) with a Markdown body (human-readable) in a single portable file.

What's included

  • Identity — title, DOI, license, visibility, status
  • Contacts — names, ORCID identifiers, affiliations, roles
  • Scale — study, sample, process, and publication counts
  • Biology — organisms and material types
  • Ontology — sources used and annotation coverage percentage
  • Provenance — max lineage depth and process categories
  • Scores — FAIR score (per-principle) and AI-Ready score
  • API endpoints — direct links to all available exports
  • Code examples — Python and curl snippets for quick access

Accessing data cards

MethodHow
Web UIClick the Data Card button on any investigation page. Use the dropdown for JSON format
REST API (Markdown)GET /api.php/v1/investigations/{id}/card
REST API (JSON)GET /api.php/v1/investigations/{id}/card?format=json

Data cards for public investigations are accessible without authentication. Cards are also bundled as DATA_CARD.md inside ISA-Tab ZIP exports.

Platform skill file (CLAUDE.md)

A platform-level skill file is available at /CLAUDE.md — a dynamically generated Markdown document describing the entire platform for AI coding assistants. It includes live statistics, data model overview, all API endpoints, authentication details, export formats, ontology coverage, and code examples. Also available as JSON via /api.php/v1/platform-card.

CSV exports

Tabular CSV exports are available for bulk data download:

  • Processes/?action=export_processes — all processes with dates, categories, procedures, and study links
  • Samples/?action=export_samples — all samples with organism, material type, descriptions, and parent process
  • Investigations/?action=export_investigations — all investigations with study counts and status

API access

The REST API provides programmatic access to all platform data. Use API keys for authenticated access. See the AI Readiness page for ML-specific guidance and code examples.

FAIR compliance

The platform implements FAIR principles throughout:

Findable

  • Rich, structured metadata using the ISA framework (Investigation → Study → Assay)
  • Ontology annotations from 2.9M+ OBO Foundry terms across 13 ontologies
  • Full-text search across 9 entity types with typeahead suggestions
  • DOI support for persistent identification
  • Public discovery pages for investigations and protocols
  • FAIR Score card on every investigation (0-100, per-principle breakdown)

Accessible

  • Open access to public content without authentication
  • REST API with API key authentication for programmatic access
  • Public API with CORS support for browser-based integrations
  • Share links with optional password protection and expiry for controlled access
  • Visibility controls: private, group, or public per investigation

Interoperable

  • ISA-JSON and ISA-Tab export (international standards for life science metadata)
  • ML-Ready JSON/CSV export for direct machine learning consumption
  • CWL workflow definitions (portable across workflow engines)
  • RO-Crate packaging (W3C PROV-based research objects)
  • PROV-O provenance export (W3C standard, JSON-LD)
  • Standard bioinformatics file formats (FASTQ, BAM, VCF, GFF3, BED)
  • Ontology-annotated metadata using OBI, EFO, UBERON, NCBITaxon, and more

Reusable

  • Complete sample provenance chains via process_input_samples lineage tracking
  • Protocol versioning with changelogs and community forking
  • Per-entity licensing metadata (CC-BY-4.0, CC0, etc.)
  • Containerised CWL workflows with pinned tool versions
  • AI-Ready Score card measuring ML consumability across 8 dimensions

FAIR Score

Every investigation displays a FAIR Score card in the sidebar, scoring 0-100 across four principles: Findable, Accessible, Interoperable, and Reusable. The score is calculated from:

  • Findable — title, description length, DOI, contacts with ORCID, studies, samples, publications
  • Accessible — visibility setting, submission/release dates, API availability
  • Interoperable — studies, processes, protocols, ontology annotations, study factors, files
  • Reusable — license, samples, processes, protocols, contacts, description, analysis runs, publications

Actionable suggestions are shown below the score bars to help you improve. See also the AI Readiness page for the companion AI-Ready Score.

No lock-in guarantee: Every piece of data you create on Libre Biotech can be exported in standard, open formats. We use no proprietary schemas. See the Data Sovereignty Statement for the full commitment.