Libre Biotech

ISA Semantic Mapping

How Libre Biotech’s data model corresponds to the ISA framework (Investigation → Study → Assay). Read this page if you are importing or exporting ISA-Tab / ISA-JSON, auditing metadata, or integrating with another ISA-aware tool.

The ISA framework is a community standard for describing life-sciences experiments. Its three vocabulary words — Investigation, Study, Assay — carry precise meanings that don’t always line up 1-to-1 with everyday usage or with platform terminology. This page is the Rosetta stone: for every platform concept that overlaps an ISA one, it tells you which concept means what, where any divergence lies, and how the exporter reconciles them on output.

At-a-glance summary

Platform conceptISA conceptRelationship
investigationsInvestigationClean 1-to-1 mapping
studiesStudyClean 1-to-1 mapping
experimentsPlatform-local grouping under a Study; no ISA counterpart, ignored at export
processesProcess Node (row in a Study or Assay Table)Grain matches; one row per execution
procedure_versionsProtocol (row in STUDY PROTOCOLS)The declaration; Protocol REF in the Tables resolves back to this
assays (the table)Not ISA “Assay”Naming collision — see below. ISA Assays are derived at export time
samples with material_typeSource Name / Sample Name / Extract Name / Labeled Extract NameStored as one table with an enum; exported into ISA’s separate columns
measurementsRaw Data File / Derived Data File column valueInline scalars in platform; emitted as scalar + optional file-ref in ISA
study_factorsStudy Factors / Factor Value[name] columnsClean 1-to-1 mapping
annotations (slot = characteristic)Characteristics[name] columnsPolymorphic in the platform; fully ISA-compliant on output
analysis_runOrthogonal to ISA core; exported separately as PROV-O artefacts

The assays naming collision

The most load-bearing mapping on this page: the platform table called assays is not what ISA calls an Assay.

Platform assaysISA Assay
GrainOne row per (process_id, sample_id, replicate_number) — a single measurement eventA declaration (measurement_type, technology_type) applied at the Study level
Cardinality in a StudyThousands of rows (one per measurement event)A handful (one per measurement-and-technology pair the Study uses)
Files emittedOne a_*.txt Assay Table file per declaration, with many rows inside

The exporter reconciles this by deriving ISA Assays at export time: it takes the DISTINCT (measurement_type_term_id, technology_type_term_id) tuples across the procedure_versions that a Study’s processes reference, and emits one ISA Assay Table file per unique tuple. The platform’s assays rows then populate the rows inside each file.

Because ISA Assay classification is a property of the procedure’s design rather than of each individual measurement, it is declared on the procedure_versions row (measurement_type_term_id and technology_type_term_id) and projected onto exports; you never maintain ISA Assay declarations as a separate entity.

Processes, protocols, and Protocol REF

A LibreBiotech process row is the closest equivalent of an ISA Process Node — a single execution of a declared procedure. The procedure version that was followed (the ISA Protocol) is linked via processes.procedure_version_id.

PlatformISA
Protocol declarationprocedure_versions rowRow in the Investigation’s STUDY PROTOCOLS section
Execution recordprocesses rowRow in a Study or Assay Table, with a Protocol REF column
Protocol REF value on output{procedure.title} v{version_number}

Protocol REF resolves to one unique label per procedure_version.id. Twenty runs of the same procedure version in a Study produce twenty execution rows with identical Protocol REF values, and the STUDY PROTOCOLS section of the Investigation declares that protocol once, not twenty times.

Samples and material types

ISA splits materials across four distinct columns in a Study Table — Source Name, Sample Name, Extract Name, Labeled Extract Name — with lineage implied by column adjacency. Libre Biotech stores them instead in a single samples table with a material_type enum (source_material, sample, extract, labeled_extract, and a few more).

The exporter walks the process_input_samples lineage graph at output time to emit ISA’s separate columns in the correct order. The storage-side conflation is purely an ergonomic choice; the exported ISA-Tab is structurally correct.

Measurements and data files

ISA-Tab typically expects measurement values as references to files (Raw Data File, Derived Data File columns). Libre Biotech uses a hybrid model: scalar values are stored inline in measurements.value (with a unit_term_id), alongside an optional source_file_id when the value is backed by an instrument output file.

The exporter emits both: the scalar goes into the Assay Table cell, and the file name (if any) goes into the Raw Data File column. This remains a valid ISA-Tab shape — downstream readers accustomed to file-only inputs may find it unusual, but the structure does not violate the specification.

Factors and characteristics

Factors map cleanly: each row in study_factors becomes a Factor Value[name] column on downstream Assay Tables, with CURIE-based term annotation on the factor type itself. No divergence.

Characteristics are stored polymorphically as rows in the annotations table (entity_type = 'sample', slot = 'characteristic') rather than as a dedicated table. On export, they emit as standard ISA Characteristics[name] columns, with term/value/unit slots preserving the full semantic annotation.

What is not an ISA concept

  • experiments — a platform-local grouping under a Study (for everyday operational organisation). It has no ISA counterpart and is ignored at export time; processes are walked directly under their owning Study.
  • analysis_run — the platform’s compute-side record of a pipeline execution. Analysis runs sit outside the Investigation / Study / Assay hierarchy and are exported separately as PROV-O artefacts via the /prov-o endpoints. ISA-Tab exports do not include them.

What the exporter produces

When you export a Study, the ISA-Tab output has these properties:

  1. One Assay Table file per (measurement_type, technology_type) tuple used across the Study’s processes. Filename pattern: a_{study_slug}_{mt_curie}_{tt_curie}.txt (e.g. a_SushiTruthPilot_OBI_0002767_OBI_0000695.txt).
  2. Procedure versions with no ISA classification (sample-prep procedures, or measurement procedures whose classification has not been set) flow into a single a_{study_slug}_undeclared.txt bucket file. The exporter emits a warning naming each contributing procedure version so the omission is visible, not silent.
  3. Protocol REF values on Assay Table rows resolve to {procedure.title} v{version_number}, matching the STUDY PROTOCOLS declaration byte-for-byte.
  4. Parameter Value columns are the union of parameter definitions across every procedure version contributing to a given Assay Table, keyed by parameter name. A row whose procedure version doesn’t declare a particular parameter leaves that cell blank — so every row in an Assay Table shares the same column schema.
  5. Factor Value columns emit correctly for every factor declared in the Study, with CURIE-based term annotations preserved.
  6. Source / Sample / Extract columns in the Study Table are derived from sample lineage, not from the storage-side material_type enum directly.

Known gaps — fields not yet in ISA export

Some platform-side fields exist in the data model but are not yet surfaced in ISA-Tab or ISA-JSON exports. They remain visible in the web UI and REST API; only the canonical export is pending.

  • Sample submitter attribution (samples.submitter_person_id) — the person who generated or submitted each sample. Captured on the platform side (see Samples → Submitter attribution) but not yet emitted as a Comment[submitter] column in the Sample block of the ISA-Tab Study Table. Deferred to a follow-up release; if you need submitter info in downstream pipelines today, read it directly from the REST API (GET /api.php/v1/samples/{id} includes a nested submitter object).

See also