ISA Semantic Mapping
How Libre Biotech’s data model corresponds to the ISA framework (Investigation → Study → Assay). Read this page if you are importing or exporting ISA-Tab / ISA-JSON, auditing metadata, or integrating with another ISA-aware tool.
At-a-glance summary
| Platform concept | ISA concept | Relationship |
|---|---|---|
investigations | Investigation | Clean 1-to-1 mapping |
studies | Study | Clean 1-to-1 mapping |
experiments | — | Platform-local grouping under a Study; no ISA counterpart, ignored at export |
processes | Process Node (row in a Study or Assay Table) | Grain matches; one row per execution |
procedure_versions | Protocol (row in STUDY PROTOCOLS) | The declaration; Protocol REF in the Tables resolves back to this |
assays (the table) | Not ISA “Assay” | Naming collision — see below. ISA Assays are derived at export time |
samples with material_type | Source Name / Sample Name / Extract Name / Labeled Extract Name | Stored as one table with an enum; exported into ISA’s separate columns |
measurements | Raw Data File / Derived Data File column value | Inline scalars in platform; emitted as scalar + optional file-ref in ISA |
study_factors | Study Factors / Factor Value[name] columns | Clean 1-to-1 mapping |
annotations (slot = characteristic) | Characteristics[name] columns | Polymorphic in the platform; fully ISA-compliant on output |
analysis_run | — | Orthogonal to ISA core; exported separately as PROV-O artefacts |
The assays naming collision
The most load-bearing mapping on this page: the platform table called assays is not what ISA calls an Assay.
Platform assays | ISA Assay | |
|---|---|---|
| Grain | One row per (process_id, sample_id, replicate_number) — a single measurement event | A declaration (measurement_type, technology_type) applied at the Study level |
| Cardinality in a Study | Thousands of rows (one per measurement event) | A handful (one per measurement-and-technology pair the Study uses) |
| Files emitted | — | One a_*.txt Assay Table file per declaration, with many rows inside |
The exporter reconciles this by deriving ISA Assays at export time: it takes the DISTINCT (measurement_type_term_id, technology_type_term_id) tuples across the procedure_versions that a Study’s processes reference, and emits one ISA Assay Table file per unique tuple. The platform’s assays rows then populate the rows inside each file.
Because ISA Assay classification is a property of the procedure’s design rather than of each individual measurement, it is declared on the procedure_versions row (measurement_type_term_id and technology_type_term_id) and projected onto exports; you never maintain ISA Assay declarations as a separate entity.
Processes, protocols, and Protocol REF
A LibreBiotech process row is the closest equivalent of an ISA Process Node — a single execution of a declared procedure. The procedure version that was followed (the ISA Protocol) is linked via processes.procedure_version_id.
| Platform | ISA | |
|---|---|---|
| Protocol declaration | procedure_versions row | Row in the Investigation’s STUDY PROTOCOLS section |
| Execution record | processes row | Row in a Study or Assay Table, with a Protocol REF column |
| Protocol REF value on output | — | {procedure.title} v{version_number} |
Protocol REF resolves to one unique label per procedure_version.id. Twenty runs of the same procedure version in a Study produce twenty execution rows with identical Protocol REF values, and the STUDY PROTOCOLS section of the Investigation declares that protocol once, not twenty times.
Samples and material types
ISA splits materials across four distinct columns in a Study Table — Source Name, Sample Name, Extract Name, Labeled Extract Name — with lineage implied by column adjacency. Libre Biotech stores them instead in a single samples table with a material_type enum (source_material, sample, extract, labeled_extract, and a few more).
The exporter walks the process_input_samples lineage graph at output time to emit ISA’s separate columns in the correct order. The storage-side conflation is purely an ergonomic choice; the exported ISA-Tab is structurally correct.
Measurements and data files
ISA-Tab typically expects measurement values as references to files (Raw Data File, Derived Data File columns). Libre Biotech uses a hybrid model: scalar values are stored inline in measurements.value (with a unit_term_id), alongside an optional source_file_id when the value is backed by an instrument output file.
The exporter emits both: the scalar goes into the Assay Table cell, and the file name (if any) goes into the Raw Data File column. This remains a valid ISA-Tab shape — downstream readers accustomed to file-only inputs may find it unusual, but the structure does not violate the specification.
Factors and characteristics
Factors map cleanly: each row in study_factors becomes a Factor Value[name] column on downstream Assay Tables, with CURIE-based term annotation on the factor type itself. No divergence.
Characteristics are stored polymorphically as rows in the annotations table (entity_type = 'sample', slot = 'characteristic') rather than as a dedicated table. On export, they emit as standard ISA Characteristics[name] columns, with term/value/unit slots preserving the full semantic annotation.
What is not an ISA concept
experiments— a platform-local grouping under a Study (for everyday operational organisation). It has no ISA counterpart and is ignored at export time; processes are walked directly under their owning Study.analysis_run— the platform’s compute-side record of a pipeline execution. Analysis runs sit outside the Investigation / Study / Assay hierarchy and are exported separately as PROV-O artefacts via the/prov-oendpoints. ISA-Tab exports do not include them.
What the exporter produces
When you export a Study, the ISA-Tab output has these properties:
-
One Assay Table file per
(measurement_type, technology_type)tuple used across the Study’s processes. Filename pattern:a_{study_slug}_{mt_curie}_{tt_curie}.txt(e.g.a_SushiTruthPilot_OBI_0002767_OBI_0000695.txt). -
Procedure versions with no ISA classification (sample-prep procedures, or measurement procedures whose classification has not been set) flow into a single
a_{study_slug}_undeclared.txtbucket file. The exporter emits a warning naming each contributing procedure version so the omission is visible, not silent. -
Protocol REF values on Assay Table rows resolve to
{procedure.title} v{version_number}, matching the STUDY PROTOCOLS declaration byte-for-byte. - Parameter Value columns are the union of parameter definitions across every procedure version contributing to a given Assay Table, keyed by parameter name. A row whose procedure version doesn’t declare a particular parameter leaves that cell blank — so every row in an Assay Table shares the same column schema.
- Factor Value columns emit correctly for every factor declared in the Study, with CURIE-based term annotations preserved.
-
Source / Sample / Extract columns in the Study Table are derived from sample lineage, not from the storage-side
material_typeenum directly.
Known gaps — fields not yet in ISA export
Some platform-side fields exist in the data model but are not yet surfaced in ISA-Tab or ISA-JSON exports. They remain visible in the web UI and REST API; only the canonical export is pending.
-
Sample submitter attribution (
samples.submitter_person_id) — the person who generated or submitted each sample. Captured on the platform side (see Samples → Submitter attribution) but not yet emitted as aComment[submitter]column in the Sample block of the ISA-Tab Study Table. Deferred to a follow-up release; if you need submitter info in downstream pipelines today, read it directly from the REST API (GET /api.php/v1/samples/{id}includes a nestedsubmitterobject).
See also
- Key Concepts — plain-language explanation of the ISA framework and FAIR principles.
- Data Export & FAIR — supported export formats (ISA-JSON, ISA-Tab, RO-Crate) and how to trigger them.
- Protocols → Schema reference — the
procedure_versionscolumns that drive ISA Assay classification. - Studies, Processes, Samples — platform-side documentation for each of the entities discussed above.