IsoSeq Clustering (Refine + Cluster)

Clustering: DBA combined (I-K2 + K7-50pM + K7)

Download ZIP JSON-LD

Type

CWL

Status

succeeded

Engine

cwltool

Duration

0.3 h

Source Data

Study

Strain-specific cortex gene expression and isoform usage

Pipeline

PacBio CCS (Subreads → HiFi)

IsoSeq Clustering (Refine + Cluster)

Run #85 (this run)

succeeded 3 sources

IsoSeq Annotation (Map + Collapse + SQANTI3)

Run #95

succeeded 1 sources

Functional Annotation (TransDecoder + Pfam + SwissProt)

Run #105

succeeded 1 sources

Combined From

#75 — PacBio CCS (Subreads → HiFi) succeeded
#77 — PacBio CCS (Subreads → HiFi) succeeded
#78 — PacBio CCS (Subreads → HiFi) succeeded

Workflow

IsoSeq Clustering (Refine + Cluster)

#cwl

Software Tools

Tool	Version	URL
cwltool	-	https://github.com/common-workflow-language/cwltool

Results Summary

Input CCS Reads

460,795

FLNC Reads

450,020

Mean FLNC Length

0 nt

HQ Isoforms

LQ Isoforms

Clustering Ratio

0.0

FLNC / HQ isoforms

Mean FL Support

8.7

reads per isoform

Total FL Reads

341,779

Output Files

clustered.cluster HPC 22.1 MB clustered.cluster_report.csv HPC 17.5 MB clustered.hq.bam HPC 33.3 MB clustered.hq.bam.pbi HPC 219.8 KB clustered.lq.bam HPC 8.2 KB clustered.lq.bam.pbi HPC 124 B demux_primers.lima.summary HPC 917 B flnc.bam HPC 897.5 MB flnc.bam.pbi HPC 4.6 MB flnc.filter_summary.report.json HPC 819 B job.yml HPC 270 B results_summary.json HPC 277 B

Provenance

Execution	Expression quantification summary
Completed	2026-03-04T13:11:26+00:00

RO-Crate 1.1 Workflow RO-Crate 1.0 FAIR

This analysis is packaged as a Research Object Crate with machine-readable provenance and FAIR metadata.

RO-Crate Metadata (JSON-LD)

Show/hide raw JSON-LD

{
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "ro-crate-metadata.json",
            "@type": "CreativeWork",
            "about": {
                "@id": "./"
            },
            "conformsTo": [
                {
                    "@id": "https://w3id.org/ro/crate/1.1"
                },
                {
                    "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"
                }
            ]
        },
        {
            "@id": "./",
            "@type": "Dataset",
            "name": "IsoSeq Clustering (Refine + Cluster) \u2014 Run #85",
            "description": "Generic IsoSeq3 clustering pipeline. Merges demultiplexed BAMs, runs primer removal + polyA filtering (refine), then clusters into HQ/LQ isoform consensus sequences. Compatible with Sequel I CCS (use_qvs=false) and Sequel II/IIe HiFi data.",
            "datePublished": "2026-03-04",
            "license": {
                "@id": "https://creativecommons.org/licenses/by/4.0/"
            },
            "mainEntity": {
                "@id": "isoseq_clustering.cwl"
            },
            "hasPart": [
                {
                    "@id": "isoseq_clustering.cwl"
                },
                {
                    "@id": "job.yml"
                },
                {
                    "@id": "clustered.hq.bam"
                },
                {
                    "@id": "clustered.lq.bam"
                },
                {
                    "@id": "clustered.cluster_report.csv"
                },
                {
                    "@id": "clustered.cluster"
                },
                {
                    "@id": "flnc.filter_summary.report.json"
                },
                {
                    "@id": "flnc.bam"
                },
                {
                    "@id": "clustered.hq.bam.pbi"
                },
                {
                    "@id": "demux_primers.lima.summary"
                },
                {
                    "@id": "flnc.bam.pbi"
                },
                {
                    "@id": "clustered.lq.bam.pbi"
                },
                {
                    "@id": "results_summary.json"
                },
                {
                    "@id": "summary_extractor.py"
                }
            ],
            "mentions": [
                {
                    "@id": "#execution"
                },
                {
                    "@id": "#summary-extraction"
                }
            ]
        },
        {
            "@id": "isoseq_clustering.cwl",
            "@type": [
                "File",
                "SoftwareSourceCode",
                "ComputationalWorkflow"
            ],
            "name": "IsoSeq Clustering (Refine + Cluster)",
            "description": "#cwl",
            "programmingLanguage": {
                "@id": "Generic IsoSeq3 clustering pipeline. Merges demultiplexed BAMs, runs primer removal + polyA filtering (refine), then clusters into HQ/LQ isoform consensus sequences. Compatible with Sequel I CCS (use_qvs=false) and Sequel II/IIe HiFi data."
            },
            "contentSize": "2.9 KB",
            "sha256": "3cd8cfcc8caaf0fb4a964a16fc02edffbddb048e0c46b8c633c8fd3abf7efa08"
        },
        {
            "@id": "#cwl",
            "@type": "ComputerLanguage",
            "name": "Common Workflow Language",
            "url": {
                "@id": "https://www.commonwl.org/"
            },
            "version": "1.2"
        },
        {
            "@id": "#cwltool",
            "@type": "SoftwareApplication",
            "name": "cwltool",
            "url": {
                "@id": "https://github.com/common-workflow-language/cwltool"
            }
        },
        {
            "@id": "job.yml",
            "@type": "File",
            "name": "job.yml",
            "description": "CWL job input parameters",
            "encodingFormat": "text/yaml",
            "contentSize": "270 B",
            "sha256": "76d7a21ed9690097f98d6ada53112b95e72e78d016a2c57c0fb458a731b581be"
        },
        {
            "@id": "clustered.hq.bam",
            "@type": "File",
            "name": "clustered.hq.bam",
            "encodingFormat": "application/octet-stream",
            "contentSize": "33.3 MB",
            "sha256": "7ee7cc20937e273e37b5aa44855bb7f23557c5b7a660ed79362d5e5cd50e0ac6"
        },
        {
            "@id": "clustered.lq.bam",
            "@type": "File",
            "name": "clustered.lq.bam",
            "encodingFormat": "application/octet-stream",
            "contentSize": "8.2 KB",
            "sha256": "56a50ab2669f55a34a50db5d74928ce43bc3a3abe12514ec985295832dd3523a"
        },
        {
            "@id": "clustered.cluster_report.csv",
            "@type": "File",
            "name": "clustered.cluster_report.csv",
            "encodingFormat": "text/csv",
            "contentSize": "17.5 MB",
            "sha256": "6f97108c6a9c4045a00cddba14dd0136124916af8ac73bccc702217dcad5be54"
        },
        {
            "@id": "clustered.cluster",
            "@type": "File",
            "name": "clustered.cluster",
            "encodingFormat": "application/octet-stream",
            "contentSize": "22.1 MB",
            "sha256": "ab7c5fbe9d97cbcbcbacfade8bd48767c4dda8273da291a8c158ece9d0e6f7a2"
        },
        {
            "@id": "flnc.filter_summary.report.json",
            "@type": "File",
            "name": "flnc.filter_summary.report.json",
            "encodingFormat": "application/json",
            "contentSize": "819 B",
            "sha256": "465b493d20a02ac18c6266579fceea255a97a7495260206a3cf9eb45b65eb26e"
        },
        {
            "@id": "flnc.bam",
            "@type": "File",
            "name": "flnc.bam",
            "encodingFormat": "application/octet-stream",
            "contentSize": "897.5 MB",
            "sha256": "e00a2da203cd1ac92403887f07a79c70ae89fa760db5ba47f0564e355d1073d9"
        },
        {
            "@id": "clustered.hq.bam.pbi",
            "@type": "File",
            "name": "clustered.hq.bam.pbi",
            "encodingFormat": "application/octet-stream",
            "contentSize": "219.8 KB",
            "sha256": "0b224f3460d06513700ae4c08137ea38eff470c575c660e9d7c3d8c29e11d99e"
        },
        {
            "@id": "demux_primers.lima.summary",
            "@type": "File",
            "name": "demux_primers.lima.summary",
            "encodingFormat": "application/octet-stream",
            "contentSize": "917 B",
            "sha256": "fec501dbf398fb0f0e13818694f214688160faabcd47785db3482942178feb2e"
        },
        {
            "@id": "flnc.bam.pbi",
            "@type": "File",
            "name": "flnc.bam.pbi",
            "encodingFormat": "application/octet-stream",
            "contentSize": "4.6 MB",
            "sha256": "796a2aa5e781f908d15a9313a7f31c716496050ea5819852f7e8b6bf381d8e7e"
        },
        {
            "@id": "clustered.lq.bam.pbi",
            "@type": "File",
            "name": "clustered.lq.bam.pbi",
            "encodingFormat": "application/octet-stream",
            "contentSize": "124 B",
            "sha256": "6cd816389dd231416287fd5fdb8e335fea361dea355dc4e4509c1440cc21ae8a"
        },
        {
            "@id": "#execution",
            "@type": "CreateAction",
            "name": "IsoSeq Clustering (Refine + Cluster) execution",
            "instrument": {
                "@id": "isoseq_clustering.cwl"
            },
            "startTime": "2026-03-04T22:55:21+00:00",
            "endTime": "2026-03-04T13:11:12+00:00",
            "object": [
                {
                    "@id": "job.yml"
                }
            ],
            "result": [
                {
                    "@id": "clustered.hq.bam"
                },
                {
                    "@id": "clustered.lq.bam"
                },
                {
                    "@id": "clustered.cluster_report.csv"
                },
                {
                    "@id": "clustered.cluster"
                },
                {
                    "@id": "flnc.filter_summary.report.json"
                },
                {
                    "@id": "flnc.bam"
                },
                {
                    "@id": "clustered.hq.bam.pbi"
                },
                {
                    "@id": "demux_primers.lima.summary"
                },
                {
                    "@id": "flnc.bam.pbi"
                },
                {
                    "@id": "clustered.lq.bam.pbi"
                }
            ]
        },
        {
            "@id": "results_summary.json",
            "@type": "File",
            "name": "results_summary.json",
            "description": "Derived summary statistics from pipeline outputs (CPM >= 1, uniquely mapped reads)",
            "encodingFormat": "application/json",
            "contentSize": "277 B",
            "sha256": "fbc23310264bc9dc097281e23d5b1f68ed2a385831d5580b77ecbfc1a8a2f29b"
        },
        {
            "@id": "summary_extractor.py",
            "@type": [
                "File",
                "SoftwareSourceCode"
            ],
            "name": "Summary extraction script",
            "description": "Python script that computed results_summary.json from pipeline outputs",
            "programmingLanguage": {
                "@id": "#python3"
            }
        },
        {
            "@id": "#python3",
            "@type": "ComputerLanguage",
            "name": "Python",
            "url": {
                "@id": "https://www.python.org/"
            },
            "version": "3"
        },
        {
            "@id": "#summary-extraction",
            "@type": "CreateAction",
            "name": "Expression quantification summary",
            "instrument": {
                "@id": "summary_extractor.py"
            },
            "endTime": "2026-03-04T13:11:26+00:00",
            "object": [
                {
                    "@id": "OUT.read_assignments.tsv.gz"
                },
                {
                    "@id": "OUT.gene_counts.tsv"
                },
                {
                    "@id": "OUT.transcript_counts.tsv"
                },
                {
                    "@id": "OUT.extended_annotation.gtf"
                },
                {
                    "@id": "OUT.transcript_models.gtf"
                }
            ],
            "result": [
                {
                    "@id": "results_summary.json"
                }
            ]
        }
    ]
}