IsoSeq Clustering (Refine + Cluster)

Clustering: C57 combined (I-K1 + K1-50pM)

Download ZIP JSON-LD

Type

CWL

Status

succeeded

Engine

cwltool

Duration

0.2 h

Source Data

Study

Strain-specific cortex gene expression and isoform usage

Pipeline

PacBio CCS (Subreads → HiFi)

IsoSeq Clustering (Refine + Cluster)

Run #84 (this run)

succeeded 2 sources

IsoSeq Annotation (Map + Collapse + SQANTI3)

Run #94

succeeded 1 sources

Functional Annotation (TransDecoder + Pfam + SwissProt)

Run #104

succeeded 1 sources

Combined From

#74 — PacBio CCS (Subreads → HiFi) succeeded
#76 — PacBio CCS (Subreads → HiFi) succeeded

Workflow

IsoSeq Clustering (Refine + Cluster)

#cwl

Software Tools

Tool	Version	URL
cwltool	-	https://github.com/common-workflow-language/cwltool

Results Summary

Input CCS Reads

254,780

FLNC Reads

249,064

Mean FLNC Length

0 nt

HQ Isoforms

LQ Isoforms

Clustering Ratio

0.0

FLNC / HQ isoforms

Mean FL Support

7.6

reads per isoform

Total FL Reads

180,022

Output Files

clustered.cluster HPC 11.7 MB clustered.cluster_report.csv HPC 9.2 MB clustered.hq.bam HPC 20.4 MB clustered.hq.bam.pbi HPC 133.3 KB clustered.lq.bam HPC 2.9 KB clustered.lq.bam.pbi HPC 86 B demux_primers.lima.summary HPC 916 B flnc.bam HPC 579.6 MB flnc.bam.pbi HPC 2.6 MB flnc.filter_summary.report.json HPC 819 B job.yml HPC 270 B results_summary.json HPC 277 B

Provenance

Execution	Expression quantification summary
Completed	2026-03-04T13:07:15+00:00

RO-Crate 1.1 Workflow RO-Crate 1.0 FAIR

This analysis is packaged as a Research Object Crate with machine-readable provenance and FAIR metadata.

RO-Crate Metadata (JSON-LD)

Show/hide raw JSON-LD

{
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "ro-crate-metadata.json",
            "@type": "CreativeWork",
            "about": {
                "@id": "./"
            },
            "conformsTo": [
                {
                    "@id": "https://w3id.org/ro/crate/1.1"
                },
                {
                    "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"
                }
            ]
        },
        {
            "@id": "./",
            "@type": "Dataset",
            "name": "IsoSeq Clustering (Refine + Cluster) \u2014 Run #84",
            "description": "Generic IsoSeq3 clustering pipeline. Merges demultiplexed BAMs, runs primer removal + polyA filtering (refine), then clusters into HQ/LQ isoform consensus sequences. Compatible with Sequel I CCS (use_qvs=false) and Sequel II/IIe HiFi data.",
            "datePublished": "2026-03-04",
            "license": {
                "@id": "https://creativecommons.org/licenses/by/4.0/"
            },
            "mainEntity": {
                "@id": "isoseq_clustering.cwl"
            },
            "hasPart": [
                {
                    "@id": "isoseq_clustering.cwl"
                },
                {
                    "@id": "job.yml"
                },
                {
                    "@id": "clustered.hq.bam"
                },
                {
                    "@id": "clustered.lq.bam"
                },
                {
                    "@id": "clustered.cluster_report.csv"
                },
                {
                    "@id": "clustered.cluster"
                },
                {
                    "@id": "flnc.filter_summary.report.json"
                },
                {
                    "@id": "flnc.bam"
                },
                {
                    "@id": "clustered.hq.bam.pbi"
                },
                {
                    "@id": "demux_primers.lima.summary"
                },
                {
                    "@id": "flnc.bam.pbi"
                },
                {
                    "@id": "clustered.lq.bam.pbi"
                },
                {
                    "@id": "results_summary.json"
                },
                {
                    "@id": "summary_extractor.py"
                }
            ],
            "mentions": [
                {
                    "@id": "#execution"
                },
                {
                    "@id": "#summary-extraction"
                }
            ]
        },
        {
            "@id": "isoseq_clustering.cwl",
            "@type": [
                "File",
                "SoftwareSourceCode",
                "ComputationalWorkflow"
            ],
            "name": "IsoSeq Clustering (Refine + Cluster)",
            "description": "#cwl",
            "programmingLanguage": {
                "@id": "Generic IsoSeq3 clustering pipeline. Merges demultiplexed BAMs, runs primer removal + polyA filtering (refine), then clusters into HQ/LQ isoform consensus sequences. Compatible with Sequel I CCS (use_qvs=false) and Sequel II/IIe HiFi data."
            },
            "contentSize": "2.9 KB",
            "sha256": "3cd8cfcc8caaf0fb4a964a16fc02edffbddb048e0c46b8c633c8fd3abf7efa08"
        },
        {
            "@id": "#cwl",
            "@type": "ComputerLanguage",
            "name": "Common Workflow Language",
            "url": {
                "@id": "https://www.commonwl.org/"
            },
            "version": "1.2"
        },
        {
            "@id": "#cwltool",
            "@type": "SoftwareApplication",
            "name": "cwltool",
            "url": {
                "@id": "https://github.com/common-workflow-language/cwltool"
            }
        },
        {
            "@id": "job.yml",
            "@type": "File",
            "name": "job.yml",
            "description": "CWL job input parameters",
            "encodingFormat": "text/yaml",
            "contentSize": "270 B",
            "sha256": "fb7d76309a26f25c4a336c6ef4173992c70ce163f172c6c1185d35c0d87bf3f5"
        },
        {
            "@id": "clustered.hq.bam",
            "@type": "File",
            "name": "clustered.hq.bam",
            "encodingFormat": "application/octet-stream",
            "contentSize": "20.4 MB",
            "sha256": "2e20d860a7346db96360e8976b920205e339514de41d7d5fa11e7c140d673402"
        },
        {
            "@id": "clustered.lq.bam",
            "@type": "File",
            "name": "clustered.lq.bam",
            "encodingFormat": "application/octet-stream",
            "contentSize": "2.9 KB",
            "sha256": "030ef91820616d84e1cfda68fc46216f9043cdbe639057d01510e2a80d0428d0"
        },
        {
            "@id": "clustered.cluster_report.csv",
            "@type": "File",
            "name": "clustered.cluster_report.csv",
            "encodingFormat": "text/csv",
            "contentSize": "9.2 MB",
            "sha256": "98b2ea2cbdf7fd47094d284a937c2ec2885306c9a103951d3dd2b5cd02ee7bd6"
        },
        {
            "@id": "clustered.cluster",
            "@type": "File",
            "name": "clustered.cluster",
            "encodingFormat": "application/octet-stream",
            "contentSize": "11.7 MB",
            "sha256": "1640947064426c921892d98c6fabb765645a0b17b2d94e90f0d59c5973a32886"
        },
        {
            "@id": "flnc.filter_summary.report.json",
            "@type": "File",
            "name": "flnc.filter_summary.report.json",
            "encodingFormat": "application/json",
            "contentSize": "819 B",
            "sha256": "3771897fd1b6a17f06ff94ea3a57460e2efef1ee3a315bb23c361098c2a1fcfa"
        },
        {
            "@id": "flnc.bam",
            "@type": "File",
            "name": "flnc.bam",
            "encodingFormat": "application/octet-stream",
            "contentSize": "579.6 MB",
            "sha256": "6355e45f0d14c72ee359c2472fd7d82810fccdbf1b8dca1bece264cf0612868f"
        },
        {
            "@id": "clustered.hq.bam.pbi",
            "@type": "File",
            "name": "clustered.hq.bam.pbi",
            "encodingFormat": "application/octet-stream",
            "contentSize": "133.3 KB",
            "sha256": "bec323d159c55f6f6a237c290632961f4705661679b76059d5a1f390f86ca994"
        },
        {
            "@id": "demux_primers.lima.summary",
            "@type": "File",
            "name": "demux_primers.lima.summary",
            "encodingFormat": "application/octet-stream",
            "contentSize": "916 B",
            "sha256": "ef490092846b14b7dd5c228eb9163285bdb83595c08459cfc2cad25504649d73"
        },
        {
            "@id": "flnc.bam.pbi",
            "@type": "File",
            "name": "flnc.bam.pbi",
            "encodingFormat": "application/octet-stream",
            "contentSize": "2.6 MB",
            "sha256": "3ad7f2b3420c15ae17215de36760fe77e3754faa4cc7cc46d2fc60106a59d824"
        },
        {
            "@id": "clustered.lq.bam.pbi",
            "@type": "File",
            "name": "clustered.lq.bam.pbi",
            "encodingFormat": "application/octet-stream",
            "contentSize": "86 B",
            "sha256": "3603584e2f4b63b2da68d4c723ee2680a17025bf5668069e2d66fe5697b69cf0"
        },
        {
            "@id": "#execution",
            "@type": "CreateAction",
            "name": "IsoSeq Clustering (Refine + Cluster) execution",
            "instrument": {
                "@id": "isoseq_clustering.cwl"
            },
            "startTime": "2026-03-04T22:55:11+00:00",
            "endTime": "2026-03-04T13:07:00+00:00",
            "object": [
                {
                    "@id": "job.yml"
                }
            ],
            "result": [
                {
                    "@id": "clustered.hq.bam"
                },
                {
                    "@id": "clustered.lq.bam"
                },
                {
                    "@id": "clustered.cluster_report.csv"
                },
                {
                    "@id": "clustered.cluster"
                },
                {
                    "@id": "flnc.filter_summary.report.json"
                },
                {
                    "@id": "flnc.bam"
                },
                {
                    "@id": "clustered.hq.bam.pbi"
                },
                {
                    "@id": "demux_primers.lima.summary"
                },
                {
                    "@id": "flnc.bam.pbi"
                },
                {
                    "@id": "clustered.lq.bam.pbi"
                }
            ]
        },
        {
            "@id": "results_summary.json",
            "@type": "File",
            "name": "results_summary.json",
            "description": "Derived summary statistics from pipeline outputs (CPM >= 1, uniquely mapped reads)",
            "encodingFormat": "application/json",
            "contentSize": "277 B",
            "sha256": "8477917e7ef4c3a03e80d7085ac75daff4829587b00748edbf6eebb5b280f14e"
        },
        {
            "@id": "summary_extractor.py",
            "@type": [
                "File",
                "SoftwareSourceCode"
            ],
            "name": "Summary extraction script",
            "description": "Python script that computed results_summary.json from pipeline outputs",
            "programmingLanguage": {
                "@id": "#python3"
            }
        },
        {
            "@id": "#python3",
            "@type": "ComputerLanguage",
            "name": "Python",
            "url": {
                "@id": "https://www.python.org/"
            },
            "version": "3"
        },
        {
            "@id": "#summary-extraction",
            "@type": "CreateAction",
            "name": "Expression quantification summary",
            "instrument": {
                "@id": "summary_extractor.py"
            },
            "endTime": "2026-03-04T13:07:15+00:00",
            "object": [
                {
                    "@id": "OUT.read_assignments.tsv.gz"
                },
                {
                    "@id": "OUT.gene_counts.tsv"
                },
                {
                    "@id": "OUT.transcript_counts.tsv"
                },
                {
                    "@id": "OUT.extended_annotation.gtf"
                },
                {
                    "@id": "OUT.transcript_models.gtf"
                }
            ],
            "result": [
                {
                    "@id": "results_summary.json"
                }
            ]
        }
    ]
}