Changes in version 0.99.15 Documentation - Rewrote the README for clarity and appeal: a stronger "why karioCaS" value proposition that highlights both data-driven thresholds (optimal CS and optimal minimum reads), a corrected folder-architecture example, and a Quick Example that lists every function's full set of flags (defaults shown as commented lines). Changes in version 0.99.14 Changes - retrieve_selected_taxa() now writes the mosaic files into two subfolders of 004_final_mosaic/: .mpa files in mpa/ and .tsv files in tsv/. The downstream readers (taxa_resolution(), group_upset()) look in tsv/ with a fallback to the flat folder for older projects. - Renamed the package source files to match the new step order and function names (e.g. kariocas_000_import.R -> kariocas_001_import.R, kariocas_1000_reliable_mpa.R -> kariocas_004_retrieve_selected_taxa.R); test files renamed to match. Changes in version 0.99.13 Changes - Output directories reorganized and renamed to follow the analysis logic (harmonize -> quantify the optima -> decide -> describe -> interpret), with a consistent naming style: 001_imported_matrix, 002_taxa_retention, 003_reads_saturation, 004_final_mosaic, 005_taxa_intersections_across_CS, 006_relative_abundance_across_CS, 007_taxa_resolution, 008_taxa_intersections_across_samples. The user input folder 000_mpa_original is unchanged. Functions resolve cross-references (TSE, audits, mosaic) in the new locations with backward-compatible fallbacks to the previous folder names, so existing projects keep working. Changes in version 0.99.12 New Features - group_upset(): new analysis that, for each biological group (inferred from sample-name prefixes), draws a cross-sample UpSet plot at a chosen rank and writes a membership TSV (presence matrix + N_Samples and a Core/Shared/Unique Category). This separates the core taxa shared by all samples of a group from unique/rare taxa present in one or a few samples - the expected pattern for pathogens and false positives. Default source is the final mosaic from retrieve_selected_taxa(); pass CS= to compare at a single Confidence Score. Output: /008_taxa_intersections_across_samples/. Changes in version 0.99.11 Changes - reads_per_taxa() saturation cutoffs are now adaptive instead of a hard-coded grid. The x-axis always includes the low-end anchors 1, 2, 3, 4, 5, 7, 10 (even when the maximum read count is smaller) and adds log-spaced points (1, 2, 3, 5, 7 x 10^k) up to just past each sample/domain's actual maximum. This gives finer resolution where rare/background taxa are shed and adapts the upper range to the data. Changes in version 0.99.10 Changes - upset_kariocas() now takes a single tax_level argument (default "Species"), consistent with the other functions' rank flags, and draws one UpSet plot per sample and domain at that rank. Previously it produced a fixed set of three ranks (Species/Genus/Family) per sample/domain; the new behaviour reduces output clutter and lets you pick any rank. Changes in version 0.99.9 Changes - retrieve_selected_taxa() now always retains all taxonomic ranks in the mosaic (an MPA profile naturally has every level). The tax_level argument no longer filters the output; it now only selects which rank's optimization audit the "auto"/"secondary" thresholds are read from (SI_Audit_ / Reads_Audit_; NULL -> "Species"). This also makes taxa_resolution() on the final mosaic reliable without any special setup. Changes in version 0.99.8 Changes - Applied styler (4-space indentation, Bioconductor style) across the package for consistent formatting. - Added funding (fnd) roles to Authors@R: Fiocruz, IOC-Fiocruz, and CAPES. Changes in version 0.99.7 Changes - taxa_resolution() no longer draws a plot for every Confidence Score. It gains a CS argument: by default (CS = NULL) it analyses the final mosaic from retrieve_selected_taxa() (004_final_mosaic/) - one figure per sample - importing and parsing the mosaic .tsv directly. Passing a numeric CS (fraction or percent) analyses the imported data at that single score instead of looping over all of them. Output filenames now end in _Final_Mosaic or _CS. For a meaningful mosaic resolution, build the mosaic with retrieve_selected_taxa(tax_level = NULL) so parent ranks are kept. Changes in version 0.99.6 Changes - retrieve_selected_taxa() can now choose the minimum reads automatically. The reads_min_* arguments accept "auto" / "secondary" (in addition to a manual number), pulling the optimal minimum reads from the Reads_Audit written by reads_per_taxa(), looked up at each domain's resolved Confidence Score. The final mosaic therefore combines both data-driven thresholds - the optimal CS and the optimal min-reads - per domain, closing the workflow loop. Changes in version 0.99.5 Changes - reads_per_taxa() now reports an optimal minimum-reads threshold. Using the same elbow engine as the optimal CS (default "kneedle", on the log read axis), it finds the knee of each domain's saturation curve - the read count above which the stable taxa core persists and below which the rare/background tail is shed. The group overlay marks each domain's median optimal reads with a dashed line, and per-sample values are written to Reads_Audit_.tsv/.rds. Together with the optimal CS (Step 001) this gives two quantitative thresholds for excluding background false positives. - The "Rare_Taxa" view and the x_max_* arguments are removed. That linear zoom of the 1-10 read region duplicated the low-count end of the saturation curve, which already covers it on the log axis. reads_per_taxa() now produces a single saturation plot per CS and gains a method= argument. Changes in version 0.99.4 Changes - optimize_CS() has been merged into taxa_retention() and removed. The two functions produced a near-identical group overlay; taxa_retention() now computes the Stability Index in the same step, marks each domain's median optimal CS on its overlay, writes the SI_Audit_.tsv/.rds tables, and returns the audit data frame invisibly. It gains method= (default "kneedle") and manual_toll= arguments. - The SI audit now lives in 002_taxa_retention/. retrieve_selected_taxa() reads it from there, with a backward-compatible fallback to the old 006_optimize_CS/ location for existing projects. Changes in version 0.99.3 Changes - optimize_CS() now defaults to a new "kneedle" method (parameter-free elbow detection) and adds a "postcliff" method. The previous default, "dynamic", picked the first CS whose step-wise taxa loss fell within tail noise; on domains whose retention curve starts with a plateau (Archaea, Eukaryota, Viruses in typical data) this stopped before the main drop, giving an inconsistent optimum (e.g. CS10 for those domains but CS60 for Bacteria). "kneedle" locates the inflection between the steep noise-removal phase and the stable signal floor consistently across domains. "dynamic", "segmented" and "manual" remain available via method =. - Under "kneedle", the Secondary Stability Index is the more conservative post-cliff floor, so retrieve_selected_taxa(CS_* = "secondary") yields a stricter threshold than the Primary. Changes in version 0.99.2 New Features - Group overlay plots are now the default output of taxa_retention() (001), reads_per_taxa() (003) and optimize_CS() (006). Instead of one set of PDFs per sample, each function draws a single figure per biological group in which every sample is a faint line and the group mean (+/-SD band) is highlighted, faceted by Domain. optimize_CS() additionally marks each domain's median Primary Stability Index as a dashed reference line. This drastically reduces the number of generated PDFs and makes group-level trends obvious at a glance. - Groups are inferred from sample names by stripping trailing digits (e.g. SAMPLE33, SAMPLE34 -> group SAMPLE; CONTROL01, TREATED01 -> CONTROL, TREATED). - New detail_samples argument on those three functions restores detailed per-sample panels on demand: NULL (default) draws only the group overlay, "all" renders every sample, and a comma-separated string such as "SAMPLE33, SAMPLE45" renders just those. Detailed PDFs are written to a per_sample/ subfolder. Bug Fixes - Confidence Score = 1.0 is now handled correctly. Filenames using the natural CS10 (or CS100, or the decimal CS1.0) for maximum stringency are now parsed as 1.0 (100%) instead of being silently misread as 0.1. CS values are stored internally on a single canonical integer-percent scale (0–100). - retrieve_selected_taxa() and heatmaps_karioCaS() now accept a CS supplied either as a Kraken fraction (e.g. 1.0) or a percentage (e.g. 40); a 1.0 request previously produced an empty result. - Silenced spurious max()/-Inf warnings emitted when a domain has no taxa at a given Confidence Score (common at high stringency). - Removed a leftover hard-coded genus name from the taxa_resolution() audit log. - import_karioCaS() now aggregates duplicate taxonomy rows explicitly (values_fn = sum), avoiding the cryptic pivot_wider() "Can't convert fill to " error on repeated taxonomy entries. Changes in version 0.99.1 New Features - Initial Bioconductor submission. - import_karioCaS(): Imports Kraken2 MPA-style reports into a TreeSummarizedExperiment object. - taxa_retention(): Evaluates taxonomic retention across Confidence Scores. - upset_kariocas(): Generates UpSet plots to identify persistent vs transient taxa. - reads_per_taxa(): Saturation analysis of reads per taxon. - taxa_resolution(): Parent-to-child taxonomic resolution analysis. - heatmaps_karioCaS(): Relative abundance heatmaps with extinction patterns. - optimize_CS(): Multi-strategy Stability Index engine to find optimal Confidence Scores. - retrieve_selected_taxa(): Generates the final high-confidence biological mosaic.