# Single-cell analysis (v0.0.1) ## Overview 🎯 This tool identifies **differentially expressed genes (DEGs)** in single-cell RNA-seq data by comparing experimental conditions. It uses robust **pseudobulk analysis** and **Wilcoxon test** methods to accurately detect DEGs across different cell types, ensuring reliable biological insights. ![](new_images/single_cell/overview.png) --- ## Inputs 📥 As input for this method a metadata file and a folder with single-cell count data should be provided. A cell-typing model is mandatory if cell types are not provided in the metadata file. **All files should be uploaded as generic files.** * **Metadata File (CSV)** Sample metadata should be provided in the format of a CSV file. The **obligatory columns** include: * **Sample identifiers**. * An experimental group column named `group` or `condition` with the values **'control'** or **'experiment'**. * An optional `cell_type` column for annotations. If missing, the tool will perform automatic cell typing. ![](new_images/single_cell/metadata_example.png) * **Folder with Single-cell RNA Sequencing Data** The tool accepts two formats: * **10X Genomics MTX Format**: Requires three files per sample, sharing a common prefix. > `barcodes.tsv.gz` > `features.tsv.gz` > `matrix.mtx.gz` * **CSV Format**: A matrix with genes as rows and cells as columns (or vice-versa). The first row/column must contain identifiers. * **Thresholds for up/down regulated genes** A floating pointer number which is used for filtering up and down regulated genes * **Model for Cell-Typing** This is a required input **only if** your metadata file lacks a `cell_type` column. Select a pre-trained **CellTypist** model to automatically annotate cell types. --- ## Workflow ⚙️ The tool follows a sequential workflow from data loading to analysis. * **Data Loading**: Reads the input single-cell data (MTX or CSV) and metadata file. * **Data Preparation**: A quality control pipeline filters low-quality genes and cells, calculates QC metrics (mitochondrial/ribosomal percentages), performs doublet detection with **Scrublet**, normalizes counts, and log-transforms the data. * **Cell Typing (Optional)**: If the `cell_type` column is missing, the tool uses the selected **CellTypist** model to predict cell types. * **Pseudobulk Aggregation**: Gene counts are aggregated for all cells belonging to the same sample and cell type, creating **pseudobulk profiles**. * **Differential Expression**: For each cell type, the tool performs DEG analysis. * **Primary Method**: Uses **PyDESeq2** on pseudobulk profiles to compare 'experiment' vs 'control'. * **Fallback Method**: If `PyDESeq2` fails (e.g., due to low sample counts), it defaults to **Scanpy's `rank_genes_groups`** (Wilcoxon test) on the original single-cell data. * **Output Generation**: The final DEG results are saved into separate CSV files for each cell type. The tool also generates a filtered csv file and a **report.csv** file which contains found cell types, number of cells corresponding to each type, and the type of analysis which was used. --- ## Outputs 📤 The tool returns a single **folder** containing the analysis results. Inside the zip file there is one **CSV file per cell type**. The filenames indicate the cell type and the analysis method used (e.g., `T_cells_pseudobulk.csv` or `B_cells_rank_groups.csv`). The tool also generates a filtered csv file and a **report.csv** file which contains found cell types, number of cells corresponding to each type, and the type of analysis which was used. ![](new_images/single_cell/output_example.png) Each CSV file includes: * Gene names * Log2 fold changes * P-values * Adjusted p-values