simul_confusion/README.md

# Cohort simulations

This repository contains all code files related to our ROSALI/Resiuals Cohort study simulation project. In order to save disk space, data files are not stored on this server and are instead available on https://osf.io.

## File Structure

```
📦 simul_these
├─ catalogue.md           - List and description of scenarios
├─ 🗂️ Analysis               - ANALYSIS RESULTS
├─ 🗂️ Data                   - GENERATED DATASETS
│  ├─ 🗂️ DIF                 - DATASETS WITH DIF
│  └─ 🗂️ noDIF               - DATASETS WITHOUT DIF
├─ 🗂️ Modules                - R AND STATA MODULES
│  ├─ 🗂️ rosali_custom       - DATASETS WITH DIF
├─ 🗂️ RProject               - R SCRIPTS FOR VARIOUS TASKS
└─ 🗂️ Scripts                - R AND STATA SCRIPTS
   ├─ 🗂️ Analysis            - PCM ANALYSIS SCRIPTS
   └─ 🗂️ Data_generation     - SIMULATION SCENARIO SCRIPTS
      ├─ 🗂️ DIF
      └─ 🗂️ noDIF

```

## Naming conventions

### Initial Datasets

**XXX_N** - Scenario XXX / N individuals per group

### Analyzed Datasets


**noDIF / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF and confusion accounted for
**DIF / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF and confusion accounted for

**noDIF_prop / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF and confusion accounted for by propensity score
**DIF_prop / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for and confusion accounted for by propensity score
**ROSALI-DIF_prop / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by ROSALI and confusion accounted for by propensity score
**RESIDUALS_prop / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by Andrich & Hagquist's residuals method and confusion accounted for by propensity score


## Reproduction - TO BE MODIFIED

1. Run **/Scripts/Data_generation/NoDIF/scenarios_noDIF_baseline.do** to simulate no DIF data
2. Run files in 🗂️ **/Scripts/Data_generation/DIF/** to simulate DIF data
3. Run **/RProject/Scripts/Analysis/pcm_nodif.R** to analyze without accounting for DIF
4. Run files in 🗂️ **/Scripts/Analysis/DIF/** to analyze while accounting for DIF
5. Run **/Scripts/Analysis/DIF-ROSALI/pcm_dif_rosali.do** to analyze data after accounting for DIF as detected by ROSALI
6. Run **/RProject/Scripts/Analysis/resali_analysis.R** to perform residuals DIF detection and prepare data for PCM analysis.
7. Run **/Scripts/Analysis/DIF-RESIDUALS/pcm_dif_residus.do** to analyze data after accounting for DIF as detected by the residuals method
8. Run **/RProject/Scripts/Analysis/aggregation.R** to compile and visualize results

**OR**

1. [BASH ONLY] Run **prepare_file_structure.sh** (only on first run) and **autorun.sh** (by default, will take multiple weeks to run. Please modify to run in parrallel if necessary)