|
|
|
|
# Cohort simulations
|
|
|
|
|
|
|
|
|
|
This repository contains all code files related to our ROSALI/Resiuals Cohort study simulation project. In order to save disk space, data files are not stored on this server and are instead available on https://osf.io.
|
|
|
|
|
|
|
|
|
|
## File Structure
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
📦 simul_these
|
|
|
|
|
├─ catalogue.md - List and description of scenarios
|
|
|
|
|
├─ 🗂️ Analysis - ANALYSIS RESULTS
|
|
|
|
|
├─ 🗂️ Data - GENERATED DATASETS
|
|
|
|
|
│ ├─ 🗂️ DIF - DATASETS WITH DIF
|
|
|
|
|
│ └─ 🗂️ noDIF - DATASETS WITHOUT DIF
|
|
|
|
|
├─ 🗂️ Modules - R AND STATA MODULES
|
|
|
|
|
│ ├─ 🗂️ rosali_custom - DATASETS WITH DIF
|
|
|
|
|
├─ 🗂️ RProject - R SCRIPTS FOR VARIOUS TASKS
|
|
|
|
|
└─ 🗂️ Scripts - R AND STATA SCRIPTS
|
|
|
|
|
├─ 🗂️ Analysis - PCM ANALYSIS SCRIPTS
|
|
|
|
|
└─ 🗂️ Data_generation - SIMULATION SCENARIO SCRIPTS
|
|
|
|
|
├─ 🗂️ DIF
|
|
|
|
|
└─ 🗂️ noDIF
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Naming conventions
|
|
|
|
|
|
|
|
|
|
### Initial Datasets
|
|
|
|
|
|
|
|
|
|
**XXX_N** - Scenario XXX / N individuals per group
|
|
|
|
|
|
|
|
|
|
### Analyzed Datasets
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**noDIF / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF and confusion accounted for
|
|
|
|
|
**DIF / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF and confusion accounted for
|
|
|
|
|
|
|
|
|
|
**noDIF_prop / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF and confusion accounted for by propensity score
|
|
|
|
|
**DIF_prop / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for and confusion accounted for by propensity score
|
|
|
|
|
**ROSALI-DIF_prop / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by ROSALI and confusion accounted for by propensity score
|
|
|
|
|
**RESIDUALS_prop / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by Andrich & Hagquist's residuals method and confusion accounted for by propensity score
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Reproduction - TO BE MODIFIED
|
|
|
|
|
|
|
|
|
|
1. Run **/Scripts/Data_generation/NoDIF/scenarios_noDIF_baseline.do** to simulate no DIF data
|
|
|
|
|
2. Run files in 🗂️ **/Scripts/Data_generation/DIF/** to simulate DIF data
|
|
|
|
|
3. Run **/RProject/Scripts/Analysis/pcm_nodif.R** to analyze without accounting for DIF
|
|
|
|
|
4. Run files in 🗂️ **/Scripts/Analysis/DIF/** to analyze while accounting for DIF
|
|
|
|
|
5. Run **/Scripts/Analysis/DIF-ROSALI/pcm_dif_rosali.do** to analyze data after accounting for DIF as detected by ROSALI
|
|
|
|
|
6. Run **/RProject/Scripts/Analysis/resali_analysis.R** to perform residuals DIF detection and prepare data for PCM analysis.
|
|
|
|
|
7. Run **/Scripts/Analysis/DIF-RESIDUALS/pcm_dif_residus.do** to analyze data after accounting for DIF as detected by the residuals method
|
|
|
|
|
8. Run **/RProject/Scripts/Analysis/aggregation.R** to compile and visualize results
|
|
|
|
|
|
|
|
|
|
**OR**
|
|
|
|
|
|
|
|
|
|
1. [BASH ONLY] Run **prepare_file_structure.sh** (only on first run) and **autorun.sh** (by default, will take multiple weeks to run. Please modify to run in parrallel if necessary)
|