You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

62 lines
3.6 KiB
Markdown

7 months ago
# Cohort simulations
This repository contains all code files related to our ROSALI/Resiuals Cohort study simulation project. In order to save disk space, data files are not stored on this server and are instead available on https://osf.io.
## File Structure
```
📦 simul_these
├─ catalogue.md - List and description of scenarios
├─ 🗂️ Analysis - ANALYSIS RESULTS
├─ 🗂️ Data - GENERATED DATASETS
│  ├─ 🗂️ DIF - DATASETS WITH DIF
│  └─ 🗂️ noDIF - DATASETS WITHOUT DIF
├─ 🗂️ Modules - R AND STATA MODULES
│  ├─ 🗂️ rosali_custom - DATASETS WITH DIF
├─ 🗂️ RProject - R SCRIPTS FOR VARIOUS TASKS
└─ 🗂️ Scripts - R AND STATA SCRIPTS
├─ 🗂️ Analysis - PCM ANALYSIS SCRIPTS
└─ 🗂 Data_generation - SIMULATION SCENARIO SCRIPTS
   ├─ 🗂️ DIF
   └─ 🗂️ noDIF
```
## Naming conventions
### Initial Datasets
**XXX_N** - Scenario XXX / N individuals per group
### Analyzed Datasets
**noDIF_noConf / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF nor confusion
**DIF_noConf / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for but not confusion
**ROSALI-DIF_noConf / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by ROSALI but not accounting for confusion
**RESIDUALS_noConf / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by Andrich & Hagquist's residuals method but not accounting for confusion
**noDIF / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF and confusion accounted for
**DIF / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF and confusion accounted for
**noDIF_prop / XXX_N.csv** - Analysis for scenario XXX_N by PCM __without__ accounting for DIF and confusion accounted for by propensity score
**DIF_prop / XXX_N.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for and confusion accounted for by propensity score
**ROSALI-DIF_prop / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by ROSALI and confusion accounted for by propensity score
**RESIDUALS_prop / XXX_N_original.xls** - Analysis for scenario XXX_N by PCM __with__ DIF accounted for after detection by Andrich & Hagquist's residuals method and confusion accounted for by propensity score
## Reproduction - TO BE MODIFIED
1. Run **/Scripts/Data_generation/NoDIF/scenarios_noDIF_baseline.do** to simulate no DIF data
2. Run files in 🗂️ **/Scripts/Data_generation/DIF/** to simulate DIF data
3. Run **/RProject/Scripts/Analysis/pcm_nodif.R** to analyze without accounting for DIF
4. Run files in 🗂️ **/Scripts/Analysis/DIF/** to analyze while accounting for DIF
5. Run **/Scripts/Analysis/DIF-ROSALI/pcm_dif_rosali.do** to analyze data after accounting for DIF as detected by ROSALI
6. Run **/RProject/Scripts/Analysis/resali_analysis.R** to perform residuals DIF detection and prepare data for PCM analysis.
7. Run **/Scripts/Analysis/DIF-RESIDUALS/pcm_dif_residus.do** to analyze data after accounting for DIF as detected by the residuals method
8. Run **/RProject/Scripts/Analysis/aggregation.R** to compile and visualize results
**OR**
1. [BASH ONLY] Run **prepare_file_structure.sh** (only on first run) and **autorun.sh** (by default, will take multiple weeks to run. Please modify to run in parrallel if necessary)