EN | DE

Code Projects & Repositories

Active development projects tracked via GitHub. My open backoffice for collaborative science.


EmotiView

Language: Nextflow
Last updated: 2026-04-02

View README

EmotiView: Neural-Autonomic Synchrony and Embodied Integration

This repository accompanies ongoing research investigating the dynamic interplay between neural activity and autonomic nervous system responses during emotional experiences. Here you'll find the research article, presentations, analysis pipeline, and results—updated in real-time as the project progresses.

Principal Investigator: Cagatay Özcan Jagiello Gutt

PlatformRoleContents
OSFResearch outputArticle, documentation
GitHubTechnical implementationAnalysis pipeline, results, presentations, proposal

Research Abstract

Emotional states are fundamentally embodied, emerging from the dynamic interplay between central neural processing and peripheral physiological adjustments orchestrated by the autonomic nervous system (ANS). While ANS outputs like heart rate variability (HRV) and electrodermal activity (EDA) reflect emotional arousal and valence, understanding the precise temporal coordination between brain activity and these peripheral signals is crucial for elucidating brain-body interactions. This study investigates neural-autonomic phase synchrony during the conscious processing of distinct emotional states (positive, negative, neutral) by quantifying the temporal alignment between cortical and physiological rhythms.

We employ a multimodal approach, simultaneously recording high-temporal-resolution electroencephalography (EEG), electrocardiography (ECG) for HRV analysis (specifically Root Mean Square of Successive Differences, RMSSD), EDA, and functional near-infrared spectroscopy (fNIRS) while participants view validated emotional video clips. Our primary analysis quantifies the Phase Locking Value (PLV) between frontal EEG oscillations (Alpha, Beta bands) and continuous signals derived from HRV (reflecting parasympathetic influence) and phasic EDA (reflecting sympathetic influence). EEG channel selection for PLV analysis is informed by task-related hemodynamic activity measured via fNIRS to focus on functionally relevant cortical areas.

We hypothesize that PLV, indicating brain-body temporal integration, will be significantly modulated by emotional content compared to neutral conditions. We further expect synchrony strength to correlate with subjective arousal ratings. By examining the phase synchrony between brain signals and ANS-mediated physiological outputs, this research provides novel insights into the dynamic, embodied mechanisms underlying emotional experience. Understanding this temporal binding is critical for models of psychophysiological function and may inform assessments of cognitive load or stress regulation capacity.

Core Research Aims & Hypotheses

This project seeks to understand how the brain and body coordinate during emotional processing, focusing on neural-autonomic phase synchrony. Key hypotheses include:

  1. Emotional Modulation of Synchrony: Neural-autonomic synchrony (Phase Locking Value - PLV) will be enhanced during the processing of positive and negative emotional stimuli compared to neutral stimuli, for both brain-heart (EEG-HRV) and brain-sudomotor (EEG-EDA) coupling.
  2. Synchrony and Subjective Arousal: The magnitude of neural-autonomic synchrony will positively correlate with subjective ratings of emotional arousal during emotional conditions.
  3. Baseline Vagal Tone and Task-Related Synchrony: Individual differences in baseline parasympathetic regulation (resting-state RMSSD) will be associated with the degree of EEG-HRV synchrony during negative emotional stimuli.
  4. Frontal Asymmetry and Branch-Specific Synchrony: The direction of prefrontal cortical asymmetry (Frontal Asymmetry Index - FAI) will be differentially associated with the strength of phase synchrony involving distinct autonomic branches (EEG-HRV vs. EEG-EDA).

For a comprehensive understanding of the theoretical background, detailed methodology, and specific work packages, please refer to the full proposal document.

Methodology Overview

A multimodal experimental design is employed, involving:

  • Stimuli: Standardized, emotionally evocative video clips (positive, negative, neutral) from the E-MOVIE database.
  • Participants: Healthy young adults, screened for relevant criteria.
  • Data Acquisition: Simultaneous recording of:
    • Electroencephalography (EEG): To measure prefrontal neural dynamics.
    • Functional Near-Infrared Spectroscopy (fNIRS): To localize hemodynamic activity in prefrontal and parietal regions, informing EEG channel selection.
    • Electrocardiography (ECG): For Heart Rate Variability (HRV) analysis.
    • Electrodermal Activity (EDA): To measure sympathetic nervous system activity.
  • Subjective Measures: Self-Assessment Manikin (SAM) for valence and arousal, Positive and Negative Affect Schedule (PANAS), and Behavioural Inhibition/Approach System (BIS/BAS) scales.

Repository Contents

Research Output (OSF)

  • Article: (Coming soon) The research article summarizing findings and contributions.

Technical Implementation (GitHub)

  • EV_results/: Processed data, analysis metrics, and visualizations.
  • EV_analysis/: The Nextflow-based analysis pipeline with Python modules.
  • EV_presentation/: Slides and presentation materials.
  • EV_proposal/: The original research proposal with methodology and analysis plan.

Analysis Pipeline

The analysis pipeline is built on the AnalysisToolbox—a modular Nextflow framework for scalable, reproducible data processing with automatic result synchronization. The EmotiView-specific pipeline in EV_analysis/ extends this framework to:

  • Load and parse multi-modal raw data (EEG, fNIRS, ECG, EDA, questionnaires).
  • Perform standardized preprocessing steps specific to each physiological modality.
  • Extract key features and metrics (e.g., EEG power, FAI, RMSSD, fNIRS ROI activation, PLV).
  • Generate participant-level results and aggregated summaries.

Configuration is managed via EV_analysis/EV_parameters.config. See the AnalysisToolbox documentation for framework details.

Project Status

Data collection and Thesis writing.

Contributors

NameRoleContact
Cagatay Özcan Jagiello GuttPrincipal InvestigatorORCID
Ben GopinTechnical AssistantEmail
Gerrit JostlerTechnical AssistantEmail

View on GitHub →


AnalysisToolbox

Language: Python
Last updated: 2026-04-02

View README

AnalysisToolbox

A modular framework for automated data processing and statistical analysis pipelines. Built on Nextflow for scalable, reproducible workflows with automatic result synchronization.

Overview

The AnalysisToolbox provides infrastructure for building data processing pipelines that:

  • Process multiple datasets in parallel with automatic discovery of new data
  • Handle diverse data types through a generic reader/processor/analyzer architecture
  • Track progress via per-dataset logging and automatic git synchronization of results
  • Recover gracefully from failures without losing completed work

The framework is domain-agnostic—modules follow simple input/output conventions (Parquet files) and can implement any processing logic. Some specialized modules exist for specific data types (e.g., fNIRS preprocessing) where domain knowledge is required.

Prerequisites

1. WSL (Windows Subsystem for Linux)

The pipeline runs in Linux. On Windows, install WSL:

wsl --install -d Ubuntu

2. Java Runtime (required by Nextflow)

sudo apt update
sudo apt install default-jre
java -version  # Verify installation

3. Nextflow

curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
nextflow -version  # Verify installation

4. Python Environment

Create a virtual environment with required packages:

python3 -m venv ~/analysis_venv
source ~/analysis_venv/bin/activate
pip install numpy pandas polars scipy matplotlib
# Add domain-specific packages as needed (e.g., mne for neuroimaging)

5. Git SSH Setup (for automatic result sync)

Generate an SSH key (press Enter for no passphrase):

ssh-keygen -t ed25519 -C "your_email@example.com"
cat ~/.ssh/id_ed25519.pub

Add the public key to GitHub: https://github.com/settings/keys → "New SSH key"

Test the connection:

ssh -T git@github.com

Project Structure

AnalysisToolbox/
├── Python/
│   ├── analyzers/       # Analysis modules (statistics, feature extraction)
│   ├── processors/      # Data transformation modules (filtering, epoching)
│   ├── readers/         # File format readers (XDF, TXT, etc.)
│   └── utils/           # Infrastructure (Nextflow wrapper, plotting)

Usage

Creating a Pipeline

Pipelines are defined in Nextflow DSL2. A typical pipeline:

  1. Discovers participants via workflow_wrapper (supports continuous monitoring)
  2. Chains processing steps using IOInterface (generic Python script runner)
  3. Tracks completion via watchdog threads that monitor terminal processes
  4. Syncs results to git when each participant completes

Configuration

Pipelines use a parameters.config file to define:

  • Input/output directories
  • Python environment path
  • Script paths
  • Processing parameters

Running

cd /path/to/your/pipeline
nextflow run pipeline.nf -c parameters.config -with-trace

The -with-trace flag is required for the watchdog to monitor completion.

Key Components

workflow_wrapper

Discovers participant directories, creates output folders, and starts per-participant watchdog threads.

IOInterface

Generic process that runs any Python script with automatic logging to {id}_pipeline.log.

Watchdog

Background thread per participant that:

  • Monitors the trace file for terminal process completion
  • Appends completion summary to the log
  • Triggers git commit/push with results

Built With

  • Nextflow - Workflow orchestration
  • Python - Processing and analysis modules
  • Polars/Pandas - Data manipulation
  • NumPy/SciPy - Numerical computing
  • Matplotlib - Visualization

Authors

  • Cagatay Özcan Jagiello Gutt - Lead Developer ORCID: https://orcid.org/0000-0002-1774-532X

View on GitHub →


5ha99y

Language: JavaScript
Last updated: 2026-04-02

View README

Zola GitHub Pages Site - Scientific Hub

A static website built with Zola that automatically syncs content from your scientific profiles.

Website URL

https://cgutt-hub.github.io/5ha99y

How It Works

Automatic Updates

The site automatically pulls data from:

  • GitHub — Your repositories and code projects
  • ORCID — Publications and works
  • OSF — Research projects and data (if configured)

Deployment Flow

1. Push to main branch

2. GitHub Actions runs

3. Fetches data from APIs

4. Builds site with Zola

5. Deploys to gh-pages branch

6. Website updates automatically!

Repository Structure

Source Files (Edit These)

  • config.toml - Site configuration
  • content/ - Your content (CV, contact, welcome post, etc.)
  • templates/ - HTML templates
  • static/ - Static assets (CSS, images)
  • scripts/fetch_data.py - Fetches data from APIs

Auto-Generated (Don't Edit)

  • public/ - Built website (generated by Zola)
  • data/ - API data cache (generated by fetch_data.py)
  • content/projects.md - Generated from GitHub repos
  • content/publications.md - Generated from ORCID
  • content/blog/????-??-??-new-project-*.md - Auto-generated blog posts

These files are created automatically during deployment and are ignored by git.

Local Development

Preview Locally

# Install Zola first: https://www.getzola.org/documentation/getting-started/installation/

# Fetch latest data
pip install -r scripts/requirements.txt
python scripts/fetch_data.py

# Build and serve
zola serve
# Visit http://127.0.0.1:1111

Making Changes

  1. Edit content in content/ folder
  2. Modify templates in templates/
  3. Update styles in static/style.css
  4. Test with zola serve
  5. Commit and push to main branch
  6. GitHub Actions deploys automatically!

GitHub Pages Setup

First Time Setup

  1. Go to: Settings → Pages
  2. Source: Deploy from a branch
  3. Branch: gh-pages
  4. Folder: / (root)
  5. Click Save

Requirements

  • Repository must be public (for free GitHub accounts)
  • GitHub Pages must be enabled in Settings
  • Workflow runs successfully (check Actions tab)

Customization

Site Settings

Edit config.toml:

  • Site title and description
  • Base URL
  • Author information

Content

Edit files in content/:

  • _index.md - Home page
  • cv.md - CV page
  • contact.md - Contact page
  • blog/2026-02-11-welcome.md - Welcome blog post

Styling

  • static/style.css - Main stylesheet
  • templates/ - HTML templates

Troubleshooting

Website Not Updating?

  1. Check Actions tab - look for green checkmark ✓
  2. If build failed, check error logs
  3. Verify GitHub Pages is enabled in Settings
  4. Clear browser cache (Ctrl+Shift+R)

Build Failing?

  • Check Actions tab for error details
  • Most common: Zola syntax errors in content files
  • Fix the error and push again

Branch Structure

  • main - Source code (you edit here)
  • gh-pages - Deployed website (auto-generated, don't edit)
  • copilot/ - Development branches

Advanced

Custom Domain

  1. Add static/CNAME with your domain
  2. Update base_url in config.toml
  3. Configure DNS at your domain registrar
  4. Add custom domain in Settings → Pages

API Configuration

Edit scripts/fetch_data.py:

  • GITHUB_USERNAME - Your GitHub username
  • ORCID_ID - Your ORCID identifier
  • Add other data sources as needed

View on GitHub →


labourAIVolt

Language: Python
Last updated: 2026-03-18

View README

labourAIVolt: AI & Human Labour Displacement Analysis for Volt

LAV Analysis

This repository hosts a Nextflow + Python analysis pipeline that automatically fetches current labour-market data from the World Bank public API and quantifies AI-driven labour displacement across the six Volt EU countries with the most active chapters — Germany, France, the Netherlands, Belgium, Italy, and Spain.

The aim is to give Volt Europa and its national chapters an evidence base for labour-market and technology policy: which sectors are shedding jobs fastest as AI and automation accelerate, which countries are most exposed, and how does digital readiness moderate that exposure?

The pipeline is architecturally modelled after the EmotiView project and extends the AnalysisToolbox modular Nextflow framework for scalable, reproducible analysis with automatic result synchronisation.

PlatformRoleContents
GitHubTechnical implementationPipeline, scripts, results
World Bank APIData sourceLive labour-market indicators

Research Background

The labour-market impact of AI and automation is one of the defining policy challenges of the 2020s. Early projections (Frey & Osborne, 2013) estimated that up to 47 % of US jobs faced high computerisation risk; subsequent analyses have moderated that figure while broadening it to task-level disruption rather than wholesale job destruction. What is clear is that the pace and sector distribution of displacement vary substantially across countries depending on industrial structure, education levels, and digital infrastructure.

For a pan-European political movement like Volt, the relevant questions are:

  1. Which EU sectors show the clearest employment decline correlated with automation?
  2. Are Volt's home countries converging toward or diverging from each other on displacement pressure?
  3. Does a country's digital readiness (internet penetration, high-tech exports) buffer it against displacement?
  4. Where should Volt's labour policy — reskilling funds, working-time reform, Universal Basic Income pilots — be concentrated first?

This pipeline operationalises those questions with reproducible, automatically updated data.


Core Research Questions & Hypotheses

  1. Sector displacement ordering: Industry and agriculture will show larger negative employment-share trends than services, consistent with higher Frey & Osborne automation risk for routine physical/cognitive tasks.

  2. Cross-country heterogeneity: Countries with larger manufacturing sectors (Germany, Italy) will exhibit higher AI Displacement Pressure Index (ADPI) than service-dominant economies (Netherlands, Belgium).

  3. Digitalization buffer: Countries with higher Digitalization Readiness Scores (internet penetration + high-tech export share) will show lower net Vulnerability Scores, suggesting that digital transformation simultaneously creates displacement and provides adaptive capacity.

  4. Temporal acceleration: Employment-share trends will steepen post-2018 as AI adoption accelerates across all three sectors, visible as a structural break in the time-series.


Analysis Pipeline

The pipeline is built on the AnalysisToolbox — a modular Nextflow framework for scalable, reproducible data processing with automatic result synchronisation. The LAV-specific pipeline in LAV_analysis/ extends this framework to:

  • Discover country datasets and create per-country output directories (L1).
  • Fetch and parse live labour-market time-series from the World Bank API.
  • Perform standardised normalisation and cleaning.
  • Extract displacement signals and automation-risk-weighted scores per sector.
  • Fit time-series trend models to all available indicators.
  • Aggregate across countries into cross-country rankings and Volt policy metrics (L2).

Configuration is managed via LAV_analysis/LAV_parameters.config. See the AnalysisToolbox documentation for framework details.

Pipeline steps

L1 (per country)
┌─────────────────────────────────────────────────────────────────────┐
│  api_reader              Fetch 13 World Bank indicators (2010–2023) │
│       ↓                                                             │
│  normalizing_processor   Pivot long→wide; sort; deduplicate         │
│       ↓              ↘                                              │
│  displacement_analyzer   trend_analyzer                             │
│  sector scores ×         OLS slope + p-value + R²                  │
│  Frey & Osborne risk     per indicator                              │
└─────────────────────────────────────────────────────────────────────┘
       ↓ collect all countries
L2 (cross-country)
┌─────────────────────────────────────────────────────────────────────┐
│  volt_report_analyzer    Displacement ranking · ADPI · DRS          │
│                          Vulnerability Score · policy metrics       │
└─────────────────────────────────────────────────────────────────────┘

Displacement model

Each broad employment sector receives a displacement score:

displacement_score  =  displacement_signal  ×  automation_risk
TermDefinition
displacement_signalNormalised negative employment-share trend: max(0, −slope / mean_level). A sector losing share faster relative to its baseline scores higher.
automation_riskSector-level probability of computerisation from Frey & Osborne (2013): agriculture 0.82, industry 0.79, services 0.63.
displacement_scoreComposite: high score = fast employment decline and high intrinsic automation susceptibility.

Group-level metrics (L2)

MetricDefinition
ADPI (AI Displacement Pressure Index)Mean displacement_score across all sectors for a country.
DRS (Digitalization Readiness Score)Normalised mean of internet-user percentage and high-tech export share (latest year).
Vulnerability ScoreADPI / (DRS + ε) — high ADPI and low digital readiness = most vulnerable.

Data indicators (World Bank API, no key required)

ColumnWorld Bank codeDescription
employment_agriculture_pctSL.AGR.EMPL.ZSEmployment in agriculture (% total)
employment_industry_pctSL.IND.EMPL.ZSEmployment in industry (% total)
employment_services_pctSL.SRV.EMPL.ZSEmployment in services (% total)
unemployment_rateSL.UEM.TOTL.ZSUnemployment (% labour force)
youth_unemployment_rateSL.UEM.1524.ZSYouth unemployment (%)
employment_to_pop_ratioSL.EMP.TOTL.SP.ZSEmployment-to-population ratio
wage_salary_workers_pctSL.EMP.WORK.ZSWage & salaried workers (%)
internet_users_pctIT.NET.USER.ZSInternet users (% population)
gdp_per_capita_usdNY.GDP.PCAP.CDGDP per capita (current USD)
gdp_growth_annual_pctNY.GDP.MKTP.KD.ZGGDP growth (annual %)
high_tech_exports_pct_mfgTX.VAL.TECH.MF.ZSHigh-tech exports (% manufactured exports)
ict_goods_exports_pctTX.VAL.ICTG.ZS.UNICT goods exports (% total goods exports)
labor_force_totalSL.TLF.TOTL.INTotal labour force

Repository Structure

labourAIVolt/

├── LAV_analysis/                    Nextflow pipeline (mirrors EV_analysis/)
│   ├── LAV_pipeline.nf              Main workflow orchestration
│   ├── LAV_modules.nf               IOInterface alias declarations
│   └── LAV_parameters.config        All pipeline parameters & script paths

├── LAV_data/                        Per-country input configs (mirrors rawData/)
│   ├── LAV_001/  LAV_001_config.json    Germany        (Volt Deutschland)
│   ├── LAV_002/  LAV_002_config.json    France         (Volt France)
│   ├── LAV_003/  LAV_003_config.json    Netherlands    (Volt Nederland)
│   ├── LAV_004/  LAV_004_config.json    Belgium        (Volt Belgium)
│   ├── LAV_005/  LAV_005_config.json    Italy          (Volt Italia)
│   └── LAV_006/  LAV_006_config.json    Spain          (Volt España)

├── LAV_results/                     Pipeline outputs (mirrors EV_results/)
│   ├── .bin/                        Shared infrastructure (logs, HTML archive)
│   ├── LAV_l1/                      First-level: per-country results
│   │   ├── LAV_001/
│   │   │   ├── plots/               Parquet output copies for QC
│   │   │   ├── LAV_001_api_raw.parquet
│   │   │   ├── LAV_001_normalized.parquet
│   │   │   ├── LAV_001_displacement.parquet
│   │   │   ├── LAV_001_trends.parquet
│   │   │   └── LAV_001.log.parquet  Live execution log
│   │   └── LAV_002/ … LAV_006/
│   └── LAV_l2/                      Second-level: cross-country group results
│       ├── LAV_volt_report.parquet
│       ├── LAV_displacement_summary.parquet
│       └── LAV_trends_summary.parquet

├── Python/                          Analysis scripts (no Nextflow dependency)
│   ├── lav_run.py                   Standalone orchestrator (used by CI)
│   ├── requirements.txt
│   ├── readers/
│   │   └── api_reader.py            Fetches World Bank labour-market data
│   ├── processors/
│   │   └── normalizing_processor.py Long→wide pivot, clean, sort
│   └── analyzers/
│       ├── displacement_analyzer.py AI displacement scores (Frey & Osborne)
│       ├── trend_analyzer.py        OLS time-series trends per indicator
│       └── volt_report_analyzer.py  Cross-country Volt policy synthesis

└── .github/workflows/
    └── lav_analysis.yml             GitHub Actions CI (weekly + on push)

Running the Analysis

The workflow in .github/workflows/lav_analysis.yml runs automatically:

TriggerWhen
ScheduledEvery Monday at 06:00 UTC (pulls the latest World Bank data)
On pushAny change to LAV_data/** or Python/** on main
ManualActions tab → LAV Labour-AI-Volt AnalysisRun workflow

Results are:

  1. Uploaded as a downloadable artifact (lav-results-<run-number>) for 90 days.
  2. Committed back to LAV_results/ in the repository so outputs are versioned alongside the code.

No API keys, secrets, or local software are required.


Option B — Standalone Python (local, no Nextflow)

Use this for quick local runs or debugging individual scripts.

# 1. Clone the repository
git clone https://github.com/CGutt-hub/labourAIVolt.git
cd labourAIVolt

# 2. Install Python dependencies
pip install -r Python/requirements.txt

# 3. Run the full pipeline
python Python/lav_run.py

# Optional: override data/output directories
python Python/lav_run.py --data-dir LAV_data --output-dir LAV_results

Results are written to LAV_results/LAV_l1/<id>/ (per country) and LAV_results/LAV_l2/ (group synthesis).


Option C — Full Nextflow Pipeline (local, requires AnalysisToolbox)

Use this for full pipeline tracing, parallel execution, and integration with the AnalysisToolbox interactive HTML archive.

Prerequisites: Java ≥ 11, Nextflow

# 1. Clone both repos as siblings
git clone https://github.com/CGutt-hub/labourAIVolt.git
git clone https://github.com/CGutt-hub/AnalysisToolbox.git

# Your directory should now look like:
#   parent/
#   ├── AnalysisToolbox/
#   └── labourAIVolt/

# 2. Install Python dependencies
cd labourAIVolt
pip install -r Python/requirements.txt

# 3. Adjust python_exe in LAV_parameters.config if needed
#    (default: 'python3')

# 4. Launch the pipeline from the LAV_analysis/ directory
cd LAV_analysis
nextflow run LAV_pipeline.nf -c LAV_parameters.config

The Nextflow pipeline adds on top of the standalone runner:

  • Parallel per-country execution
  • Full Nextflow trace (LAV_results/.bin/pipeline_trace.txt)
  • Interactive HTML result archive (via AnalysisToolbox interactive_plotter)
  • Automatic git commit + push of results after each country completes

Output Files Reference

Per-country (L1) — LAV_results/LAV_l1/LAV_XXX/

FileDescription
LAV_XXX_api_raw.parquetRaw long-format data as returned by the World Bank API. Columns: participant_id, country, iso3, source, indicator, indicator_code, year, value.
LAV_XXX_normalized.parquetWide-format time-series. One row per year, one column per indicator. Ready for analysis scripts.
LAV_XXX_displacement.parquetPer-sector displacement scores. Key columns: sector, employment_mean_pct, trend_slope_pp_per_yr, trend_significant, automation_risk_frey_osborne, displacement_score.
LAV_XXX_trends.parquetOLS trend results for every indicator. Key columns: indicator, trend_slope, trend_p_value, trend_r_squared, trend_significant, total_change_pct.
LAV_XXX.log.parquetLive pipeline execution log (Nextflow mode only).

Group-level (L2) — LAV_results/LAV_l2/

FileDescription
LAV_volt_report.parquetFull combined table (displacement + policy metrics for all countries).
LAV_displacement_summary.parquetCross-country displacement ranking per sector, with EU-wide mean, std, and per-country rank.
LAV_trends_summary.parquetEU-wide mean slope and significance counts for key indicators across all countries.

Adding a New Country

  1. Create a new directory: LAV_data/LAV_007/
  2. Add a config file LAV_data/LAV_007/LAV_007_config.json:
{
  "participant_id": "LAV_007",
  "country": "Portugal",
  "iso3": "PRT",
  "iso2": "PT",
  "year_start": 2010,
  "year_end": 2025,
  "volt_chapter": "Volt Portugal",
  "population_millions": 10.3,
  "eu_member": true,
  "notes": "Optional notes about the country context"
}
  1. Push the file — the GitHub Action will pick it up automatically on the next run.

Project Status

Active development. Data fetching, pipeline, and group analysis are operational. Planned additions: visualisation layer, structural-break detection (2018 AI inflection point), and integration with OECD employment-by-occupation microdata for finer-grained occupational risk scoring.


References

  • Frey, C. B., & Osborne, M. A. (2013). The Future of Employment: How Susceptible Are Jobs to Computerisation? Oxford Martin School Working Paper.
  • World Bank Open Data. https://data.worldbank.org
  • Acemoglu, D., & Restrepo, P. (2020). Robots and Employment: Evidence from Europe. American Economic Review, 110(6), 2188–2220.
  • Autor, D. (2015). Why Are There Still So Many Jobs? Journal of Economic Perspectives, 29(3), 3–30.

Contributors

NameRoleContact
Cagatay Özcan Jagiello GuttPrincipal InvestigatorORCID

View on GitHub →


surveyWorkbench

Language: Python
Last updated: 2026-02-18

View README

Survey Workbench v2.0

A comprehensive participant data management system for survey research with dynamic questionnaire configuration and batch processing capabilities.

Python PyQt5 License

Overview

Survey Workbench is a desktop application designed to streamline the management of participant folders and extraction of survey data from questionnaires. Built with PyQt5, it provides an intuitive graphical interface for researchers and data managers to efficiently organize and process survey data.

Key Features

  • 🔧 Dynamic Questionnaire Configuration: Support for unlimited questionnaire types per participant with flexible template management
  • 📦 Batch Processing: Generate and extract data for multiple participants simultaneously
  • 📥 Participant Import: Import participant lists from .txt or .csv files
  • 📋 Template Bundles: Create and reuse questionnaire configuration bundles across projects
  • 🔍 Duplicate Detection: Automatic masterfile checking (supports CSV and Excel formats) to prevent duplicate entries
  • ✅ Data Completeness Verification: Validate all required data before extraction
  • 👁️ Preview Dialog: Review extracted data before finalizing
  • 📊 Missing Data Report: Generate quality control reports for incomplete data
  • 💾 Configuration Management: Save, load, and manage multiple configurations with an intuitive submenu interface
  • ❓ Interactive Help System: Built-in tooltips and "What's This?" mode for user assistance
  • 📑 Auto-Format Detection: Automatically detect masterfile format (CSV, XLS, XLSX)

System Requirements

  • Operating System: Windows 10/11, macOS 10.14+, or Linux
  • Python: 3.8 or higher
  • Microsoft Excel: Required for Excel file operations (via xlwings)
  • Memory: 4GB RAM minimum (8GB recommended for large datasets)
  • Storage: 100MB free space minimum

Installation

Prerequisites

Ensure you have Python 3.8+ installed on your system. You can download it from python.org.

Install Dependencies

# Clone the repository
git clone https://github.com/CGutt-hub/surveyWorkbench.git
cd surveyWorkbench

# Install required Python packages
pip install PyQt5 xlwings configparser

Additional Setup

For Excel integration (xlwings), you may need to install the Excel add-in:

xlwings addin install

Quick Start

Running the Application

python survey_workbench.py

Or run the compiled executable (if available):

./survey_workbench_v2.0  # On Linux/macOS
survey_workbench_v2.0.exe  # On Windows

Basic Workflow

  1. Configure Questionnaires: Set up your questionnaire templates and target folders
  2. Generate Participant Folders: Create participant-specific folders with questionnaire templates
  3. Fill Out Questionnaires: Have participants complete their questionnaires
  4. Extract Data: Collect and consolidate data from completed questionnaires into a masterfile

Usage

Generate Participant Folders

  1. Select template files for each questionnaire type
  2. Specify the target folder where participant folders will be created
  3. Enter participant IDs (manual entry or import from file)
  4. Click "Generate Participant Folder" to create the folder structure

Batch Mode: Enable batch mode to process multiple participants at once by importing a list from .txt or .csv files.

Extract Data

  1. Select the source folder containing participant folders
  2. Choose the masterfile (CSV or Excel) where data will be extracted
  3. Configure questionnaire-specific extraction settings:
    • Excel sheet names
    • Column filters
    • Multiple questionnaire copies
  4. Click "Extract Data" to consolidate participant data

Features:

  • Duplicate Detection: Automatically checks if participant data already exists in the masterfile
  • Data Completeness Check: Verifies all required questionnaires are present before extraction
  • Preview Dialog: Review data before final extraction
  • Missing Data Report: Generate reports for participants with incomplete data

Configuration Management

Save and load configurations to quickly switch between different project setups:

  • Save Configuration: Store your current questionnaire setup and settings
  • Load Configuration: Quickly restore a previously saved configuration
  • Delete Configuration: Remove outdated configurations
  • Recent Configurations: Access recently used configurations from the menu

Template Bundles

Create reusable template bundles for standardized project setups:

  1. Configure all questionnaires and settings
  2. Select "Save Template Bundle" from the menu
  3. Load the bundle in future projects to instantly apply the same configuration

File Structure

surveyWorkbench/
├── survey_workbench.py      # Main application source code
├── survey_workbench.spec    # PyInstaller build specification
├── config.ini               # Configuration storage file
├── USER_MANUAL.pdf          # Comprehensive user manual
├── USER_MANUAL.tex          # LaTeX source for user manual
└── README.md                # This file

Technology Stack

  • GUI Framework: PyQt5 - Cross-platform graphical user interface
  • Excel Integration: xlwings - Python library for Excel automation
  • Configuration: ConfigParser - INI file handling for settings persistence
  • Build Tool: PyInstaller - Executable packaging (see survey_workbench.spec)
  • Type Hints: Full type annotation support for better code maintainability

Documentation

For detailed documentation, including screenshots and step-by-step guides, please refer to the USER_MANUAL.pdf included in this repository.

Troubleshooting

Common Issues

  • Excel not found: Ensure Microsoft Excel is installed and xlwings is properly configured
  • Configuration not saving: Check write permissions for config.ini file
  • Import errors: Verify all dependencies are installed with pip list
  • Template files not copying: Ensure source template files exist and have read permissions

For more troubleshooting tips, consult the USER_MANUAL.pdf.

Version History

Version 2.0 (February 2026)

  • Dynamic questionnaire support with unlimited types
  • Enhanced batch processing capabilities
  • Template bundle system
  • Improved duplicate detection
  • Data completeness verification
  • Preview dialog for data extraction
  • Missing data reporting
  • Interactive help system

Version 1.0 (April 2024)

  • Initial release
  • Basic participant folder generation
  • Simple data extraction

Author

Cagatay Gutt

  • Created: April 15, 2024
  • Last Updated: February 4, 2026

License

This software is for internal use only. All rights reserved.

Support

For questions, issues, or feature requests, please contact the project maintainer or refer to the comprehensive USER_MANUAL.pdf for detailed guidance.


Survey Workbench - Streamlining survey data management for research excellence.

View on GitHub →


paperFinder

Language: Python
Last updated: 2026-02-18

View README

View on GitHub →


Development Philosophy

All code is developed with a commitment to open and transparent science. Tools, pipelines, and analysis code are made available to support reproducibility and collaborative advancement of knowledge.