Background
A Danish trade union has an online analysis tool where members can compare their terms with the market. Under the hood, it runs 255 statistical regressions — quantile regression, logistic regression on employee benefits, and multinomial regression on contract clauses.
The problem was that the code was written in R, was difficult to maintain, and eventually stopped running altogether. The consultant maintaining it needed someone to take the old code and make it work again.
What I built
I migrated the entire codebase from R to Python. Not a line-by-line translation, but a refactoring that leveraged what Python does well.
Tech stack
- pandas for data handling and Excel import
- statsmodels for quantile regression and binomial GLM
- scikit-learn for multinomial logistic regression
- joblib for parallel computation across all CPU cores
I built it so a non-technical employee can run it: one script installs everything, another runs the calculation. Input files are detected automatically, and the output is CSV files imported directly into the analysis tool.
Results
This is a classic modernization project. Old code that works — until it does not. Nobody dared touch it, and nobody could debug it. Now it runs flawlessly, it is easy to maintain, and it can be used year after year with new data.