1M-Scale Quantile Regression in Approximately 0.1 Second Using ReHLine

Posted 2025-07-06

The main purpose of this post is to demonstrate the computational power of the ReHLine algorithm that Dr. Yixuan Qiu and I recently developed, using quantile regression as an illustrative example.

I would like to emphasize that ReHLine is a highly efficient solver for general (constrained) Empirical Risk Minimization (ERM) and quadratic programming problems. Here, we simply use quantile regression to showcase ReHLine’s capabilities (see details in our paper¹).

Introduction

Quantile regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. It is widely applied across various fields, including finance, economics, and healthcare, to estimate the conditional quantiles of a response variable. However, traditional quantile regression methods can be computationally expensive and may not scale well to large datasets. In this blog post, we explore how ReHLine, our novel algorithm, can perform quantile regression on datasets with one million observations in approximately 0.1 seconds, making it a game-changer for fast and accurate quantile estimation.

We demonstrate this example on my standard MacBook Air M4.

First, let’s install ReHLine via pypi.

pip install rehline

Computation

We use the make_friedman1 dataset from sklearn to simulate the data with 1 million observations.

# Import required libraries
from sklearn.datasets import make_friedman1
import numpy as np

n_sample=1000000
X, y = make_friedman1(n_samples=n_sample, random_state=42)

We establish two quantile regression models, corresponding to the 5th and 95th percentiles, respectively.

# Import ReHLine's quantile regression implementation
from rehline import plqERM_Ridge

# Create models for lower (5th) and upper (95th) quantiles
# Note: C parameter is the inverse of regularization strength (1/lambda)
clf_l = plqERM_Ridge(loss={'name': 'QR', 'qt': 0.05}, C=1.0/n_sample)
clf_u = plqERM_Ridge(loss={'name': 'QR', 'qt': 0.95}, C=1.0/n_sample)

Next, we measure the computational time of the quantile regression models.

# Measure the time for fitting the lower quantile model
%%timeit
clf_l.fit(X=X, y=y)

134 ms ± 438 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Measure the time for fitting the upper quantile model
%%timeit
clf_u.fit(X=X, y=y)

134 ms ± 409 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The fitting time of one quantile regression is approximately 0.1 seconds for one million data points. This is remarkably fast compared to traditional methods that would typically require minutes or even hours for the same dataset size.

Next, we monitor the computational time of the path solution of quantile regression, which calculates solutions for multiple regularization strengths efficiently using warm-starting:

# Import the path solution function from ReHLine
from rehline import plqERM_Ridge_path_sol

loss={'name': 'QR', 'qt': 0.05}
Cs = np.logspace(-3,3,10)
Cs, times, n_iters, losses, norms, coefs = plqERM_Ridge_path_sol(X, y, loss=loss, Cs=Cs, max_iter=50000, tol=1e-4, verbose=1, warm_start=True, return_time=True)

The path solution of quantile regression with 10 Cs takes approximately 55 seconds.

PLQ ERM Path Solution Results
==========================================================================================
C Value        Iterations     Time (s)            Loss                L2 Norm             
------------------------------------------------------------------------------------------
0.001          6              0.805814            267410.897200       10.314200           
0.004642       8              0.701051            259282.043800       11.575500           
0.02154        46             0.746822            258779.629900       11.912500           
0.1            463            1.344118            258752.334400       11.996500           
0.4642         3783           5.736946            258751.285200       12.014100           
2.154          12425          7.766684            258751.270200       12.017500           
10             23951          10.518842           258751.273800       12.017900           
46.42          40974          12.514367           258751.276600       12.018100           
215.4          37061          5.797936            258751.276800       12.018100           
1000           40136          9.676294            258751.277500       12.018200           
==========================================================================================
Total Time  55.629967 sec
Avg Time/Iter0.000350 sec
==========================================================================================

More information

Again, this is merely a simple example, as ReHLine can be applied to solve a much broader range of optimization problems. For instance, it can handle Ridge Composite Quantile Regression, Constrained ERM, and Constrained ERM - path solution.

If you have specific computational problem requirements, please feel free to email us. We are happy to check whether ReHLine can solve your problem. If ReHLine proves helpful to you, please star our GitHub repository.

Thank you for the support from our stargazers!

References

Dai, B., & Qiu, Y. (2023). ReHLine: Regularized composite ReLU-ReHU loss minimization with linear computation and linear convergence. Advances in Neural Information Processing Systems, 36, 14366-14386. ↩

machine-learning optimization quadratic-programming quantile-regression

Comments

You can comment on this post using your GitHub account.

Join the discussion for this article on this ticket. Comments appear on this page instantly.