1M-Scale Quantile Regression in Approximately 0.1 Second Using ReHLine
The main purpose of this post is to demonstrate the computational power of the ReHLine algorithm that Dr. Yixuan Qiu and I recently developed, using quantile regression as an illustrative example.
I would like to emphasize that ReHLine is a highly efficient solver for general (constrained) Empirical Risk Minimization (ERM) and quadratic programming problems. Here, we simply use quantile regression to showcase ReHLine’s capabilities (see details in our paper1).
Introduction
Quantile regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. It is widely applied across various fields, including finance, economics, and healthcare, to estimate the conditional quantiles of a response variable. However, traditional quantile regression methods can be computationally expensive and may not scale well to large datasets. In this blog post, we explore how ReHLine, our novel algorithm, can perform quantile regression on datasets with one million observations in approximately 0.1 seconds, making it a game-changer for fast and accurate quantile estimation.
We demonstrate this example on my standard MacBook Air M4.
First, let’s install ReHLine via pypi.
pip install rehline
Computation
We use the make_friedman1 dataset from sklearn to simulate the data with 1 million observations.
# Import required libraries
from sklearn.datasets import make_friedman1
import numpy as np
n_sample=1000000
X, y = make_friedman1(n_samples=n_sample, random_state=42)
We establish two quantile regression models, corresponding to the 5th and 95th percentiles, respectively.
# Import ReHLine's quantile regression implementation
from rehline import plqERM_Ridge
# Create models for lower (5th) and upper (95th) quantiles
# Note: C parameter is the inverse of regularization strength (1/lambda)
clf_l = plqERM_Ridge(loss={'name': 'QR', 'qt': 0.05}, C=1.0/n_sample)
clf_u = plqERM_Ridge(loss={'name': 'QR', 'qt': 0.95}, C=1.0/n_sample)
Next, we measure the computational time of the quantile regression models.
# Measure the time for fitting the lower quantile model
%%timeit
clf_l.fit(X=X, y=y)
134 ms ± 438 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Measure the time for fitting the upper quantile model
%%timeit
clf_u.fit(X=X, y=y)
134 ms ± 409 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The fitting time of one quantile regression is approximately 0.1 seconds for one million data points. This is remarkably fast compared to traditional methods that would typically require minutes or even hours for the same dataset size.
Next, we monitor the computational time of the path solution of quantile regression, which calculates solutions for multiple regularization strengths efficiently using warm-starting:
# Import the path solution function from ReHLine
from rehline import plqERM_Ridge_path_sol
loss={'name': 'QR', 'qt': 0.05}
Cs = np.logspace(-3,3,10)
Cs, times, n_iters, losses, norms, coefs = plqERM_Ridge_path_sol(X, y, loss=loss, Cs=Cs, max_iter=50000, tol=1e-4, verbose=1, warm_start=True, return_time=True)
The path solution of quantile regression with 10 Cs takes approximately 55 seconds.
PLQ ERM Path Solution Results
==========================================================================================
C Value Iterations Time (s) Loss L2 Norm
------------------------------------------------------------------------------------------
0.001 6 0.805814 267410.897200 10.314200
0.004642 8 0.701051 259282.043800 11.575500
0.02154 46 0.746822 258779.629900 11.912500
0.1 463 1.344118 258752.334400 11.996500
0.4642 3783 5.736946 258751.285200 12.014100
2.154 12425 7.766684 258751.270200 12.017500
10 23951 10.518842 258751.273800 12.017900
46.42 40974 12.514367 258751.276600 12.018100
215.4 37061 5.797936 258751.276800 12.018100
1000 40136 9.676294 258751.277500 12.018200
==========================================================================================
Total Time 55.629967 sec
Avg Time/Iter0.000350 sec
==========================================================================================
More information
Again, this is merely a simple example, as ReHLine can be applied to solve a much broader range of optimization problems. For instance, it can handle Ridge Composite Quantile Regression, Constrained ERM, and Constrained ERM - path solution.
If you have specific computational problem requirements, please feel free to email us. We are happy to check whether ReHLine can solve your problem. If ReHLine proves helpful to you, please star our GitHub repository.

Thank you for the support from our stargazers!
References
-
Dai, B., & Qiu, Y. (2023). ReHLine: Regularized composite ReLU-ReHU loss minimization with linear computation and linear convergence. Advances in Neural Information Processing Systems, 36, 14366-14386. ↩

Comments
You can comment on this post using your GitHub account.
Join the discussion for this article on this ticket. Comments appear on this page instantly.