Under the Hood

Complexity

The complexity classification for preference-graph acyclicity testing is due to Smeulders, Cherchye, De Rock & Spieksma (2014, ACM TEAC 2(1)). A comprehensive survey of the algorithmic landscape is provided by Smeulders, Crama & Spieksma (2019, EJOR 272(3)).

Problem	Complexity	Economic Significance
GARP / SARP / WARP	$O(T^2)$	Fundamental test of utility maximization.
CCEI (Afriat index)	$O(T^2 \log T)$	Measure of “near-rationality” via budget deflation.
MPI (Money Pump)	$O(T^3)$	Direct measure of welfare loss from inconsistency.
HARP (Homothetic)	$O(T^3)$	Test for homothetic (scale-invariant) preferences.
Houtman-Maks	NP-hard	Max subset of rational observations (Outlier detection).
VEI (Varian Index)	NP-hard	Observation-specific efficiency (Precision diagnostics).
Stochastic RUM	NP-hard	Population-level rationality (Random Utility Models).

Budget-Based Methods

GARP - SCC Algorithm

Definition: A dataset $\{(p_t, x_t)\}_{t=1}^T$ satisfies the Generalized Axiom of Revealed Preference (GARP) if for every sequence of observations $(t_1, t_2, \dots, t_k)$, the condition $p_{t_1}x_{t_1} \geq p_{t_1}x_{t_2}, \dots, p_{t_k}x_{t_k} \geq p_{t_k}x_{t_1}$ implies that all inequalities are actually equalities.

Intuition: If an agent selects bundle $A$ when $B$ was less expensive, this reveals $A \succeq B$. If the agent subsequently selects $B$ when $A$ was strictly less expensive, this produces a contradiction ($B \succ A$), implying no stable utility function can rationalize the observed behavior.

Traditional approach (pre-2015): Build the direct observation graph $G_{R_0}$, compute its transitive closure $R^*$ via Floyd-Warshall in $O(T^3)$, then check $\neg(i R^* j \wedge j P_0 i)$ for all pairs.

Floyd-Warshall builds the full transitive closure R* in O(T³)

Our approach: Talla Nobibon, Smeulders & Spieksma (2015, JOTA 166(3)) proved that transitive closure is unnecessary. Instead, we use Strongly Connected Components (SCCs).

Tarjan's SCC detects GARP violations in O(T²) - no closure needed

Theorem (Talla Nobibon et al., 2015)

GARP is violated if and only if some strongly connected component (SCC) of the direct weak preference graph $G_{R_0}$ contains a strict preference arc $P_0$.

Example: Suppose at $t=1$, you buy $x_1$ at prices $p_1$ and could have bought $x_2$ ($p_1 x_1 \geq p_1 x_2$). At $t=2$, you buy $x_2$ at prices $p_2$ and could have bought $x_1$ while it was strictly cheaper ($p_2 x_2 > p_2 x_1$). This forms a 2‑cycle $1 \xrightarrow{R_0} 2 \xrightarrow{P_0} 1$. Both observations are in the same SCC, and there is a strict preference arc $P_0$ between them. GARP fails.

Algorithm:

Build $R_0$ and $P_0$ from expenditure data - $O(T^2)$
Tarjan’s SCC decomposition on $R_0$ - $O(T + |A|) \leq O(T^2)$
For each arc $(i,j)$ where $\text{scc}[i] = \text{scc}[j]$, check $P_0[i,j]$ - $O(T^2)$

Total: $O(T^2)$ - provably tight. For $T = 10{,}000$, this is $1{,}000\times$ faster than Floyd-Warshall.

Implementation

Rust: rpt-core/src/garp.rs - garp_check() uses Tarjan’s SCC (no closure).
Batch dispatch: batch.rs auto-selects $O(T^2)$ when only GARP is requested. When downstream metrics need the full transitive closure $R^*$ (MPI, VEI), the engine runs Floyd-Warshall within each SCC and propagates reachability across the condensed DAG using bitwise operations, so the $O(T^3)$ cost is only paid once and shared.

CCEI (Afriat Efficiency Index)

Definition: The Critical Cost Efficiency Index (CCEI) is the supremum of all $e \in (0,1]$ such that the deflated data $\{(e \cdot p_t, x_t)\}_{t=1}^T$ satisfies GARP.

Intuition: If you fail GARP, how much would we have to “shrink” your budget to make your choices look rational? A CCEI of 0.95 means that if you had 5% less money at each step, your observed choices would no longer be seen as “wasteful” relative to other options, as those other options would have been outside your budget.

Example: Suppose $p_1 x_1 = 100$ and $p_1 x_2 = 105$. You bought $x_1$ even though $x_2$ was only slightly more expensive. If you also have a preference revealing $x_2 \succ x_1$, you have a violation. By setting $e = 100/105 \approx 0.952$, the cost of $x_2$ at $t=1$ becomes $0.952 \times 105 = 100$. Now $x_2$ is exactly as expensive as $x_1$, so choosing $x_1$ no longer reveals a strict preference over $x_2$.

Algorithm: The CCEI is found by a discrete binary search over the $T^2$ critical efficiency ratios $\{E_{ij} / E_{ii}\}$.

Collect all pairwise ratios $e_{ij} = p_i x_j / p_i x_i$ where $e_{ij} < 1$.
Sort and deduplicate these $\leq T^2$ values.
Binary search: for a candidate $e$, check GARP on the deflated data.

Each iteration of the binary search only needs a pass/fail GARP result, so PrefGraph uses the $O(T^2)$ SCC check rather than computing the full transitive closure. This keeps the total at $O(T^2 \log T)$ instead of $O(T^3 \log T)$.

Implementation

Rust: rpt-core/src/ccei.rs - ccei_search() performs the discrete binary search.

CCEI shrinks budget sets until preference cycles disappear

MPI (Money Pump Index) - Karp’s Algorithm

Definition: The Money Pump Index (MPI) measures the maximum average budget savings per step in a preference cycle.

Intuition: If your preferences are $A \succ B \succ C \succ A$, an arbitrageur could trade you $A$ for $B$ (and charge a small fee), then $B$ for $C$, then $C$ for $A$, ending up with their original goods plus your fees. The MPI quantifies how much “money” can be pumped out of you this way.

Example (Money Pump Cycle): 1. At $t=1$, you buy $x_1$ for $10; you could have bought $x_2$ for $8 (savings = 20%). 2. At $t=2$, you buy $x_2$ for $10; you could have bought $x_1$ for $8 (savings = 20%). By trading back and forth, 20% of the budget is “wasted” in each round of the cycle. The MPI for this cycle is 0.20.

Algorithm: We model this as finding the Maximum Mean-Weight Cycle in a directed graph where edge weights are relative savings $w_{ij} = (E_{ii} - E_{ij})/E_{ii}$.

PrefGraph uses Karp’s Algorithm, which uses dynamic programming to find the optimal cycle in $O(VE)$ time, which is $O(T^3)$ here.

\[\text{MPI} = \max_C \frac{1}{|C|} \sum_{(i,j) \in C} \frac{E_{ii} - E_{ij}}{E_{ii}}\]

Implementation

Rust: rpt-core/src/mpi.rs.

References: Karp (1978); Echenique, Lee & Shum (2011, JPE).

HARP (Homothetic Axiom) - Max-Product Paths

Definition: The Homothetic Axiom of Revealed Preference (HARP) tests if choices are consistent with a utility function $u(x)$ that is linearly homogeneous ($u(\alpha x) = \alpha u(x)$).

Intuition: Homothetic preferences imply that your relative choices between goods don’t change as your income increases; you just scale everything up. This imposes a much stricter requirement: not just “no cycles”, but “the product of expenditure ratios along any cycle cannot exceed 1.”

Algorithm: We use a log-transform to turn the product check into a sum check. 1. Define weights $W_{ij} = \log(E_{ii} / E_{ij})$. 2. Find the maximum-weight path between all pairs using a modified Floyd-Warshall. 3. HARP holds if no diagonal entry is positive (no positive-sum cycle).

Complexity: $O(T^3)$ due to the all-pairs shortest (longest) path requirement.

Implementation

Rust: rpt-core/src/harp.rs - harp_check().

Houtman-Maks Index - Greedy + ILP

Definition: The Houtman-Maks index is the size of the largest subset of observations that is consistent with GARP.

Intuition: If you have 100 shopping trips and 5 of them are completely unusual (e.g., buying for a large party), GARP might fail because of those 5 outliers. Houtman-Maks asks: “What is the maximum number of observations we can keep such that they are perfectly rational?”

Complexity: This is NP-hard. Formally, it is equivalent to the Maximum Weight Independent Set on a conflict graph, or more directly, the Minimum Directed Feedback Vertex Set (DFVS) on the preference graph.

Algorithm: 1. Greedy (Default): SCC‑aware heuristic per Heufer & Hjertstrand (2015). Decompose by SCC, then iteratively remove the node with highest violation participation; fast and usually within 1–2% of optimal. 2. Exact (ILP): Integer program with binary $z_t \in \{0,1\}$ indicating whether observation $t$ is kept; maximize $\sum z_t$ subject to GARP constraints.

NP-hard in general, but practical for $T \leq 500$ using SCC decomposition. The exact ILP uses the Demuynck & Rehbeck (2023) formulation with integer variables, which avoids the LP-relaxation errors documented by Mononen (2023).

Implementation

Rust: rpt-core/src/houtman_maks.rs (greedy + ILP dispatch), rpt-core/src/lp.rs (ILP solver).

References: Houtman & Maks (1985); Demuynck & Rehbeck (2023, Econ Theory).

Houtman-Maks removes the minimum observations to break all preference cycles

VEI (Varian Efficiency Index) - Exact MILP

Definition: Varian’s index assigns an individual efficiency level $e_t \in [0,1]$ to each observation. It is the least average adjustment that restores consistency,

\[I_{\mathrm{Var}} = \frac{1}{T} \inf_{(e_t)} \sum_{t=1}^{T} (1 - e_t) \quad \text{such that GARP holds at the adjusted budgets } e_t \, p_t x_t.\]

vei_exact_mean reports $1 - I_{\mathrm{Var}}$, which is unique. The per-observation vector behind it is generally not unique when several adjustment patterns achieve the same total, so PrefGraph reports a canonical vector. Among value-optimal solutions it first minimizes the largest single adjustment, then resolves remaining ties in observation order, giving earlier observations the benefit of the doubt. This selection is a reporting convention, not part of the index definition, and it makes the vector statistics deterministic and identical across both backends.

Intuition: Unlike CCEI, which applies a single “penalty” to every observation, VEI allows us to say: “Trip #14 was irrational (e=0.7), but Trip #1 was perfect (e=1.0).” This provides much higher diagnostic resolution for identifying when behavior became inconsistent.

Algorithm (Mononen, 2023): PrefGraph implements Theorem 1 of Mononen (2023) with row generation. 1. Reformulate the index as a binary linear program over strict revealed preferences. Lowering one budget removes the targeted preference and every cheaper preference at the same observation for free, so each cycle constraint is expanded to the full set of equally or more expensive preferences that would cover it. 2. Initialize with all 2-cycles (WARP violations). 3. Solve the MILP with the current constraint set. 4. Run the separation oracle from Mononen’s Algorithm 1, which keeps a preference alive only while its additional removal cost is positive and greedily breaks each cycle it finds. 5. If surviving cycles are found, add their expanded constraints and re-solve; otherwise the value is optimal. Two further constrained solves then pin the canonical vector.

This is NP-hard, but the reformulation converges orders of magnitude faster than earlier ILP approaches and sidesteps the LP-relaxation errors documented by Mononen (2023) in the Demuynck & Rehbeck formulation. A polynomial-time LP relaxation (compute_vei) is also available as a fast heuristic.

Implementation

Rust: rpt-core/src/vei.rs - compute_vei() (LP relaxation) and compute_vei_exact() (Theorem 1 MILP with row generation).
Python: prefgraph.algorithms.vei.compute_vei_exact is an exact mirror used by the Engine fallback; on integer data both backends return bit-identical results, verified by exhaustive-enumeration oracles in the test suite.

References: Varian (1990, J Econometrics); Mononen (2023).

GAPP (Generalized Axiom of Price Preference)

Definition: GAPP tests whether prices (not quantities) reveal consistent preferences. This is the dual of GARP.

Intuition: While GARP asks “is the bundle you chose better than other affordable bundles?”, GAPP asks “is the price you paid lower than other prices that would have made that bundle affordable?” It tests for utility maximization when consumers respond primarily to price signals rather than quantity constraints.

Algorithm: The price preference matrices are defined as:

\[R_p[s,t] = (p^s \cdot x^t \leq p^t \cdot x^t), \qquad P_p[s,t] = (p^s \cdot x^t < p^t \cdot x^t)\]

A violation occurs if : $R_p^*[s,t] \wedge P_p[t,s]$. This is the same structure as GARP but on the transposed price-expenditure graph.

Implementation

Rust: rpt-core/src/gapp.rs - gapp_check() uses SCC-optimized transitive closure on the price preference graph.

Reference: Deb, Kitamura, Quah & Stoye (2023, RES).

Stochastic Choice and RUM

Definition: Stochastic choice models analyze the probability $P(x|A)$ of choosing item $x$ from a menu $A$. A Random Utility Model (RUM) assumes the consumer has a utility $U_i = V_i + \epsilon_i$ where $\epsilon_i$ is a random error term.

Key Axioms: - Regularity: $P(x|A) \geq P(x|B)$ whenever $A \subseteq B$; removing options should not decrease the probability of choosing an existing item. - IIA (Independence of Irrelevant Alternatives): The ratio $P(x|A)/P(y|A)$ is constant across menus that contain both $x$ and $y$.

Algorithms: PrefGraph implements maximum likelihood estimation for: - Multinomial Logit (MNL): Type‑I Extreme Value (Gumbel) errors; satisfies IIA. - Luce choice rule: $P(x\mid A) = \dfrac{w_x}{\sum_{y\in A} w_y}$; equivalent to MNL under a log‑weight parameterization.

Implementation

Python: src/prefgraph/contrib/stochastic.py - fit_random_utility_model().

References: McFadden (1974); Chambers & Echenique (2016), Chapter 13.

Practical Usage: Code Examples

The examples below use built-in data generators so every snippet runs as-is.

Budget analysis (GARP, CCEI, MPI)

from prefgraph import load_demo, Engine

users = load_demo(n_users=100)  # 40% rational, 40% noisy, 20% irrational
results = Engine(metrics=["garp", "ccei", "mpi"]).analyze_arrays(users)

n_consistent = sum(r.is_garp for r in results)
mean_ccei    = sum(r.ccei    for r in results) / len(results)
mean_mpi     = sum(r.mpi     for r in results) / len(results)

print(f"GARP-consistent: {n_consistent}/100")   # GARP-consistent: 96/100
print(f"Mean CCEI:       {mean_ccei:.4f}")       # Mean CCEI:       0.9976
print(f"Mean MPI:        {mean_mpi:.4f}")        # Mean MPI:        0.0026

Menu analysis (SARP, WARP, Houtman-Maks)

from prefgraph import generate_random_menus, Engine

users = generate_random_menus(n_users=100)
results = Engine().analyze_menus(users)

n_sarp  = sum(r.is_sarp for r in results)
n_warp  = sum(r.is_warp for r in results)
mean_hm = sum(r.hm_consistent / r.hm_total
              for r in results if r.hm_total > 0) / len(results)

print(f"SARP-consistent: {n_sarp}/100")          # SARP-consistent: 7/100
print(f"WARP-consistent: {n_warp}/100")          # WARP-consistent: 7/100
print(f"Mean HM ratio:   {mean_hm:.4f}")         # Mean HM ratio:   0.7020

Solver Stack

Solver	Status	Notes
HiGHS	Default	MIT-licensed. Best open-source LP/MILP solver (Machado, 2024). Used by SciPy since v1.6.0. Competitive with Gurobi for LP; ~10× slower for MIP.
Gurobi	Optional	Commercial license required. Enable via `cargo build --features gurobi`. ~10× faster for Houtman-Maks ILP. Functions: `solve_afriat_lp_gurobi()`, `solve_hm_ilp_gurobi()` in `lp.rs`.

To build with Gurobi support:

# Requires GUROBI_HOME set and gurobi shared library available
cd rust && cargo build --release --features gurobi