Preference Graphs
==================

.. code-block:: bash

   pip install prefgraph                  # core library
   pip install "prefgraph[parquet]"       # + Parquet file support
   pip install "prefgraph[datasets]"      # + real-world dataset loaders

See the :doc:`install` page for all extras and workflow options.

.. raw:: html

   <div style="display: flex; gap: 24px; align-items: center; flex-wrap: wrap; margin: 1.5em 0;">
     <div style="flex: 1; min-width: 280px;">
       <p style="font-size: 1.05em; line-height: 1.6; margin: 0;">
         When users make choices, we can represent their decisions as a <strong>preference graph</strong>.
         If someone chooses A over B, B over C, and then C over A, they have formed a cycle.
         These cycles could represent an inconsistency in their decision making.
         PrefGraph uses graph algorithms (like Tarjan's SCC) to detect these cycles.
         By identifying and counting these violations, we can score a user's consistency.
       </p>
     </div>
     <div style="flex: 1; min-width: 320px;">
       <img src="_static/consistency.gif" style="width: 100%; border-radius: 8px; box-shadow: 0 4px 6px rgba(0,0,0,0.1);" alt="Animated preference graph showing consistency detection">
     </div>
   </div>

Budgets & Menu Choices
----------------------

PrefGraph handles two types of choice data. **Budgets** are purchased quantities at given prices, like retail shopping. **Menus** are discrete selections from a set of available items, like search clicks or AI agent prompting. Menus come in three flavors: deterministic (``MenuChoiceLog``), stochastic (``StochasticChoiceLog``), and risk-based lotteries (``RiskChoiceLog``).

PrefGraph accepts Polars DataFrames, Pandas, Parquet files, or raw NumPy arrays. See the :doc:`Loading Data <quickstart>` guide for examples.

**Budget-choice example**

.. code-block:: python

   from prefgraph.datasets import load_demo
   from prefgraph.engine import Engine, results_to_dataframe

   # load_demo returns list[tuple[prices, quantities]] — synthetic shoppers
   users = load_demo(n_users=100_000)

   # Engine scores every user in parallel via Rust/Rayon
   engine = Engine(metrics=["garp", "ccei"])  # GARP = acyclicity test, CCEI = efficiency score
   results = engine.analyze_arrays(users)

   # Flatten to a DataFrame for analysis
   df = results_to_dataframe(results)
   print(df[["is_garp", "n_violations", "ccei"]].head())

.. code-block:: text

   Scored 100,000 users in 3.8s (26,165 users/sec)

      is_garp  n_violations      ccei
   0     True             0  1.000000
   1     True             0  1.000000
   2     True             0  1.000000
   3    False             4  0.972536
   4    False             2  0.978055

**Menu-choice example**

.. code-block:: python

   from prefgraph import generate_random_menus
   from prefgraph.engine import Engine, results_to_dataframe

   # Generate discrete-choice data: each user picks one item from a menu
   menus_data = generate_random_menus(
       n_users=100_000, n_obs=10, n_items=5,
       menu_size=(2, 5),        # menus contain 2–5 items each
       choice_model="logit",    # logit model with some noise
       rationality=0.7, seed=42
   )

   # HM = Houtman-Maks: counts how many choices to discard for consistency
   engine = Engine(metrics=["hm"])
   results = engine.analyze_menus(menus_data)
   df = results_to_dataframe(results)
   print(df[["is_sarp", "n_sarp_violations", "hm_consistent", "hm_total"]].head())

.. code-block:: text

   Generated + scored 100,000 users in 2.6s (38,895 users/sec)

      is_sarp  n_sarp_violations  hm_consistent  hm_total
   0    False                  6              3         5
   1    False                  3              4         5
   2    False                  6              3         5
   3    False                  3              4         5
   4    False                  6              3         5

Before You Trust the Scores
---------------------------

Consistency scores are only meaningful when the input data represents genuine feasible choices. Menus must reflect what the user actually saw, not a retroactive reconstruction from purchase logs. Keep only clean single-choice sessions where the user picked exactly one item. The chosen item must be present in the menu. Item IDs must be remapped to contiguous ``0..N-1`` indices before scoring. For budget data, prices must be positive and quantities non-negative. The Engine now rejects NaN, Inf, negative prices, out-of-range item IDs, and duplicate menu items with clear error messages, but the harder question is whether your menus and budgets approximate real choice sets at all. If they do not, the scores measure data artifacts, not behavior. See the :doc:`Loading Data <quickstart>` guide for worked examples of building clean inputs from raw event logs.

Examples
--------

Inconsistency in AI Agents
~~~~~~~~~~~~~~~~~~~~~~~~~~

Do LLMs have stable action rankings, or does the ranking change when different
alternatives are shown? We queried GPT-4o-mini across 5 enterprise scenarios
(support triage, alert routing, content moderation, hiring, procurement), each
with 10 vignettes, 5 prompt frameworks, and 15 menus per vignette. The
deterministic stage collected 3,750 calls at temperature 0; the stochastic stage
sampled each menu 20 times at temperature 0.7, adding 75,000 calls — roughly
78,750 API calls in total over 15 hours. We built preference graphs from these
responses and tested for logical cycles. All vignettes are synthetic and results
come from a single model family, so these numbers are a diagnostic demo rather
than a general benchmark. Full results: :doc:`budget/app_llm_benchmark`.

.. list-table::
   :header-rows: 1
   :widths: 30 30 30

   * - Operational Scenario
     - Perfect Consistency (%)
     - Probabilistic Consistency (%)
   * - **Support**
     - 88
     - 54
   * - **Alert**
     - 92
     - 74
   * - **Content Moderation Task**
     - 82
     - 60
   * - **Jobs Task**
     - 74
     - 62
   * - **Procurement**
     - 84
     - 61

Predicting Customer Spend and Engagement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We tested whether revealed preference features improve user-level predictions across 11 datasets and 32 targets. Under 5-fold cross-validation, the median lift is zero. The one exception is Amazon churn prediction. Despite near-zero predictive lift, three revealed preference features rank in the top ten by model importance. Full results: :doc:`benchmarks_ecommerce`.

.. list-table::
   :header-rows: 1
   :widths: 18 8 15 12 12

   * - Dataset
     - N
     - Target
     - Base AUC-PR
     - +RP AUC-PR
   * - Amazon
     - 4,694
     - Spend Drop
     - .226
     - **.248**
   * - REES46
     - 8,832
     - Low Loyalty
     - .709
     - .715
   * - H&M
     - 46,757
     - High Spender
     - .683
     - .682
   * - FINN
     - 46,858
     - Low Loyalty
     - .780
     - .781

Performance
~~~~~~~~~~~

The Rust engine processes users in parallel via Rayon and streams them in fixed-size chunks, so memory stays flat regardless of population size. On a 10-core laptop, scoring 100,000 users across five metrics from a 110 MB Parquet file takes under two minutes. See the :doc:`Performance Benchmarks <performance>` page for details.

.. raw:: html

   <div style="max-width: 640px;">

.. list-table:: Throughput by Metric Configuration (T=20-100, K=5)
   :header-rows: 1
   :widths: 40 20 20

   * - Metrics
     - Throughput (users/sec)
     - Latency (per user)
   * - **GARP Only** (O(T²))
     - ~49,000
     - 20 μs
   * - **GARP + CCEI**
     - ~2,400
     - 420 μs
   * - **Comprehensive** (GARP, CCEI, MPI, HARP)
     - ~2,000
     - 500 μs

.. list-table:: Budget — Large-Scale
   :header-rows: 1
   :widths: 25 15 15 15

   * - Configuration
     - 10K users
     - 100K users
     - 1M users
   * - GARP (O(T²))
     - 0.1s
     - 2.0s
     - ~20s
   * - GARP + CCEI
     - 4.2s
     - 39.5s
     - ~6.6 min
   * - Comprehensive Suite
     - 6.8s
     - 67.1s
     - ~11 min

.. list-table:: Menu — Large-Scale
   :header-rows: 1
   :widths: 25 15 15 15

   * - Configuration
     - 10K users
     - 100K users
     - 1M users
   * - SARP + WARP + HM
     - 0.3s
     - 5.2s
     - **85.6s**

.. raw:: html

   </div>

Explore the :doc:`API Reference <api>` and :doc:`References <papers>` for more.

..
   Archived: homepage book blurb moved to docs/archive/homepage_extras.rst

.. toctree::
   :maxdepth: 2
   :hidden:

   install
   quickstart
   theory/index
   identification
   budget/index
   menu/index
   benchmarks
   algorithms
   api
   papers