Stop Guessing Perceptual Dimensionality: An Occam's Razor for Ordinal Embeddings

ordinal-embedding intrinsic-dimensionality psychology representations machine-learning subjective-perceptual-learning relative-similarity-embedding

February 20, 2026

Based on our ICLR 2026 paper: LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data

Introduction

Measuring and mapping human perception quantitatively is hard. If you want to measure physical distance, you use a ruler. But if you want to measure how “sweet” a cake is, or how “aesthetic” a painting is, no absolute physical ruler exists.

Historically, researchers relied on absolute queries, like asking subjects to rate a stimulus on a 1-5 Likert scale. But absolute scales are fundamentally flawed: my “moderately sweet” might be your “cloyingly sweet”. Moreover, humans are quite inconsistent in their absolute judgments giving very different answers to the same questions at different times [1]. To get around the lack of an absolute ruler, early psychophysics [2], [3], [3] and later, machine learning [4], [5], [6] leveraged relative queries. These ask subjects to compare stimuli against each other (e.g., “Is stimulus A more similar to B than to C?”) instead of rating them in a vacuum. (Check out Episode 1 of my blog series for more details on how relative queries allowed psychophysics to escape the absolute ruler.)

However, converting these relative judgements into a latent geometry is non-trivial. The field developed Multidimensional Scaling (MDS) algorithms [7], which yield exact solutions but require an exhaustive pairwise distance matrix, and Ordinal Embedding (OE) algorithms [5], [6], [8], which are highly data-efficient but non-convex. Despite their challenges, these methods allowed researchers to finally build multidimensional perceptual maps instead of flat rulers.

**Relative Scales are more reliable than absolute scales.** Human perception lacks an absolute baseline for abstract concepts like loudness or taste. However, we possess a highly sensitive comparison mechanism. In this example, it is much easier to say which sound is louder than to quantify exactly how loud each sound is.

But there’s a massive catch. Both MDS and OEs require you to specify the dimensionality of the map before you even start building it.

The Problem of Dimensionality

Consider a thought experiment. Imagine trying to map the perceptual geometry of food. How many dimensions do you think you need? Sweetness and sourness give us two. Texture adds a third. Maybe temperature adds a fourth?

Regardless of your intuition, manually choosing the dimensionality means you are making an arbitrary guess about the shape of the space. This is a major roadblock when the entire goal of your research is to discover the underlying structure of an unknown percept, which is the case for much of psychology!

Because existing OEs require you to specify this intrinsic rank upfront, you are forced into a blind guessing game that leads to two major problems:

Underfitting: If you guess too few dimensions, you crush complex relationships. The model might combine “spicy” and “hot temperature” into a single confusing axis.
Overparameterizing: If you play it safe and guess 15 dimensions, the algorithm will happily spread the data across all of them. A 15D model might satisfy all your triplet comparisons perfectly, but it fractures simple concepts across multiple axes, making the resulting map totally uninterpretable to scientists.

Scientific discovery demands Occam’s Razor: we want the simplest model that explains the data. A 2D taste map is infinitely more useful than a 10D one if the underlying percept only varies in two ways. Yet, historically, we had no scalable way to let the data tell us its own complexity.

**LORE jointly learns both the intrinsic dimensionality and relative similarities.**: Other methods require the embedding dimension to be chosen apriori, making them highly susceptible to underfitting or overparameterizing the latent space.

LORE: Letting the Data Tell You the Intrinsic Rank

To solve this, we introduce LORE (Low Rank Ordinal Embedding), a new algorithm that jointly learns the intrinsic rank and the relative similarities directly from the data.

The core intuition behind LORE is to apply Occam’s Razor mathematically. We want to balance fitting the similarity structure (which existing OEs do quite well) with penalizing unnecessary dimensions. Strictly minimizing the number of dimensions (the matrix rank) is computationally NP-Hard [9]. The standard workaround is to use a convex relaxation called the nuclear norm [10].

However, there is a problem: the nuclear norm uniformly shrinks all singular values [11], [12] not just the lower order ones. While this works well empirically for standard matrix completion [10], it often fails to recover the true intrinsic rank of perceptual spaces because it over-penalizes the largest, most important dimensions that actually define the space [13].

Instead LORE regularizes using the nonconvex Schatten \(p\) Quasi-norm (\(0 < p < 1\)) [13], [14]. This specific penalty is much more forgiving to the large, dominant singular values but aggressively crushes the smaller, noisy ones to zero.

By balancing a smoothed ordinal embedding loss with this Schatten \(p\)-quasi-norm penalty, LORE automatically prunes away unnecessary dimensions during training. \[\begin{array}{rlr} \underset{\mathbf{Z}}{\text{min}}\text{ }\Psi(\mathbf{Z}) & = \sum\limits_{(a,i,j) \in T}\log(1 + \exp(1 + d\left( \mathbf{Z}_{a,:},\mathbf{Z}_{i,:} \right) - d\left( \mathbf{Z}_{a,:},\mathbf{Z}_{j,:} \right)))) & + \lambda\sum_{i = 1}^{\min\left\{ N,d' \right\}}\sigma_{i}\left( \mathbf{Z} \right)^{p}. \end{array}\]

Because this objective is highly non-convex, standard optimization methods often fail. To solve this, we use an efficiently scaled, iteratively reweighted Singular Value Decomposition (SVD) algorithm [15]. Even with the inherent non-convexity, this guarantees convergence to a stationary point. And because stationary points in OE landscapes are generally known to be nearly as good as global optima [16], [17], LORE reliably yields robust, high quality embeddings.

Does it work? Yes!

Evaluating LORE against existing methods is tricky because, for real human data, we don’t actually know the “true” intrinsic rank. Therefore, we first had to leverage synthetic data where the ground-truth rank is known.

We benchmarked LORE against state-of-the-art OEs across synthetic environments, as well as LLM-generated proxy perceptual spaces (using SBERT [18] to model human alignment. It has been found that LLMs are able to capture human perceptual similarity [19]). As seen in the plot below, the results were stark: LORE was the only method that accurately tracked and recovered the true intrinsic rank while maintaining near-optimal test triplet accuracy.

Only LORE successfully learns the true intrinsic rank while maintaining high accuracy on LLM-simulated perceptual data.

But the most exciting results came from real, noisy human data, including the Food-100 dataset [20], which contains crowdsourced triplet ratings of 100 different food items based on perceived similarity of taste.

Given a maximum allowable dimension of 15, standard OEs blindly used all 15 dimensions, creating a completely tangled perceptual space. LORE, acting as an automated Occam’s Razor, seamlessly compressed the embedding down to roughly 3.3 dimensions without sacrificing accuracy.

**Only LORE is able to recover the low dimensional structure of the data** while maintaining near-optimal test triplet accuracy on the Food-100 dataset [20].
Method	Test Acc.	Rank	Time (s)
LORE (Ours)	\(82.45\) \(\pm\) \(0.27\)	\(3.3\) \(\pm\) \(0.47\)	\(6.64\) \(\pm\) \(3.90\)
SOE	\(82.34\) \(\pm\) \(0.32\)	\(15\) \(\pm\) \(0.00\)	\(27.09\) \(\pm\) \(1.38\)
FORTE	\(81.73\) \(\pm\) \(0.46\)	\(15\) \(\pm\) \(0.00\)	\(6.34\) \(\pm\) \(0.52\)
t-STE	\(82.79\) \(\pm\) \(0.24\)	\(15\) \(\pm\) \(0.00\)	\(40.93\) \(\pm\) \(20.14\)
CKL	\(82.75\) \(\pm\) \(0.20\)	\(15\) \(\pm\) \(0.00\)	\(18.41\) \(\pm\) \(7.89\)
Dim-CV	\(77.67\) \(\pm\) \(0.02\)	\(1.47\) \(\pm\) \(0.51\)	\(1721.9\) \(\pm\) \(26.71\)

The ultimate test of any perceptual map is whether its dimensions actually mean something. Remarkably, even though LORE was trained purely on relative similarities, the resulting axes organically aligned with human interpretable features. The first axis naturally separates sweet from savory; the second contrasts dense foods with light ones; and the third distinguishes carb-heavy items from proteins and vegetables.

LORE’s learned axes are semantically interpretable on the Food-100 dataset. [20]

The Takeaway

When we model human perception, we aren’t just trying to maximize predictive accuracy on a holdout set. Instead, we are trying to uncover the latent geometry of the mind. By jointly inferring relative similarities and intrinsic dimensionality, LORE bakes Occam’s Razor directly into the learning process. It removes the need to blindly guess the dimensionality, ensuring we neither underfit nor overparameterize the underlying perceptual space.

Code

Code to reproduce our results and for your own ordinal datasets is available at https://github.com/vivek2000anand/lore_iclr. We are in progress of integrating LORE into the open source python comparison based machine learning library cblearn which would make calling LORE as easy as a call to a standard sciki-learn model. Stay tuned!

References

[1]

N. Stewart, G. D. Brown, and N. Chater, “Absolute identification by relative judgment.” Psychological review, vol. 112, no. 4, p. 881, 2005.

[2]

L. L. Thurstone, “A law of comparative judgment,” in Scaling, Routledge, 2017, pp. 81–92.

[3]

L. L. Thurstone, “The measurement of social attitudes.” The journal of abnormal and social psychology, vol. 26, no. 3, p. 249, 1931.

[4]

R. N. Shepard, “The analysis of proximities: Multidimensional scaling with an unknown distance function. i.” Psychometrika, vol. 27, no. 2, pp. 125–140, 1962.

[5]

O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai, “Adaptively Learning the Crowd Kernel.” arXiv, Jun. 2011. doi: 10.48550/arXiv.1105.1033.

[6]

Y. Terada and U. Luxburg, “Local ordinal embedding,” in International conference on machine learning, PMLR, 2014, pp. 847–855.

[7]

J. B. Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, no. 1, pp. 1–27, 1964.

[8]

S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, and S. Belongie, “Generalized non-metric multidimensional scaling,” in Artificial intelligence and statistics, PMLR, 2007, pp. 11–18.

[9]

M. Fazel, H. Hindi, and S. P. Boyd, “A rank minimization heuristic with application to minimum order system approximation,” in Proceedings of the 2001 american control conference.(cat. No. 01CH37148), IEEE, 2001, pp. 4734–4739.

[10]

E. J. Candes and B. Recht, “Exact Matrix Completion via Convex Optimization.” arXiv, May 2008. doi: 10.48550/arXiv.0805.4471.

[11]

S. Negahban and M. J. Wainwright, “Estimation of (near) low-rank matrices with noise and high-dimensional scaling,” 2011.

[12]

C.-H. Zhang, “Nearly unbiased variable selection under minimax concave penalty,” 2010.

[13]

C. Lu, J. Tang, S. Yan, and Z. Lin, “Generalized Nonconvex Nonsmooth Low-Rank Minimization,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 4130–4137. doi: 10.1109/CVPR.2014.526.

[14]

G. Marjanovic and V. Solo, “On \(l\_q\) optimization and matrix completion,” IEEE Transactions on signal processing, vol. 60, no. 11, pp. 5714–5724, 2012.

[15]

T. Sun, H. Jiang, and L. Cheng, “Convergence of proximal iteratively reweighted nuclear norm algorithm for image processing,” IEEE Transactions on Image Processing, vol. 26, no. 12, pp. 5632–5644, 2017.

[16]

A. Bower, L. Jain, and L. Balzano, “The landscape of non-convex quadratic feasibility,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2018, pp. 3974–3978.

[17]

L. C. Vankadara, M. Lohaus, S. Haghiri, F. U. Wahab, and U. Von Luxburg, “Insights into ordinal embedding algorithms: A systematic evaluation,” Journal of Machine Learning Research, vol. 24, no. 191, pp. 1–83, 2023.

[18]

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using siamese BERT-networks,” in Proceedings of the 2019 conference on empirical methods in natural language processing, Association for Computational Linguistics, Nov. 2019. Available: https://arxiv.org/abs/1908.10084

[19]

R. Marjieh, I. Sucholutsky, P. van Rijn, N. Jacoby, and T. L. Griffiths, “Large language models predict human sensory judgments across six modalities,” Scientific Reports, vol. 14, no. 1, p. 21445, 2024.

[20]

M. Wilber, I. Kwak, and S. Belongie, “Cost-effective hits for relative similarity comparisons,” in Proceedings of the AAAI conference on human computation and crowdsourcing, 2014, pp. 227–233.

Citation

If you found this post helpful, please consider citing it:

@inproceedings{anand2026lore,
  title={LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data},
  author={Anand, Vivek and Helbling, Alec and Davenport, Mark A. and Berman, Gordon J and Alagapan, Sankaraleengam and Rozell, Christopher J},
  paper_url = {https://arxiv.org/abs/2602.04192},
  booktitle={International Conference on Learning Representations (ICLR)},
  month        = {April},
  address      = {Rio de Janeiro, Brazil},
  year={2026}
}