Cospectral

Cospectral is an MIT spinoff developing new conceptual and computational architectures for learning systems that are minimal1,2,3, online4, continual5,6, and capable of interacting with the physical world7,8. We believe that constructing larger offline statistical models will not lead to meaningful progress solving this problem, despite how popular the approach might be. Existing models require ever-increasing amounts of hand-coded data, are only useful in settings where the past is a sufficient model of the future, and are inextricably bound by available compute. We develop forward-looking systems designed for continuously changing domains, where unseen scenarios are common, including those for which training data will never exist.

Our researchers have backgrounds in robotics and reinforcement learning, with experience developing autonomous vehicles, personal robots, spacecraft, multimodal models, and safety‑critical systems. Drawing on algorithmic information theory, sparse reconstruction, approximation theory, harmonic analysis, and compressive sensing, our proprietary algorithms yield systems that are online4,14, adaptive9,15, interpretable16,17, and minimal1,13.

What does a model model?

Artificial Intelligence and Machine Learning are byproducts of Information Theory18,19,20. In this paradigm, signals are considered encrypted until a functioning representation for them is found. Once found, the representation can be used to reconstruct noisy or incomplete observations. If compositional, it can expand beyond what has been seen and is known51,54. The more robust the representation, the more efficiently the information can be compressed.

Revisiting the learning problem using its original information-theoretic framing leads to systems that are very different from those used today. They operate directly on continuous unstructured datastreams, not on labeled batches of data. They use a universal framework to incrementally compose representations, not handcrafted encoders or embeddings47. They seek unexpected, orthogonal, and incoherent observations21, not samples that lower prediction loss. They compile sparse10 abstractions of causal relationships11 and identified components12 which are simplified over time1,13. Their ultimate goal is to minimize their computational requirements.

Local minima

The concept of a system purposely designed to become larger and increase in computational requirements is antithetical to the entire history of computing45,46. As such, large generative models have demonstrably become compressors24,25. They were built with brute force, using most of all available data and existing computational resources over two decades26. The resulting models top lossless compression benchmarks for several data modalities22,23,24,25. However, this capability does not enable them to generalize beyond information stored in their weights44 or to scale without limit24. It is not surprising that, even though they are widely believed to be uninterpretable and probabilistic, it has been comprehensively shown that it is possible to extract their training data27,28,29,30,31,41,42, architecture32,33,34,35, logits35,38,39, policies41, inputs36,37,38,39, alignment data40, and more42,43.

Moving forward

Progress toward minimal, efficient systems has continued in other fields. Breakthroughs in applied mathematics and computational statistics led to compressive sensing algorithms able to exactly reconstruct signals while sampling under the Nyquist rate21. Innovations in coding theory reincorporate Gallager codes48 in new forward error correction techniques that closely approach the Shannon limit49,50. Discoveries in harmonic analysis16,51,52 allow for alternate deep learning architectures that are deterministic and interpretable while using far fewer trainable parameters17,53. These are only a few examples, yet they cover a multitude of technologies currently powering our communication, computing, sensing, and even medical devices.

Artificial intelligence has followed a turbulent and unpredictable trajectory that includes multiple directional changes along with several ‘winters’55. Its history is full of examples of significant findings that challenge existing perspectives, often leading to redefined priorities. We see a path toward results that push our current theoretical limits and are committed to developing the technologies necessary to turn that possible future into our present reality.

References

Rissanen, Jorma. “Modeling by shortest data description.” Automatica 14, no. 5 (1978)
Ray J Solomonoff. “A preliminary report on a general theory of inductive inference”. In Zator Company Cambridge, MA. 1960.
Ray J Solomonoff. “A formal theory of inductive inference. Part I”. In Information and Control 7.1 (1964)
Rakhlin, Alexander, and Arthur Flajolet. “6.883: Online Methods in Machine Learning.” MIT (2016).
Thrun, Sebastian, and Tom M. Mitchell. “Lifelong Robot Learning.” In The Biology and Technology of Intelligent Autonomous Agents, NATO ASI Series, vol 144. (1995).
Ring, Mark B. “Child: A first step towards continual learning.” Machine Learning 28, no. 1 (1997)
Jürgen Schmidhuber. “Planning & Reinforcement Learning with Recurrent World Models and Artificial Curiosity.” (1990).
Ha, David, and Jürgen Schmidhuber. “World Models.” (2018).
Nicholas Roy and Andrew McCallum. “Toward optimal active learning through sampling estimation of error reduction”. In In Proceedings 18th International Conference on Machine Learning. 2001.
Foucart, Simon, and Holger Rauhut. “Sparse solutions of underdetermined systems.” In A Mathematical Introduction to Compressive Sensing, 2013.
Good, Irving J. “A causal calculus (I).” The British journal for the philosophy of science 11, no. 44 (1961)
Soderstrom, Torsten, and Petre Stoica. “System identification.“ Prentice-Hall, 1989.
Kolmogorov, Andrei N. “Three approaches to the quantitative definition of information.” Problems of Information Transmission 1, no. 1 (1965)
Sutton, Richard S., and Steven D. Whitehead. “Online learning with random representations.” In ICML, 1993.
Li, Puheng, Tijana Zrnic, and Emmanuel Candes. “Robust sampling for active statistical inference.” Advances in Neural Information Processing Systems 38 (2026)
Behboodi, Arash, Holger Rauhut, and Ekkehard Schnoor. “Compressive sensing and neural networks from a statistical learning perspective.” In Compressed Sensing in Information Processing, 2022.
Zeger, Emi, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, and Mert Pilanci. “A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features.” 2024.
Turing, A. M. “Mathematical theory of enigma machine.” Public Record Office, London 3 (1940).
Shannon, Claude E. “A mathematical theory of cryptography.” Mathematical Theory of Cryptography (1945).
Shannon, Claude E. “Prediction and entropy of printed English.” Bell system technical journal 30, no. 1 (1951).
Candès, Emmanuel J., Justin Romberg, and Terence Tao. “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information.” (2004).
Bellard, Fabrice. “Lossless data compression with neural networks.” Technical report (2019).
Bellard, Fabrice. “Nncp v2: Lossless data compression with transformer.” Technical report (2021).
Delétang, Grégoire, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya et al. “Language modeling is compression.” In International Conference on Learning Representations, 2024
Li, Ziguang, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, and Ming Li. “Lossless data compression by large models.” Nature Machine Intelligence 7, no. 5 (2025).
Brants, Thorsten, Ashok Popat, Peng Xu, Franz Josef Och, and Jeffrey Dean. “Large language models in machine translation.” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007.
Carlini, Nicholas, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts et al. “Extracting training data from large language models.” In 30th USENIX security symposium, 2021.
Carlini, Nicolas, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. “Extracting training data from diffusion models.” In 32nd USENIX security symposium, 2023.
Morris, John, Volodymyr Kuleshov, Vitaly Shmatikov, and Alexander M. Rush. “Text embeddings reveal (almost) as much as text.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
Nasr, Milad, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher Choquette-Choo, Florian Tramèr, and Katherine Lee. “Scalable extraction of training data from aligned, production language models.” In International Conference on Learning Representations, vol. 2025, 2025.
Ahmed, Ahmed, A. Feder Cooper, Sanmi Koyejo, and Percy Liang. “Extracting books from production language models.” (2026).
Tramèr, Florian, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. “Stealing machine learning models via prediction APIs.” In 25th USENIX security symposium. 2016.
Jagielski, Matthew, Nicholas Carlini, David Berthelot, Alex Kurakin, and Nicolas Papernot. “High accuracy and high fidelity extraction of neural networks.” In 29th USENIX security symposium. 2020.
Shamir, Adi, Isaac Canales-Martinez, Anna Hambitzer, Jorge Chavez-Saab, Francisco Rodrigez-Henriquez, and Nitin Satpute. “Polynomial time cryptanalytic extraction of neural network models.” (2023).
Carlini, Nicholas, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee et al. “Stealing part of a production language model.” In International Conference on Machine Learning, PMLR, 2024.
Zhang, Yiming, Nicholas Carlini, and Daphne Ippolito. “Effective prompt extraction from language models.” (2023).
Zhang, Collin, John Xavier Morris, and Vitaly Shmatikov. “Extracting prompts by inverting llm outputs.” In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024.
Morris, John X., Wenting Zhao, Justin Chiu, Vitaly Shmatikov, and Alexander Rush. “Language model inversion.” In International Conference on Learning Representations, 2024.
Nazir, Murtaza, Matthew Finlayson, John Morris, Xiang Ren, and Swabha Swayamdipta. “Better language model inversion by compactly representing next-token distributions.” Advances in Neural Information Processing Systems 38 (2026).
Barbero, Federico, Xiangming Gu, Christopher A. Choquette-Choo, Chawin Sitawarin, Matthew Jagielski, Itay Yona, Petar Veličković, Ilia Shumailov, and Jamie Hayes. “Extracting alignment data in open models.” (2025).
He, Chengyang, Xu Liu, Gadiel Mark Sznaier Camps, Joseph Bruno, Guillaume Adrien Sartoretti, and Mac Schwager. “Demystifying robot diffusion policies: Action memorization and a simple lookup table alternative.” In The Fourteenth International Conference on Learning Representations. 2026.
Borkar, Jaydeep, Matthew Jagielski, Katherine Lee, Niloofar Mireshghallah, David A. Smith, and Christopher A. Choquette-Choo. “Privacy ripple effects from adding or removing personal information in language model training.” In Findings of the Association for Computational Linguistics, 2025.
Zhang, Tingwei, John X. Morris, and Vitaly Shmatikov. “How to Steal Reasoning Without Reasoning Traces.” (2026).
Mirzadeh, Iman, Keivan Alizadeh-Vahid, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. “Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models.” In International Conference on Learning Representations, vol. 2025, 2025.
Denning, Peter J., Douglas E. Comer, David Gries, Michael C. Mulder, Allen Tucker, A. Joe Turner, and Paul R. Young. “Computing as a discipline.” Computer 22, no. 2 (1989).
Ridgway, Richard K. “Compiling routines.” In Proceedings of the 1952 ACM national meeting. 1952.
Weller, Orion, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. “On the theoretical limitations of embedding-based retrieval.” (2025).
Gallager, Robert. “Low-density parity-check codes.” IRE Transactions on information theory 8, no. 1 (1962).
Berrou, Claude. “The ten-year-old turbo codes are entering into service.” IEEE communications magazine 41, no. 8 (2003).
Komoto, Daiki, and Kenta Kasai. “Quantum error correction near the coding theoretical bound.” npj Quantum Information 11, no. 1 (2025).
Tang, Gongguo, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht. “Compressed sensing off the grid.” IEEE transactions on information theory 59, no. 11 (2013).
Hershey, John R., Jonathan Le Roux, and Felix Weninger. “Deep unfolding: Model-based inspiration of novel deep architectures.” (2014).
Kouni, Vicky, and Yannis Panagakis. “Generalization analysis of an unfolding network for analysis-based Compressed Sensing.” Applied and Computational Harmonic Analysis 79 (2025).
Feng, Ping, and Yoram Bresler. “Spectrum-blind minimum-rate sampling and reconstruction of multiband signals.” In IEEE international conference on acoustics, speech, and signal processing conference proceedings, vol. 3, 1996.
Crevier, D. “AI: The Tumultuous Search for Artificial Intelligence.” (1993).