Building Safe Model

This is still a very young field, but one that is becoming increasingly critical for the safe use of AI. We are committed to building AI systems that are safe, secure, and aligned with human values.

Our research focuses on developing techniques for ensuring that AI systems behave in a predictable and reliable manner, even in the face of uncertainty and change. We are also exploring ways to make AI systems more transparent and interpretable, so that their decisions can be understood and trusted by humans.

Inspiration from Anthropic's Open-source circuit tracing tools which is a step towards building safer AI systems. These tools allow researchers to analyse and understand the inner workings of AI models, which is crucial for identifying potential safety issues and developing strategies to mitigate them.

These tools can be used to trace the flow of information through a model, identify key decision points, and analyse the impact of different inputs on the model's output. This helps us in our research to identify potential vulnerabilities and develop strategies to improve the safety and reliability of AI systems.

Neuronpedia interface

Academic Research To Deepen Our Research

Some of the academic research papers that have inspired and informed our work in building safe and reliable AI systems. These papers cover a range of topics, including neural network architectures, training techniques, and methods for improving model interpretability and robustness. By studying these papers, we have been able to gain a deeper understanding of the challenges and opportunities in the field of AI safety, and to develop new approaches for building more trustworthy AI systems.

Liquid Time-constant Networks

Liquid Structural State-Space Models

Hyena Hierarchy: Towards Larger Convolutional Language Models

Liquid Structural State-Space Models

STAR: Synthesis of Tailored Architectures

Improving neural networks by preventing co-adaptation of feature detectors

Deep Researcher with Test-Time Diffusion

A Survey on Self-Evolution of Large Language Models

Small Language Models: Survey, Measurements, and Insights

AIOS: LLM AGENT OPERATING SYSTEM

LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem