new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

May 11

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented language model (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data selection. Rather than using black-box representations, our approach directly characterizes each target input by a sparse set of high-impact neurons in any off-the-shelf LLMs. Concretely, we quantify neuron impact and select the most influential neurons across layers into a compact Neuron-Activated Graph (NAG), and rank candidate data by NAG similarity to target examples. We conduct experiments across six benchmarks, where our NAG-based Ranking improves target-oriented pretraining by 4.9% on average over random sampling, and also outperforms state-of-the-art baselines by 5.3% accuracy on HellaSwag. It also remains effective under a more applicable multi-target setting, where our best setup surpasses two baselines by 1.1% and 4.1%, respectively. Furthermore, we provide a comprehensive analysis on why and how our NAG works, e.g., deactivating NAG-selected neurons (only 0.12% of all) causes a 23.5% performance collapse, and restricting NAG to the final layer incurs a 4.1% average drop, indicating that NAG captures a sparse "functional backbone" for learning target features. We release the code at https://github.com/asillycat/NAG.

UCSC-VLAA UCSC-VLAA
·
Apr 16 2

Neuron-Level Analysis of Cultural Understanding in Large Language Models

As large language models (LLMs) are increasingly deployed worldwide, ensuring their fair and comprehensive cultural understanding is important. However, LLMs exhibit cultural bias and limited awareness of underrepresented cultures, while the mechanisms underlying their cultural understanding remain underexplored. To fill this gap, we conduct a neuron-level analysis to identify neurons that drive cultural behavior, introducing a gradient-based scoring method with additional filtering for precise refinement. We identify both culture-general neurons contributing to cultural understanding regardless of cultures, and culture-specific neurons tied to an individual culture. These neurons account for less than 1% of all neurons and are concentrated in shallow to middle MLP layers. We validate their role by showing that suppressing them substantially degrades performance on cultural benchmarks (by up to 30%), while performance on general natural language understanding (NLU) benchmarks remains largely unaffected. Moreover, we show that culture-specific neurons support knowledge of not only the target culture, but also related cultures. Finally, we demonstrate that training on NLU benchmarks can diminish models' cultural understanding when we update modules containing many culture-general neurons. These findings provide insights into the internal mechanisms of LLMs and offer practical guidance for model training and engineering. Our code is available at https://github.com/ynklab/CULNIG

  • 4 authors
·
Oct 9, 2025

Discovering Influential Neuron Path in Vision Transformers

Vision Transformer models exhibit immense power yet remain opaque to human understanding, posing challenges and risks for practical applications. While prior research has attempted to demystify these models through input attribution and neuron role analysis, there's been a notable gap in considering layer-level information and the holistic path of information flow across layers. In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly. We first propose a joint influence measure to assess the contribution of a set of neurons to the model outcome. And we further provide a layer-progressive neuron locating approach that efficiently selects the most influential neuron at each layer trying to discover the crucial neuron path from input to output within the target model. Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions. Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category. We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning. The project website including implementation code is available at https://foundation-model-research.github.io/NeuronPath/.

  • 8 authors
·
Mar 12, 2025 2

Learnability and Robustness of Shallow Neural Networks Learned With a Performance-Driven BP and a Variant PSO For Edge Decision-Making

In many cases, the computing resources are limited without the benefit from GPU, especially in the edge devices of IoT enabled systems. It may not be easy to implement complex AI models in edge devices. The Universal Approximation Theorem states that a shallow neural network (SNN) can represent any nonlinear function. However, how fat is an SNN enough to solve a nonlinear decision-making problem in edge devices? In this paper, we focus on the learnability and robustness of SNNs, obtained by a greedy tight force heuristic algorithm (performance driven BP) and a loose force meta-heuristic algorithm (a variant of PSO). Two groups of experiments are conducted to examine the learnability and the robustness of SNNs with Sigmoid activation, learned/optimised by KPI-PDBPs and KPI-VPSOs, where, KPIs (key performance indicators: error (ERR), accuracy (ACC) and F_1 score) are the objectives, driving the searching process. An incremental approach is applied to examine the impact of hidden neuron numbers on the performance of SNNs, learned/optimised by KPI-PDBPs and KPI-VPSOs. From the engineering prospective, all sensors are well justified for a specific task. Hence, all sensor readings should be strongly correlated to the target. Therefore, the structure of an SNN should depend on the dimensions of a problem space. The experimental results show that the number of hidden neurons up to the dimension number of a problem space is enough; the learnability of SNNs, produced by KPI-PDBP, is better than that of SNNs, optimized by KPI-VPSO, regarding the performance and learning time on the training data sets; the robustness of SNNs learned by KPI-PDBPs and KPI-VPSOs depends on the data sets; and comparing with other classic machine learning models, ACC-PDBPs win for almost all tested data sets.

  • 5 authors
·
Aug 13, 2020

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse. Therefore, vanilla SMoE models are memory inefficient and non-scalable, especially for resource-constrained downstream scenarios. In this paper, we ask: Can we craft a compact SMoE model by consolidating expert information? What is the best recipe to merge multiple experts into fewer but more knowledgeable experts? Our pilot investigation reveals that conventional model merging methods fail to be effective in such expert merging for SMoE. The potential reasons are: (1) redundant information overshadows critical experts; (2) appropriate neuron permutation for each expert is missing to bring all of them in alignment. To address this, we propose M-SMoE, which leverages routing statistics to guide expert merging. Specifically, it starts with neuron permutation alignment for experts; then, dominant experts and their "group members" are formed; lastly, every expert group is merged into a single expert by utilizing each expert's activation frequency as their weight for merging, thus diminishing the impact of insignificant experts. Moreover, we observed that our proposed merging promotes a low dimensionality in the merged expert's weight space, naturally paving the way for additional compression. Hence, our final method, MC-SMoE (i.e., Merge, then Compress SMoE), further decomposes the merged experts into low-rank and structural sparse alternatives. Extensive experiments across 8 benchmarks validate the effectiveness of MC-SMoE. For instance, our MC-SMoE achieves up to 80% memory and a 20% FLOPs reduction, with virtually no loss in performance.

  • 7 authors
·
Oct 2, 2023

Perforated Backpropagation: A Neuroscience Inspired Extension to Artificial Neural Networks

The neurons of artificial neural networks were originally invented when much less was known about biological neurons than is known today. Our work explores a modification to the core neuron unit to make it more parallel to a biological neuron. The modification is made with the knowledge that biological dendrites are not simply passive activation funnels, but also compute complex non-linear functions as they transmit activation to the cell body. The paper explores a novel system of "Perforated" backpropagation empowering the artificial neurons of deep neural networks to achieve better performance coding for the same features they coded for in the original architecture. After an initial network training phase, additional "Dendrite Nodes" are added to the network and separately trained with a different objective: to correlate their output with the remaining error of the original neurons. The trained Dendrite Nodes are then frozen, and the original neurons are further trained, now taking into account the additional error signals provided by the Dendrite Nodes. The cycle of training the original neurons and then adding and training Dendrite Nodes can be repeated several times until satisfactory performance is achieved. Our algorithm was successfully added to modern state-of-the-art PyTorch networks across multiple domains, improving upon original accuracies and allowing for significant model compression without a loss in accuracy.

  • 2 authors
·
Jan 29, 2025

Redefining Experts: Interpretable Decomposition of Language Models for Toxicity Mitigation

Large Language Models have demonstrated impressive fluency across diverse tasks, yet their tendency to produce toxic content remains a critical challenge for AI safety and public trust. Existing toxicity mitigation approaches primarily manipulate individual neuron activations, but these methods suffer from instability, context dependence, and often compromise the model's core language abilities. To address these shortcomings, we investigate three key questions: the stability of neuron-level toxicity indicators, the advantages of structural (layer-wise) representations, and the interpretability of mechanisms driving toxic generation. Through extensive experiments on Jigsaw and ToxiCN datasets, we show that aggregated layer-wise features provide more robust signals than single neurons. Moreover, we observe conceptual limitations in prior works that conflate toxicity detection experts and generation experts within neuron-based interventions. To mitigate this, we propose a novel principled intervention technique, EigenShift, based on eigen-decomposition of the language model's final output layer. This method selectively targets generation-aligned components, enabling precise toxicity suppression without impairing linguistic competence. Our method requires no additional training or fine-tuning, incurs minimal computational cost, and is grounded in rigorous theoretical analysis.

  • 4 authors
·
Sep 20, 2025

Canonical Cortical Field Theories

We characterise the dynamics of neuronal activity, in terms of field theory, using neural units placed on a 2D-lattice modelling the cortical surface. The electrical activity of neuronal units was analysed with the aim of deriving a neural field model with a simple functional form that still able to predict or reproduce empirical findings. Each neural unit was modelled using a neural mass and the accompanying field theory was derived in the continuum limit. The field theory comprised coupled (real) Klein-Gordon fields, where predictions of the model fall within the range of experimental findings. These predictions included the frequency spectrum of electric activity measured from the cortex, which was derived using an equipartition of energy over eigenfunctions of the neural fields. Moreover, the neural field model was invariant, within a set of parameters, to the dynamical system used to model each neuronal mass. Specifically, topologically equivalent dynamical systems resulted in the same neural field model when connected in a lattice; indicating that the fields derived could be read as a canonical cortical field theory. We specifically investigated non-dispersive fields that provide a structure for the coding (or representation) of afferent information. Further elaboration of the ensuing neural field theory, including the effect of dispersive forces, could be of importance in the understanding of the cortical processing of information.

  • 3 authors
·
Aug 20, 2023

A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU

Deep learning (DL) has emerged as a powerful subset of machine learning (ML) and artificial intelligence (AI), outperforming traditional ML methods, especially in handling unstructured and large datasets. Its impact spans across various domains, including speech recognition, healthcare, autonomous vehicles, cybersecurity, predictive analytics, and more. However, the complexity and dynamic nature of real-world problems present challenges in designing effective deep learning models. Consequently, several deep learning models have been developed to address different problems and applications. In this article, we conduct a comprehensive survey of various deep learning models, including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Temporal Convolutional Networks (TCN), Transformer, Kolmogorov-Arnold networks (KAN), Generative Models, Deep Reinforcement Learning (DRL), and Deep Transfer Learning. We examine the structure, applications, benefits, and limitations of each model. Furthermore, we perform an analysis using three publicly available datasets: IMDB, ARAS, and Fruit-360. We compared the performance of six renowned deep learning models: CNN, RNN, Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional GRU alongside two newer models, TCN and Transformer, using the IMDB and ARAS datasets. Additionally, we evaluated the performance of eight CNN-based models, including VGG (Visual Geometry Group), Inception, ResNet (Residual Network), InceptionResNet, Xception (Extreme Inception), MobileNet, DenseNet (Dense Convolutional Network), and NASNet (Neural Architecture Search Network), for image classification tasks using the Fruit-360 dataset.

  • 4 authors
·
May 27, 2023

Graph AI generates neurological hypotheses validated in molecular, organoid, and clinical systems

Neurological diseases are the leading global cause of disability, yet most lack disease-modifying treatments. We present PROTON, a heterogeneous graph transformer that generates testable hypotheses across molecular, organoid, and clinical systems. To evaluate PROTON, we apply it to Parkinson's disease (PD), bipolar disorder (BD), and Alzheimer's disease (AD). In PD, PROTON linked genetic risk loci to genes essential for dopaminergic neuron survival and predicted pesticides toxic to patient-derived neurons, including the insecticide endosulfan, which ranked within the top 1.29% of predictions. In silico screens performed by PROTON reproduced six genome-wide α-synuclein experiments, including a split-ubiquitin yeast two-hybrid system (normalized enrichment score [NES] = 2.30, FDR-adjusted p < 1 times 10^{-4}), an ascorbate peroxidase proximity labeling assay (NES = 2.16, FDR < 1 times 10^{-4}), and a high-depth targeted exome sequencing study in 496 synucleinopathy patients (NES = 2.13, FDR < 1 times 10^{-4}). In BD, PROTON predicted calcitriol as a candidate drug that reversed proteomic alterations observed in cortical organoids derived from BD patients. In AD, we evaluated PROTON predictions in health records from n = 610,524 patients at Mass General Brigham, confirming that five PROTON-predicted drugs were associated with reduced seven-year dementia risk (minimum hazard ratio = 0.63, 95% CI: 0.53-0.75, p < 1 times 10^{-7}). PROTON generated neurological hypotheses that were evaluated across molecular, organoid, and clinical systems, defining a path for AI-driven discovery in neurological disease.

  • 29 authors
·
Dec 13, 2025

Neural Circuit Architectural Priors for Embodied Control

Artificial neural networks for motor control usually adopt generic architectures like fully connected MLPs. While general, these tabula rasa architectures rely on large amounts of experience to learn, are not easily transferable to new bodies, and have internal dynamics that are difficult to interpret. In nature, animals are born with highly structured connectivity in their nervous systems shaped by evolution; this innate circuitry acts synergistically with learning mechanisms to provide inductive biases that enable most animals to function well soon after birth and learn efficiently. Convolutional networks inspired by visual circuitry have encoded useful biases for vision. However, it is unknown the extent to which ANN architectures inspired by neural circuitry can yield useful biases for other AI domains. In this work, we ask what advantages biologically inspired ANN architecture can provide in the domain of motor control. Specifically, we translate C. elegans locomotion circuits into an ANN model controlling a simulated Swimmer agent. On a locomotion task, our architecture achieves good initial performance and asymptotic performance comparable with MLPs, while dramatically improving data efficiency and requiring orders of magnitude fewer parameters. Our architecture is interpretable and transfers to new body designs. An ablation analysis shows that constrained excitation/inhibition is crucial for learning, while weight initialization contributes to good initial performance. Our work demonstrates several advantages of biologically inspired ANN architecture and encourages future work in more complex embodied control.

  • 3 authors
·
Jan 13, 2022

Astrocyte-Enabled Advancements in Spiking Neural Networks for Large Language Modeling

Within the complex neuroarchitecture of the brain, astrocytes play crucial roles in development, structure, and metabolism. These cells regulate neural activity through tripartite synapses, directly impacting cognitive processes such as learning and memory. Despite the growing recognition of astrocytes' significance, traditional Spiking Neural Network (SNN) models remain predominantly neuron-centric, overlooking the profound influence of astrocytes on neural dynamics. Inspired by these biological insights, we have developed an Astrocyte-Modulated Spiking Unit (AM-SU), an innovative framework that integrates neuron-astrocyte interactions into the computational paradigm, demonstrating wide applicability across various hardware platforms. Our Astrocyte-Modulated Spiking Neural Network (AstroSNN) exhibits exceptional performance in tasks involving memory retention and natural language generation, particularly in handling long-term dependencies and complex linguistic structures. The design of AstroSNN not only enhances its biological authenticity but also introduces novel computational dynamics, enabling more effective processing of complex temporal dependencies. Furthermore, AstroSNN shows low latency, high throughput, and reduced memory usage in practical applications, making it highly suitable for resource-constrained environments. By successfully integrating astrocytic dynamics into intelligent neural networks, our work narrows the gap between biological plausibility and neural modeling, laying the groundwork for future biologically-inspired neural computing research that includes both neurons and astrocytes.

  • 7 authors
·
Dec 12, 2023

Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data

State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis. Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive spatiotemporal generation problem. Neuroformer is a multimodal, multitask generative pretrained transformer (GPT) model that is specifically designed to handle the intricacies of data in systems neuroscience. It scales linearly with feature size, can process an arbitrary number of modalities, and is adaptable to downstream tasks, such as predicting behavior. We first trained Neuroformer on simulated datasets, and found that it both accurately predicted simulated neuronal circuit activity, and also intrinsically inferred the underlying neural circuit connectivity, including direction. When pretrained to decode neural responses, the model predicted the behavior of a mouse with only few-shot fine-tuning, suggesting that the model begins learning how to do so directly from the neural representations themselves, without any explicit supervision. We used an ablation study to show that joint training on neuronal responses and behavior boosted performance, highlighting the model's ability to associate behavioral and neural representations in an unsupervised manner. These findings show that Neuroformer can analyze neural datasets and their emergent properties, informing the development of models and hypotheses associated with the brain.

  • 5 authors
·
Oct 31, 2023

How connectivity structure shapes rich and lazy learning in neural circuits

In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity could exhibit a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.

  • 6 authors
·
Oct 12, 2023

Learning dynamic representations of the functional connectome in neurobiological networks

The static synaptic connectivity of neuronal circuits stands in direct contrast to the dynamics of their function. As in changing community interactions, different neurons can participate actively in various combinations to effect behaviors at different times. We introduce an unsupervised approach to learn the dynamic affinities between neurons in live, behaving animals, and to reveal which communities form among neurons at different times. The inference occurs in two major steps. First, pairwise non-linear affinities between neuronal traces from brain-wide calcium activity are organized by non-negative tensor factorization (NTF). Each factor specifies which groups of neurons are most likely interacting for an inferred interval in time, and for which animals. Finally, a generative model that allows for weighted community detection is applied to the functional motifs produced by NTF to reveal a dynamic functional connectome. Since time codes the different experimental variables (e.g., application of chemical stimuli), this provides an atlas of neural motifs active during separate stages of an experiment (e.g., stimulus application or spontaneous behaviors). Results from our analysis are experimentally validated, confirming that our method is able to robustly predict causal interactions between neurons to generate behavior. Code is available at https://github.com/dyballa/dynamic-connectomes.

  • 5 authors
·
Feb 21, 2024

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed three sessions under the same condition. In a fourth session, LLM users were reassigned to Brain-only group (LLM-to-Brain), and Brain-only users were reassigned to LLM condition (Brain-to-LLM). A total of 54 participants took part in Sessions 1-3, with 18 completing session 4. We used electroencephalography (EEG) to assess cognitive load during essay writing, and analyzed essays using NLP, as well as scoring essays with the help from human teachers and an AI judge. Across groups, NERs, n-gram patterns, and topic ontology showed within-group homogeneity. EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity. Cognitive activity scaled down in relation to external tool use. In session 4, LLM-to-Brain participants showed reduced alpha and beta connectivity, indicating under-engagement. Brain-to-LLM users exhibited higher memory recall and activation of occipito-parietal and prefrontal areas, similar to Search Engine users. Self-reported ownership of essays was the lowest in the LLM group and the highest in the Brain-only group. LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning.

  • 8 authors
·
Jun 10, 2025 2

Decoding Neural Responses in Mouse Visual Cortex through a Deep Neural Network

Finding a code to unravel the population of neural responses that leads to a distinct animal behavior has been a long-standing question in the field of neuroscience. With the recent advances in machine learning, it is shown that the hierarchically Deep Neural Networks (DNNs) perform optimally in decoding unique features out of complex datasets. In this study, we utilize the power of a DNN to explore the computational principles in the mammalian brain by exploiting the Neuropixel data from Allen Brain Institute. We decode the neural responses from mouse visual cortex to predict the presented stimuli to the animal for natural (bear, trees, cheetah, etc.) and artificial (drifted gratings, orientated bars, etc.) classes. Our results indicate that neurons in mouse visual cortex encode the features of natural and artificial objects in a distinct manner, and such neural code is consistent across animals. We investigate this by applying transfer learning to train a DNN on the neural responses of a single animal and test its generalized performance across multiple animals. Within a single animal, DNN is able to decode the neural responses with as much as 100% classification accuracy. Across animals, this accuracy is reduced to 91%. This study demonstrates the potential of utilizing the DNN models as a computational framework to understand the neural coding principles in the mammalian brain.

  • 4 authors
·
Oct 26, 2019

Training Spiking Neural Networks Using Lessons From Deep Learning

The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper serves as a tutorial and perspective showing how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neuroscience to biologically plausible spiking neural neural networks. We also explore the delicate interplay between encoding data as spikes and the learning process; the challenges and solutions of applying gradient-based learning to spiking neural networks (SNNs); the subtle link between temporal backpropagation and spike timing dependent plasticity, and how deep learning might move towards biologically plausible online learning. Some ideas are well accepted and commonly used amongst the neuromorphic engineering community, while others are presented or justified for the first time here. The fields of deep learning and spiking neural networks evolve very rapidly. We endeavour to treat this document as a 'dynamic' manuscript that will continue to be updated as the common practices in training SNNs also change. A series of companion interactive tutorials complementary to this paper using our Python package, snnTorch, are also made available. See https://snntorch.readthedocs.io/en/latest/tutorials/index.html .

  • 9 authors
·
Aug 12, 2023

FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

  • 9 authors
·
Mar 29, 2024

The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks

Biological cortical neurons are remarkably sophisticated computational devices, temporally integrating their vast synaptic input over an intricate dendritic tree, subject to complex, nonlinearly interacting internal biological processes. A recent study proposed to characterize this complexity by fitting accurate surrogate models to replicate the input-output relationship of a detailed biophysical cortical pyramidal neuron model and discovered it needed temporal convolutional networks (TCN) with millions of parameters. Requiring these many parameters, however, could stem from a misalignment between the inductive biases of the TCN and cortical neuron's computations. In light of this, and to explore the computational implications of leaky memory units and nonlinear dendritic processing, we introduce the Expressive Leaky Memory (ELM) neuron model, a biologically inspired phenomenological model of a cortical neuron. Remarkably, by exploiting such slowly decaying memory-like hidden states and two-layered nonlinear integration of synaptic input, our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters. To further assess the computational ramifications of our neuron design, we evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets, as well as a novel neuromorphic dataset based on the Spiking Heidelberg Digits dataset (SHD-Adding). Leveraging a larger number of memory units with sufficiently long timescales, and correspondingly sophisticated synaptic integration, the ELM neuron displays substantial long-range processing capabilities, reliably outperforming the classic Transformer or Chrono-LSTM architectures on LRA, and even solving the Pathfinder-X task with over 70% accuracy (16k context length).

  • 5 authors
·
Jun 14, 2023

Comparison Against Task Driven Artificial Neural Networks Reveals Functional Organization of Mouse Visual Cortex

Partially inspired by features of computation in visual cortex, deep neural networks compute hierarchical representations of their inputs. While these networks have been highly successful in machine learning, it remains unclear to what extent they can aid our understanding of cortical function. Several groups have developed metrics that provide a quantitative comparison between representations computed by networks and representations measured in cortex. At the same time, neuroscience is well into an unprecedented phase of large-scale data collection, as evidenced by projects such as the Allen Brain Observatory. Despite the magnitude of these efforts, in a given experiment only a fraction of units are recorded, limiting the information available about the cortical representation. Moreover, only a finite number of stimuli can be shown to an animal over the course of a realistic experiment. These limitations raise the question of how and whether metrics that compare representations of deep networks are meaningful on these datasets. Here, we empirically quantify the capabilities and limitations of these metrics due to limited image presentations and neuron samples. We find that the comparison procedure is robust to different choices of stimuli set and the level of subsampling that one might expect in a large-scale brain survey with thousands of neurons. Using these results, we compare the representations measured in the Allen Brain Observatory in response to natural image presentations to deep neural network. We show that the visual cortical areas are relatively high order representations (in that they map to deeper layers of convolutional neural networks). Furthermore, we see evidence of a broad, more parallel organization rather than a sequential hierarchy, with the primary area VISp(V1) being lower order relative to the other areas.

  • 3 authors
·
Nov 18, 2019

Synchronization and Redundancy: Implications for Robustness of Neural Learning and Decision Making

Learning and decision making in the brain are key processes critical to survival, and yet are processes implemented by non-ideal biological building blocks which can impose significant error. We explore quantitatively how the brain might cope with this inherent source of error by taking advantage of two ubiquitous mechanisms, redundancy and synchronization. In particular we consider a neural process whose goal is to learn a decision function by implementing a nonlinear gradient dynamics. The dynamics, however, are assumed to be corrupted by perturbations modeling the error which might be incurred due to limitations of the biology, intrinsic neuronal noise, and imperfect measurements. We show that error, and the associated uncertainty surrounding a learned solution, can be controlled in large part by trading off synchronization strength among multiple redundant neural systems against the noise amplitude. The impact of the coupling between such redundant systems is quantified by the spectrum of the network Laplacian, and we discuss the role of network topology in synchronization and in reducing the effect of noise. A range of situations in which the mechanisms we model arise in brain science are discussed, and we draw attention to experimental evidence suggesting that cortical circuits capable of implementing the computations of interest here can be found on several scales. Finally, simulations comparing theoretical bounds to the relevant empirical quantities show that the theoretical estimates we derive can be tight.

  • 2 authors
·
Oct 21, 2010

Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis

Transformer-based language models excel at both recall (retrieving memorized facts) and reasoning (performing multi-step inference), but whether these abilities rely on distinct internal mechanisms remains unclear. Distinguishing recall from reasoning is crucial for predicting model generalization, designing targeted evaluations, and building safer interventions that affect one ability without disrupting the other.We approach this question through mechanistic interpretability, using controlled datasets of synthetic linguistic puzzles to probe transformer models at the layer, head, and neuron level. Our pipeline combines activation patching and structured ablations to causally measure component contributions to each task type. Across two model families (Qwen and LLaMA), we find that interventions on distinct layers and attention heads lead to selective impairments: disabling identified "recall circuits" reduces fact-retrieval accuracy by up to 15\% while leaving reasoning intact, whereas disabling "reasoning circuits" reduces multi-step inference by a comparable margin. At the neuron level, we observe task-specific firing patterns, though these effects are less robust, consistent with neuronal polysemanticity.Our results provide the first causal evidence that recall and reasoning rely on separable but interacting circuits in transformer models. These findings advance mechanistic interpretability by linking circuit-level structure to functional specialization and demonstrate how controlled datasets and causal interventions can yield mechanistic insights into model cognition, informing safer deployment of large language models.

  • 6 authors
·
Oct 3, 2025

Need is All You Need: Homeostatic Neural Networks Adapt to Concept Shift

In living organisms, homeostasis is the natural regulation of internal states aimed at maintaining conditions compatible with life. Typical artificial systems are not equipped with comparable regulatory features. Here, we introduce an artificial neural network that incorporates homeostatic features. Its own computing substrate is placed in a needful and vulnerable relation to the very objects over which it computes. For example, artificial neurons performing classification of MNIST digits or Fashion-MNIST articles of clothing may receive excitatory or inhibitory effects, which alter their own learning rate as a direct result of perceiving and classifying the digits. In this scenario, accurate recognition is desirable to the agent itself because it guides decisions to regulate its vulnerable internal states and functionality. Counterintuitively, the addition of vulnerability to a learner does not necessarily impair its performance. On the contrary, self-regulation in response to vulnerability confers benefits under certain conditions. We show that homeostatic design confers increased adaptability under concept shift, in which the relationships between labels and data change over time, and that the greatest advantages are obtained under the highest rates of shift. This necessitates the rapid un-learning of past associations and the re-learning of new ones. We also demonstrate the superior abilities of homeostatic learners in environments with dynamically changing rates of concept shift. Our homeostatic design exposes the artificial neural network's thinking machinery to the consequences of its own "thoughts", illustrating the advantage of putting one's own "skin in the game" to improve fluid intelligence.

  • 3 authors
·
May 17, 2022

Pre-Synaptic Pool Modification (PSPM): A Supervised Learning Procedure for Spiking Neural Networks

Learning synaptic weights of spiking neural network (SNN) models that can reproduce target spike trains from provided neural firing data is a central problem in computational neuroscience and spike-based computing. The discovery of the optimal weight values can be posed as a supervised learning task wherein the weights of the model network are chosen to maximize the similarity between the target spike trains and the model outputs. It is still largely unknown whether optimizing spike train similarity of highly recurrent SNNs produces weight matrices similar to those of the ground truth model. To this end, we propose flexible heuristic supervised learning rules, termed Pre-Synaptic Pool Modification (PSPM), that rely on stochastic weight updates in order to produce spikes within a short window of the desired times and eliminate spikes outside of this window. PSPM improves spike train similarity for all-to-all SNNs and makes no assumption about the post-synaptic potential of the neurons or the structure of the network since no gradients are required. We test whether optimizing for spike train similarity entails the discovery of accurate weights and explore the relative contributions of local and homeostatic weight updates. Although PSPM improves similarity between spike trains, the learned weights often differ from the weights of the ground truth model, implying that connectome inference from spike data may require additional constraints on connectivity statistics. We also find that spike train similarity is sensitive to local updates, but other measures of network activity such as avalanche distributions, can be learned through synaptic homeostasis.

  • 4 authors
·
Oct 7, 2018

Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media

Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences. In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder. The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its clinical and social impacts. We collected data from chosen subreddits using the publicly available Application Programming Interface for Reddit. We manually annotated text spans representing clinical and social impacts reported by people who also reported personal nonmedical use of substances including but not limited to opioids, stimulants and benzodiazepines. Our objective is to create a resource that can enable the development of systems that can automatically detect clinical and social impacts of substance use from text-based social media data. The successful development of such systems may enable us to better understand how nonmedical use of substances affects individual health and societal dynamics, aiding the development of effective public health strategies. In addition to creating the annotated data set, we applied several machine learning models to establish baseline performances. Specifically, we experimented with transformer models like BERT, and RoBERTa, one few-shot learning model DANN by leveraging the full training dataset, and GPT-3.5 by using one-shot learning, for automatic NER of clinical and social impacts. The dataset has been made available through the 2024 SMM4H shared tasks.

  • 6 authors
·
May 9, 2024

Deep neural networks as nested dynamical systems

There is an analogy that is often made between deep neural networks and actual brains, suggested by the nomenclature itself: the "neurons" in deep neural networks should correspond to neurons (or nerve cells, to avoid confusion) in the brain. We claim, however, that this analogy doesn't even type check: it is structurally flawed. In agreement with the slightly glib summary of Hebbian learning as "cells that fire together wire together", this article makes the case that the analogy should be different. Since the "neurons" in deep neural networks are managing the changing weights, they are more akin to the synapses in the brain; instead, it is the wires in deep neural networks that are more like nerve cells, in that they are what cause the information to flow. An intuition that nerve cells seem like more than mere wires is exactly right, and is justified by a precise category-theoretic analogy which we will explore in this article. Throughout, we will continue to highlight the error in equating artificial neurons with nerve cells by leaving "neuron" in quotes or by calling them artificial neurons. We will first explain how to view deep neural networks as nested dynamical systems with a very restricted sort of interaction pattern, and then explain a more general sort of interaction for dynamical systems that is useful throughout engineering, but which fails to adapt to changing circumstances. As mentioned, an analogy is then forced upon us by the mathematical formalism in which they are both embedded. We call the resulting encompassing generalization deeply interacting learning systems: they have complex interaction as in control theory, but adaptation to circumstances as in deep neural networks.

  • 2 authors
·
Nov 1, 2021

On Relation-Specific Neurons in Large Language Models

In large language models (LLMs), certain neurons can store distinct pieces of knowledge learned during pretraining. While knowledge typically appears as a combination of relations and entities, it remains unclear whether some neurons focus on a relation itself -- independent of any entity. We hypothesize such neurons detect a relation in the input text and guide generation involving such a relation. To investigate this, we study the Llama-2 family on a chosen set of relations with a statistics-based method. Our experiments demonstrate the existence of relation-specific neurons. We measure the effect of selectively deactivating candidate neurons specific to relation r on the LLM's ability to handle (1) facts whose relation is r and (2) facts whose relation is a different relation r' neq r. With respect to their capacity for encoding relation information, we give evidence for the following three properties of relation-specific neurons. (i) Neuron cumulativity. The neurons for r present a cumulative effect so that deactivating a larger portion of them results in the degradation of more facts in r. (ii) Neuron versatility. Neurons can be shared across multiple closely related as well as less related relations. Some relation neurons transfer across languages. (iii) Neuron interference. Deactivating neurons specific to one relation can improve LLM generation performance for facts of other relations. We will make our code publicly available at https://github.com/cisnlp/relation-specific-neurons.

cis-lmu CIS, LMU Munich
·
Feb 24, 2025 2

Homogenized C. elegans Neural Activity and Connectivity Data

There is renewed interest in modeling and understanding the nervous system of the nematode Caenorhabditis elegans (C. elegans), as this small model system provides a path to bridge the gap between nervous system structure (connectivity) and function (physiology). However, existing physiology datasets, whether involving passive recording or stimulation, are in distinct formats, and connectome datasets require preprocessing before analysis can commence. Here we compile and homogenize datasets of neural activity and connectivity. Our neural activity dataset is derived from 11 C. elegans neuroimaging experiments, while our connectivity dataset is compiled from 9 connectome annotations based on 3 primary electron microscopy studies and 1 signal propagation study. Physiology datasets, collected under varying protocols, measure calcium fluorescence in labeled subsets of the worm's 300 neurons. Our preprocessing pipeline standardizes these datasets by consistently ordering labeled neurons and resampling traces to a common sampling rate, yielding recordings from approximately 900 worms and 250 uniquely labeled neurons. The connectome datasets, collected from electron microscopy reconstructions, represent the entire nervous system as a graph of connections. Our collection is accessible on HuggingFace, facilitating analysis of the structure-function relationship in biology using modern neural network architectures and enabling cross-lab and cross-animal comparisons.

  • 4 authors
·
Nov 18, 2024

Decoding specialised feature neurons in LLMs with the final projection layer

Large Language Models (LLMs) typically have billions of parameters and are thus often difficult to interpret in their operation. Such black-box models can pose a significant risk to safety when trusted to make important decisions. The lack of interpretability of LLMs is more related to their sheer size, rather than the complexity of their individual components. The TARS method for knowledge removal (Davies et al 2024) provides strong evidence for the hypothesis that that linear layer weights which act directly on the residual stream may have high correlation with different concepts encoded in the residual stream. Building upon this, we attempt to decode neuron weights directly into token probabilities through the final projection layer of the model (the LM-head). Firstly, we show that with Llama 3.1 8B we can utilise the LM-head to decode specialised feature neurons that respond strongly to certain concepts, with examples such as "dog" and "California". This is then confirmed by demonstrating that these neurons can be clamped to affect the probability of the concept in the output. This extends to the fine-tuned assistant Llama 3.1 8B instruct model, where we find that over 75% of neurons in the up-projection layers have the same top associated token compared to the pretrained model. Finally, we demonstrate that clamping the "dog" neuron leads the instruct model to always discuss dogs when asked about its favourite animal. Through our method, it is possible to map the entirety of Llama 3.1 8B's up-projection neurons in less than 15 minutes with no parallelization.

  • 1 authors
·
Jan 5, 2025

Deep Learning in Spiking Neural Networks

In recent years, deep learning has been a revolution in the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Huge amounts of labeled examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons in an ANN are characterized by a single, static, continuous-valued activation. Yet biological neurons use discrete spikes to compute and transmit information, and the spike times, in addition to the spike rates, matter. Spiking neural networks (SNNs) are thus more biologically realistic than ANNs, and arguably the only viable option if one wants to understand how the brain computes. SNNs are also more hardware friendly and energy-efficient than ANNs, and are thus appealing for technology, especially for portable devices. However, training deep SNNs remains a challenge. Spiking neurons' transfer function is usually non-differentiable, which prevents using backpropagation. Here we review recent supervised and unsupervised methods to train deep SNNs, and compare them in terms of accuracy, but also computational cost and hardware friendliness. The emerging picture is that SNNs still lag behind ANNs in terms of accuracy, but the gap is decreasing, and can even vanish on some tasks, while the SNNs typically require much fewer operations.

  • 5 authors
·
Apr 22, 2018

A brain basis of dynamical intelligence for AI and computational neuroscience

The deep neural nets of modern artificial intelligence (AI) have not achieved defining features of biological intelligence, including abstraction, causal learning, and energy-efficiency. While scaling to larger models has delivered performance improvements for current applications, more brain-like capacities may demand new theories, models, and methods for designing artificial learning systems. Here, we argue that this opportunity to reassess insights from the brain should stimulate cooperation between AI research and theory-driven computational neuroscience (CN). To motivate a brain basis of neural computation, we present a dynamical view of intelligence from which we elaborate concepts of sparsity in network structure, temporal dynamics, and interactive learning. In particular, we suggest that temporal dynamics, as expressed through neural synchrony, nested oscillations, and flexible sequences, provide a rich computational layer for reading and updating hierarchical models distributed in long-term memory networks. Moreover, embracing agent-centered paradigms in AI and CN will accelerate our understanding of the complex dynamics and behaviors that build useful world models. A convergence of AI/CN theories and objectives will reveal dynamical principles of intelligence for brains and engineered learning systems. This article was inspired by our symposium on dynamical neuroscience and machine learning at the 6th Annual US/NIH BRAIN Initiative Investigators Meeting.

  • 3 authors
·
May 15, 2021

How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites

We propose a formal mathematical model for sparse representations and active dendrites in neocortex. Our model is inspired by recent experimental findings on active dendritic processing and NMDA spikes in pyramidal neurons. These experimental and modeling studies suggest that the basic unit of pattern memory in the neocortex is instantiated by small clusters of synapses operated on by localized non-linear dendritic processes. We derive a number of scaling laws that characterize the accuracy of such dendrites in detecting activation patterns in a neuronal population under adverse conditions. We introduce the union property which shows that synapses for multiple patterns can be randomly mixed together within a segment and still lead to highly accurate recognition. We describe simulation results that provide further insight into sparse representations as well as two primary results. First we show that pattern recognition by a neuron with active dendrites can be extremely accurate and robust with high dimensional sparse inputs even when using a tiny number of synapses to recognize large patterns. Second, equations representing recognition accuracy of a dendrite predict optimal NMDA spiking thresholds under a generous set of assumptions. The prediction tightly matches NMDA spiking thresholds measured in the literature. Our model matches many of the known properties of pyramidal neurons. As such the theory provides a mathematical framework for understanding the benefits and limits of sparse representations in cortical networks.

  • 2 authors
·
Jan 4, 2016

Critical Learning Periods Emerge Even in Deep Linear Networks

Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of biology. Yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. To isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. We show that critical periods depend on the depth of the model and structure of the data distribution. We also show analytically and in simulations that the learning of features is tied to competition between sources. Finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. To the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.

  • 3 authors
·
Aug 23, 2023

Recurrent Neural Network Learning of Performance and Intrinsic Population Dynamics from Sparse Neural Data

Recurrent Neural Networks (RNNs) are popular models of brain function. The typical training strategy is to adjust their input-output behavior so that it matches that of the biological circuit of interest. Even though this strategy ensures that the biological and artificial networks perform the same computational task, it does not guarantee that their internal activity dynamics match. This suggests that the trained RNNs might end up performing the task employing a different internal computational mechanism, which would make them a suboptimal model of the biological circuit. In this work, we introduce a novel training strategy that allows learning not only the input-output behavior of an RNN but also its internal network dynamics, based on sparse neural recordings. We test the proposed method by training an RNN to simultaneously reproduce internal dynamics and output signals of a physiologically-inspired neural model. Specifically, this model generates the multiphasic muscle-like activity patterns typically observed during the execution of reaching movements, based on the oscillatory activation patterns concurrently observed in the motor cortex. Remarkably, we show that the reproduction of the internal dynamics is successful even when the training algorithm relies on the activities of a small subset of neurons sampled from the biological network. Furthermore, we show that training the RNNs with this method significantly improves their generalization performance. Overall, our results suggest that the proposed method is suitable for building powerful functional RNN models, which automatically capture important computational properties of the biological circuit of interest from sparse neural recordings.

  • 2 authors
·
May 5, 2020

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings

Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on three datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC) and its non-spiking version Google Speech Commands v0.02 (GSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two or three hidden fully connected layers, and vanilla leaky integrate-and-fire neurons. We showed that fixed random delays help and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in the three datasets without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays

  • 3 authors
·
Jun 30, 2023

Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks

Neuromorphic computing with spiking neural networks is promising for energy-efficient artificial intelligence (AI) applications. However, different from humans who continually learn different tasks in a lifetime, neural network models suffer from catastrophic forgetting. How could neuronal operations solve this problem is an important question for AI and neuroscience. Many previous studies draw inspiration from observed neuroscience phenomena and propose episodic replay or synaptic metaplasticity, but they are not guaranteed to explicitly preserve knowledge for neuron populations. Other works focus on machine learning methods with more mathematical grounding, e.g., orthogonal projection on high dimensional spaces, but there is no neural correspondence for neuromorphic computing. In this work, we develop a new method with neuronal operations based on lateral connections and Hebbian learning, which can protect knowledge by projecting activity traces of neurons into an orthogonal subspace so that synaptic weight update will not interfere with old tasks. We show that Hebbian and anti-Hebbian learning on recurrent lateral connections can effectively extract the principal subspace of neural activities and enable orthogonal projection. This provides new insights into how neural circuits and Hebbian learning can help continual learning, and also how the concept of orthogonal projection can be realized in neuronal systems. Our method is also flexible to utilize arbitrary training methods based on presynaptic activities/traces. Experiments show that our method consistently solves forgetting for spiking neural networks with nearly zero forgetting under various supervised training methods with different error propagation approaches, and outperforms previous approaches under various settings. Our method can pave a solid path for building continual neuromorphic computing systems.

  • 5 authors
·
Feb 19, 2024

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization). The former method suffers from an extremely large computation cost and the latter usually struggles with insufficient performance. In comparison, during-training pruning, a class of pruning methods that simultaneously enjoys the training/inference efficiency and the comparable performance, temporarily, has been less explored. To better understand during-training pruning, we quantitatively study the effect of pruning throughout training from the perspective of pruning plasticity (the ability of the pruned networks to recover the original performance). Pruning plasticity can help explain several other empirical observations about neural network pruning in literature. We further find that pruning plasticity can be substantially improved by injecting a brain-inspired mechanism called neuroregeneration, i.e., to regenerate the same number of connections as pruned. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet), that advances state of the art. Perhaps most impressively, its sparse-to-sparse version for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods with ResNet-50 on ImageNet without extending the training time. We release all codes in https://github.com/Shiweiliuiiiiiii/GraNet.

  • 10 authors
·
Jun 18, 2021

Stochastic Adaptive Activation Function

The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of representing this variability. Therefore, this study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs. Furthermore, the proposed activation function mathematically exhibits a more generalized form of Swish activation function, and thus we denoted it as Adaptive SwisH (ASH). ASH highlights informative features that exhibit large values in the top percentiles in an input, whereas it rectifies low values. Most importantly, ASH exhibits trainable, adaptive, and context-aware properties compared to other activation functions. Furthermore, ASH represents general formula of the previously studied activation function and provides a reasonable mathematical background for the superior performance. To validate the effectiveness and robustness of ASH, we implemented ASH into many deep learning models for various tasks, including classification, detection, segmentation, and image generation. Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.

  • 4 authors
·
Oct 20, 2022

Wider and Deeper LLM Networks are Fairer LLM Evaluators

Measuring the quality of responses generated by LLMs is a challenging task, particularly when it comes to evaluating whether the response is aligned with human preference. A novel approach involves using the LLM itself to make evaluation and stabilizing the results through multiple independent evaluations, similar to a single-layer narrow LLM network. This network consists of a fixed number of neurons, with each neuron being the same LLM. In this paper, we draw upon the extensive research on deep neural networks to explore whether deeper and wider networks can lead to fairer evaluations. Specifically, inspired by the observation that different neurons in a neural network are responsible for detecting different concepts, we first adaptively generate as many neuron roles as possible for each evaluation sample. Each perspective corresponds to the role of a specific LLM neuron in the first layer. In subsequent layers, we follow the idea that higher layers in deep networks are responsible for more comprehensive features, each layer receives representations from all neurons in the previous layer, integrating the locally learned evaluation information to obtain a more comprehensive evaluation result. Interestingly, this network design resembles the process of academic paper reviewing. To validate the effectiveness of our method, we construct the largest and most diverse English evaluation benchmark LLMEval^2 for LLM evaluators, comprising 15 tasks, 8 abilities, and 2,553 samples. Experimental results demonstrate that a wider network (involving many reviewers) with 2 layers (one round of discussion) performs the best, improving kappa correlation coefficient from 0.28 to 0.34. We also leverage WideDeep to aid in the assessment of Chinese LLMs, which has accelerated the evaluation time by 4.6 times, resulting in a 60% cost saving. WideDeep achieves a remarkable 93% agreement level among humans.

  • 8 authors
·
Aug 3, 2023

Learning Internal Biological Neuron Parameters and Complexity-Based Encoding for Improved Spiking Neural Networks Performance

This study introduces a novel approach by replacing the traditional perceptron neuron model with a biologically inspired probabilistic meta neuron, where the internal neuron parameters are jointly learned, leading to improved classification accuracy of spiking neural networks (SNNs). To validate this innovation, we implement and compare two SNN architectures: one based on standard leaky integrate-and-fire (LIF) neurons and another utilizing the proposed probabilistic meta neuron model. As a second key contribution, we present a new biologically inspired classification framework that uniquely integrates SNNs with Lempel-Ziv complexity (LZC) a measure closely related to entropy rate. By combining the temporal precision and biological plausibility of SNNs with the capacity of LZC to capture structural regularity, the proposed approach enables efficient and interpretable classification of spatiotemporal neural data, an aspect not addressed in existing works. We consider learning algorithms such as backpropagation, spike-timing-dependent plasticity (STDP), and the Tempotron learning rule. To explore neural dynamics, we use Poisson processes to model neuronal spike trains, a well-established method for simulating the stochastic firing behavior of biological neurons. Our results reveal that depending on the training method, the classifier's efficiency can improve by up to 11.00%, highlighting the advantage of learning additional neuron parameters beyond the traditional focus on weighted inputs alone.

  • 3 authors
·
Aug 8, 2025

Natural Language Descriptions of Deep Visual Features

Some neurons in deep networks specialize in recognizing highly specific perceptual, structural, or semantic features of inputs. In computer vision, techniques exist for identifying neurons that respond to individual concept categories like colors, textures, and object classes. But these techniques are limited in scope, labeling only a small subset of neurons and behaviors in any network. Is a richer characterization of neuron-level computation possible? We introduce a procedure (called MILAN, for mutual-information-guided linguistic annotation of neurons) that automatically labels neurons with open-ended, compositional, natural language descriptions. Given a neuron, MILAN generates a description by searching for a natural language string that maximizes pointwise mutual information with the image regions in which the neuron is active. MILAN produces fine-grained descriptions that capture categorical, relational, and logical structure in learned features. These descriptions obtain high agreement with human-generated feature descriptions across a diverse set of model architectures and tasks, and can aid in understanding and controlling learned models. We highlight three applications of natural language neuron descriptions. First, we use MILAN for analysis, characterizing the distribution and importance of neurons selective for attribute, category, and relational information in vision models. Second, we use MILAN for auditing, surfacing neurons sensitive to human faces in datasets designed to obscure them. Finally, we use MILAN for editing, improving robustness in an image classifier by deleting neurons sensitive to text features spuriously correlated with class labels.

  • 6 authors
·
Jan 26, 2022

Catastrophic Interference is Mitigated in Naturalistic Power-Law Learning Environments

Neural networks often suffer from catastrophic interference (CI): performance on previously learned tasks drops off significantly when learning a new task. This contrasts strongly with humans, who can sequentially learn new tasks without appreciably forgetting previous tasks. Prior work has explored various techniques for mitigating CI such as regularization, rehearsal, generative replay, and distillation methods. The current work takes a different approach, one guided by cognitive science research showing that in naturalistic environments, the probability of encountering a task decreases as a power-law of the time since it was last performed. We argue that a realistic evaluation of techniques for the mitigation of CI should be performed in simulated naturalistic learning environments. Thus, we evaluate the extent of mitigation of CI when training simple rehearsal-based methods in power-law environments similar to the ones humans face. Our work explores this novel rehearsal-based approach for a domain-incremental task: learning permutations in the MNIST task. We compare our rehearsal environment with other baselines to show its efficacy in promoting continual learning. Additionally, we investigate whether this environment shows forward facilitation, i.e., faster learning of later tasks. Next, we explore the robustness of our learning environment to the number of tasks, model size, and amount of data rehearsed after each task. Notably, our results show that the performance is comparable or superior to that of models trained using popular regularization methods and also to rehearsals in non-power-law environments. The benefits of this training paradigm include simplicity and the lack of a need for extra neural circuitry. In addition, because our method is orthogonal to other methods, future research can combine training in power-law environments with other continual learning mechanisms.

  • 4 authors
·
Jan 18, 2024

Better Models, Faster Training: Sigmoid Attention for single-cell Foundation Models

Training stable biological foundation models requires rethinking attention mechanisms: we find that using sigmoid attention as a drop in replacement for softmax attention a) produces better learned representations: on six diverse single-cell datasets, sigmoid achieves 25% higher cell-type separation, better cell-type cohesion metrics, and lower validation loss, b) faster training, models with sigmoid attention train up to 10% faster than their softmax counterparts, and c) more stable training by eliminating inherent sources of instability in softmax attention. We establish that sigmoid attention has globally bounded derivatives (leq 0.25) as opposed to softmax, and a diagonal Jacobian structure in contrast with softmax's dense coupling, which together help alleviate training instabilities. In stress tests on 160M-parameter bidirectional attention models trained without gradient clipping on 8K-token sequences, softmax diverges catastrophically, with gradients exploding by four orders of magnitude, while sigmoid remains stable. Finally, we implement and open-source TritonSigmoid, an efficient GPU kernel that achieves 515 TFLOPS on H100 GPUs, outperforming both FlashAttention-2 and FlashSigmoid, with native padding support, which is essential for biological sequences. Our results establish sigmoid attention as both theoretically grounded and empirically superior for biological foundation models. Code is available at https://github.com/MSDLLCpapers/triton-sigmoid

Cortico-cerebellar networks as decoupling neural interfaces

The brain solves the credit assignment problem remarkably well. For credit to be assigned across neural networks they must, in principle, wait for specific neural computations to finish. How the brain deals with this inherent locking problem has remained unclear. Deep learning methods suffer from similar locking constraints both on the forward and feedback phase. Recently, decoupled neural interfaces (DNIs) were introduced as a solution to the forward and feedback locking problems in deep networks. Here we propose that a specialised brain region, the cerebellum, helps the cerebral cortex solve similar locking problems akin to DNIs. To demonstrate the potential of this framework we introduce a systems-level model in which a recurrent cortical network receives online temporal feedback predictions from a cerebellar module. We test this cortico-cerebellar recurrent neural network (ccRNN) model on a number of sensorimotor (line and digit drawing) and cognitive tasks (pattern recognition and caption generation) that have been shown to be cerebellar-dependent. In all tasks, we observe that ccRNNs facilitates learning while reducing ataxia-like behaviours, consistent with classical experimental observations. Moreover, our model also explains recent behavioural and neuronal observations while making several testable predictions across multiple levels. Overall, our work offers a novel perspective on the cerebellum as a brain-wide decoupling machine for efficient credit assignment and opens a new avenue between deep learning and neuroscience.

  • 4 authors
·
Oct 21, 2021

Learning heterogeneous delays in a layer of spiking neurons for fast motion detection

The precise timing of spikes emitted by neurons plays a crucial role in shaping the response of efferent biological neurons. This temporal dimension of neural activity holds significant importance in understanding information processing in neurobiology, especially for the performance of neuromorphic hardware, such as event-based cameras. Nonetheless, many artificial neural models disregard this critical temporal dimension of neural activity. In this study, we present a model designed to efficiently detect temporal spiking motifs using a layer of spiking neurons equipped with heterogeneous synaptic delays. Our model capitalizes on the diverse synaptic delays present on the dendritic tree, enabling specific arrangements of temporally precise synaptic inputs to synchronize upon reaching the basal dendritic tree. We formalize this process as a time-invariant logistic regression, which can be trained using labeled data. To demonstrate its practical efficacy, we apply the model to naturalistic videos transformed into event streams, simulating the output of the biological retina or event-based cameras. To evaluate the robustness of the model in detecting visual motion, we conduct experiments by selectively pruning weights and demonstrate that the model remains efficient even under significantly reduced workloads. In conclusion, by providing a comprehensive, event-driven computational building block, the incorporation of heterogeneous delays has the potential to greatly improve the performance of future spiking neural network algorithms, particularly in the context of neuromorphic chips.

  • 2 authors
·
Jul 26, 2023

Adaptive coding efficiency in recurrent cortical circuits via gain control

Sensory systems across all modalities and species exhibit adaptation to continuously changing input statistics. Individual neurons have been shown to modulate their response gains so as to maximize information transmission in different stimulus contexts. Experimental measurements have revealed additional, nuanced sensory adaptation effects including changes in response maxima and minima, tuning curve repulsion from the adapter stimulus, and stimulus-driven response decorrelation. Existing explanations of these phenomena rely on changes in inter-neuronal synaptic efficacy, which, while more flexible, are unlikely to operate as rapidly or reversibly as single neuron gain modulations. Using published V1 population adaptation data, we show that propagation of single neuron gain changes in a recurrent network is sufficient to capture the entire set of observed adaptation effects. We propose a novel adaptive efficient coding objective with which single neuron gains are modulated, maximizing the fidelity of the stimulus representation while minimizing overall activity in the network. From this objective, we analytically derive a set of gains that optimize the trade-off between preserving information about the stimulus and conserving metabolic resources. Our model generalizes well-established concepts of single neuron adaptive gain control to recurrent populations, and parsimoniously explains experimental adaptation data.

  • 4 authors
·
May 31, 2023