Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Research has focused on developing efficient neural network architectures
Explored logic gate networks for machine learning tasks
Networks comprise logic gates such as “AND” and “XOR”
Difficult to learn logic gate networks as they are non-differentiable
Proposed differentiable logic gate networks to allow for effective training
Discretized logic gate networks achieve fast inference speeds

Paper Content

Introduction

Neural networks have been researched to make computations faster and more efficient.
Various techniques have been proposed to solve this problem.
This paper proposes an approach for gradient-based training of logic gate networks.
Logic gate networks are based on binary logic gates, such as “and” and “xor”.
Networks are continuously relaxed to differentiable logic gate networks to allow for gradient descent.
Probability distributions are learned for each neuron to decide which logic gate to use.
Logic gate networks are faster than fully connected ReLU neural networks by two orders of magnitude.
Training can be scaled up to 5 million parameters.

Differentiable logics and triangular norms are well-known in fuzzy logics and probabilistic metric spaces
Chatterjee explored a method for memorizing binary classification data sets with a network of binary lookup tables
Brudermueller et al. proposed a method to translate a neural network into random forests and then into networks of AND-Inverter logic gates
Continuous relaxations are used to make discrete structures differentiable
Zimmer et al. proposed differentiable logic machines for inductive logic programming
Chen proposed Gumbel-Max Equation Learner Networks to learn symbolic expressions from data
Mocanu et al. proposed training neural networks with sparse evolutionary training
Gaier et al. proposed learning networks of operators such as ReLU, sin, inverse, absolute, step, and tanh using evolutionary strategies
Zantedeschi et al. proposed to learn decision trees by quadratically relaxing the decision trees
Binary neural networks are conceptually different from logic gate networks
Sparse neural networks are neural networks where only a selected subset of connections is present

Logic gate networks

Logic gate networks are similar to neural networks but use binary logic gates instead of weights.
Logic gate networks are sparse because each neuron has only 2 inputs.
Logic gate networks do not need activation functions as they are intrinsically non-linear.
Logic gate networks allow for grading of predictions with as many levels as there are neurons per class.
Logic gate networks are efficient to execute.

Differentiable logic gate networks

Training binary logic gate networks is difficult because they are not differentiable.
We propose relaxing logic gate networks to differentiable logic gate networks to allow for gradient-based training.
Relaxation involves replacing hard binary activations with probabilistic activations.
Logic gates are replaced by computing the expected value of the activation given probabilities of independent inputs.
Choice of logic gate is represented by a categorical probability distribution.
Output neurons are aggregated to produce graded outputs.

Training considerations

Randomly initialize connections and parameters of each neuron
Draw elements of w from standard normal distribution
Use same number of neurons in each layer (except input)
Use 4-8 layers
Train with Adam optimizer at constant learning rate of 0.01
Discretize probability distributions by taking mode
Most neurons converge to one logic gate operation
Classification: group output into k groups, count number of 1s
Regression: use affine transformation to transform range of predictions

Remarks

Computational detail: use larger data types instead of Boolean
Computational speedup on CPUs and GPUs
Memory considerations: only need to store 4-bit information
Pruning model for speedup, but requires storing connections
Use Adam optimizer for training

Current limitations and opportunities

Differentiable logic gate networks have higher training cost than conventional neural networks
Practical computational cost can be reduced
Asymptotically, differentiable logic gate networks are cheaper to train
Edge computing and embedded machine learning require small architectures
Training cost is not a concern for edge computing and embedded machine learning
Experiments performed on MONK, Adult Census, Breast Cancer, MNIST and CIFAR-10 data sets
Method outperforms logistic regression and larger neural network on MONK-3

Adult and breast cancer

Method performs similarly to neural networks and logistic regression on Adult Census data set
Method achieves best performance and fastest inference speed on Breast Cancer data set

Mnist

Our logic network achieves better performance than the fastest method (FINN) while using less than 10% of the number of binary operations.
Our model is 12x faster than FINN on an NVIDIA A6000 GPU, even though it only achieves 7% utilization.
Our model requires substantially fewer operations than each of the baselines.

Cifar-10

Benchmarked method on MNIST and CIFAR-10
Reduced color-channel resolution of CIFAR-10 images
Employed binary embedding
No data augmentation or dropout
Outperforms neural networks with small margin
Requires less than 0.1% of memory footprint
In comparison to best fully-connected neural network baselines, does not achieve same performance
Sparse neural networks more competitive with respect to speed

Distribution of logic gates

Constant 0/1 operator is rarely used and not present in the last layer
First layer has more ‘and’, ’nand’, ‘or’, and ’nor’
Second and third layers have more ‘A’, ‘B’, ‘¬A’, and ‘¬B’s
Last layer has more ‘xor’ and ‘xnor’
Implications (e.g., A ⇒ B) are rarely used

Conclusion

Presented novel approach to train logic gate networks
Allows for efficient neural networks with high accuracy
Source code released to foster future research
Output bits can be aggregated into binary number to reduce memory bandwidth
Training times and standard deviations provided
Trade-off between depth and accuracy similar to regular neural networks
Sparsity arises from binary logic gate operators and number of pairs of neurons covered
Randomly selected sparse connections work well
T-norms and T-conorms used for real-valued logics
Axioms for multi-valued logics defined
Results on MONK, Adult, Breast Cancer, MNIST, and CIFAR-10 data sets
Inference times per data point for 1 CPU thread
GPU and CPU used for inference times

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Logic gate networks#

Differentiable logic gate networks#

Training considerations#

Remarks#

Current limitations and opportunities#

Adult and breast cancer#

Mnist#

Cifar-10#

Distribution of logic gates#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related work

Logic gate networks

Differentiable logic gate networks

Training considerations

Remarks

Current limitations and opportunities

Adult and breast cancer

Mnist

Cifar-10

Distribution of logic gates

Conclusion