A failed experiment of compressing weights inside a neural network
In neural networks there seem to be a lot of weights to express what we want. So I thought of implementing the most basic feature, namely matrix multiplication, in a compressed version. So instead of just multiply two matrixes we compress the weight matrix and multiply this. I used wavelet decomposition with a threshold. My approach was successful in making the weight matrix 75% zeros during training.
I used something like:
coeffs = ptwt.wavedec(w_flat, "haar", mode="zero")
coeffs_thresh = tuple(
torch.where(torch.abs(c) < threshold, torch.zeros_like(c), c)
for c in coeffs
)
w_compressed = ptwt.waverec(coeffs_thresh, "haar")
While the network was the following:
class NetFC(nn.Module):
def __init__(self):
super(NetFC2, self).__init__()
self.fc1 = CompressedLinear(784, 128, zero_fraction=0.75)
self.fc2 = CompressedLinear(128, 64, zero_fraction=0.75)
self.fc3 = CompressedLinear(64, 10, zero_fraction=0.75)
def forward(self, x):
x = x.view(-1, 784)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
x = F.relu(x)
x = self.fc3(x)
return x
This network has around 100k parameters from which 75k will be zero.
But the performance drop was too high. I could only get my MNIST loss to 0.55 whilst a network using no compression but just less (25k) parameters easily reached 0.15.
So it was an interesting approach, but not successful.