Why Fundamentals Matter More Than Frameworks

Frameworks change. Principles persist. PyTorch replaced Theano. TensorFlow 2 adopted eager execution. JAX emerged. New abstractions will arrive. The learner who anchors to a framework will spend a career migrating. The learner who anchors to fundamentals will migrate once—from one syntax to another—while the underlying ideas remain. This post examines why fundamentals matter more than frameworks, and what that means in practice.

What Fundamentals Actually Mean

Fundamentals are the invariants. They do not depend on a particular library or release. They are the substrate on which everything else rests.

Linear algebra. Vectors, matrices, dot products, matrix multiplication. Eigenvalues and eigenvectors. The geometry of transformations. Neural networks are linear algebra with nonlinearities. Convolutions are matrix operations with structure. Attention is a weighted sum of value vectors. Without linear algebra, you are manipulating tensors by shape alone. You have no intuition for what the operations mean or why they work.

Probability. Distributions, expectations, Bayes rule. Likelihood and maximum likelihood. Uncertainty quantification. Machine learning is probabilistic modeling. Loss functions have probabilistic interpretations. Regularization has a Bayesian view. Sampling, variational inference, and generative models assume probability fluency. Without it, you use tools without understanding what they optimize for or what the outputs represent.

Optimization. Gradients, gradient descent, the chain rule. Convex vs nonconvex. Learning rates, momentum, adaptive methods. Every trained model is the result of an optimization process. When training fails—divergence, plateau, instability—the cause is usually in the optimization. Without optimization fundamentals, you cannot reason about failure. You can only try different hyperparameters and hope.

Model assumptions. What does a linear model assume? What does a neural network assume? Inductive bias, generalization, overfitting. Every model encodes assumptions about the data. When the model fails, the assumptions are often wrong. Understanding assumptions lets you choose models and interpret results. Ignoring them turns modeling into trial and error.

Framework Abstraction Layers

Frameworks provide layers of abstraction. Each layer hides complexity. Each layer also hides understanding.

Autograd. Automatic differentiation computes gradients for you. You write a forward pass; the framework traces it and builds the backward pass. This is powerful. It is also opaque. When gradients vanish or explode, when a parameter does not update, when the backward pass is wrong—do you know why? Autograd assumes you understand what a gradient is and what the chain rule does. Without that, you are debugging a black box.

Pretrained models. Load a model. Fine-tune it. Deploy it. The barrier to a working system is low. The barrier to understanding is high. What architecture is it? What did pretraining optimize for? What does fine-tuning change? When the model misbehaves on your data, these questions matter. Pretrained models are leverage. Leverage without understanding is borrowed—and the debt comes due when something breaks.

High-level APIs. One line to train. One line to evaluate. Abstractions that hide the training loop, the loss computation, the optimizer step. Convenient for standard cases. Useless for nonstandard ones. When you need to modify the training loop, add a custom loss, or debug a gradient issue, the high-level API retreats. You need the low-level primitives. You need the fundamentals that those primitives implement.

The Danger of Black-Box Learning

Learning at the abstraction layer alone has predictable failure modes.

Debugging blindness. When the model underperforms, the framework learner has limited tools. Check the data. Change the learning rate. Try a different architecture. The fundamental learner can inspect gradients, verify the forward pass, reason about numerical stability, trace the loss landscape. Debugging is hypothesis testing. Hypotheses require understanding. Without it, you are guessing.

Inability to reason about failure. Why did the model diverge? Why does it overfit on small data? Why does it fail on out-of-distribution inputs? These questions have answers. The answers live in the math: optimization dynamics, generalization bounds, distribution shift. Framework fluency does not provide them. It provides APIs. When the API does not solve the problem, you need to go deeper. Deeper requires fundamentals.

Shallow pattern recognition. You learn that "X usually works" or "Y is the standard approach." You do not learn why. Shallow pattern recognition transfers poorly. A new domain, a new constraint, a new failure mode—and the patterns no longer apply. Fundamentals transfer. Linear algebra is the same in vision and language. Optimization is the same across architectures. The learner who has internalized principles can adapt. The learner who has memorized patterns cannot.

Historical Example: CNNs Before PyTorch

Consider convolutional neural networks. The architecture—convolutional layers, pooling, fully connected heads—was established by LeNet, refined by AlexNet, extended by VGG and ResNet. The ideas were stable. The frameworks were not.

In the mid-2010s, practitioners used Caffe, Theano, or early TensorFlow. Caffe defined networks in protobuf. Theano used symbolic computation. TensorFlow built static graphs. Each framework had different syntax, different conventions, different pain points. Migrating a model from one to another required rewriting code—but the model itself, the math of convolutions and backpropagation through them, did not change. A practitioner who understood convolutions—what they compute, why they work for spatial data, how backprop flows through them—could port knowledge across frameworks. A practitioner who only knew Caffe’s API could not. When PyTorch arrived with dynamic computation graphs and a more intuitive interface, the fundamental practitioners adopted it quickly. The framework-locked practitioners had to learn again from scratch.

The pattern repeats. New frameworks will replace current ones. The math of convolutions, attention, and optimization will remain. Fundamentals are the portable layer. Frameworks are the implementation detail.

A Balanced View

The argument is not to avoid frameworks. It is to use them from a position of understanding.

Use frameworks. Implementations matter. PyTorch, TensorFlow, JAX—they are well-tested, optimized, and widely used. Writing everything from scratch is inefficient. Production systems require framework-level tooling. The goal is not purity. It is competence.

But understand the math. Before relying on an abstraction, understand what it abstracts. Know what a gradient is before you use autograd. Know what a convolution computes before you use Conv2d. Know what cross-entropy is before you use it as a loss. The understanding does not have to be exhaustive. It has to be sufficient to reason about behavior and failure. Frameworks amplify understanding. They do not create it.

The balanced approach: learn fundamentals first, then adopt frameworks. Use frameworks for productivity. Use fundamentals for debugging, adaptation, and depth. When a new framework arrives, you migrate the syntax. The understanding carries over.

Closing

Tools amplify understanding. They do not replace it. A practitioner with strong fundamentals and framework fluency can build, debug, and adapt. A practitioner with only framework fluency can build until something goes wrong—and then stalls. The investment in fundamentals pays compound interest. Each new model, each new framework, each new problem becomes easier because the substrate is solid. Frameworks change. Principles persist. The choice of what to prioritize is not academic. It determines how far you can go.