Programming as a Learning Tool in STEM

I think many students in math, physics and chemistry are probably familiar with the following scenario (I certainly am!) You're revising for an exam. You read a page of notes, and think something like this to yourself: "I know this stuff, it's easy, I don't need to look at it!" Then, when you get the exam questions in front of you, that thing you chose not to revise is asked about - and you have no idea how to solve the problem!

What happened? Just yesterday, you were sure that you can do it with your eyes closed while simultaneously riding a unicycle and juggling 5 beer cans. Now, you suddenly have to struggle to even get started.

While there are surely many reasons for these sorts of lapses in memory, I think one of them is that math can be deceptive. What I mean is that, if you see a problem solved once, and you remember the relevant equation, you may be fooled in to thinking that you really understand what is going on - when in fact your understanding is completely superficial.

Consider the (stationary) Schrödinger equation:

\begin{align} \hat{H}|\psi \rangle = E|\psi \rangle \end{align}

It's a simple equation. All you need to do is find the eigenvectors of the Hamiltonian H! That's what you'd do in an introductory linear algebra course, so quantum mechanics must surely be trivial, right? Innocent though it seems, countless books have been written just on the subject of extracting predictions from this very equation. Buried beneath the beautiful formalism is a great amount of complexity.

This may not always be evident to students, though. In an introductory course, you'd perhaps solve the Schrödinger equation for a couple of trivial systems (particle in a box, etc) which can be done on pen and paper, and then you move on. A real feeling for how the equation works is not necessarily developed until much later, if ever.

According to a pretty convincing amount of research (see here, for instance), active learning methods produce superior outcomes compared to, say, just a guy lecturing for 2 hours straight. "Active learning" means that the student does something other than just listening; traditional homework problems are in some sense a form of active learning. But homework problems in physics traditionally only include exercises doable on pen and paper, unless it's specifically a computational course.

This is where I think there is room to use programming as a learning tool. In order to implement a physics equation on a computer, you have to understand the solution step-by-step; otherwise, you won't be able to program it. (I've explained how to do this for Schrödinger equation in this post.) I have always found translating things in to a machine-understandable form to be an excellent learning strategy, as it forces you to be explicit.

For some time, I thought I was alone in these thoughts, as that's not how physics was taught to me. I then discovered the book Structure and Interpretation of Classical Mechanics (SICM), a sibling of the famous computer science book Structure and Interpretation of Computer Programs.

Here's what the authors of SICM say in their introduction:

Computation also enters into the presentation of the mathematical ideas underlying mechanics. We require that our mathematical notations be explicit and precise enough that they can be interpreted automatically, as by a computer. As a consequence of this requirement the formulas and equations that appear in the text stand on their own. They have clear meaning, independent of the informal context. [...] That the mathematics is precise enough to be interpreted automatically allows active exploration to be extended to it. The requirement that the computer be able to interpret any expression provides strict and immediate feedback as to whether the expression is correctly formulated. Experience demonstrates that interaction with the computer in this way uncovers and corrects many deficiencies in understanding.

In this spirit, I will now apply this learning technique to a typical quantum mechanics problem, minimization of a total energy functional. The functional we want to minimize is:

\begin{align} E[\rho] = \int _0^L \frac{\pi ^2}{6}\rho ^3(x) + \frac{\lambda}{4} \rho (x) - \mu \rho (x) \mathrm{d}x \end{align}

This functional roughly describes the interaction of 1-dimensional electron gas with electron density \( \rho (x) \). What we vary is precisely the electron density. Here, \( \lambda \) is the interaction strength and \( \mu \) is the Lagrange multiplier that ensures the number of electrons is correct. We'll suppose there are 10 electrons, though you could pick pretty much any number, and let's set \( \lambda = 1 \). As boundary conditions, let's suppose the density is zero at the edges of our simulation area.

Now, this is a calculus of variations problem, meaning we're looking for a function \( \rho (x) \) that minimizes our functional. But in this case, there is really no way to find the solution exactly (or at least you can very easily make this a functional where pen-and-paper doesn't work). How does one approach a problem when just plugging things in to the Euler-Lagrange equation doesn't work?

Well, we say we want to find the density \( \rho (x) \) which minimizes the energy functional. Since the density is a continuous function, it contains an infinite number of points, which the computer can't handle as an optimization problem. Further, we are constrained by the fact that the electron density must, of course, be positive everywhere.

How do you convert a continuous function to a finite number of parameters to minimize over? Easy: suppose your function is a combination of some already-known functions, like so:

\begin{align} \psi (x) = \sum _{i=1}^N c_i \phi_i (x),\\ \rho (x) = \psi(x) ^2 \end{align}

Two birds with one stone: since we set \( \rho (x) = \psi (x)^2 \), the density will always be positive. And since we already know the functions \( \phi _i(x) \), we only need to find the appropriate \( c_i \)s. In order to also comply with our boundary condition, we choose \( \phi _i (x) \) so that it's zero at the edges of our grid. Satisfying these constraints nicely is the set of functions that are the solutions of the particle in a box problem, which are in our case:

\begin{align} \phi _i (x) = \sqrt{\frac{2}{L}}\sin (\frac{ix\pi}{L} ) \end{align}

and which are taught to all undergraduates in physics.

So, let's implement this. I choose here to use the programming language Julia with the packages "Plots.jl" and "Optim.jl". First, let's implement the basis functions (we also set the number of basis functions to 60):

using Optim
using Plots
L = 8.0
x = range(0.01,stop=7.99,length=1000)
function basisfunc(i)
    p = sqrt(2.0/L)*sin.(i*x*pi/L)
    return p

Well, that was simple. And it is just as easy to implement \( \psi \) given a set of coefficients \( c_i \):

function psi(c)
    p = zeros(1000)
    for i = 1:nfuns
        p = p + sqrt(2.0/L)*c[i]*sin.(i*x*pi/L)
    return p

Now, we have to do some numerical integration to implement our energy functional. We will set \( \mu =10\) as a test case. Recall that the integral may be defined with Riemann sums of the type

\begin{align} \int _a^b f(x)\mathrm{d}x \approx \sum _{i=1}^N f(x_i) \Delta x, \ \ \Delta x = \frac{|b-a|}{N} \end{align}

the logic being that we divide our interval [a,b] to a finite set of points (amount N), and then take the limit as \( N\rightarrow \infty \). Therefore, we can do numerical integration easily by just taking a finite sum! Having grasped this, the implementation for the energy functonal is once again trivial:

function energy_tf(c)
    p = psi(c)
    en = sum(0.25*p.^4.0+pi^2.0/6.0 * p.^6.0-10.0*p.^2.0)*(x[2]-x[1])
    return en

(remember that \( \psi (x)\) was the square root of \( \rho(x) \)). Now, we have defined our energy functional, and the minimization is just a matter of finding the zeroes of the derivatives, a task that is, in principle, not much more complicated than finding the maximum of some function using a derivative.

We can leverage algorithms in the package "Optim.jl" to do this minimization for us. To do that, it is good to also provide these derivatives explicity. Hence, we take the derivative with respect to the coefficients \( c_i \). The implementation is easy:

function gradient_tf!(G,c)
    p = psi(c)
    for i=1:nfuns
        bs = basisfunc(i)
        G[i] = sum(bs .* p.^3.0 + pi^2.0 .* bs .* p.^5.0 - 20.0*bs .* p)*(x[2]-x[1])

Finally, we call the Optim.jl routine and plot the resulting density:

initial = 0.1*ones(nfuns)

res = optimize(energy_tf,gradient_tf!,initial,LBFGS(),Optim.Options(iterations=50))
finalrho = psi(Optim.minimizer(res)).^2.0


Here is a picture of the resulting density:

Result of the density minimization

Such an exercise, when explained by an instructor, serves to highlight several points, including:

Now, one could also give as an exercise to find the correct \( \mu \). We used \( \mu=10 \), but that results in a number of electrons that isn't exactly 10, as can be seen with:


In fact, it turns out that increasing \( \mu \) will increase the number of electrons, and lowering it will decrease the number of electrons - there's a lesson lurking there as well, relating to what this sort of Lagrange multiplier means physically ;)

Of course, to use this sort of thinking as a learning tool, it is not necessary to actually go as far as implementing these ideas on the computer - but that is a gratifying way to make sure you're right, and it also allows you to explore interesting problems that are beyond standard analytical tools taught to students.

In conclusion, I think there is room for more material like the Structure and Interpretation of Classical Mechanics. Perhaps this time in a language that is more likely to be used by STEM students, like FORTRAN 77 Julia or Python.