Reinventing Programming Wheels

· Sam Ireland

When I was learning the Python programming language, it took me a very long time to really get that to learn any programming language - or learn most things really - it’s not enough to just read and make notes, occasionally trying the odd example on your own computer. You need to be doing something with the stuff you are learning if you are going to learn anything - you need to have your own projects running in parallel rather than just be consuming information.

This is partly because working on projects is just more fun than soaking up abstract knowledge, so you end up finding it easier to devote time to it. Mostly though, it’s because it’s the act of thinking how you would do something, and actually coming up with your own ideas, that cements ideas and concepts in your mind. People just remember things better that way.

I sometimes look back slightly frustrated at how long it took to really start doing that. I started learning Python in 2010, and made fairly faltering, on-and-off-again progress until late 2014, when I started having projects be the main way I learned Python. I probably made more progress in the following months than I had in the four years previously.

One of the main reasons why I didn’t really attempt writing my own bits of software for that long period of time is a problem that I think a lot of beginners have - pretty much everything has already been done. Or at least it can seem that way, especially when you limit yourselves to ideas that you can feasibly do at that early stage of learning a programming language.

You might learn how to open, manipulate, and save files, and think ‘wouldn’t it be cool to write a script that can encrypt a file’ using this knowledge - and then you google it and find that there are hundreds of projects that do just that. Or you might see a piece of software that does your idea with much more sophisticated tools than you would have used, by people with a much better knowledge of the language than you have, and swiftly conclude that there would be no point starting. I know all of the above certainly happened to me.

So you decide to just wait until that magical idea comes along that is somehow both easy for a beginner to accomplish, and has never been done before. But it probably never will, because there’s basically no such thing, so this approach is useless.

Thankfully I managed to get out of this mindset, and have deviated far to the other end of this spectrum in the past few years - I no longer care much whether an idea I have has been done before much, to a fairly extreme extent. Don’t Reinvent the Wheel can be solid programming advice in many cases, but in my hobby projects, it’s advice I heed less and less.

A casual glance at some of these projects reveals this starkly. atomium is a Python library for dealing with molecular structures - in much the same way that the well established library BioPython does. inferi and points are mathematical libraries for dealing with statistics and linear algebra respectively, which the very widely used (and excellent) library NumPy does perfectly well and has done for a long time. Many of the web projects I am working on would be competitors to existing websites.

But I consider the time I have spent on these to be very well spent, for a few reasons.

Firstly, in some cases I just straightforwardly think my implementation has advantages over the established solution. This is certainly the case with atomium - BioPython has more features (for now) but I think atomium has a better, cleaner API, and is more rigourously tested. And it has the exact features that I need for my PhD work because, well, of course it does - I made it.

But more importantly though, as someone who is still very much learning about Python and programming in general, I have learned vastly, vastly more by creating my own implementations of these things, in two ways:

  1. You learn more about the programming language itself. Books are helpful for teaching you how to write Python, but to learn how the language and community and everything else fits together - how software is distributed, why certain design choices are preferred over others, etc. - you need to write your own libraries, document them, write tests for them, publish them, and develop them over time.
  2. You also learn more about the subject matter itself. A lot of the work I do is in Structural Biology, and writing a PDB file parser was absolutely the best way to learn about PDB files. Writing a linear algebra software library is by far the best way to make sure you really understand linear algebra on a fundamental level. There’s a saying that you don’t fully understand something until you can explain it someone else in your own words - I would add that you don’t fully understand a technical subject until you can code up your own implementation of it.

I think the time spent on these projects could easily be written off as time spent learning, if anything.

One day this might come to an end, and I might conclude that the learning benefits I get from this attitude to programming have diminishing returns, and maybe I’ll decide to spend more time on things that have never been done before. But for now, I think I’m getting a lot out of it as a policy.

So my advice to anyone learning a programming language as a beginner is: if you have an idea for a project or piece of software - don’t google it, or worry how it would compare with a similar project made by someone who knows more. Just make it. It feels fantastic.