George Watson-Hyde | Science

My new paper: How combining simulation and experiment sheds light on DNA–protein interactions

2021-08-10T22:04:00+01:00

The first paper from my PhD has finally been published! Using an exciting combination of advanced simulations and microscopy, the paper reveals the multiple ways in which a protein found in most bacteria bends DNA and demonstrates that the protein can hold together two separate DNA helices. This has some important consequences for our understanding of DNA organisation in bacteria and the stability of infectious bacterial colonies, and the tightly coupled combination of experiment and simulation presents a promising foundation for future studies into other important biological systems. Unfortunately, a scientific paper is by its very nature a relatively dry, technical document, but the fruits of science belong to us all and I think it’s important that we share our research outside the academic bubble. With that in mind, please do sit tight while I walk you through our work in terms I hope a scientifically curious layperson can understand.

For those who want to skip straight to the paper, it’s published (open-access) in Nucleic Acids Research—the full citation is

Yoshua S B, Watson G D, Howard J A L, Velasco-Berrelleza V, Leake M C, Noy A 2021 “Integration host factor bends and bridges DNA in a multiplicity of binding modes with varying specificity” Nucleic Acids Research gkab641 doi:10.1093/nar/gkab641

My role in the project revolved around the simulations—I set them up, ran them, fixed them when they went wrong, and analysed the results. Dr Sam Yoshua did the same for the experimental portion of the project, and we’re considered joint first authors, however little sense that phrase may make. My portion of the work is discussed in more detail, with more in-depth explanation of the methods, in my PhD thesis, and Sam’s in part of his. Meanwhile, Dr Agnes Noy and Prof. Mark Leake designed much of the project, secured funding, and provided invaluable support and expertise throughout.

A final note before we begin: While Sam and I created the vast majority of the images in the paper, the copyright is now held by Oxford University Press, publishers of Nucleic Acids Research, who have very kindly granted permission to use them under a Creative Commons Attribution (CC-BY) licence. It is under those terms that I share my own work here.¹

If you feel like you already have a good understanding of DNA, proteins, molecular dynamics, and atomic force microscopy, feel free to skip right to the results. Otherwise, I’ll do my best to give you the background knowledge you need.

Right, onto the good stuff!

Background

I’m sure you already have at least a vague grasp of what DNA is. It is perhaps the single most important and interesting molecule in the universe, for it contains all of the information that makes (most known) living things what they are. Composed of two very long strands held together by paired “bases” containing the genetic code, and neatly packaged into the iconic double helix, the storage and maintenance of DNA is a perpetual problem for all life.

The problem is, one must store a lot of information to make a functioning organism—the human genome consists of around 6.4 billion base pairs with a total length of around two metres. And there is a separate copy of that genome in every one of your cells,² each of which has a diameter of less than 100 micrometres. These numbers are not easy to comprehend, but the point is that genome packaging is a very important and difficult problem. How do you fit that much DNA into a cell while still ensuring you can access the bits you need when you need them?

Eukaryotes—complex organisms like plants and animals—have converged on a complicated solution to this problem, involving the wrapping of DNA around special proteins called histones. Bacteria have their own solution, involving a range of “histone-like” proteins collectively called “nucleoid-associated proteins” (NAPs), which bind to DNA and wrap it around themselves.

Of course, bending DNA like this affects how it behaves. Some DNA is easy to access, allowing it to be read, processed, and used to make proteins, while other DNA is inaccessible to the cellular machinery. Understanding how DNA is packaged, and the factors affecting this, means being able to predict and control which genes will be expressed and when. To understand this better, we studied one of these proteins, called integration host factor (IHF), which is common across a wide range of bacteria and has a lot of features that make it a good example of a general NAP. IHF creates sharp U-turns in DNA, bending it back on itself by about 160°, but (spoiler alert) I’ll soon reveal that there’s more to the story than that.

Another interesting thing about IHF is its presence in some bacterial biofilms—sticky colonies of bacteria surrounded by a protective matrix that makes them particularly pernicious pathogens, causes a lot of problems when they get into your body, and enhances their resistance to antibiotics. In some of these biofilms, large amounts of DNA and proteins are pumped out of the cells, forming a three-dimensional scaffold of DNA strands that supports the structure. IHF is often found at the points where these strands cross, and removing IHF causes the biofilms to collapse. How does IHF perform this important role in stabilising biofilms?

To answer these questions, I worked closely with Sam to develop a set of simulations and experiments that work in parallel to complement one another. My part of this was a simulation technique called molecular dynamics, which uses supercomputers to simulate the movements of each individual atom making up a biological system, while Sam used a technique called atomic force microscopy (AFM), which uses a tiny needle to trace the surface of an object and construct a detailed heightmap. If you just want the biology, feel free to skip the next couple of sections, but I think it’s valuable to understand the way in which these results came about, so I’ll try to give you a brief overview of (my understanding of) these techniques next.

How molecular dynamics works

Molecular dynamics (MD) simulations are a favourite technique of mine. I provide a detailed technical overview in chapter 2 of my thesis, but I’ll give a briefer overview in terms normal humans can understand here.

MD is a way of modelling the movements of atoms and molecules by simulating their movement over a series of very short time steps. After providing an initial structure, the basic steps are:

Calculate the force between every pair of atoms, and how much they should be accelerating.
Move the atoms by the amount they should move in our small time step.
Make sure everything stays in the box.
Repeat lots and lots³ of times.

That makes it sound easy, but each of these steps hides a lot of complexity. Lots of things can affect the force between two atoms, but the most important are:

The distance between them—too close and they’ll repel each other strongly; too far apart and they’ll be gently attracted to each other
Their charge—like charges repel each other, while opposites attract
Covalent bonds—these are what hold molecules together, and come with complicated restrictions on lengths and angles

There are a huge range of force fields that model these interactions,⁴ and people keep tweaking them to make the results even more accurate. There are a lot of other forces that you might be surprised to see excluded, like gravity; in reality, these are simply far too weak at these scales to make any difference to the system, and it would be a waste of precious computation time to calculate them.

The other big problem is that, in the words of father of science Thales of Miletus,⁵ “all things possess a moist nature”. That is, water is everywhere. DNA and proteins are not, much to the dismay of my fellow physicists, spherical objects floating in a vacuum; they are surrounded by salty water, and this is very annoying because of just how bloody much of it there is. A cube just big enough to contain just 10 base pairs of DNA would be 85% water, and this percentage only increases as the DNA gets longer.⁶

All this water has a big effect on the behaviour of the system, but I’m not interested in the water itself and really don’t want to simulate it if I can help it. It’s possible to approximate the effect of the water without considering all the individual water molecules using an “implicit solvent” model, allowing the simulation of bigger systems, and that’s what I did for my large-scale simulations, but some of the advanced techniques I’ll talk about later don’t work if you do that. You probably don’t need to know any of this, but I’ve written it now so there you go.

Anyway, even without all the water, interesting systems have a lot of atoms and we need to consider the interactions between every pair.⁷ That’s a lot of lots. Simulations the size of mine have to be run on a supercomputer, normally using software designed to misuse graphics processing units (GPUs). Even then, it normally takes about a week for a simulation to finish.

The experiments: Atomic force microscopy

I’m afraid we’ve reached the bit of this post I don’t know very much about, but it would be remiss of me to ignore completely the experiments that made this work possible. This wasn’t simply a case of experiments confirming simulations, or vice versa, but of the two working in parallel to fill in each other’s gaps and reveal things that would be inaccessible to any one technique on its own, so it’s important that I try to do justice to the work Sam did.

The technology is called atomic force microscopy (AFM), but if you’re thinking of a microscope like the ones you probably used at school you’re thinking of the wrong thing. Rather than using lenses and mirrors to make things look bigger, AFM works by running a very sharp tip over a surface to measure its height at every point; the elderly among you may find it helpful to imagine a record needle running over the tiny bumps that correspond to your favourite song.

It makes no sense to me, and I really can’t get my head around it, but somehow this manages to be up to a thousand times more precise than what is physically possible with a light-based microscope, allowing incredibly high-resolution imaging. The result is a two dimensional image where light and dark spots represent hight above the surface. By sticking a piece of DNA to a flat surface, it’s possible to obtain a high-resolution image of the overall shape of the DNA, allowing it to be measured very accurately—but you can only see it from above, like a map before Street View, and even this most impressive of techniques can’t resolve individual atoms.

But remember how my simulations include all the atoms? By cross-referencing between the two techniques, it’s possible to demonstrate that my simulations accurately reflect reality—because the overall structure looks like the AFM images—and use the simulation data to fill in the information the experiments can’t see.

Clever, isn’t it?

What we actually did: Observing the binding modes

This paper describes a set of simulations and AFM experiments we performed to investigate DNA–IHF binding.

First, we chose a DNA sequence of around 300 base pairs—big enough to see using AFM but just about small enough to simulate in a vaguely acceptable amount of time—containing a binding site for IHF. Then, Sam mixed them together in a test tube or something while I built a sensible initial structure using an experimentally-determined structure of IHF bound to DNA as a starting point. A close-up of that starting structure looks like this:

The pink and sort-of-cyan bits are the protein, with a compact body and two arms that give the DNA (the black thing, with some special sequences highlighted in red and blue) a nice hug. The DNA here starts off perfectly straight—it wouldn’t be like that in real life, but it works fine as a starting point for a simulation. The body has lots of positive charges on its sides, which are what all those arrows point to, and the negative charge of DNA means it will naturally be attracted to those positive charges (remember, opposites attract)—that’s how IHF bends DNA so sharply.

And here’s how it looks at the end of a simulation:

This is exactly what we expected to see—the sharp U-turn I described earlier, with the DNA fully bound to both sides of the protein. This looks a lot like the structures that have already been determined through experimental methods like X-ray crystallography. In fact, we can quantify just how similar they are using something called the root-mean-square deviation (RMSD), effectively the average distance of each atom from where we’d expect it to be. A normal value for a converged simulation might be around 5 Å (the Å symbol represents ångströms—one ångström is about the width of an atom). You’re unlikely to get much less than that purely because atoms are always in motion, jiggling and jostling with thermal energy.

The two lines in this graph represent the two different ways of modelling the water. The important thing is that they both converge on 5 Å, just as we hoped, indicating that the simulations do accurately reproduce the known structure. That means it’s probably safe to keep using them to draw some new conclusions.

Meanwhile, the real pieces of DNA, by now well mixed with IHF, were stuck to a surface and imaged using AFM. Just by looking at the pictures with our eyes, we spotted three different types of structure in both methods:

The top row, the slightly fuzzier images, are some of Sam’s AFM images. The bottom row are corresponding frames from my simulations. We both see what looks like the same three states: the fully wrapped one we already knew about (on the right), and two new ones that are a bit less bent. But confirmation bias is a problem: We might just be seeing what we want to see. How can we put a number on it?

Measuring bend angles

The first thing we did was to measure the angle by which the DNA is bent. To do this, we both traced a straight line of about the same length each side of the protein and measured the angle between them. Of course, the AFM image is basically two-dimensional, and the DNA is stuck to a surface, so I projected my simulated structures into a plane—a bit like looking at their shadows—to approximate the same effect.

We measured these angles for every DNA molecule visible in the AFM images and every time step of my simulations (I ran multiple simulations to make sure I sampled as much as possible). The distribution of bend angles in the AFM data looks like this:

If you’re not familiar with histograms, the horizontal axis represents possible bend angles, and a taller bar means angles in that range are more common. There’s a big mass of barely-bent DNA on the left corresponding to DNA with no IHF bound to it—that’s going to happen in experimental data and can be mostly ignored. To the right of that, however, we see three distinct peaks. That implies we’re seeing something real!

When I do the same analysis on my simulation data, I also get three peaks at very similar bend angles, but there’s a better way to work with atomistic data: hierarchical clustering. Rather than reducing states to a single measurement, like a bend angle, I can directly compare how similar two simulation frames are by measuring the RMSD between them. By doing this for all the frames across all my simulations, and merging the most similar pairs until I was left with a small number of clearly distinct states, I was able to look in detail at the structures of the three different states. These are them:

The mean bend angles of these states are pretty close to the values measured from the AFM images! That makes it pretty likely we’re looking at the same thing. These simulations give us the structures of the three states, and tell us exactly how the DNA and protein interact in each one. As well as the original “fully wrapped” state, we have a “half-wrapped” state in which the DNA only binds on one side of the protein—and it’s always the same side (which we’ll call the left)—and another state in which the DNA on each side binds only to the top part of the protein; we called this the “associated” state.

Sequence specificity

There’s something interesting about these binding modes. Well, there are lots of interesting things about them, but there’s one very interesting thing: Why does the left arm sometimes bind without the right arm, but never the other way round?

On the face of it, IHF is a pretty symmetrical protein, but this is decidedly asymmetrical behaviour. Even more interestingly, it has a strong preference for a particular DNA sequence, and that sequence is located on the right arm, which means it doesn’t even interact with the protein in the new states.

For that to make sense, we’d expect to see the associated and half-wrapped states—but not the fully wrapped state—even in pieces of DNA without that sequence. This is where we bring the AFM data back in. For a different DNA sequence without an IHF binding site, the angle distribution looks like this:

The peaks corresponding to the associated and half-wrapped states are there, clear as day, but the fully wrapped state is missing—just as we expected. This lets us say with a pretty high degree of certainty that these states involve nonspecific binding with no sequence preference. That is, IHF can bind to DNA even if that sequence is missing, but the strong bend for which it is famous is possible only at certain special DNA sites.

Investigating the asymmetry using free-energy calculations

This confirms that our interesting asymmetry is real, but it doesn’t tell us much about it. Just looking at static structures or trajectories doesn’t give us much insight into the binding dynamics. To learn about that, we need to think about free energy.

Free energy is one of the most fundamental concepts in physics and chemistry. The two driving forces of the evolution of physical systems over time are the desire of a system to minimise its internal energy (\(U\)) and maximise its entropy (\(S\)); the balance of these is captured by the (Helmholtz) free energy, \(A\):

\[A = U - T S,\]

where \(T\) is the system’s temperature. A transition between two states can occur spontaneously only if it results in a smaller value of \(A\).

This is very exciting because it tells us which states are stable and which are not. A state with a large free energy compared to those around it won’t occur in nature—and if it does, it won’t last very long. Meanwhile, states with a lower free energy than those around them will probably stay stable for a long time.

What do I mean by a “state”? We can think about all sorts of things that have multiple states: flexible molecules that can take on multiple shapes, maybe, or pairs of molecules that might want to stick together (or might not). In the more human-sized world, a ball placed on the side of a steep hill is in an unstable state, and will quickly roll down to a much more stable position at the bottom of the hill; once it’s there, it’s never going to roll itself back up to the top. A ball perched precisely at the top of the hill might be happy to stay there forever, but just the tiniest nudge will send it tumbling all the way down; a state like this is called “metastable”.

It’s actually very helpful to think in terms of “landscapes” of hills and valleys, where altitude represents free energy. You’ll see some lovely examples of these landscapes shortly.

In principle, I could just run a ridiculously long simulation and keep track of how long the system spends in each state. Since it should spend much more time in states with lower free energy, it’s simple to convert between the two. The problem with that is that the simulation could get stuck in a free-energy valley—it’s very unlikely to climb the sides of the valley in any reasonable amount of time. The probability of the necessary states is just too low. So we’ll sample this one valley really well and never head off to explore the rest of the landscape.

To get around this problem, we can force our simulation to explore the whole landscape by adding an artificial force to literally drag it through all the states. We know how it should behave under the influence of our new force alone; by looking at how the behaviour deviates from this, we can get information about the underlying free-energy landscape. This is called the weighted histogram analysis method, or WHAM.

What we’re interested in is the binding of the DNA to the protein, so it makes sense for our new force to act between a pair of atoms—one towards the bottom of the protein, and one on the DNA. Pulling these closer together forces the DNA and protein to come into contact. Of course, our system has two sides, so we have to do this on both sides. This gives us two dimensions to our free-energy landscape.

Let’s look at the left arm first—that’s the one that binds to the protein in both the fully and half-wrapped states (and partially bound in the associated state too):

The red line is what happens when we let the right arm bind too; the blue line is what happens when we don’t. As you can see, they’re very similar, which means the left arm’s behaviour doesn’t depend on what the right arm is doing. The other thing you might notice is that this is quite a deep valley—a short distance (that is, a bound state) is very strongly preferred. If the protein and the left DNA arm start about 50 Å apart, they’re going to be strongly attracted to each other and quickly roll into the bound state at the bottom of the valley.

What about the right arm? That looks more like this:

Now, this is interesting. The right arm doesn’t have the same kind of deep free-energy valley as the left one. The right half of that graph is mostly flat, with the only lumps and bumps being seemingly random fluctuations on the order of thermal noise. To the left of the graph, we see the expected sharp rise as the protein and DNA get pushed too close together—two objects can’t occupy the same space and will resist very strongly any attempts to make them. But we see that we reach this point much sooner along the blue line, when the left arm is held away from the protein, than along the red line, when the left arm is free to bind.

Astute readers might have figured out what’s going on here. We’ve explained why we never see the right arm binding without the left. Until the left arm binds, there’s a physical obstacle in the way. If we overlay the structures of the fully wrapped and associated states, we can see what’s going on here:

Here, the fully wrapped state is in blue and the associated state in red. In the associated state, the protein body is noticeably tilted; the right arm would have to bend significantly to bind any farther down the protein. In the fully wrapped state, things have straightened up and both arms can bind without bending. DNA is quite a rigid polymer, so this amounts to a rule preventing the right arm from binding first.

The scale of the asymmetry might be easier to grasp if we view the landscape in 3D, which really emphasises how steep the potential for the left arm is compared to the flat right arm:

We could also look at where our three states sit on this landscape. It turns out—as we’d hope—that they’re associated with valleys and flat regions:

They’re also really nice to look at!

DNA bridging by IHF

While we were doing all this, something else interesting turned up in the AFM images. You may recall that IHF stabilises biofilms by holding pieces of DNA together. Well, we started getting pictures like these:

On the left, you can see a few small clusters of DNA with bright spots showing IHF holding them together. On the right is data for a different sequence with not one but three IHF binding sites: a huge blob of DNA and IHF!

This seems to line up with what happens in biofilms. IHF is clearly holding multiple pieces of DNA together. So I ran some simulations to investigate.

First, I set up this initial bridged structure by sort of encouraging another piece of DNA to get close to the protein:

Then, I used WHAM again to pull the second piece of DNA away. The resulting free-energy landscape looks like this:

That’s a nice deep well again: IHF loves to form bridges! about 50 Å is going to be attracted and eventually bind to the bottom of the protein in a structure a lot like the one above. Understanding this could help us to understand how to disrupt IHF’s role in biofilms and treat bacterial infections better.

A complete model of IHF binding, bending, & bridging

We now have enough information to produce a complete model of IHF’s interactions with DNA. Here it is:

So, IHF first binds to straight DNA, which we’ve labelled the “intercalated” state in the diagram, because part of the protein is intercalated (inserted between DNA base pairs); this isn’t really a real state, because our energy landscapes predict that it shouldn’t last very long. If there’s some other DNA nearby, it will really want to form a bridge. Otherwise, it will progress to either the associated or the half wrapped state; it’s not entirely clear how it picks, and there’s almost certainly a random element, but the half-wrapped state seems to be preferred. It doesn’t look like it should be possible to move between these two states, but they can both eventually progress further into the fully wrapped state.

If you’ve made it this far, hopefully you agree it’s really cool that we can figure all of this out, especially by combining simulations and experiments in this manner. I think this is a really promising approach that could be super valuable for other studies in the future, and I’d encourage you to try and break down any disciplinary barriers you can because that’s where all the most interesting stuff is hidden.

I’m always happy to talk more about science, so do get in touch if you’re interested. I’ll be Tweeting about this so you’re welcome to reply there, and there’s a comment section below.

Thank you for your time.

Mad, innit? ↩
That’s not quite true—some of your cells, such as red blood cells, don’t contain any DNA, but that doesn’t really fix anything. ↩
and lots and lots… ↩
Unfortunately, these are nowhere near as exciting and useful as the ones that block phaser fire, and look more like this. ↩
Thales here makes his second appearance in as many posts, purely coïncidentally. ↩
Actually, we don’t usually use cubes, and a whole section of my thesis is devoted to the interesting properties of the truncated octahedron, but that’s not really important. ↩
Sometimes it’s okay to set a cutoff distance beyond which the interactions are basically zero so there’s no need to calculate them, but often not, and we still need to consider a lot of atoms either way. ↩

Why I’m leaving academia

2021-07-09T19:07:00+01:00

I love science. I love the academic environment and I love the work I do. I even think I’m quite good at it. But I’m leaving – I’m taking my PhD and heading to pastures new in the big, scary outside world, and I’d like to share my reasons.

I have wanted to be a scientist for as long as I can remember, apart from a brief Ally McBeal-fuelled teenage flirtation with the law, and over the last few years I have had the immense privilege of making that a reality. My PhD has been a genuine blast—I get to do really cool, exciting new work and boldly see things no man has seen before. I have been tremendously fortunate to work with a universally wonderful, impressive, and supportive group of people.¹ If I could stay forever, I probably would. But I can’t, and there’s the rub.

You see—and this is where I expect to lose any sympathy you may harbour for me—I am a tremendously privileged person. I have had the excellent fortune to have found the person I believe to be the love of my life, who—very selfishly—has a career of her own. Through several strokes of luck and the generosity of others, we own a home, of which we are quite fond. One day, I should like to have a dog, and maybe even become a parent. Most academics do all of these things, but they are stronger people than I am.

The academic trajectory

I say this because of the very nature of life as an early-career academic, or postdoc, which would represent my next step were I to follow an academic path. I applied for exactly one postdoc position, in a particularly impressive group doing some really exciting work in a city of which I am very fond, and was fortunate enough to get an interview. And I found myself, quite unexpectedly, hoping that I would not be offered the job. I surprised myself; many people report their postdoc years as among the happiest of their career—spending all day, every day immersed in the research about which they are most passionate, unburdened by the other legs of the academic tripos, the much-maligned teaching and administration.² I expect this is true.

The only downside is that, in the words of George Harrison MBE, All Things Must Pass.³ All good things come to an end, and employment contracts are no exception. The typical postdoctoral position in the UK lasts for around two to three years. After that, it’s time to pack your bags and move on to the next one, probably at a different university on the other side of the country—and probably do this more than once. (In my experience it is far from uncommon to hold multiple postdoctoral positions before finding a permanent job; the statistics on this are surprisingly sparse but I’ll present the best I could find in a few paragraphs’ time.)

And that’s why I didn’t want the job—because I knew I’d take it, but the idea of everything that would entail, of commuting long distances, of regularly spending nights away from home, of potentially relocating and leaving behind everything we have built here without any guarantee that I’d even be able to stay for more than a few years, made me sad and afraid. Thankfully, I was not put in that position and the job was undoubtedly offered to someone far more qualified than myself. But I realised then that this path—the one I’d always thought I wanted to walk—was not for me.

This may be unreasonable of me. Packing up your life, dragging loved ones behind you along with all your worldly possessions, may be troublesome, but people move for their careers all the time. At least there’s a permanent job at the end of it, right? Apparently not—it turns out that only 10% of postdocs will ever find a permanent academic position.

But at least it’s great experience for a career in industry, right? The consensus seems to be that no, it is not. In the words of a Stack Exchange answer:

the rule of thumb is that as soon as you are 100% [certain] that you won’t stay in academia, every further month spent as a postdoc is inefficient in terms of career development. Yes, some companies may count your years as postdoc as some sort of relevant leadership experience, but most won’t, and even those that do will consider a similar candidate with the same number of years working in industry to be much more attractive. [Emphasis added]

But at least you’re having fun… right? Studies seem to indicate that, once again, the answer is no:

Survey data indicate that the majority of university staff find their job stressful. Levels of burnout appear higher among university staff than in general working populations and are comparable to “high-risk” groups such as healthcare workers. The proportions of both university staff and postgraduate students with a risk of having or developing a mental health problem, based on self-reported evidence, were generally higher than for other working populations. [Emphasis added]

One lecturer, Dr Alexandre Afonso (then of King’s College London, now of Leiden University), went so far as to compare the academic job market to drug gangs as they were described in Prof. Levitt and Mr Dubner’s seminal Freakonomics:⁴

what you have is an increasing number of PhD graduates arriving every year into the market hoping to secure a permanent position as a professor and enjoying freedom and – reasonably – high salaries, a bit like the rank-and-file drug dealer hoping to become a drug lord. To achieve that, they are ready to forgo the income and security that they could have in other areas of employment by accepting insecure working conditions in the hope of securing jobs that are not expanding at the same rate. [Emphasis added]

I wanted to find statistics regarding the median age of UK academics upon receiving their first permanent contract. This was the best I could do: According to table 21 of a report by the Higher Education Statistics Agency, the median age of the inflow into the population employed on teaching and research contracts (94% of which are permanent, compared to just 33% of research-only contracts, according to chart 12 of the same report’s introduction) falls between 36 and 40, although admittedly much of this may be inflow of experienced academics from outside the UK. Conservatively, this suggests that even the exceptional 10% who do find a permanent contract do not typically do so until their mid-thirties, following around a decade of precarious post-PhD employment, but I would be interested in seeing some better statistics on this.

Of course, much of this is probably field-dependent—it is quite possible that I would have it easier, in the sciences, than do my colleagues in the humanities, or vice versa. But it is clear that there is a problem. Within academia, we hear, of course, only from those who did make it—Dr Afonso’s drug lords. To them, the whole experience was worthwhile. We don’t hear from the other 90%, who tried and failed—and it is considered a failure to leave academia, despite being by far the modal outcome. To them, the whole thing probably doesn’t feel so worthwhile. I’m not dragging my life—and my fiancée’s—across the country to gamble on an outcome that exists only in the tails of the probability distribution. I have pledged to take the road more travelled by.

The problem

Losing me is no great loss to the academy, but I am not alone. If we keep shutting people out of science by making the profession inconvenient and unpleasant, rather than merely difficult, it is only a matter of time before at least one great, potentially world-changing genius takes a job in the private sector instead. I reckon they already have.⁵ Besides, every mind we lose takes with it a unique set of skills and a unique perspective, leaving us poorer, and we all miss out by making ourselves, our colleagues and our friends miserable for the first decade of our careers. Is it worth it?

While I would not dare to suggest that I know the panacæa for modern academic woes, I wish for this piece to come across not as a hollow rant but as a contribution to a constructive discussion about the lifestyles and mental health of academics everywhere. To that end, I wish to put forth my uninformed hypotheses about where the problem (such as it is) may lie.

As with most situations involving human interactions, the natural place to begin would seem to be in the language of œconomics (in which I am far from an expert). For example, the number of PhD graduates far exceeds the supply of postdoc positions, which in turn far exceeds the supply of permanent academic jobs. This is a classic imbalance of supply and demand, which is not a problem per se, and should be more obvious than it perhaps seems, but does not align with everyone’s expectations.⁶ This imbalance creates an effective oligopsony—a buyers’ market in which the universities have the power to impose conditions according to their own incentives.

What are their incentives? Employees are expensive and might not work out. No employer wants to be stuck with a permanent employee who turns out to be a bit rubbish. Short-term, grant-funded employment is a much safer bet: The work gets done without staking a penny on it, and these employees are interchangeable and easily replaced when the money runs out. This also suits the funding bodies, who similarly want to minimise their own exposure to risk of waste; and academics who already have permanent jobs, who are free to select the best candidate on a per-project basis, which is beneficial since their own careers depend on a steady stream of grants and publications.

This is but a hasty example—I am sure somebody else far more intelligent and skilful than myself could continue, refine, and expand upon this kind of reasoning, and I would be keen to see the results. The gist, however, is that nobody involved is doing anything wrong—everyone is simply behaving as a rational œconomic agent—but these factors exert a certain selection pressure on the academic community. Those willing and able to accept their ordained rôle as itinerant scientists survive; those unwilling or unable do not.

The question we must ask is: Is this really what we want to select for? Do we have reason to believe that these are the best scientists? Perhaps they are the most dedicated (for a certain definition of dedication), and this may well be correlated with excellence in other metrics, but if excellence is what we want, we’re looking at the long tails of the distribution of human traits, and the tails come apart. Selecting scientists based on dedication is like selecting basketball players based on height—you’ll probably pick better than a lottery, but you’d be better off actually watching them play basketball.

The inevitable followup is: Could science of a similar standard, or higher, be done differently? Could we make the profession more accessible to those with families, other commitments, and mental or physical health issues? I think the answer may be yes. And while it’s not my place to do so, I’d like to note that family commitments and increased risk aversion are more common in underrepresented groups; perhaps making the profession more stable and accessible will do more good for representation in the academy than any fair or outreach effort ever could.

Counterpoint

This decision has consumed my consciousness for most of the last year, at the very least, and I do not take it lightly. Some people will inevitably think I am making the wrong decision. Many who have made the same decision have come to regret it—after all, the grass is always greener. I have been wrong before (like when I thought I wanted an academic career) and I will be wrong again.

Perhaps I would enjoy an academic career more than anything else. Perhaps this system results in the best science. Perhaps this is one case in which the tails don’t come apart, and the most dedicated scientists, those most willing to give their lives to the pursuit of knowledge, really are the best. Perhaps it is naïf or egocentric for me to assume that I am entitled to an academic job. Perhaps I am being too picky and the expectation that I willingly relocate my home and family for a temporary job is a perfectly reasonable one. Perhaps I am simply lashing out, blaming others for my own failure, and in truth I am simply not good enough. Perhaps all of these things are true.

There are certainly advantages to moving around. By exposing oneself to different working environments, one is exposed to new ways of thinking, new methods and approaches to problems, and discussions with a broader range of people. There is undoubtedly a great deal of value in that, and it is very important to see the world outside one’s bubble. This is possibly the most convincing argument in favour of the current system of itinerant research, and I don’t have a good response other than that I think the same ends can be achieved in other ways—perhaps the “loaning” of permanent employees to other research groups, or simply much stronger collaborative networks. All things considered, I think years of instability only weaken the positive exchange of ideas and shut out much of the diversity of minds from which we could otherwise benefit.

It may also be beneficial to be able to select the best candidate for a given project—if the tails come apart, the best postdoc for one grant may not also be the best for the next. I touched on this when discussing incentives in the previous section, and agree that this is rational. However, I am not sure the magnitude of this benefit outweighs the costs of the current system, considering that successive projects in the same research group are unlikely to suddenly require a completely different set of skills that can’t be obtained through collaboration. As well as the personal cost to researchers, there is also a cost associated with recruiting, onboarding, and integrating a new employee, whereäs a permanent employee is more likely to be able to hit the ground running on a new project.

There may always be a need for consulting-type arrangements and short-term contracts, just as there is in industry, but I dispute that there is likely to be any significant benefit from such arrangements as the norm. Notice that few other skilled jobs rely on short-term labour to the extent that academia does, despite arguably stronger incentives towards profitability.

I also acknowledge that there are probably more people alive today in a position to practise science than at any point in history. It is no longer necessary to possess great wealth, or the patronage of someone with wealth of their own, to conduct research. Not all that long ago, I might have been expected to go and join some royal court if I wished to do my simulations (although I may have needed to invent the computer first). Perhaps I have unreasonable expectations about a system that has made so much progress already.

But none of this makes my life any easier. Maybe my priorities are all muddled up, but they remain my priorities. I want to have a positive impact on the world, but I also desire stability, financial security, and a peaceful and rewarding home and family life. I don’t think I can get that in academia, but I think I can elsewhere. Your priorities may differ, but I have little expectation that I am alone in my desire for stability, given that the bulk of my twenties is now behind me. Accepting instability as a fresh-faced young graduate with nothing to tie them down is one thing—but can we really expect people to work this way well into their thirties?⁷

I may well come to regret leaving academia, but on the balance of probabilities I think I’d regret staying more.

Final thoughts

I wish to reïterate before closing that I have no regrets about the path that has led me here. My PhD has been the privilege of my life, and, while I could perhaps have made better use of the opportunities it presented to me, I would not undo one iota of it.

I truly, deeply admire every single person who manages to make academia work for them. They have a mental fortitude and flexibility that I can only dream of emulating. If you are among them, thank you for the work to which you have given your life—you are enriching the global commons and I salute you. You are part of a chain of scientists and philosophers reaching all the way back to Thales of Miletus⁸ that I hope remains forever unbroken.

But there is a systemic problem. The academy is losing some incredible minds (of which I can assure you I am not one), and making worse the lives of those it retains—the UK’s academics have spent much of the last few years on strike over a set of disputes about pay, pensions, and working conditions. Whatever your thoughts on those disputes, one thing is inarguable: They are unhappy and the system is failing. No, not everybody needs to be a scientist, and not everybody should, but we should make sure that the best people can, whatever their background and personal circumstances.

I have said nothing in this essay that I have not seen or heard myriad times from others at all stages of their careers, and know I am not alone. The academy is a global enterprise and its problems transcend national borders every bit as much as do its successes. The mental health of our colleagues and friends is suffering, and so is our science. Academic research offers a unique opportunity to observe the true beauty of creation in all its magnificent detail, but we seem to have forgotten this. Science will always have a place in my heart, but our relationship is not a healthy one and it’s time I moved on.

For the world is hollow and I have touched the sky.

I would wholeheartedly recommend the University of York, and especially the Noy and Leake groups, to anybody looking for a position in the physics of life, and am happy to talk to anybody interested in joining them. Please don’t let this essay put you off following your dreams. ↩
Some more jaded than myself might prefer to call these heads of the academic hydra. ↩
It may be unfair to credit this quote to Harrison, as he adapted it from a poem entitled “All Things Pass” by Dr Timothy Leary, who in turn translated it from an original by Chinese philosopher Lao Tzu, author of the Tao Te Ching, but it’s a good song so I’m doing it. ↩
This is an Amazon Associates link. If you make a purchase, I will earn a small commission. This doesn’t affect my decision to discuss or recommend products, but helps to very slightly offset my hosting costs, given that I display no adverts on this website. I can assure you that this is not a profitable enterprise. ↩
Of course, the narrative of the lone genius revolutionising their field is a well worn and overly simplistic one—science is a story of gradual, incremental refinement of knowledge by a global collective of minds—but I think the point holds. ↩
How many times have the average PhD student’s grandparents been shocked to discover that one does not simply walk into a permanent academic job? Perhaps too many sitcoms have given them the wrong impression. ↩
I actually think we expect too much mobility and poorly-calibrated “dedication” of graduates too, but I’m aware that a great many graduates seem willing to move for whatever graduate scheme will take them and my differing priorities may render me an outlier here. ↩
Thales of Miletus (Greek: Θάλῆς ὁ Μιλήσιος), born c. 626–623 BCE, was an ancient Greek philospher known as the “Father of Science” for his theory that everything in existence is made of the same primary substance: water. He was, of course, incorrect, but this did not preclude him from being a major influence on later thinkers including Plato and Aristotle. ↩

Object-oriented processing of PDB files in Python

2019-06-04T20:14:00+01:00

I recently encountered the surprisingly difficult task of processing a PDB file in Python. While reading fixed-width files is relatively trivial in certain old-fashioned languages with support for complex format statements, like my beloved Fortran, Python makes it less simple. Given the myriad record types that frequently appear in such files, doing this processing on an ad-hoc basis seems like a terrible idea. Thankfully, the fields in these records lend themselves nicely to implementation as objects, so I went ahead and did just that.

I’m by no means the first person to write a package for this purpose—searching PyPI for “pdb” turns up a lot of results, even after references to the similarly named debugger are discarded. These packages are uniformly impressive and provide powerful sets of functions for fetching, modifying, and converting PDB files. I would recommend many of them to friends and family. But that’s not what I was looking for.

I wanted a simple, lightweight package that converts PDB records into Python objects and back again, something I can drop into a script when I want to perform some arbitrary calculation of my own.

For example, I wanted to find the heavy atom closest to a known point. I’m sure many of the packages on PyPI can do that, and cpptraj almost certainly can. But the first thing that came to mind was

min(atoms,
    key=lambda a: a.distance(point))

Doesn’t that look beautiful? I don’t want to install a whole package or dig through the entire AMBER manual to do something I already know how to do. Reading the data in the first place was the only hard part.

Here’s what the function to read in an ATOM record looks like:

def read_atom(line):
    """
    Reads an ATOM or HETATM from a PDB file into an Atom object
    """
    return Atom(record_type=line[:6].strip(),
                num=maybe_int(line[6:11]),
                name=line[12:16].strip(),
                alt_location=line[16:17].strip(),
                residue=Residue(name=line[17:21].strip(),
                                chain=line[21:22].strip(),
                                resid=maybe_int(line[22:26]),
                                insertion=line[26:27].strip()),
                coords=Coords(x=maybe_float(line[30:38]),
                              y=maybe_float(line[38:46]),
                              z=maybe_float(line[46:54])),
                occupancy=maybe_float(line[54:60]),
                temp_factor=maybe_float(line[60:66]),
                segment=line[72:76].strip(),
                symbol=line[76:78].strip(),
                charge=line[78:80].strip())

While I’m sure there are neater ways, that’s roughly as good as it gets. And if I want to manipulate the structure and write out the result, I can’t really discard very many of those fields. It’s clearly better to outsource the ugliness to a friendly package than to do that every time I want to read a PDB.

The result is an Atom object, which looks a bit like this:

{'record_type': 'ATOM',
 'num': 1,
 'name': "O5'",
 'alt_location': '',
 'residue': {'name': 'DG',
             'chain': 'A',
             'resid': -117,
             'insertion': ''},
 'coords': {'x': 186.697,
            'y': 135.541,
            'z': 228.518},
 'occupancy': 1.0,
 'temp_factor': 757.65,
 'segment': '',
 'symbol': 'O',
 'charge': ''}

That’s much easier to work with. You may have noticed that residues and coördinates are objects too, so these elements can be easily shared across record types and a load of powerful and expressive methods can be exposed.

The package currently (v0.1.2) supports the following record types:

ATOM
HETATM
TER
HELIX
SHEET

A number of useful operators are intuitively defined. Since PDBs are inherently ordered, > and < do what you’d expect. Equality is defined using the unique (in a well-formed PDB) identifiers, so two objects representing the same atom/terminator/structural element are equal even if their other properties are not (or undefined). The __contains__ method is defined for residues, so that if atom in residue is a valid construct. Coördinates can be added and subtracted together, and multiplied or divided by scalars.

Printing any of the objects (or otherwise casting it to a string) results in a correctly formatted PDB record like this one:

ATOM      1 O5'   DG A-117     186.697 135.541 228.518  1.00757.65           O

So print(*records, sep='\n') gives you a PDB, as long as records is a list of objects like the one the handy read_pdb() function gives you.

This isn’t a fully-fledged PDB-manipulating package. It’s not trying to be. But it has made my life a bit easier, and maybe it will help you too.

The source code is available (under an MIT license, so you can do whatever you want with it) on GitHub, and the package is in PyPI, so you can just

$ pip3 install pdb-objects

then stick

import pdb_objects

at the top of your script and go get a coffee. You deserve it.

My poster at the IoP “Physics of Microorganisms II” conference, London, 8 April 2019

2019-04-05T13:35:13+01:00

I will be attending the Institute of Physics “Physics of Microorganisms II” conference in London on Monday, with my poster entitled “Atomistic simulations unveil the influence of DNA topology on IHF–DNA interaction”.

Abstract:

IHF is a nucleoid-associated DNA-binding protein that bends DNA by up to 160 degrees and is known to be vital to the stability of bacterial biofilms. Through atomistic molecular dynamics simulations of IHF bound to supercoiled DNA minicircles and linear DNA constructs, the DNA–IHF interaction is studied in unprecedented detail. Novel observations are made of key features of this interaction, including the existence of two clearly distinct binding modes and the formation of bridges between distal DNA sites by IHF, forming closed topological domains. Unlike canonical IHF binding, this bridging appears to be nonspecific, and may be the mechanism by which IHF stabilises crossing points in the extracellular DNA matrix in biofilms. Furthermore, IHF binding both modulates and is modulated by DNA topology, leading to a complex interplay that regulates the bacterial genome and allows regulatory information to be communicated over long distances. Further simulations involving IHF, related proteins, and DNA sequences with multiple binding sites, will soon converge with parallel single-molecule experiments to shed more light on the mechanisms underlying this biologically relevant process and the interactions between nucleoid-associated proteins and DNA.

Watson G D, Leake M C, Noy A

If you’re attending too, find me at board P23.

If you’re not, you don’t need to miss out: A PDF version of my poster is available online.