<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://georgewatson.me/feed/science.xml" rel="self" type="application/atom+xml" /><link href="https://georgewatson.me/" rel="alternate" type="text/html" /><updated>2026-03-04T09:31:55+00:00</updated><id>https://georgewatson.me/feed/science.xml</id><title type="html">George Watson-Hyde | Science</title><entry><title type="html">My new paper: How combining simulation and experiment sheds light on DNA–protein interactions</title><link href="https://georgewatson.me/blog/science/2021/08/10/my-new-paper-how-combining-simulations-and-experiments-sheds-light-on-dna-protein-interactions/" rel="alternate" type="text/html" title="My new paper: How combining simulation and experiment sheds light on DNA–protein interactions" /><published>2021-08-10T21:04:00+00:00</published><updated>2021-08-10T21:04:00+00:00</updated><id>https://georgewatson.me/blog/science/2021/08/10/my-new-paper-how-combining-simulations-and-experiments-sheds-light-on-dna-protein-interactions</id><content type="html" xml:base="https://georgewatson.me/blog/science/2021/08/10/my-new-paper-how-combining-simulations-and-experiments-sheds-light-on-dna-protein-interactions/"><![CDATA[<p>The first paper from my <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr>
has finally been published!
Using an exciting combination of
advanced simulations
and microscopy,
the paper reveals
the multiple ways
in which a protein
found in most bacteria
bends <abbr title="deoxyribonucleic acid">DNA</abbr>
and demonstrates
that the protein can
hold together two separate <abbr title="deoxyribonucleic acid">DNA</abbr> helices.
This has some important consequences
for our understanding of
<abbr title="deoxyribonucleic acid">DNA</abbr> organisation in bacteria
and the stability of
infectious bacterial colonies,
and the tightly coupled combination
of experiment and simulation
presents a promising foundation
for future studies
into other important biological systems.
Unfortunately,
a scientific paper is
by its very nature
a relatively dry, technical document,
but the fruits of science belong to us all
and I think it’s important
that we share our research
outside the academic bubble.
With that in mind,
please do
sit tight while I
walk you through our work
in terms I hope
a scientifically curious layperson
can understand.</p>

<!--more-->

<hr />

<p>For those who want to skip straight to the paper,
it’s published
(open-access)
in <em>Nucleic Acids Research</em>—the full citation
is</p>

<blockquote>
  <p>Yoshua S B, Watson G D, Howard J A L, Velasco-Berrelleza V, Leake M C, Noy A
2021
“Integration host factor bends and bridges <abbr title="deoxyribonucleic acid">DNA</abbr> in a multiplicity of binding
modes with varying specificity”
<em>Nucleic Acids Research</em>
gkab641
<a href="https://doi.org/10.1093/nar/gkab641">doi:10.1093/nar/gkab641</a></p>
</blockquote>

<p>My role in the project
revolved around the simulations—I
set them up,
ran them,
fixed them when they went wrong,
and analysed the results.
Dr Sam Yoshua
did the same for the experimental portion of the project,
and we’re considered joint first authors,
however little sense that phrase may make.
My portion of the work
is discussed in more detail,
with more in-depth explanation of the methods,
in <a href="https://etheses.whiterose.ac.uk/28874/">my <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr> thesis</a>,
and Sam’s in <a href="https://etheses.whiterose.ac.uk/27489/">part of his</a>.
Meanwhile,
<a href="https://agnesnoylab.wordpress.com/">Dr Agnes Noy</a>
and <a href="http://single-molecule-biophysics.org/">Prof. Mark Leake</a>
designed much of the project,
secured funding,
and provided invaluable support and expertise throughout.</p>

<p>A final note
before we begin:
While Sam and I created
the vast majority of
the images in the paper,
the copyright is now held by
Oxford University Press,
publishers of <em>Nucleic Acids Research</em>,
who have very kindly granted
permission to use them
under a
<a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution
(CC-BY)</a>
licence.
It is under those terms
that I share my own work here.<sup id="fnref:copyright"><a href="#fn:copyright" class="footnote" rel="footnote" role="doc-noteref">1</a></sup></p>

<p>If you feel like you already have a good understanding of
<abbr title="deoxyribonucleic acid">DNA</abbr>,
proteins,
molecular dynamics,
and atomic force microscopy,
feel free to
<a href="#what-we-actually-did-observing-the-binding-modes">skip right to the results</a>.
Otherwise,
I’ll do my best to give you the background knowledge you need.</p>

<p>Right, onto the good stuff!</p>

<h2 id="background">Background</h2>

<p>I’m sure you already have
at least a vague grasp of what <abbr title="deoxyribonucleic acid">DNA</abbr> is.
It is perhaps the single most important and interesting molecule
in the universe,
for it contains all of the information that
makes (most known) living things what they are.
Composed of two
very long
strands
held together by paired “bases”
containing the genetic code,
and neatly packaged into
the iconic
<a href="https://en.wikipedia.org/wiki/Nucleic_acid_double_helix">double helix</a>,
the storage and maintenance of <abbr title="deoxyribonucleic acid">DNA</abbr>
is a perpetual problem for all life.</p>

<p>The problem is,
one must store a <em>lot</em> of information
to make a functioning organism—the human genome
consists of around 6.4 billion base pairs
with a total length of around two metres.
And there is a separate copy of that genome
in every one of your cells,<sup id="fnref:cells"><a href="#fn:cells" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>
each of which has a diameter of
less than 100 micrometres.
These numbers are
not easy to comprehend,
but the point is
that genome packaging
is a very important
and difficult
problem.
How do you fit that much <abbr title="deoxyribonucleic acid">DNA</abbr> into a cell
while still ensuring you can access
the bits you need
when you need them?</p>

<p>Eukaryotes—complex organisms like plants and animals—have
converged on a complicated solution to this problem,
involving the wrapping of <abbr title="deoxyribonucleic acid">DNA</abbr>
around special proteins called
<a href="https://en.wikipedia.org/wiki/Histone">histones</a>.
Bacteria
have their own solution,
involving a range of “histone-like” proteins
collectively called
“<a href="https://en.wikipedia.org/wiki/Nucleoid#Nucleoid-associated_proteins_(NAPs)">nucleoid-associated proteins</a>”
(NAPs),
which bind to <abbr title="deoxyribonucleic acid">DNA</abbr>
and wrap it around themselves.</p>

<p>Of course,
bending <abbr title="deoxyribonucleic acid">DNA</abbr> like this
affects how it behaves.
Some <abbr title="deoxyribonucleic acid">DNA</abbr> is easy to access,
allowing it to be read,
processed,
and used to make proteins,
while other <abbr title="deoxyribonucleic acid">DNA</abbr> is inaccessible
to the cellular machinery.
Understanding how <abbr title="deoxyribonucleic acid">DNA</abbr> is packaged,
and the factors affecting this,
means being able to
predict and control
which genes will be expressed
and when.
To understand this better,
we studied
one of these proteins,
called integration host factor (<abbr title="integration host factor">IHF</abbr>),
which
is common across a wide range of bacteria and
has a lot of features
that make it a good example of
a general <abbr title="nucleoid-associated protein">NAP</abbr>.
<abbr title="integration host factor">IHF</abbr> creates sharp U-turns in <abbr title="deoxyribonucleic acid">DNA</abbr>,
bending it back on itself
by about 160°,
but
(spoiler alert)
I’ll soon reveal that there’s more to the story than that.</p>

<p>Another interesting thing about <abbr title="integration host factor">IHF</abbr>
is its presence in
some bacterial <a href="https://en.wikipedia.org/wiki/Biofilm">biofilms</a>—sticky
colonies of bacteria
surrounded by a protective matrix
that makes them particularly pernicious pathogens,
causes a lot of problems when they get into your body,
and enhances their resistance to antibiotics.
In some of these biofilms,
large amounts of <abbr title="deoxyribonucleic acid">DNA</abbr> and proteins
are pumped out of the cells,
forming a three-dimensional scaffold of <abbr title="deoxyribonucleic acid">DNA</abbr> strands
that supports the structure.
<abbr title="integration host factor">IHF</abbr> is often found at
the points where these strands cross,
and <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0067629">removing <abbr title="integration host factor">IHF</abbr> causes the biofilms to
collapse</a>.
How does <abbr title="integration host factor">IHF</abbr> perform this important role in stabilising biofilms?</p>

<p>To answer these questions,
I worked closely with Sam
to develop a set of simulations and experiments
that work in parallel to complement one another.
My part of this
was a simulation technique called molecular dynamics,
which uses supercomputers to
simulate the movements of each individual atom
making up a biological system,
while Sam used a technique called
atomic force microscopy (<abbr title="atomic force microscopy">AFM</abbr>),
which uses a tiny needle
to trace the surface of an object
and construct a detailed heightmap.
If you just want the biology,
feel free to skip the next couple of sections,
but I think it’s valuable to
understand the way in which these results came about,
so I’ll try to give you a brief overview of
(my understanding of)
these techniques next.</p>

<h2 id="how-molecular-dynamics-works">How molecular dynamics works</h2>

<p><a href="https://en.wikipedia.org/wiki/Molecular_dynamics">Molecular dynamics</a>
(<abbr title="molecular dynamics">MD</abbr>)
simulations
are a favourite technique of mine.
I provide a detailed technical overview
in chapter 2 of
<a href="https://etheses.whiterose.ac.uk/28874/">my thesis</a>,
but I’ll give a briefer overview
in terms normal humans can understand
here.</p>

<p><abbr title="molecular dynamics">MD</abbr> is a way of modelling
the movements of atoms and molecules
by simulating their movement
over a series of very short time steps.
After providing an initial structure,
the basic steps are:</p>

<ol>
  <li>Calculate the force between every pair of atoms,
and how much they should be accelerating.</li>
  <li>Move the atoms
by the amount they should move
in our small time step.</li>
  <li>Make sure everything stays in the box.</li>
  <li>Repeat lots and lots<sup id="fnref:lots"><a href="#fn:lots" class="footnote" rel="footnote" role="doc-noteref">3</a></sup> of times.</li>
</ol>

<p>That makes it sound easy,
but each of these steps hides a lot of complexity.
Lots of things can affect the force between two atoms,
but the most important are:</p>

<ul>
  <li>The distance between them—too
close and they’ll repel each other strongly;
too far apart and they’ll be gently attracted to each other</li>
  <li>Their charge—like charges repel each other,
while <a href="https://www.youtube.com/watch?v=xweiQukBM_k">opposites attract</a></li>
  <li>Covalent bonds—these are what hold molecules together,
and come with complicated restrictions
on lengths and angles</li>
</ul>

<p>There are a huge range of force fields
that model these interactions,<sup id="fnref:forcefield"><a href="#fn:forcefield" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>
and people keep tweaking them to make the results even more accurate.
There are a lot of other forces
that you might be surprised to see excluded,
like gravity;
in reality,
these are simply far too weak
at these scales
to make any difference to the system,
and it would be a waste of precious computation time
to calculate them.</p>

<p>The other big problem is that,
in the words of
father of science
<a href="https://en.wikipedia.org/wiki/Thales_of_Miletus">Thales of Miletus</a>,<sup id="fnref:thales"><a href="#fn:thales" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>
“all things possess a moist nature”.
That is,
water is everywhere.
<abbr title="deoxyribonucleic acid">DNA</abbr> and proteins are not,
much to the dismay of my fellow physicists,
spherical objects
floating in a vacuum;
they are surrounded by
salty water,
and this is very annoying
because of just how bloody <em>much</em> of it
there is.
A cube
just big enough to contain
just 10 base pairs of <abbr title="deoxyribonucleic acid">DNA</abbr>
would be 85% water,
and this percentage
only increases
as the <abbr title="deoxyribonucleic acid">DNA</abbr> gets longer.<sup id="fnref:cube"><a href="#fn:cube" class="footnote" rel="footnote" role="doc-noteref">6</a></sup></p>

<p>All this water
has a big effect on the behaviour of the system,
but I’m not interested in the water itself
and really don’t want to simulate it
if I can help it.
It’s possible to
approximate the effect of the water
without considering all the individual water molecules
using an “implicit solvent” model,
allowing the simulation of bigger systems,
and that’s what I did for my large-scale simulations,
but some of the advanced techniques I’ll talk about later
don’t work if you do that.
You probably don’t need to know any of this,
but I’ve written it now
so there you go.</p>

<p>Anyway,
even without all the water,
interesting systems have a <em>lot</em> of atoms
and we need to consider the interactions between
every pair.<sup id="fnref:pairs"><a href="#fn:pairs" class="footnote" rel="footnote" role="doc-noteref">7</a></sup>
That’s a lot of lots.
Simulations the size of mine
have to be run on a supercomputer,
normally
using software designed to
misuse graphics processing units (GPUs).
Even then,
it normally takes about a week
for a simulation to finish.</p>

<h2 id="the-experiments-atomic-force-microscopy">The experiments: Atomic force microscopy</h2>

<p>I’m afraid
we’ve reached the bit of this post
I don’t know very much about,
but it would be remiss of me
to ignore completely
the experiments that made this work possible.
This wasn’t simply a case of
experiments confirming simulations,
or <em>vice versa</em>,
but of the two working in parallel
to fill in each other’s gaps
and reveal things
that would be inaccessible
to any one technique on its own,
so it’s important that
I try to do justice to
the work Sam did.</p>

<p>The technology is called
<a href="https://en.wikipedia.org/wiki/Atomic_force_microscopy">atomic force microscopy</a>
(<abbr title="atomic force microscopy">AFM</abbr>),
but if you’re thinking of
a microscope like the ones you probably used at school
you’re thinking of the wrong thing.
Rather than using lenses
and mirrors
to make things look bigger,
<abbr title="atomic force microscopy">AFM</abbr> works by
running a very sharp tip
over a surface
to measure its height at every point;
the elderly among you
may find it helpful to imagine
a record needle
running over the tiny bumps that correspond to
your favourite song.</p>

<p>It makes no sense to me,
and I really can’t get my head around it,
but somehow
this manages to be
up to a thousand times more precise than
what is physically possible
with a light-based microscope,
allowing incredibly high-resolution imaging.
The result is a two dimensional image
where light and dark spots
represent hight above the surface.
By sticking a piece of <abbr title="deoxyribonucleic acid">DNA</abbr>
to a flat surface,
it’s possible to obtain
a high-resolution image of
the overall shape of the <abbr title="deoxyribonucleic acid">DNA</abbr>,
allowing it to be measured very accurately—but you can
only see it from above,
like a map before Street View,
and even this most impressive of techniques
can’t resolve individual atoms.</p>

<p>But remember how my simulations
include all the atoms?
By cross-referencing between the two techniques,
it’s possible to demonstrate that
my simulations accurately reflect reality—because
the overall structure looks like the <abbr title="atomic force microscopy">AFM</abbr> images—and
use the simulation data
to fill in the information the experiments can’t see.</p>

<p>Clever, isn’t it?</p>

<h2 id="what-we-actually-did-observing-the-binding-modes">What we actually did: Observing the binding modes</h2>

<p>This paper describes a set of simulations and <abbr title="atomic force microscopy">AFM</abbr> experiments
we performed to investigate <abbr title="deoxyribonucleic acid">DNA</abbr>–<abbr title="integration host factor">IHF</abbr> binding.</p>

<p>First, we chose a <abbr title="deoxyribonucleic acid">DNA</abbr> sequence
of around 300 base pairs—big enough to see using <abbr title="atomic force microscopy">AFM</abbr>
but just about small enough to simulate in a vaguely acceptable
amount of time—containing a binding site for <abbr title="integration host factor">IHF</abbr>.
Then,
Sam mixed them together in a test tube or something
while I built a sensible initial structure
using
<a href="https://www.rcsb.org/structure/5J0N">an experimentally-determined structure</a>
of <abbr title="integration host factor">IHF</abbr> bound to <abbr title="deoxyribonucleic acid">DNA</abbr>
as a starting point.
A close-up of that starting structure
looks like this:
<img src="/assets/post_images/ihf/initial_structure.png" alt="A protein
with a body made of alpha helices
and two arms made of beta sheets
that are bound to a straight piece of
DNA" class="align-center" /></p>

<p>The pink and sort-of-cyan bits
are the protein,
with a compact body
and two arms
that give the <abbr title="deoxyribonucleic acid">DNA</abbr>
(the black thing,
with some special sequences
highlighted in red and blue)
a nice hug.
The <abbr title="deoxyribonucleic acid">DNA</abbr> here starts off perfectly straight—it
wouldn’t be like that in real life,
but it works fine as a starting point
for a simulation.
The body has lots of positive charges on its sides,
which are what all those arrows point to,
and the negative charge of <abbr title="deoxyribonucleic acid">DNA</abbr>
means it will naturally
be attracted to those positive charges
(remember, opposites attract)—that’s
how <abbr title="integration host factor">IHF</abbr> bends <abbr title="deoxyribonucleic acid">DNA</abbr> so sharply.</p>

<p>And here’s how it looks at the end of a simulation:
<img src="/assets/post_images/ihf/final_structure.png" alt="The same protein
and DNA
as before,
but the DNA is now tightly bent
around the protein
by around 160 degrees,
forming a sharp U-turn,
and in contact with the sides of the protein
all the way down." class="align-center" /></p>

<p>This is exactly what we expected to see—the
sharp U-turn I described earlier,
with the <abbr title="deoxyribonucleic acid">DNA</abbr> fully bound
to both sides of the protein.
This looks a lot like
<a href="https://www.rcsb.org/structure/1ihf">the structures</a>
that have already been determined
through experimental methods
like X-ray crystallography.
In fact,
we can quantify
just how similar they are
using something called
the root-mean-square deviation
(<abbr title="root-mean-square deviation">RMSD</abbr>),
effectively the average distance
of each atom
from where we’d expect it to be.
A normal value for a converged simulation
might be around 5 Å
(the Å symbol represents
ångströms—one ångström is
about the width of an atom).
You’re unlikely to get much less than that
purely because atoms are always in motion,
jiggling and jostling with thermal energy.
<img src="/assets/post_images/ihf/rmsd.png" alt="A graph of two lines
that both converge on
an RMSD of around 5 angstroms
within roughly 20 nanoseconds" class="align-center dark-invert" /></p>

<p>The two lines in this graph
represent the two different ways of modelling the water.
The important thing
is that they both converge on 5 Å,
just as we hoped,
indicating that the simulations
do accurately reproduce the known structure.
That means it’s probably safe to keep using them
to draw some new conclusions.</p>

<p>Meanwhile,
the real pieces of <abbr title="deoxyribonucleic acid">DNA</abbr>,
by now well mixed with <abbr title="integration host factor">IHF</abbr>,
were stuck to a surface
and imaged using <abbr title="atomic force microscopy">AFM</abbr>.
Just by looking at the pictures with our eyes,
we spotted three different types of structure
in both methods:
<img src="/assets/post_images/ihf/afm_vs_md.png" alt="Two sets of three images
showing the three observed states
across AFM and MD:
a state with a loose bend of less than 90 degrees,
one bent by approximately 90 degrees,
and one fully bent as in the above 
images" class="align-center" /></p>

<p>The top row,
the slightly fuzzier images,
are some of Sam’s <abbr title="atomic force microscopy">AFM</abbr> images.
The bottom row
are corresponding frames from my simulations.
We both see
what looks like
the same three states:
the fully wrapped one we already knew about
(on the right),
and two new ones
that are a bit less bent.
But confirmation bias is a problem:
We might just be seeing what we want to see.
How can we put a number on it?</p>

<h2 id="measuring-bend-angles">Measuring bend angles</h2>

<p>The first thing we did
was to measure the angle
by which the <abbr title="deoxyribonucleic acid">DNA</abbr> is bent.
To do this,
we both traced a straight line
of about the same length
each side of the protein
and measured the angle between them.
Of course,
the <abbr title="atomic force microscopy">AFM</abbr> image is basically two-dimensional,
and the <abbr title="deoxyribonucleic acid">DNA</abbr> is stuck to a surface,
so I projected my simulated structures
into a plane—a bit like
looking at their shadows—to
approximate the same effect.</p>

<p>We measured these angles for
every <abbr title="deoxyribonucleic acid">DNA</abbr> molecule visible in the <abbr title="atomic force microscopy">AFM</abbr> images
and every time step of my simulations
(I ran multiple simulations to make sure I sampled as much as possible).
The distribution of bend angles
in the <abbr title="atomic force microscopy">AFM</abbr> data
looks like this:
<img src="/assets/post_images/ihf/1l302_histogram.png" alt="A histogram
of bend angles,
with four peaks at
close to 0 degrees,
72.7 degrees,
107.5 degrees,
and 147.1 degrees" class="align-center dark-invert" /></p>

<p>If you’re not familiar with histograms,
the horizontal axis represents possible bend angles,
and a taller bar means
angles in that range
are more common.
There’s a big mass of barely-bent <abbr title="deoxyribonucleic acid">DNA</abbr>
on the left
corresponding to <abbr title="deoxyribonucleic acid">DNA</abbr> with no <abbr title="integration host factor">IHF</abbr> bound to it—that’s
going to happen in experimental data
and can be mostly ignored.
To the right of that,
however,
we see three distinct peaks.
That implies we’re seeing something real!</p>

<p>When I do the same analysis
on my simulation data,
I also get three peaks
at very similar bend angles,
but there’s a better way
to work with atomistic data:
<a href="https://en.wikipedia.org/wiki/Hierarchical_clustering">hierarchical clustering</a>.
Rather than reducing states to a single measurement,
like a bend angle,
I can directly compare how similar
two simulation frames
are
by measuring the <abbr title="root-mean-square deviation">RMSD</abbr> between them.
By doing this for all the frames
across all my simulations,
and merging the most similar pairs
until I was left with a small number of
clearly distinct states,
I was able to look
in detail
at the structures of the
three different states.
These are them:
<img src="/assets/post_images/ihf/states.png" alt="Three different states
of the IHF-DNA complex:
one in which DNA binds to part of the protein
on each side
with a 66 degree bend,
one in which it binds fully on the left
and not at all on the right
with a 115 degree bend,
and the fully wrapped state
previously observed" class="align-center" /></p>

<p>The mean bend angles of these states
are pretty close to the values measured from the <abbr title="atomic force microscopy">AFM</abbr> images!
That makes it pretty likely
we’re looking at the same thing.
These simulations give us
the structures of the three states,
and tell us
exactly how the <abbr title="deoxyribonucleic acid">DNA</abbr> and protein
interact
in each one.
As well as
the original “fully wrapped” state,
we have a “half-wrapped” state
in which the <abbr title="deoxyribonucleic acid">DNA</abbr> only binds on one side of the protein—and
it’s always the same side
(which we’ll call the left)—and
another state in which
the <abbr title="deoxyribonucleic acid">DNA</abbr> on each side
binds only to the top part of the protein;
we called this the “associated” state.</p>

<h2 id="sequence-specificity">Sequence specificity</h2>

<p>There’s something interesting
about these binding modes.
Well,
there are lots of interesting things about them,
but there’s one very interesting thing:
Why does the left arm sometimes bind
without the right arm,
but never the other way round?</p>

<p>On the face of it,
<abbr title="integration host factor">IHF</abbr> is a pretty symmetrical protein,
but this is decidedly asymmetrical behaviour.
Even more interestingly,
it has a strong preference
for a particular <abbr title="deoxyribonucleic acid">DNA</abbr> sequence,
and that sequence is located
on the <em>right</em> arm,
which means it doesn’t even interact with the protein
in the new states.</p>

<p>For that to make sense,
we’d expect to see
the associated and half-wrapped states—but
not the fully wrapped state—even
in pieces of <abbr title="deoxyribonucleic acid">DNA</abbr> without that sequence.
This is where we bring the <abbr title="atomic force microscopy">AFM</abbr> data back in.
For a different <abbr title="deoxyribonucleic acid">DNA</abbr> sequence
without an <abbr title="integration host factor">IHF</abbr> binding site,
the angle distribution looks like this:
<img src="/assets/post_images/ihf/0l361_histogram.png" alt="A histogram
of bend angles
similar to the one above
but with the fully wrapped state
no longer present" class="align-center dark-invert" /></p>

<p>The peaks
corresponding to
the associated and half-wrapped states
are there,
clear as day,
but the fully wrapped state
is missing—just
as we expected.
This lets us say
with a pretty high degree of certainty
that these states involve
nonspecific binding
with no sequence preference.
That is,
<abbr title="integration host factor">IHF</abbr> can bind to <abbr title="deoxyribonucleic acid">DNA</abbr>
even if that sequence is missing,
but the strong bend
for which it is famous
is possible
only at certain special <abbr title="deoxyribonucleic acid">DNA</abbr> sites.</p>

<h2 id="investigating-the-asymmetry-using-free-energy-calculations">Investigating the asymmetry using free-energy calculations</h2>

<p>This confirms that
our interesting asymmetry is real,
but it doesn’t tell us much about it.
Just looking at static structures
or trajectories
doesn’t give us
much insight into the binding dynamics.
To learn about that,
we need to think about free energy.</p>

<p>Free energy
is one of the most fundamental concepts
in physics
and chemistry.
The two driving forces
of the evolution of physical systems
over time
are the desire of a system
to minimise its internal energy (\(U\))
and maximise its entropy (\(S\));
the balance of these is captured by
the (Helmholtz) free energy, \(A\):</p>

\[A = U - T S,\]

<p>where \(T\) is the system’s temperature.
A transition between two states
can occur spontaneously
only if
it results in a smaller value of \(A\).</p>

<p>This is very exciting
because it tells us
which states are stable
and which are not.
A state with a large free energy
compared to those around it
won’t occur in nature—and if it does,
it won’t last very long.
Meanwhile,
states with a lower free energy than those around them
will probably stay stable for a long time.</p>

<p>What do I mean by a “state”?
We can think about all sorts of things
that have multiple states:
flexible molecules that can take on multiple shapes,
maybe,
or
pairs of molecules that might want to stick together (or might not).
In the more human-sized world,
a ball placed on the side of a steep hill is in an unstable state,
and will quickly roll down to
a much more stable position at
the bottom of the hill;
once it’s there,
it’s never going to roll itself back up to the top.
A ball perched precisely at the top of the hill
might be happy to stay there forever,
but just the tiniest nudge will send it tumbling all the way down;
a state like this is called “metastable”.</p>

<p>It’s actually very helpful
to think in terms of “landscapes”
of hills and valleys,
where altitude represents free energy.
You’ll see some lovely examples of these landscapes shortly.</p>

<p>In principle,
I could just run
a ridiculously long simulation
and keep track of
how long the system spends in each state.
Since it should spend much more time
in states with lower free energy,
it’s simple to convert between the two.
The problem with that is
that the simulation
could get stuck
in a free-energy valley—it’s
very unlikely to
climb the sides of the valley
in any reasonable amount of time.
The probability of the necessary states
is just too low.
So we’ll sample this one valley
<em>really</em> well
and never head off to explore the rest of the landscape.</p>

<p>To get around this problem,
we can force our simulation to explore
the whole landscape
by adding an artificial force
to literally drag it through all the states.
We know how it should behave
under the influence of our new force alone;
by looking at how the behaviour deviates from this,
we can get information about the underlying
free-energy landscape.
This is called the
weighted histogram analysis method,
or <a href="https://www.youtube.com/watch?v=pIgZ7gMze7A">WHAM</a>.</p>

<p>What we’re interested in
is the binding of the <abbr title="deoxyribonucleic acid">DNA</abbr>
to the protein,
so it makes sense for
our new force
to act between a pair of atoms—one
towards the bottom of the protein,
and one on the <abbr title="deoxyribonucleic acid">DNA</abbr>.
Pulling these closer together
forces the <abbr title="deoxyribonucleic acid">DNA</abbr> and protein to come into contact.
Of course,
our system has two sides,
so we have to do this on both sides.
This gives us two dimensions
to our free-energy landscape.</p>

<p>Let’s look at the left arm first—that’s
the one that binds to the protein
in both the fully and half-wrapped states
(and partially bound in the associated state too):
<img src="/assets/post_images/ihf/wham_left.png" alt="A graph showing
two lines,
both showing similar
potential wells
with a depth of around 6
angstroms" class="align-center dark-invert" /></p>

<p>The red line is what happens when we let the right arm bind too;
the blue line is what happens when we don’t.
As you can see,
they’re very similar,
which means the left arm’s behaviour
doesn’t depend on what the right arm is doing.
The other thing you might notice
is that this is quite a deep valley—a
short distance
(that is, a bound state)
is very strongly preferred.
If the protein and the left <abbr title="deoxyribonucleic acid">DNA</abbr> arm
start about 50 Å apart,
they’re going to be strongly attracted to each other
and quickly roll into the
bound state at the bottom of the valley.</p>

<p>What about the right arm?
That looks more like this:
<img src="/assets/post_images/ihf/wham_right.png" alt="A graph showing
two lines,
both showing
relatively flat potentials;
the blue line sharply increases
for values below 40 angstroms,
while the red line remains mostly flat
for all values above 30
angstroms" class="align-center dark-invert" /></p>

<p>Now, this is interesting.
The right arm doesn’t have the same kind of
deep free-energy valley
as the left one.
The right half of that graph
is mostly flat,
with the only lumps and bumps
being seemingly random fluctuations
on the order of thermal noise.
To the left of the graph,
we see the expected sharp rise
as the protein and <abbr title="deoxyribonucleic acid">DNA</abbr> get pushed
too close together—two
objects can’t occupy the same space
and will resist
very strongly
any attempts to make them.
But we see that
we reach this point
much sooner
along the blue line,
when the left arm is held away from the protein,
than along the red line,
when the left arm is free to bind.</p>

<p>Astute readers might have figured out what’s going on here.
We’ve explained why
we never see the right arm binding
without the left.
Until the left arm binds,
there’s a physical obstacle in the way.
If we overlay the structures
of the fully wrapped and associated states,
we can see what’s going on here:
<img src="/assets/post_images/ihf/overlay.png" alt="An overlay of two binding states,
showing a significant movement
of the body of the protein
between the states" class="align-center" /></p>

<p>Here,
the fully wrapped state is in blue
and the associated state in red.
In the associated state,
the protein body is
noticeably tilted;
the right arm
would have to bend significantly
to bind any farther down the protein.
In the fully wrapped state,
things have straightened up
and both arms can bind without bending.
<abbr title="deoxyribonucleic acid">DNA</abbr> is quite a rigid polymer,
so this amounts to a rule
preventing the right arm from binding first.</p>

<p>The scale of the asymmetry
might be easier to grasp
if we view the landscape in 3D,
which really emphasises
how steep the potential for the left arm is
compared to the flat right arm:
<img src="/assets/post_images/ihf/3d_landscape.png" alt="3D free energy landscape
contrasting the steepness associated with the left arm
with the relative flatness associated with the right
arm" class="align-center" /></p>

<p>We could also look at
where our three states sit
on this landscape.
It turns out—as we’d hope—that
they’re associated with valleys
and flat regions:
<img src="/assets/post_images/ihf/2d_landscape.png" alt="2D free energy landscape
showing that the three states
correspond to valleys and
plateaux" class="align-center" /></p>

<p>They’re also really nice to look at!</p>

<h2 id="dna-bridging-by-ihf"><abbr title="deoxyribonucleic acid">DNA</abbr> bridging by <abbr title="integration host factor">IHF</abbr></h2>

<p>While we were doing all this,
something else interesting
turned up in the <abbr title="atomic force microscopy">AFM</abbr> images.
You may recall that
<abbr title="integration host factor">IHF</abbr> stabilises biofilms
by holding pieces of <abbr title="deoxyribonucleic acid">DNA</abbr> together.
Well, we started getting pictures like these:
<img src="/assets/post_images/ihf/aggregates.png" alt="AFM images
showing clusters of DNA and IHF
of various sizes" class="align-center" /></p>

<p>On the left,
you can see a few small clusters of <abbr title="deoxyribonucleic acid">DNA</abbr>
with bright spots showing <abbr title="integration host factor">IHF</abbr>
holding them together.
On the right
is data for a different sequence
with
not one but
three <abbr title="integration host factor">IHF</abbr> binding sites:
a huge blob of <abbr title="deoxyribonucleic acid">DNA</abbr> and <abbr title="integration host factor">IHF</abbr>!</p>

<p>This seems to line up with
what happens in biofilms.
<abbr title="integration host factor">IHF</abbr> is clearly holding
multiple pieces of <abbr title="deoxyribonucleic acid">DNA</abbr> together.
So I ran some simulations to investigate.</p>

<p>First,
I set up this initial bridged structure
by sort of encouraging another piece of <abbr title="deoxyribonucleic acid">DNA</abbr>
to get close to the protein:
<img src="/assets/post_images/ihf/bridge.png" alt="Structure of a DNA-IHF-DNA
bridge" class="align-center" /></p>

<p>Then,
I used WHAM again
to pull the second piece of <abbr title="deoxyribonucleic acid">DNA</abbr> away.
The resulting free-energy landscape
looks like this:
<img src="/assets/post_images/ihf/wham_bridge.png" alt="A free-energy well with a depth of around 14
angstroms" class="align-center dark-invert" /></p>

<p>That’s a nice deep well again:
<abbr title="integration host factor">IHF</abbr> loves to form bridges!
about 50 Å
is going to be attracted
and eventually bind to the bottom of the protein
in a structure a lot like the one above.
Understanding this
could help us to understand how to disrupt
<abbr title="integration host factor">IHF</abbr>’s role in biofilms
and treat bacterial infections better.</p>

<h2 id="a-complete-model-of-ihf-binding-bending--bridging">A complete model of <abbr title="integration host factor">IHF</abbr> binding, bending, &amp; bridging</h2>

<p>We now have enough information
to produce a complete model
of <abbr title="integration host factor">IHF</abbr>’s interactions with <abbr title="deoxyribonucleic acid">DNA</abbr>.
Here it is:
<img src="/assets/post_images/ihf/model.png" alt="Starting from a straight and intercalated state,
IHF prefers to form a bridge
if another DNA molecule is nearby.
Otherwise,
it can enter
either the half-wrapped or the associated state,
of which the half-wrapped state is slightly preferred.
Transitions between these states are not possible,
but either may progress to the fully wrapped
state." class="align-center" /></p>

<p>So,
<abbr title="integration host factor">IHF</abbr> first binds to straight <abbr title="deoxyribonucleic acid">DNA</abbr>,
which we’ve labelled the “intercalated” state
in the diagram,
because part of the protein is
intercalated
(inserted between <abbr title="deoxyribonucleic acid">DNA</abbr> base pairs);
this isn’t really a real state,
because our energy landscapes
predict that it shouldn’t last very long.
If there’s some other <abbr title="deoxyribonucleic acid">DNA</abbr> nearby,
it will really want to form a bridge.
Otherwise,
it will progress to either
the associated or the half wrapped state;
it’s not entirely clear
how it picks,
and there’s almost certainly a random element,
but the half-wrapped state
seems to be preferred.
It doesn’t look like it should be possible
to move between these two states,
but they can both
eventually
progress further
into the fully wrapped state.</p>

<p>If you’ve made it this far,
hopefully you agree
it’s really cool
that we can figure all of this out,
especially by combining simulations and experiments
in this manner.
I think this is a really promising approach
that could be super valuable
for other studies in the future,
and I’d encourage you to
try and break down any disciplinary barriers you can
because that’s where all the most interesting stuff is hidden.</p>

<p>I’m always happy to talk more about science,
so do get in touch if you’re interested.
I’ll be
<a href="https://twitter.com/GeorgeDWatson">Tweeting</a>
about this
so you’re welcome to reply there,
and there’s a comment section below.</p>

<p>Thank you for your time.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:copyright">
      <p>Mad, innit? <a href="#fnref:copyright" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cells">
      <p>That’s not quite true—some of your cells,
such as red blood cells,
don’t contain any <abbr title="deoxyribonucleic acid">DNA</abbr>,
but that doesn’t really fix anything. <a href="#fnref:cells" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:lots">
      <p>and lots and lots… <a href="#fnref:lots" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:forcefield">
      <p>Unfortunately,
these are nowhere near as
exciting and useful
as the ones that block phaser fire,
and look
<a href="https://en.wikipedia.org/wiki/AMBER#Functional_form">more like this</a>. <a href="#fnref:forcefield" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:thales">
      <p>Thales here makes his second appearance in
<a href="https://www.georgewatson.me/blog/science/2021/07/09/why-i-m-leaving-academia/">as many posts</a>,
purely coïncidentally. <a href="#fnref:thales" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cube">
      <p>Actually,
we don’t usually use cubes,
and a whole section of my thesis
is devoted to the interesting properties of
the truncated octahedron,
but that’s not really important. <a href="#fnref:cube" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pairs">
      <p>Sometimes
it’s okay to set a cutoff distance
beyond which the interactions are basically zero
so there’s no need to calculate them,
but often not,
and we still need to consider a lot of atoms
either way. <a href="#fnref:pairs" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="science" /><category term="science" /><category term="papers" /><category term="dna" /><category term="ihf" /><category term="molecular-dynamics" /><category term="biophysics" /><category term="personal" /><summary type="html"><![CDATA[The first paper from my PhD has finally been published! Using an exciting combination of advanced simulations and microscopy, the paper reveals the multiple ways in which a protein found in most bacteria bends DNA and demonstrates that the protein can hold together two separate DNA helices. This has some important consequences for our understanding of DNA organisation in bacteria and the stability of infectious bacterial colonies, and the tightly coupled combination of experiment and simulation presents a promising foundation for future studies into other important biological systems. Unfortunately, a scientific paper is by its very nature a relatively dry, technical document, but the fruits of science belong to us all and I think it’s important that we share our research outside the academic bubble. With that in mind, please do sit tight while I walk you through our work in terms I hope a scientifically curious layperson can understand.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://georgewatson.me/assets/post_images/ihf_model.png" /><media:content medium="image" url="https://georgewatson.me/assets/post_images/ihf_model.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why I’m leaving academia</title><link href="https://georgewatson.me/blog/science/2021/07/09/why-i-m-leaving-academia/" rel="alternate" type="text/html" title="Why I’m leaving academia" /><published>2021-07-09T18:07:00+00:00</published><updated>2021-07-09T18:07:00+00:00</updated><id>https://georgewatson.me/blog/science/2021/07/09/why-i-m-leaving-academia</id><content type="html" xml:base="https://georgewatson.me/blog/science/2021/07/09/why-i-m-leaving-academia/"><![CDATA[<p>I love science.
I love the academic environment
and I love the work I do.
I even think I’m quite good at it.
But I’m leaving –
I’m taking my <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr>
and heading to pastures new
in the big, scary outside world,
and I’d like to share my reasons.</p>

<!--more-->

<hr />

<p>I have wanted to be a scientist
for as long as I can remember,
apart from a brief
<em>Ally McBeal</em>-fuelled
teenage
flirtation with
the law,
and over the last few years
I have had the immense privilege
of making that a reality.
My <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr> has been a genuine blast—I get to
do really cool, exciting new work
and boldly see things no man has seen before.
I have been tremendously fortunate
to work with
a universally wonderful,
impressive,
and supportive group of people.<sup id="fnref:york"><a href="#fn:york" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>
If I could stay forever,
I probably would.
But I can’t,
and there’s the rub.</p>

<p>You see—and this is where I expect to lose
any sympathy you may harbour for me—I
am a tremendously privileged person.
I have had the excellent fortune
to have found the person I believe to be the love of my life,
who—very selfishly—has a career of her own.
Through several strokes of luck
and the generosity of others,
we own a home,
of which we are quite fond.
One day,
I should like to have a dog,
and maybe even become a parent.
Most academics do all of these things,
but they are stronger people than I am.</p>

<h3 id="the-academic-trajectory">The academic trajectory</h3>

<p>I say this because
of the very nature of
life as an early-career academic,
or postdoc,
which would represent my next step
were I to follow an academic path.
I applied for exactly one postdoc position,
in a particularly impressive group
doing some really exciting work
in a city of which I am very fond,
and was fortunate enough to get an interview.
And I found myself,
quite unexpectedly,
hoping that I would not be offered the job.
I surprised myself;
many people report their postdoc years
as among the happiest of their career—spending
all day, every day
immersed in the research about which they are most passionate,
unburdened by the other legs of the academic tripos,
the much-maligned
teaching and administration.<sup id="fnref:tripos"><a href="#fn:tripos" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>
I expect this is true.</p>

<p>The only downside is that,
in the words of
George Harrison <abbr title="Member of the Order of the British Empire">MBE</abbr>,
<a href="https://www.youtube.com/watch?v=QWV4pFV5nX4">All Things Must Pass</a>.<sup id="fnref:harrison"><a href="#fn:harrison" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>
All good things come to an end,
and employment contracts are no exception.
The typical postdoctoral position
in the <abbr title="United Kingdom of Great Britain and Northern Ireland">UK</abbr>
lasts for around two to three years.
After that,
it’s time to pack your bags
and move on to the next one,
probably at a different university
on the other side of the country—and
probably do this more than once.
(In my experience
it is far from uncommon to
hold multiple postdoctoral positions
before finding a permanent job;
the statistics on this are surprisingly sparse
but I’ll present the best I could find in a few paragraphs’ time.)</p>

<aside>
  <blockquote class="pull-quote">
    I realised then
    that this path&nbsp;—
    the one I'd always thought I wanted to walk&nbsp;—
    was not for me.
  </blockquote>
</aside>

<p>And that’s why I didn’t want the job—because
I knew I’d take it,
but the idea of everything that would entail,
of commuting long distances,
of regularly spending nights away from home,
of potentially relocating
and leaving behind everything we have built here
without any guarantee that
I’d even be able to stay
for more than a few years,
made me
sad
and afraid.
Thankfully,
I was not put in that position
and the job was undoubtedly offered to
someone far more qualified than myself.
But I realised then
that this path—the one I’d always thought I wanted to walk—was
not for me.</p>

<p>This may be unreasonable of me.
Packing up your life,
dragging loved ones behind you
along with all your worldly possessions,
may be troublesome,
but
people move for their careers all the time.
At least there’s a permanent job at the end of it,
right?
Apparently not—it turns out that
<a href="https://www.imperial.ac.uk/news/179895/to-academic-question-postdocs/">only 10% of postdocs will ever find a permanent academic
position</a>.</p>

<p>But at least it’s great experience
for a career in industry,
right?
The consensus seems to be that no,
it is not.
In the words of
<a href="https://academia.stackexchange.com/questions/60274/is-postdoc-experience-valued-by-industry">a Stack Exchange
answer</a>:</p>

<blockquote>
  <p>the rule of thumb is that
as soon as you are 100%
[certain]
that you won’t stay in
academia,
<strong>every further month spent as a postdoc is inefficient in terms of
career development.</strong>
Yes, some companies may count your years as postdoc as
some sort of relevant leadership experience, but most won’t, and even those
that do will consider a similar candidate with the same number of years
working in industry to be much more attractive.
[Emphasis added]</p>
</blockquote>

<p>But at least you’re having fun…
right?
Studies seem to indicate that,
once again,
<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5873519/">the answer is no</a>:</p>

<blockquote>
  <p>Survey data indicate that
<strong>the majority of university staff find their job stressful.</strong>
Levels of burnout appear higher among university staff than in
general working populations and are comparable to “high-risk” groups such as
healthcare workers. The proportions of both university staff and postgraduate
students with a risk of having or developing a mental health problem, based on
self-reported evidence, were generally higher than for other working
populations.
[Emphasis added]</p>
</blockquote>

<p>One lecturer,
Dr Alexandre Afonso
(then
of King’s College London,
now of Leiden University),
went so far as to
<a href="https://blogs.lse.ac.uk/impactofsocialsciences/2013/12/11/how-academia-resembles-a-drug-gang/">compare the academic job market to drug gangs</a>
as they were described in
Prof. Levitt and Mr Dubner’s
seminal
<em><a href="https://smile.amazon.co.uk/Freakonomics-Economist-Explores-Hidden-Everything-ebook/dp/B002RPCOH8?tag=georgewatson-21">Freakonomics</a></em>:<sup id="fnref:affiliate"><a href="#fn:affiliate" class="footnote" rel="footnote" role="doc-noteref">4</a></sup></p>

<blockquote>
  <p>what you have is
<strong>an increasing number of <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr> graduates arriving every year</strong>
into the market hoping to secure a permanent position as a professor and
enjoying freedom and – reasonably – high salaries, a bit like the
rank-and-file drug dealer  hoping to become a drug lord. To achieve that, they
are ready to forgo the income and security that they could have in other areas
of employment by
<strong>accepting insecure working conditions in the hope of securing
jobs that are not expanding at the same rate.</strong>
[Emphasis added]</p>
</blockquote>

<p>I wanted to find statistics
regarding the median age of <abbr title="United Kingdom of Great Britain and Northern Ireland">UK</abbr> academics
upon receiving their first permanent contract.
This was the best I could do:
According to
table 21 of
<a href="https://www.hesa.ac.uk/data-and-analysis/publications/staff-2016-17">a report by the Higher Education Statistics Agency</a>,
the median age of
the inflow into
the population employed on teaching and research contracts
(94% of which are permanent,
compared to just 33% of research-only contracts,
according to chart 12 of
<a href="https://www.hesa.ac.uk/data-and-analysis/publications/staff-2016-17/introduction">the same report’s introduction</a>)
falls between 36 and 40,
although admittedly much of this
may be inflow of
experienced academics
from outside the <abbr title="United Kingdom of Great Britain and Northern Ireland">UK</abbr>.
Conservatively,
this suggests that
even the exceptional 10% who <em>do</em> find a permanent contract
do not typically do so
until their mid-thirties,
following around a decade of
precarious
post-<abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr>
employment,
but I would be interested in seeing some better statistics on this.</p>

<p>Of course,
much of this is probably field-dependent—it
is quite possible that
I would have it easier,
in the sciences,
than do my colleagues in the humanities,
or <em>vice versa</em>.
But it is clear that there is a problem.
Within academia,
we hear,
of course,
only from those who did make it—Dr Afonso’s drug lords.
To them,
the whole experience was worthwhile.
We don’t hear from the other 90%,
who tried and failed—and
it <em>is</em> considered a failure
to leave academia,
despite being by far the modal outcome.
To them,
the whole thing probably doesn’t feel so worthwhile.
I’m not dragging my life—and my fiancée’s—across the country
to gamble on an outcome
that exists only in the tails
of the probability distribution.
I have pledged to take the road more travelled by.</p>

<h3 id="the-problem">The problem</h3>

<p>Losing me
is no great loss to the academy,
but I am not alone.
If we keep shutting people out of science
by making the profession inconvenient
and unpleasant,
rather than merely difficult,
it is only a matter of time before
at least one great,
potentially world-changing genius
takes a job in the private sector instead.
I reckon they already have.<sup id="fnref:genius"><a href="#fn:genius" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>
Besides,
every mind we lose
takes with it
a unique set of skills
and a unique perspective,
leaving us poorer,
and
we all miss out
by making ourselves,
our colleagues
and our friends
miserable for the first decade of our careers.
Is it worth it?</p>

<p>While
I would not dare to suggest that
I know the panacæa for modern academic woes,
I wish for this piece to come across
not as a hollow rant
but as a contribution to a constructive discussion
about the lifestyles
and mental health
of academics everywhere.
To that end,
I wish to put forth
my uninformed hypotheses
about where the problem
(such as it is)
may lie.</p>

<p>As with most situations
involving human interactions,
the natural place to begin
would seem to be in the language of œconomics
(in which I am far from an expert).
For example,
the number of <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr> graduates
far exceeds the supply of postdoc positions,
which in turn far exceeds
the supply of permanent academic jobs.
This is a classic imbalance of supply and demand,
which is not a problem <em>per se</em>,
and should be more obvious
than it perhaps seems,
but does not align with everyone’s expectations.<sup id="fnref:expectations"><a href="#fn:expectations" class="footnote" rel="footnote" role="doc-noteref">6</a></sup>
This imbalance
creates an effective oligopsony—a buyers’ market
in which the universities have the power
to impose conditions according to their own incentives.</p>

<p>What are their incentives?
Employees are expensive
and might not work out.
No employer wants to be stuck with
a permanent employee
who turns out to be a bit rubbish.
Short-term,
grant-funded
employment
is a much safer bet:
The work gets done
without staking a penny on it,
and these employees are
interchangeable
and easily replaced
when the money runs out.
This also suits the funding bodies,
who similarly want to minimise
their own exposure to
risk of waste;
and academics who already have permanent jobs,
who are free to select the best candidate
on a per-project basis,
which is beneficial
since their own careers
depend on
a steady stream of
grants and publications.</p>

<aside>
  <blockquote class="pull-quote">
    Those willing and able to
    accept their ordained rôle
    as itinerant scientists
    survive;
    those unwilling or unable
    do not.
  </blockquote>
</aside>

<p>This is but a hasty example—I am sure somebody else
far more intelligent and skilful than myself
could
continue,
refine,
and expand upon this kind of reasoning,
and I would be keen to see the results.
The gist,
however,
is that
nobody involved is doing anything wrong—everyone
is simply behaving as a rational œconomic agent—but
these factors exert a certain selection pressure
on the academic community.
Those willing and able to
accept their ordained rôle
as itinerant scientists
survive;
those unwilling or unable
do not.</p>

<p>The question we must ask is:
Is this really what we want to select for?
Do we have reason to believe that these are the best scientists?
Perhaps they are the most dedicated
(for a certain definition of dedication),
and this may well be correlated with excellence in other metrics,
but if excellence is what we want,
we’re looking at the long tails of the distribution of human traits,
and <a href="https://medium.com/@gjlewis/why-the-tails-come-apart-a6817e92651c">the tails come apart</a>.
Selecting scientists based on dedication
is like selecting basketball players based on height—you’ll
probably pick better than a lottery,
but you’d be better off
actually watching them play basketball.</p>

<p>The inevitable followup is:
Could science of a similar standard,
or higher,
be done
differently?
Could we make the profession
more accessible
to those with families,
other commitments,
and mental or physical health issues?
I think the answer may be yes.
And while
it’s not my place to do so,
I’d like to note that
family commitments
and increased risk aversion
are more common
in underrepresented groups;
perhaps making the profession
more stable and accessible
will do more good
for representation in the academy
than any fair
or outreach effort
ever could.</p>

<h3 id="counterpoint">Counterpoint</h3>

<p>This decision has consumed my consciousness
for most of the last year,
at the very least,
and I do not take it lightly.
Some people will inevitably
think I am making the wrong decision.
Many who have made the same decision
<a href="http://www.marcelhaas.com/index.php/2020/12/16/i-regret-quitting-astrophysics/">have come to regret
it</a>—after all,
the grass is always greener.
I have been wrong before
(like when I thought I wanted an academic career)
and I will be wrong again.</p>

<p>Perhaps I would enjoy an academic career
more than anything else.
Perhaps this system results in the best science.
Perhaps this is one case in which the tails don’t come apart,
and the most dedicated scientists,
those most willing to give their lives to the pursuit of knowledge,
really are the best.
Perhaps it is naïf
or egocentric
for me to assume that I am <em>entitled</em> to an academic job.
Perhaps I am being too picky
and the expectation that I willingly relocate
my home and family
for a temporary job is a perfectly reasonable one.
Perhaps I am simply lashing out,
blaming others for my own failure,
and in truth I am simply not good enough.
Perhaps all of these things are true.</p>

<aside>
  <blockquote class="pull-quote">
    Years of instability
    only weaken the positive exchange of ideas
    and shut out much of the diversity of minds
    from which we could otherwise benefit.
  </blockquote>
</aside>

<p>There are certainly advantages to moving around.
By exposing oneself to different working environments,
one is exposed to new ways of thinking,
new methods and approaches to problems,
and discussions with a broader range of people.
There is undoubtedly a great deal of value in that,
and it is very important
to see the world outside one’s bubble.
This is possibly the most convincing argument
in favour of the current system
of itinerant research,
and I don’t have a good response
other than that I think the same ends
can be achieved in other ways—perhaps
the “loaning” of permanent employees
to other research groups,
or simply much stronger collaborative networks.
All things considered,
I think
years of instability
only weaken the positive exchange of ideas
and shut out much of the diversity of minds
from which we could otherwise benefit.</p>

<p>It may also be beneficial to
be able to select the best candidate
for a given project—if the tails come apart,
the best postdoc for one grant
may not also be the best for the next.
I touched on this
when discussing incentives
in the previous section,
and agree that this is rational.
However,
I am not sure the magnitude of this benefit
outweighs the costs of the current system,
considering that
successive projects in the same research group
are unlikely to suddenly require a completely different
set of skills
that can’t be obtained through collaboration.
As well as the personal cost to researchers,
there is also a cost associated with
recruiting,
onboarding,
and integrating a new employee,
whereäs a permanent employee
is more likely to
be able to
hit the ground running
on a new project.</p>

<p>There may always be a need for consulting-type arrangements
and short-term contracts,
just as there is in industry,
but I dispute that there is likely to be
any significant benefit
from such arrangements as the norm.
Notice that
few other skilled jobs
rely on short-term labour
to the extent that academia does,
despite arguably stronger
incentives towards profitability.</p>

<p>I also acknowledge that
there are probably more people alive today
in a position to practise science
than at any point in history.
It is no longer necessary
to possess great wealth,
or the patronage of someone with wealth of their own,
to conduct research.
Not all that long ago,
I might have been expected to go and join some royal court
if I wished to do my simulations
(although I may have needed to invent the computer first).
Perhaps I have unreasonable expectations
about a system that has made so much progress already.</p>

<p>But none of this makes my life any easier.
Maybe my priorities are all muddled up,
but they remain my priorities.
I want to have a positive impact on the world,
but I also desire
stability,
financial security,
and a peaceful and rewarding home and family life.
I don’t think I can get that in academia,
but I think I can elsewhere.
Your priorities may differ,
but I have little expectation
that I am alone in my desire for stability,
given that the bulk of my twenties is now behind me.
Accepting instability
as a fresh-faced young graduate
with nothing to tie them down
is one thing—but
can we really expect people
to work this way
well into their thirties?<sup id="fnref:graduates"><a href="#fn:graduates" class="footnote" rel="footnote" role="doc-noteref">7</a></sup></p>

<p>I may well come to regret leaving academia,
but
on the balance of probabilities
I think I’d regret staying more.</p>

<h3 id="final-thoughts">Final thoughts</h3>

<p>I wish to reïterate
before closing
that I have no regrets
about the path that has led me here.
My <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr> has been the privilege of my life,
and,
while I could perhaps have made better use
of the opportunities it presented to me,
I would not undo one iota of it.</p>

<p>I truly,
deeply
admire every single person
who manages to make academia work for them.
They have a mental fortitude
and flexibility
that I can only dream of emulating.
If you are among them,
thank you for the work
to which you have given your life—you
are enriching the global commons
and I salute you.
You are part of a chain
of scientists and philosophers
reaching all the way back to
Thales of Miletus<sup id="fnref:thales"><a href="#fn:thales" class="footnote" rel="footnote" role="doc-noteref">8</a></sup>
that I hope remains forever unbroken.</p>

<p>But there is a systemic problem.
The academy is losing some incredible minds
(of which I can assure you I am not one),
and making worse the lives of those it retains—the
<abbr title="United Kingdom of Great Britain and Northern Ireland">UK</abbr>’s academics have spent
<a href="https://en.wikipedia.org/wiki/2018%E2%80%932020_UK_higher_education_strikes">much of the last few years on strike</a>
over a set of disputes
about pay,
pensions,
and working conditions.
Whatever your thoughts on those disputes,
one thing is inarguable:
They are unhappy and
the system is failing.
No,
not everybody needs to be a scientist,
and not everybody should,
but we should make sure that the best people can,
whatever their background and personal circumstances.</p>

<p>I have said nothing in this essay
that I have not seen or heard myriad times
from others at all stages of their careers,
and know I am not alone.
The academy is a global enterprise
and its problems
transcend national borders
every bit as much as do its successes.
The mental health of
our colleagues and friends
is suffering,
and so is our science.
Academic research
offers a unique opportunity
to observe the true beauty of creation in all its magnificent detail,
but we seem to have forgotten this.
Science will always have a place in my heart,
but our relationship is not a healthy one
and it’s time I moved on.</p>

<p>For the world is hollow and I have touched the sky.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:york">
      <p>I would wholeheartedly recommend
   the
   <a href="https://www.york.ac.uk/">University of York</a>,
   and especially the
   <a href="https://agnesnoylab.wordpress.com/">Noy</a>
   and
   <a href="http://www.single-molecule-biophysics.org/">Leake</a>
   groups,
   to anybody
   looking for a position in the
   <a href="https://www.york.ac.uk/physics/research/physics-of-life/">physics of life</a>,
   and am happy to talk to anybody interested in joining them.
   Please don’t let this essay
   put you off
   following your dreams. <a href="#fnref:york" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:tripos">
      <p>Some
more jaded than myself
might prefer to call these
heads of the academic hydra. <a href="#fnref:tripos" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:harrison">
      <p>It may be unfair to credit this quote to Harrison,
as he adapted it from a poem entitled
“<a href="https://www.poetrynook.com/poem/all-things-pass">All Things Pass</a>”
by Dr Timothy Leary,
who in turn translated it from
an original by
Chinese philosopher Lao Tzu,
author of the
<a href="https://en.wikisource.org/wiki/Translation:Tao_Te_Ching"><em>Tao Te Ching</em></a>,
but it’s a good song
so I’m doing it. <a href="#fnref:harrison" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:affiliate">
      <p>This is an Amazon Associates link.
If you make a purchase,
I will earn a small commission.
This doesn’t affect my decision to discuss or recommend products,
but helps to very slightly offset
my hosting costs,
given that I display no adverts
on this website.
I can assure you that
this is not a profitable enterprise. <a href="#fnref:affiliate" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:genius">
      <p>Of course,
the narrative of the lone genius
revolutionising their field
is a well worn
and overly simplistic one—science
is a story
of gradual,
incremental
refinement of knowledge
by a global collective of minds—but
I think the point holds. <a href="#fnref:genius" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:expectations">
      <p>How many times have the average <abbr title="la: Philosophiae Doctor (en: Doctor of Philosophy)">PhD</abbr> student’s grandparents
been shocked to discover that
one does not simply walk into <!-- mordor -->
a permanent academic job?
Perhaps
<a href="https://en.wikipedia.org/wiki/Ross_Geller">too</a>
<a href="https://en.wikipedia.org/wiki/Ted_Mosby">many</a>
<a href="https://en.wikipedia.org/wiki/The_Big_Bang_Theory">sitcoms</a>
have given them the wrong impression. <a href="#fnref:expectations" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:graduates">
      <p>I actually think
we expect too much mobility
and poorly-calibrated “dedication”
of graduates too,
but I’m aware that
a great many graduates
seem willing to move
for whatever graduate scheme will take them
and my differing priorities
may render me an outlier here. <a href="#fnref:graduates" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:thales">
      <p><a href="https://en.wikipedia.org/wiki/Thales_of_Miletus">Thales of Miletus</a>
       (Greek: Θάλῆς ὁ Μιλήσιος),
       born <abbr title="la: circa (en: approximately)">c.</abbr> 626–623 BCE,
       was an ancient Greek philospher
       known as the “Father of Science”
       for his theory that everything in existence
       is made of the same primary substance:
       water.
       He was,
       of course,
       incorrect,
       but this did not preclude him
       from being a major influence
       on later thinkers
       including Plato and Aristotle. <a href="#fnref:thales" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="science" /><category term="science" /><category term="academia" /><category term="personal" /><category term="essays" /><category term="jobs" /><summary type="html"><![CDATA[I love science. I love the academic environment and I love the work I do. I even think I’m quite good at it. But I’m leaving – I’m taking my PhD and heading to pastures new in the big, scary outside world, and I’d like to share my reasons.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://georgewatson.me/assets/post_images/signpost.jpg" /><media:content medium="image" url="https://georgewatson.me/assets/post_images/signpost.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Object-oriented processing of PDB files in Python</title><link href="https://georgewatson.me/blog/science/2019/06/04/object-oriented-processing-of-pdb-files-in-python/" rel="alternate" type="text/html" title="Object-oriented processing of PDB files in Python" /><published>2019-06-04T19:14:00+00:00</published><updated>2019-06-04T19:14:00+00:00</updated><id>https://georgewatson.me/blog/science/2019/06/04/object-oriented-processing-of-pdb-files-in-python</id><content type="html" xml:base="https://georgewatson.me/blog/science/2019/06/04/object-oriented-processing-of-pdb-files-in-python/"><![CDATA[<p>I recently encountered the surprisingly difficult task of processing a <abbr title="Protein Data Bank">PDB</abbr> file
in Python.
While reading fixed-width files is relatively trivial in certain old-fashioned
languages with support for complex <code class="language-plaintext highlighter-rouge">format</code> statements,
like my beloved Fortran,
Python makes it
<a href="https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files">less simple</a>.
Given the <a href="http://www.wwpdb.org/documentation/file-format">myriad record types</a>
that frequently appear in such files,
doing this processing on an ad-hoc basis seems like a terrible idea.
Thankfully,
the fields in these records lend themselves nicely to implementation as objects,
<a href="https://github.com/georgewatson/pdb_objects">so I went ahead and did just that</a>.</p>

<!--more-->

<hr />

<p>I’m by no means the first person to write a package for this
purpose—<a href="https://pypi.org/search/?q=pdb">searching <abbr title="Python Package Index">PyPI</abbr> for “pdb”</a>
turns up a lot of results, even after references to the similarly named debugger
are discarded.
These packages are uniformly impressive and provide powerful sets of functions
for fetching, modifying, and converting <abbr title="Protein Data Bank">PDB</abbr> files.
I would recommend many of them to friends and family.
But that’s not what I was looking for.</p>

<p>I wanted a simple, lightweight package that converts <abbr title="Protein Data Bank">PDB</abbr> records into Python
objects and back again,
something I can drop into a script when I want to perform some arbitrary
calculation of my own.</p>

<p>For example,
I wanted to find the heavy atom closest to a known point.
I’m sure many of the packages on <abbr title="Python Package Index">PyPI</abbr> can do that,
and <a href="https://github.com/Amber-MD/cpptraj">cpptraj</a>
almost certainly can.
But the first thing that came to mind was</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">min</span><span class="p">(</span><span class="n">atoms</span><span class="p">,</span>
    <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">a</span><span class="p">:</span> <span class="n">a</span><span class="p">.</span><span class="nf">distance</span><span class="p">(</span><span class="n">point</span><span class="p">))</span>
</code></pre></div></div>

<p>Doesn’t that look beautiful?
I don’t want to install a whole package
or dig through the entire <abbr title="Assisted Model Building with Energy Refinement">AMBER</abbr> manual
to do something I already know how to do.
Reading the data in the first place was the only hard part.</p>

<p>Here’s what the function to read in an <code class="language-plaintext highlighter-rouge">ATOM</code> record looks like:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">read_atom</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">
    Reads an ATOM or HETATM from a PDB file into an Atom object
    </span><span class="sh">"""</span>
    <span class="k">return</span> <span class="nc">Atom</span><span class="p">(</span><span class="n">record_type</span><span class="o">=</span><span class="n">line</span><span class="p">[:</span><span class="mi">6</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                <span class="n">num</span><span class="o">=</span><span class="nf">maybe_int</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">6</span><span class="p">:</span><span class="mi">11</span><span class="p">]),</span>
                <span class="n">name</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">12</span><span class="p">:</span><span class="mi">16</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                <span class="n">alt_location</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="mi">17</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                <span class="n">residue</span><span class="o">=</span><span class="nc">Residue</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">17</span><span class="p">:</span><span class="mi">21</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                                <span class="n">chain</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">21</span><span class="p">:</span><span class="mi">22</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                                <span class="n">resid</span><span class="o">=</span><span class="nf">maybe_int</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">22</span><span class="p">:</span><span class="mi">26</span><span class="p">]),</span>
                                <span class="n">insertion</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">26</span><span class="p">:</span><span class="mi">27</span><span class="p">].</span><span class="nf">strip</span><span class="p">()),</span>
                <span class="n">coords</span><span class="o">=</span><span class="nc">Coords</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="nf">maybe_float</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">30</span><span class="p">:</span><span class="mi">38</span><span class="p">]),</span>
                              <span class="n">y</span><span class="o">=</span><span class="nf">maybe_float</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">38</span><span class="p">:</span><span class="mi">46</span><span class="p">]),</span>
                              <span class="n">z</span><span class="o">=</span><span class="nf">maybe_float</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">46</span><span class="p">:</span><span class="mi">54</span><span class="p">])),</span>
                <span class="n">occupancy</span><span class="o">=</span><span class="nf">maybe_float</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">54</span><span class="p">:</span><span class="mi">60</span><span class="p">]),</span>
                <span class="n">temp_factor</span><span class="o">=</span><span class="nf">maybe_float</span><span class="p">(</span><span class="n">line</span><span class="p">[</span><span class="mi">60</span><span class="p">:</span><span class="mi">66</span><span class="p">]),</span>
                <span class="n">segment</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">72</span><span class="p">:</span><span class="mi">76</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                <span class="n">symbol</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">76</span><span class="p">:</span><span class="mi">78</span><span class="p">].</span><span class="nf">strip</span><span class="p">(),</span>
                <span class="n">charge</span><span class="o">=</span><span class="n">line</span><span class="p">[</span><span class="mi">78</span><span class="p">:</span><span class="mi">80</span><span class="p">].</span><span class="nf">strip</span><span class="p">())</span>
</code></pre></div></div>

<p>While I’m sure there are neater ways,
that’s roughly as good as it gets.
And if I want to manipulate the structure and write out the result,
I can’t really discard very many of those fields.
It’s clearly better to outsource the ugliness to a friendly package
than to do that every time I want to read a <abbr title="Protein Data Bank">PDB</abbr>.</p>

<p>The result is an <code class="language-plaintext highlighter-rouge">Atom</code> object, which looks a bit like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="sh">'</span><span class="s">record_type</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">ATOM</span><span class="sh">'</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">num</span><span class="sh">'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">name</span><span class="sh">'</span><span class="p">:</span> <span class="sh">"</span><span class="s">O5</span><span class="sh">'"</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">alt_location</span><span class="sh">'</span><span class="p">:</span> <span class="sh">''</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">residue</span><span class="sh">'</span><span class="p">:</span> <span class="p">{</span><span class="sh">'</span><span class="s">name</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">DG</span><span class="sh">'</span><span class="p">,</span>
             <span class="sh">'</span><span class="s">chain</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">A</span><span class="sh">'</span><span class="p">,</span>
             <span class="sh">'</span><span class="s">resid</span><span class="sh">'</span><span class="p">:</span> <span class="o">-</span><span class="mi">117</span><span class="p">,</span>
             <span class="sh">'</span><span class="s">insertion</span><span class="sh">'</span><span class="p">:</span> <span class="sh">''</span><span class="p">},</span>
 <span class="sh">'</span><span class="s">coords</span><span class="sh">'</span><span class="p">:</span> <span class="p">{</span><span class="sh">'</span><span class="s">x</span><span class="sh">'</span><span class="p">:</span> <span class="mf">186.697</span><span class="p">,</span>
            <span class="sh">'</span><span class="s">y</span><span class="sh">'</span><span class="p">:</span> <span class="mf">135.541</span><span class="p">,</span>
            <span class="sh">'</span><span class="s">z</span><span class="sh">'</span><span class="p">:</span> <span class="mf">228.518</span><span class="p">},</span>
 <span class="sh">'</span><span class="s">occupancy</span><span class="sh">'</span><span class="p">:</span> <span class="mf">1.0</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">temp_factor</span><span class="sh">'</span><span class="p">:</span> <span class="mf">757.65</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">segment</span><span class="sh">'</span><span class="p">:</span> <span class="sh">''</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">symbol</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">O</span><span class="sh">'</span><span class="p">,</span>
 <span class="sh">'</span><span class="s">charge</span><span class="sh">'</span><span class="p">:</span> <span class="sh">''</span><span class="p">}</span>
</code></pre></div></div>

<p>That’s much easier to work with.
You may have noticed that residues and coördinates are objects too,
so these elements can be easily shared across record types
and a load of powerful and expressive methods can be exposed.</p>

<p>The package currently (v0.1.2) supports the following record types:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">ATOM</code></li>
  <li><code class="language-plaintext highlighter-rouge">HETATM</code></li>
  <li><code class="language-plaintext highlighter-rouge">TER</code></li>
  <li><code class="language-plaintext highlighter-rouge">HELIX</code></li>
  <li><code class="language-plaintext highlighter-rouge">SHEET</code></li>
</ul>

<p>A number of useful operators are intuitively defined.
Since PDBs are inherently ordered,
<code class="language-plaintext highlighter-rouge">&gt;</code> and <code class="language-plaintext highlighter-rouge">&lt;</code> do what you’d expect.
Equality is defined using the unique (in a well-formed <abbr title="Protein Data Bank">PDB</abbr>) identifiers,
so two objects representing the same atom/terminator/structural element are
equal even if their other properties are not (or undefined).
The <code class="language-plaintext highlighter-rouge">__contains__</code> method is defined for residues,
so that <code class="language-plaintext highlighter-rouge">if atom in residue</code> is a valid construct.
Coördinates can be added and subtracted together,
and multiplied or divided by scalars.</p>

<p>Printing any of the objects (or otherwise casting it to a string) results in a
correctly formatted <abbr title="Protein Data Bank">PDB</abbr> record like this one:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ATOM      1 O5'   DG A-117     186.697 135.541 228.518  1.00757.65           O
</code></pre></div></div>

<p>So <code class="language-plaintext highlighter-rouge">print(*records, sep='\n')</code> gives you a <abbr title="Protein Data Bank">PDB</abbr>,
as long as <code class="language-plaintext highlighter-rouge">records</code> is a list of objects
like the one the handy <code class="language-plaintext highlighter-rouge">read_pdb()</code> function gives you.</p>

<p>This isn’t a fully-fledged <abbr title="Protein Data Bank">PDB</abbr>-manipulating package.
It’s not trying to be.
But it has made my life a bit easier,
and maybe it will help you too.</p>

<p>The source code is available
(under an <a href="https://github.com/georgewatson/pdb_objects/blob/master/LICENSE"><abbr title="Massachusetts Institute of Technology">MIT</abbr> license</a>,
so you can do whatever you want with it)
<a href="https://github.com/georgewatson/pdb_objects">on GitHub</a>,
and the package is <a href="https://pypi.org/project/pdb-objects/">in <abbr title="Python Package Index">PyPI</abbr></a>,
so you can just</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>pip3 <span class="nb">install </span>pdb-objects
</code></pre></div></div>

<p>then stick</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">pdb_objects</span>
</code></pre></div></div>

<p>at the top of your script and go get a coffee.
You deserve it.</p>]]></content><author><name></name></author><category term="science" /><category term="science" /><category term="technology" /><category term="python" /><category term="pdb" /><category term="biophysics" /><category term="object-oriented-programming" /><category term="projects" /><category term="python-packages" /><summary type="html"><![CDATA[I recently encountered the surprisingly difficult task of processing a PDB file in Python. While reading fixed-width files is relatively trivial in certain old-fashioned languages with support for complex format statements, like my beloved Fortran, Python makes it less simple. Given the myriad record types that frequently appear in such files, doing this processing on an ad-hoc basis seems like a terrible idea. Thankfully, the fields in these records lend themselves nicely to implementation as objects, so I went ahead and did just that.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://georgewatson.me/assets/post_images/object-oriented-processing-of-pdb-files-in-python.png" /><media:content medium="image" url="https://georgewatson.me/assets/post_images/object-oriented-processing-of-pdb-files-in-python.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">My poster at the IoP “Physics of Microorganisms II” conference, London, 8 April 2019</title><link href="https://georgewatson.me/blog/science/2019/04/05/my-poster-at-the-iop-physics-of-microorganisms-ii-conference-london-8-april-2019/" rel="alternate" type="text/html" title="My poster at the IoP “Physics of Microorganisms II” conference, London, 8 April 2019" /><published>2019-04-05T12:35:13+00:00</published><updated>2019-04-05T12:35:13+00:00</updated><id>https://georgewatson.me/blog/science/2019/04/05/my-poster-at-the-iop-physics-of-microorganisms-ii-conference-london-8-april-2019</id><content type="html" xml:base="https://georgewatson.me/blog/science/2019/04/05/my-poster-at-the-iop-physics-of-microorganisms-ii-conference-london-8-april-2019/"><![CDATA[<p>I will be attending the Institute of Physics “<a href="https://www.iopconferences.org/iop/frontend/reg/thome.csp?pageID=785982&amp;eventID=1271&amp;traceRedir=4">Physics of
Microorganisms
II</a>”
conference in London on Monday, with my poster entitled “<strong>Atomistic
simulations unveil the influence of DNA topology on IHF–DNA
interaction</strong>”.</p>

<!--more-->

<hr />

<p><strong>Abstract:</strong></p>

<blockquote>
  <p>IHF is a nucleoid-associated DNA-binding protein that bends DNA by up
to 160 degrees and is known to be vital to the stability of bacterial
biofilms. Through atomistic molecular dynamics simulations of IHF
bound to supercoiled DNA minicircles and linear DNA constructs, the
DNA–IHF interaction is studied in unprecedented detail. Novel
observations are made of key features of this interaction, including
the existence of two clearly distinct binding modes and the formation
of bridges between distal DNA sites by IHF, forming closed topological
domains. Unlike canonical IHF binding, this bridging appears to be
nonspecific, and may be the mechanism by which IHF stabilises crossing
points in the extracellular DNA matrix in biofilms. Furthermore, IHF
binding both modulates and is modulated by DNA topology, leading to a
complex interplay that regulates the bacterial genome and allows
regulatory information to be communicated over long distances. Further
simulations involving IHF, related proteins, and DNA sequences with
multiple binding sites, will soon converge with parallel
single-molecule experiments to shed more light on the mechanisms
underlying this biologically relevant process and the interactions
between nucleoid-associated proteins and DNA.</p>

  <p>Watson G D, Leake M C, Noy A</p>
</blockquote>

<p>If you’re attending too, find me at board <strong>P23</strong>.</p>

<p>If you’re not, you don’t need to miss out: <a href="https://georgewatson.me/dl/2019-04-08_poster.pdf">A PDF version of my poster
is available online</a>.</p>]]></content><author><name></name></author><category term="science" /><category term="conferences" /><category term="posters" /><category term="science" /><category term="biophysics" /><category term="personal" /><category term="ihf" /><summary type="html"><![CDATA[I will be attending the Institute of Physics “Physics of Microorganisms II” conference in London on Monday, with my poster entitled “Atomistic simulations unveil the influence of DNA topology on IHF–DNA interaction”.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://georgewatson.me/assets/post_images/2019-04-08_poster_cropped.png" /><media:content medium="image" url="https://georgewatson.me/assets/post_images/2019-04-08_poster_cropped.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>