1
00:00:00,000 --> 00:00:08,000
So, for today's lecture as you can
see up there is molecular --

2
00:00:08,000 --> 00:00:16,000
evolution, and ecology.

3
00:00:16,000 --> 00:00:23,000
And what I mean by this,

4
00:00:23,000 --> 00:00:28,000
it's basically the study or what we
try to figure out in molecular

5
00:00:28,000 --> 00:00:34,000
evolution and ecology is what genes
or gene sequences can tell us about

6
00:00:34,000 --> 00:00:39,000
the evolution and ultimately also
the ecology of organisms

7
00:00:39,000 --> 00:00:44,000
in the environment.
And it's particularly relevant for

8
00:00:44,000 --> 00:00:48,000
thinking about microorganisms,
prokaryotes and the environment.

9
00:00:48,000 --> 00:00:53,000
And I hope I can actually convince
you today of that.

10
00:00:53,000 --> 00:00:57,000
This is interesting.
The topics that I want to cover

11
00:00:57,000 --> 00:01:02,000
today is, first of all,
I want to review a little bit what

12
00:01:02,000 --> 00:01:06,000
we know about life on Earth,
sort of give an overview of the

13
00:01:06,000 --> 00:01:11,000
evolution of life on Earth.
Then, I want to go into specific

14
00:01:11,000 --> 00:01:15,000
topic that's of particular relevance
for the evolution of eukaryotes.

15
00:01:15,000 --> 00:01:19,000
That's the endosymbiosis theory.
And then I'll explain how we can

16
00:01:19,000 --> 00:01:23,000
use gene sequences to actually
reconstruct events that have

17
00:01:23,000 --> 00:01:27,000
happened a very, very
long time ago.

18
00:01:27,000 --> 00:01:31,000
OK, so we'll look at what we call
molecular phylogenies,

19
00:01:31,000 --> 00:01:35,000
with the use of gene sequences to
reconstruct the evolutionary history

20
00:01:35,000 --> 00:01:40,000
of organisms on Earth.
Derived from that, we'll look at

21
00:01:40,000 --> 00:01:44,000
what we call the tree of life.
That's sort of the big picture

22
00:01:44,000 --> 00:01:49,000
overview of the evolutionary
relationships of all organisms on

23
00:01:49,000 --> 00:01:53,000
the planet.  And then finally,
I'll introduce you to a topic called

24
00:01:53,000 --> 00:01:59,000
molecular ecology.
Again, that's how we can use gene

25
00:01:59,000 --> 00:02:05,000
sequences to learn something about
the diversity of microorganisms in

26
00:02:05,000 --> 00:02:11,000
the environment that lead us then,
next time, when I come back on

27
00:02:11,000 --> 00:02:18,000
Monday, into this big topic of
environmental genomics,

28
00:02:18,000 --> 00:02:24,000
how we can actually expand this
analysis to learn much more about

29
00:02:24,000 --> 00:02:30,000
organisms in the environment.
So, first of all, let's look at

30
00:02:30,000 --> 00:02:38,000
life on Earth.
Does anybody know how old we think

31
00:02:38,000 --> 00:02:48,000
Earth is?  Say again?
Yeah, 4.5 to 4.6, I haven't my

32
00:02:48,000 --> 00:02:58,000
notes 4.6.  So,
Earth's thought to have originated

33
00:02:58,000 --> 00:03:08,000
about 4.6 billion years ago.
When did the first solid rocks

34
00:03:08,000 --> 00:03:19,000
appear on earth?
So, when was the surface kind of

35
00:03:19,000 --> 00:03:30,000
solidified?  Anybody know?
About 3.9 billion years ago, OK?

36
00:03:30,000 --> 00:03:40,000
And when do we think life started to
develop on the planet?

37
00:03:40,000 --> 00:03:50,000
Any ideas?  Take a guess.
Two?  One?  3.5 billion years ago,

38
00:03:50,000 --> 00:04:00,000
OK?  So, this is really
remarkable.

39
00:04:00,000 --> 00:04:04,000
We think it didn't,
I mean, of course it took a long

40
00:04:04,000 --> 00:04:09,000
time because were talking about
millions of years and hundreds of

41
00:04:09,000 --> 00:04:13,000
millions of years,
but still, if you look at the big

42
00:04:13,000 --> 00:04:18,000
picture, it didn't actually take
life that long to evolve on the

43
00:04:18,000 --> 00:04:23,000
planet.  So, why do we think that is
the case?  What's the evidence for

44
00:04:23,000 --> 00:04:27,000
that?  Well, we look into
sedimentary rocks,

45
00:04:27,000 --> 00:04:32,000
so old rocks that arose from
sediments, what you find around this

46
00:04:32,000 --> 00:04:37,000
time, you find that chemicals start
to appear, organic molecules that

47
00:04:37,000 --> 00:04:42,000
really resemble organic molecules
in modern life.

48
00:04:42,000 --> 00:04:47,000
So, we have sort of chemical tracers,
or chemical fossils.

49
00:04:47,000 --> 00:05:01,000
So, tracers that indicate the

50
00:05:01,000 --> 00:05:09,000
presence of organisms.
But what we also find is so-called

51
00:05:09,000 --> 00:05:17,000
micro-fossils,
and I have a picture of that here

52
00:05:17,000 --> 00:05:25,000
where when you actually take rocks
and actually slice them into very,

53
00:05:25,000 --> 00:05:33,000
very then slices, you can put them
under specific microscopes.

54
00:05:33,000 --> 00:05:37,000
And what you then find is that many
rocks that are very,

55
00:05:37,000 --> 00:05:42,000
very old, have those kinds of
inclusions in them.

56
00:05:42,000 --> 00:05:47,000
And these things really resemble
very much modern prokaryotic cells,

57
00:05:47,000 --> 00:05:52,000
modern bacterial cells, for example.
And so, those micro-fossils are

58
00:05:52,000 --> 00:05:57,000
generally taken as an indication,
also, that life is already present

59
00:05:57,000 --> 00:06:02,000
during those times.
Now, when we take a quick sort of

60
00:06:02,000 --> 00:06:08,000
overlook of the evolution of life on
the planet, again this graph here

61
00:06:08,000 --> 00:06:13,000
summarizes sort of the last 4.
billion years or so when life

62
00:06:13,000 --> 00:06:19,000
originated.  We see that there was a
period of chemical evolution,

63
00:06:19,000 --> 00:06:24,000
and then somewhere here that region,
it's, of course, not really well

64
00:06:24,000 --> 00:06:30,000
understood when that exactly happens,
the origin of life is placed.

65
00:06:30,000 --> 00:06:34,000
But I want to alert you to a couple
of really, really critical steps

66
00:06:34,000 --> 00:06:39,000
here that are shown on this graph
which we'll actually talk more about.

67
00:06:39,000 --> 00:06:44,000
It is thought that life very early
on is split into three major

68
00:06:44,000 --> 00:06:49,000
lineages: the bacteria,
the archaea, in what is called here

69
00:06:49,000 --> 00:06:54,000
nuclear line.  And I'll come back to
that in a minute or so.

70
00:06:54,000 --> 00:06:59,000
Then, a further major event which
you may remember is oxygenic

71
00:06:59,000 --> 00:07:04,000
photosynthesis actually evolved --
-- which means that cyanobacteria

72
00:07:04,000 --> 00:07:08,000
evolved that started to produce
oxygen as a byproduct of

73
00:07:08,000 --> 00:07:12,000
photosynthesis.
And that really fundamentally

74
00:07:12,000 --> 00:07:16,000
changed the chemistry of the Earth.
It actually became an oxidizing

75
00:07:16,000 --> 00:07:20,000
atmosphere.  And what you see here
is, once the oxygen concentration

76
00:07:20,000 --> 00:07:24,000
goes over a certain level,
it allowed the development of an

77
00:07:24,000 --> 00:07:28,000
ozone shield.  Now, what
does that mean?

78
00:07:28,000 --> 00:07:33,000
What was the critical significance
of the presence of an ozone shield?

79
00:07:33,000 --> 00:07:38,000
Does anybody know?  What does it
block out?  Anybody remember that?

80
00:07:38,000 --> 00:07:43,000
What's the big significance of the
ozone hole over Antarctica for

81
00:07:43,000 --> 00:07:48,000
example?  It allows UV radiation to
heat the Earth's surface,

82
00:07:48,000 --> 00:07:53,000
and in fact if there were no ozone,
the UV radiation would be so strong

83
00:07:53,000 --> 00:07:59,000
that there would be no life
possible on land.

84
00:07:59,000 --> 00:08:03,000
So, once the ozone shield actually
developed, organisms could conquer,

85
00:08:03,000 --> 00:08:08,000
basically, the land's surface and
settle on the land surface.

86
00:08:08,000 --> 00:08:13,000
In this, then, is thought to be at
least correlated with the

87
00:08:13,000 --> 00:08:18,000
development of endosymbiosis.
And I'll explain what I mean by

88
00:08:18,000 --> 00:08:22,000
that.  But it basically led to the
origin of modern eukaryotes,

89
00:08:22,000 --> 00:08:27,000
so your ancestors essentially.
But there was still a long time,

90
00:08:27,000 --> 00:08:33,000
obviously, until humans appeared.
We have here the origin of animals

91
00:08:33,000 --> 00:08:39,000
and metazoans,
and then the age of the dinosaurs is

92
00:08:39,000 --> 00:08:45,000
already a very small blip here on
this graph.  And humans don't even

93
00:08:45,000 --> 00:08:51,000
get featured on that because we are
so recent.  So,

94
00:08:51,000 --> 00:08:57,000
but what I want to show you here is
that three major lineages

95
00:08:57,000 --> 00:09:05,000
evolved early on.
These are the bacteria,

96
00:09:05,000 --> 00:09:15,000
archaea, and what we call a nuclear
lineage.  And the significance of

97
00:09:15,000 --> 00:09:25,000
those nuclear lineages is that it
basically combined with bacteria to

98
00:09:25,000 --> 00:09:35,000
form the modern eukaryotic cell.
So, the eukarya, or eukaryotes

99
00:09:35,000 --> 00:09:50,000
they're also called.
And it was this combination that we

100
00:09:50,000 --> 00:10:02,000
called the endosymbiosis event.
I want to explain this a little bit

101
00:10:02,000 --> 00:10:07,000
more, and then I'll show you finally
why we actually know that those

102
00:10:07,000 --> 00:10:12,000
things are very likely to have
occurred a long time ago.

103
00:10:12,000 --> 00:10:17,000
Yes?  It means the bacteria and the
nuclear lineages combine to form a

104
00:10:17,000 --> 00:10:22,000
eukaryote, OK?
And I'm actually going to explain

105
00:10:22,000 --> 00:10:27,000
this on the slide here.
So, if you have any more questions

106
00:10:27,000 --> 00:10:32,000
after that, please let me know.
So, again, this shows you this early

107
00:10:32,000 --> 00:10:38,000
evolution, this early split in two
archaea, bacteria,

108
00:10:38,000 --> 00:10:44,000
and this sort of nuclear line.
It is thought that this nuclear

109
00:10:44,000 --> 00:10:50,000
line, this was single celled
organisms that increased in cell

110
00:10:50,000 --> 00:10:56,000
size, and then developed or
partitioned the DNA into a nucleus,

111
00:10:56,000 --> 00:11:02,000
basically.  So exactly how you find
it in modern eukaryotic cells.

112
00:11:02,000 --> 00:11:07,000
But then what happened is the cell
took up a bacterial cell,

113
00:11:07,000 --> 00:11:12,000
and over time this bacterial cell
became symbiont.

114
00:11:12,000 --> 00:11:17,000
In fact it became the mitochondria.
And so what this mitochondria now

115
00:11:17,000 --> 00:11:22,000
does in the moderate eukaryotic cell
as you all know is it really took

116
00:11:22,000 --> 00:11:27,000
over the energy metabolism.
So, the proto-eukaryotic cell took

117
00:11:27,000 --> 00:11:33,000
up a heterotrophic bacteria that
form the mitochondria.

118
00:11:33,000 --> 00:11:37,000
And this ultimately then gave rise
to protozoa and to modern-day

119
00:11:37,000 --> 00:11:42,000
animals.  But there was a secondary
symbiotic event.

120
00:11:42,000 --> 00:11:46,000
This cell, once it had taken up a
heterotrophic bacterium,

121
00:11:46,000 --> 00:11:51,000
it took up an autotrophic bacterium,
a cyanobacterium, an oxygenic

122
00:11:51,000 --> 00:11:55,000
photosynthesizer.
And this actually that led to the

123
00:11:55,000 --> 00:12:00,000
development of modern algae
and modern plants.

124
00:12:00,000 --> 00:12:08,000
So what we can say is that
mitochondria our ancient

125
00:12:08,000 --> 00:12:24,000
heterotrophic bacteria --

126
00:12:24,000 --> 00:12:36,000
And the chloroplasts are ancient
cyanobacteria,

127
00:12:36,000 --> 00:12:48,000
so, oxygenic, photosynthetic
bacteria.  And these obviously have

128
00:12:48,000 --> 00:13:00,000
coevolved to then form animals and
finally your plants.

129
00:13:00,000 --> 00:13:06,000
So now, obviously we are talking
here about events that happened a

130
00:13:06,000 --> 00:13:13,000
very, very long time ago.
And so, the big question is really

131
00:13:13,000 --> 00:13:19,000
how do we really know this?
But this takes me to the third

132
00:13:19,000 --> 00:13:26,000
topic, which is that of molecular
evolution.  So, we can state

133
00:13:26,000 --> 00:13:34,000
the problem again,
And that is very simply put,

134
00:13:34,000 --> 00:13:42,000
evolution is incredibly slow,
OK?  And therefore, its processes

135
00:13:42,000 --> 00:14:01,000
are not directly observable.

136
00:14:01,000 --> 00:14:05,000
And we need to actually use
inference techniques to reconstruct

137
00:14:05,000 --> 00:14:10,000
evolutionary processes.
Now, what do we use when we want to

138
00:14:10,000 --> 00:14:15,000
reconstruct the evolutionary history
of animals and plants usually?

139
00:14:15,000 --> 00:14:20,000
Anybody?  Fossils.  Exactly.  So
you take a shovel,

140
00:14:20,000 --> 00:14:25,000
essentially, and dig down into the
different layers.

141
00:14:25,000 --> 00:14:30,000
And there's different techniques
that you can actually determine the

142
00:14:30,000 --> 00:14:34,000
age of different sedentary rocks.
For example, and then you can

143
00:14:34,000 --> 00:14:38,000
construct, if you're lucky,
you'll find enough fossils of a

144
00:14:38,000 --> 00:14:42,000
particular lineage.
You can reconstruct the evolution

145
00:14:42,000 --> 00:14:45,000
of the lineage.
I'm sure you all have seen the

146
00:14:45,000 --> 00:14:49,000
example of the horse,
for example, where we have actually

147
00:14:49,000 --> 00:14:53,000
quite good evidence when ancient
horses look like.

148
00:14:53,000 --> 00:14:57,000
And we can reconstruct the sequence
of events that led to the evolution

149
00:14:57,000 --> 00:14:59,000
of modern-day horses.
Now, you can imagine,

150
00:14:59,000 --> 00:14:59,000
though, that when we talk about such
ancient events like these there
really is no fossil record.
OK, so what people have figured out,
then, is that that was really a
stroke of genius that came about in
the late 60s, that DNA molecules can
act as evolutionary chronometers.

151
00:15:00,000 --> 00:15:44,000
OK, now what do I mean by that?

152
00:15:44,000 --> 00:15:48,000
I mean that you can take DNA
sequences or gene sequences from

153
00:15:48,000 --> 00:15:53,000
different kinds of organisms.
Based on those gene sequences you

154
00:15:53,000 --> 00:15:58,000
can reconstruct the relationships to
each other.  You can determine

155
00:15:58,000 --> 00:16:02,000
whether two organisms are closely
related or whether they are only

156
00:16:02,000 --> 00:16:14,000
very distantly related.
And the underlying mechanism of that,

157
00:16:14,000 --> 00:16:33,000
is that mutations happen with a
certain probability all the time.

158
00:16:33,000 --> 00:16:41,000
So, the idea is that as time passed
on, DNA molecules will change.

159
00:16:41,000 --> 00:16:50,000
So they will accumulate, actually,
mutations, and so this will lead to,

160
00:16:50,000 --> 00:16:59,000
and that the idea is that the amount
of change in a particular DNA

161
00:16:59,000 --> 00:17:08,000
sequence is proportional to the time
of separate evolution of two

162
00:17:08,000 --> 00:17:17,000
different lineages or two
different organisms.

163
00:17:17,000 --> 00:17:26,000
So, the amount is more or
less proportional --

164
00:17:26,000 --> 00:17:38,000
-- to time since the last

165
00:17:38,000 --> 00:17:54,000
common ancestry.

166
00:17:54,000 --> 00:18:05,000
So, let me explain how this is
actually done.

167
00:18:05,000 --> 00:18:16,000
What you really need in order to do
this, is you need genes that are

168
00:18:16,000 --> 00:18:27,000
related to each other,
OK?  So, genes, they need to be

169
00:18:27,000 --> 00:18:34,000
universally distributed.
That meets all organisms that you

170
00:18:34,000 --> 00:18:37,000
want to compare need to have this
type of gene.  And,

171
00:18:37,000 --> 00:18:41,000
those genes need to have conserved
function.

172
00:18:41,000 --> 00:18:52,000
In these genes,

173
00:18:52,000 --> 00:18:57,000
we can then compare to each other,
and I will explain how this is

174
00:18:57,000 --> 00:19:02,000
actually done.  Any
questions so far?

175
00:19:02,000 --> 00:19:06,000
OK, so the example that I actually
want to bring is the 16S

176
00:19:06,000 --> 00:19:26,000
ribosomal RNA genes.

177
00:19:26,000 --> 00:19:35,000
We oftentimes abbreviate this rRNA.
Now, does anybody remember what the

178
00:19:35,000 --> 00:19:44,000
ribosomal RNAs are and do?
What's the ribosome?  Yes?

179
00:19:44,000 --> 00:19:53,000
Right, and what does it do?
Exactly, it's the location where

180
00:19:53,000 --> 00:20:02,000
messenger RNA is translated
into protein.

181
00:20:02,000 --> 00:20:06,000
Now, the ribosomal RNAs are an
integral part of the ribosome.

182
00:20:06,000 --> 00:20:10,000
They play both a catalytic role as
well as a structural role in the

183
00:20:10,000 --> 00:20:14,000
ribosome.  And so,
fundamentally, because this is such

184
00:20:14,000 --> 00:20:18,000
a fundamental organelle,
all living organisms possess it.

185
00:20:18,000 --> 00:20:22,000
So, all organisms have it.  So this
allows us to use these genes to

186
00:20:22,000 --> 00:20:26,000
really compare all living organisms
to each other.

187
00:20:26,000 --> 00:20:30,000
OK, so this is a very important
point.

188
00:20:30,000 --> 00:20:34,000
I wanted to show you a,
OK, if it wakes up.  There we go.

189
00:20:34,000 --> 00:20:39,000
An example of these ribosomal RNA
genes, now this is actually,

190
00:20:39,000 --> 00:20:43,000
what you see here is a secondary
structure of the actual RNA,

191
00:20:43,000 --> 00:20:48,000
the ribosomal RNA.  Now, these
molecules have a secondary structure

192
00:20:48,000 --> 00:20:52,000
because they play a catalytic and
structural role.

193
00:20:52,000 --> 00:20:57,000
And so, the really amazing thing is
when you look at the structure,

194
00:20:57,000 --> 00:21:01,000
the structure determines really the
function of those molecules in

195
00:21:01,000 --> 00:21:06,000
different organisms.
And then look at this.

196
00:21:06,000 --> 00:21:10,000
We have here a bacterium,
and here are an archaea.  Now,

197
00:21:10,000 --> 00:21:14,000
if you think back to the first
couple of slides,

198
00:21:14,000 --> 00:21:18,000
what I showed you is that those
organisms have not shared a common

199
00:21:18,000 --> 00:21:22,000
evolutionary history for about four,
or so, billion years, or 3 billion

200
00:21:22,000 --> 00:21:26,000
years, excuse me.
But, if you just glance very

201
00:21:26,000 --> 00:21:30,000
quickly at the structures,
you see that they look very similar

202
00:21:30,000 --> 00:21:34,000
to each other.
So, there's an indication that the

203
00:21:34,000 --> 00:21:38,000
function is really very highly
conserved of those molecules.

204
00:21:38,000 --> 00:21:42,000
However, when you actually look at
the sequences in detail,

205
00:21:42,000 --> 00:21:46,000
what you'll find is that there's
different regions.

206
00:21:46,000 --> 00:21:50,000
And I'd given some examples here
denoted by A, B,

207
00:21:50,000 --> 00:21:54,000
C in those molecules.
And these different regions of the

208
00:21:54,000 --> 00:21:58,000
molecules are really the key to its
usefulness in figuring out the

209
00:21:58,000 --> 00:22:02,000
evolution and ecology
of many organisms.

210
00:22:02,000 --> 00:22:06,000
The region number A here,
or denoted by A, a sequence

211
00:22:06,000 --> 00:22:10,000
stretches that are the same in all
living organisms.

212
00:22:10,000 --> 00:22:14,000
So they are universally conserved,
which means that if you get a

213
00:22:14,000 --> 00:22:19,000
mutation in a gene in that
particular region,

214
00:22:19,000 --> 00:22:23,000
you are dead.  OK, that's why it's
conserved essentially.

215
00:22:23,000 --> 00:22:27,000
Then we have those regions B where
the length is conserved,

216
00:22:27,000 --> 00:22:32,000
but the sequence is not.
So, there are sequence change

217
00:22:32,000 --> 00:22:36,000
allowed, but the length needs to be
conserved.  And then there's the

218
00:22:36,000 --> 00:22:40,000
region C were neither length nor
sequence is actually conserved,

219
00:22:40,000 --> 00:22:44,000
and where we get a lot of variation.
So, let me write this down.  We

220
00:22:44,000 --> 00:22:49,000
have three types of sequence
stretches.

221
00:22:49,000 --> 00:23:05,000
We have A, what I called the

222
00:23:05,000 --> 00:23:16,000
universally conserved sequences.
We have B where length, but not

223
00:23:16,000 --> 00:23:27,000
sequence is conserved.
And, we have C where neither length

224
00:23:27,000 --> 00:23:42,000
nor sequence is actually conserved.

225
00:23:42,000 --> 00:23:48,000
And the first two stretches,
the first two types of sequence

226
00:23:48,000 --> 00:23:55,000
stretches, are very important in
figuring out the phylogeny or the

227
00:23:55,000 --> 00:24:01,000
evolutionary relationships amongst
organisms.  Whereas the sequence

228
00:24:01,000 --> 00:24:08,000
stretches number C because they vary
so dramatically,

229
00:24:08,000 --> 00:24:15,000
are very important in identifying
organisms.

230
00:24:15,000 --> 00:24:19,000
And we'll talk more about this
actually next time.

231
00:24:19,000 --> 00:24:24,000
So what can we actually know do
with those sequences?

232
00:24:24,000 --> 00:24:29,000
Well, the first step is we need to
generate an alignment.

233
00:24:29,000 --> 00:24:51,000
OK, and this is actually shown here,

234
00:24:51,000 --> 00:24:55,000
where each row denotes a gene from a
particular organism.

235
00:24:55,000 --> 00:25:00,000
OK, so these are all abbreviated
here.

236
00:25:00,000 --> 00:25:04,000
These actually aren't ribosomal RNA
genes, but other genes.

237
00:25:04,000 --> 00:25:09,000
And that what you will see here is
we can recognize those three

238
00:25:09,000 --> 00:25:13,000
different regions that I've pointed
out before.  You have the regions A

239
00:25:13,000 --> 00:25:18,000
which tell you which nucleotides
line up with each other,

240
00:25:18,000 --> 00:25:22,000
so you use this sort of as an anchor
because the sequences never vary

241
00:25:22,000 --> 00:25:27,000
amongst organisms.
And that the sequence region B

242
00:25:27,000 --> 00:25:31,000
where you light up sequences that
vary or stretches that vary in

243
00:25:31,000 --> 00:25:36,000
sequence but not in length.
Now, why is this important?

244
00:25:36,000 --> 00:25:41,000
It's important because you have in
each column that nucleotides that

245
00:25:41,000 --> 00:25:47,000
have originated from a common
ancestral nucleotide,

246
00:25:47,000 --> 00:25:52,000
and whose variation over time you
can actually monitor.

247
00:25:52,000 --> 00:25:58,000
Is everybody with that?
Any questions?  OK, great.

248
00:25:58,000 --> 00:26:02,000
The second step,
then, is the calculation of a

249
00:26:02,000 --> 00:26:16,000
similarity.

250
00:26:16,000 --> 00:26:20,000
And this is shown here.
Again, we have a very simplified

251
00:26:20,000 --> 00:26:24,000
alignment now of four different
organisms.  Here,

252
00:26:24,000 --> 00:26:29,000
we have the sequences that we want
to compare.  And what you'll see is

253
00:26:29,000 --> 00:26:33,000
that they're overall very similar,
but there are different sort of

254
00:26:33,000 --> 00:26:38,000
nucleotides.
And so, what we simply do is for

255
00:26:38,000 --> 00:26:43,000
each pair of sequence combinations,
we calculate the sequence similarity

256
00:26:43,000 --> 00:26:48,000
value.  So, what you see is that you
have 12 nucleotides,

257
00:26:48,000 --> 00:26:52,000
and the first pair differs in three
nucleotides.  OK,

258
00:26:52,000 --> 00:26:57,000
so that tells us, or it's called
actually a distance

259
00:26:57,000 --> 00:27:01,000
here, I'm sorry.
Let me write this down here.

260
00:27:01,000 --> 00:27:15,000
It's simply one minus the similarity,

261
00:27:15,000 --> 00:27:21,000
of course, but so basically a
quarter of the nucleotides differ

262
00:27:21,000 --> 00:27:27,000
where it's between A and C,
a third of the nucleotides

263
00:27:27,000 --> 00:27:33,000
difference on.
OK, so you do this for each pair of

264
00:27:33,000 --> 00:27:40,000
sequences, excuse me.
The third step,

265
00:27:40,000 --> 00:27:49,000
then, is to calculate the correction
for multiple mutations affecting the

266
00:27:49,000 --> 00:28:08,000
same nucleotides.

267
00:28:08,000 --> 00:28:12,000
Now, you can imagine that over time
there's a probability that a

268
00:28:12,000 --> 00:28:16,000
particular nucleotide mutates,
say, twice.  So, in the first

269
00:28:16,000 --> 00:28:20,000
instance it may change from A to a G,
, but then it changes to a C.

270
00:28:20,000 --> 00:28:24,000
But when you look at the modern-day
sequences, you don't know that this

271
00:28:24,000 --> 00:28:28,000
actually happened.
And so there's ways to

272
00:28:28,000 --> 00:28:32,000
statistically estimate what the
likelihood is that a sequence

273
00:28:32,000 --> 00:28:37,000
actually contains such
multiple events.

274
00:28:37,000 --> 00:28:41,000
OK, and this, we called,
a corrective evolutionary distance

275
00:28:41,000 --> 00:28:46,000
then.  And what you will note is
that the corrected evolutionary

276
00:28:46,000 --> 00:28:51,000
distance is invariably larger than
the actual observed one.

277
00:28:51,000 --> 00:28:56,000
Now, what can we can do with those
distances?  We can constrain them

278
00:28:56,000 --> 00:29:01,000
into a best fit tree
of relationships.

279
00:29:01,000 --> 00:29:07,000
So, we can draw what we call is a
best fit tree.

280
00:29:07,000 --> 00:29:14,000
That's shown here.
We have our four organisms,

281
00:29:14,000 --> 00:29:20,000
but when you look at those branches
of the tree what you'll see is that

282
00:29:20,000 --> 00:29:27,000
they add up roughly to the correct
evolutionary distance here.

283
00:29:27,000 --> 00:29:32,000
So, between A and B we have 0.
3 and 0.08, which roughly gives you

284
00:29:32,000 --> 00:29:37,000
0.3 here, OK, whereas between A and
C the tree is constrain such that we

285
00:29:37,000 --> 00:29:42,000
have 0.31, and here 0.
5, and so overall you roughly get

286
00:29:42,000 --> 00:29:48,000
the distance here that we have
calculated.  And so what this means

287
00:29:48,000 --> 00:29:53,000
is that you ordered the organisms by
their calculated evolutionary

288
00:29:53,000 --> 00:29:58,000
distance.  And so you have now
obtained, actually,

289
00:29:58,000 --> 00:30:04,000
a very intuitive picture of the
relationship of organisms to each

290
00:30:04,000 --> 00:30:09,000
other where A and B are obviously
the most closely related ones,

291
00:30:09,000 --> 00:30:15,000
and A and D are the most distantly
related.

292
00:30:15,000 --> 00:30:23,000
Is everybody with it?
Any questions?  OK, now,

293
00:30:23,000 --> 00:30:31,000
this best fit tree is what we call a
phylogeny.

294
00:30:31,000 --> 00:30:52,000
Now, excuse me,

295
00:30:52,000 --> 00:31:00,000
these techniques really
revolutionized the study of

296
00:31:00,000 --> 00:31:08,000
evolutionary relationships,
and one of the things that it

297
00:31:08,000 --> 00:31:16,000
allowed us to do is to construct
universal phylogenetic trees or what

298
00:31:16,000 --> 00:31:23,000
we can also call the tree of life.
And I will show you this on the next

299
00:31:23,000 --> 00:31:30,000
slide, and that I want to make a few
general statements about this.

300
00:31:30,000 --> 00:31:37,000
So first of all,
when you analyze all known organisms,

301
00:31:37,000 --> 00:31:45,000
and obviously that would be a big
task, but representative of all

302
00:31:45,000 --> 00:31:52,000
known organisms,
what you'll find is that,

303
00:31:52,000 --> 00:32:00,000
indeed, we have three major
lineages: the bacteria,

304
00:32:00,000 --> 00:32:07,000
the archaea, and the eukarya.
OK, so we have what we call three

305
00:32:07,000 --> 00:32:15,000
domains of life: the archaea,
bacteria, and the eukarya.

306
00:32:15,000 --> 00:32:20,000
So, this really is the evidence that
life really split very,

307
00:32:20,000 --> 00:32:26,000
very early on into those three
lineages that I showed you before.

308
00:32:26,000 --> 00:32:32,000
Interestingly,
two of those major domains here are

309
00:32:32,000 --> 00:32:39,000
prokaryotic, OK?
So, two of the domains are

310
00:32:39,000 --> 00:32:46,000
prokaryotes.  Moreover,
if you actually look at the types of

311
00:32:46,000 --> 00:32:53,000
organisms that are on here,
you'll notice that even on the

312
00:32:53,000 --> 00:33:00,000
eukaryotic side of the tree,
most of the organisms here are

313
00:33:00,000 --> 00:33:07,000
actually microbial.
So, the single celled organisms: and

314
00:33:07,000 --> 00:33:14,000
that means that most of the life on
the planet is microbial.

315
00:33:14,000 --> 00:33:21,000
The vast diversity of organisms on
the planet are microorganisms.

316
00:33:21,000 --> 00:33:29,000
So, we can say that most life is
microbial.

317
00:33:29,000 --> 00:33:34,000
And when you, then,
look at analysis of mitochondria,

318
00:33:34,000 --> 00:33:39,000
and chloroplasts which all have
their own genetic machinery,

319
00:33:39,000 --> 00:33:44,000
and therefore also their own
ribosomes you'll see that the

320
00:33:44,000 --> 00:33:49,000
mitochondrion,
OK, and the chloroplasts both tree

321
00:33:49,000 --> 00:33:54,000
within the bacteria.
So, we really have an amazing

322
00:33:54,000 --> 00:33:59,000
confirmation of this endosymbiont
theory which actually developed in

323
00:33:59,000 --> 00:34:04,000
the absence of gene sequences by
some Russian scientists in the early

324
00:34:04,000 --> 00:34:13,000
20th century.
So, we have that mitochondria and

325
00:34:13,000 --> 00:34:27,000
chloroplasts tree within bacteria,
and this really supports the

326
00:34:27,000 --> 00:34:36,000
endosymbiont theory.
So really, you could say eukaryotes

327
00:34:36,000 --> 00:34:42,000
are really just walking,
and swimming, and flying incubators

328
00:34:42,000 --> 00:34:48,000
for bacteria, right?
So, just hosts for microorganisms.

329
00:34:48,000 --> 00:34:54,000
OK, so basically you can, what you
should take home from this is the

330
00:34:54,000 --> 00:35:00,000
three domains of life.
Two are prokaryotic, and even more

331
00:35:00,000 --> 00:35:06,000
so most of the diversity that we
find is actually microbial,

332
00:35:06,000 --> 00:35:12,000
and then finally the endosymbiont
theory is actually confirmed by

333
00:35:12,000 --> 00:35:17,000
those phylogenies.
Now, what I want to cover in the

334
00:35:17,000 --> 00:35:22,000
remaining time,
is how we can actually use now those

335
00:35:22,000 --> 00:35:27,000
sequences to learn something about
organisms in the environment.

336
00:35:27,000 --> 00:35:32,000
That's the topic of molecular
ecology.

337
00:35:32,000 --> 00:35:43,000
To introduce this,

338
00:35:43,000 --> 00:35:47,000
I just want to show you a couple
slides that really sort of capture

339
00:35:47,000 --> 00:35:51,000
what the big problem is that we're
facing here.  Now,

340
00:35:51,000 --> 00:35:55,000
when we look at the abundance of
prokaryotic cells in different types

341
00:35:55,000 --> 00:35:59,000
of environments,
what we see is that there is an

342
00:35:59,000 --> 00:36:04,000
enormous number of different
prokaryotes out there.

343
00:36:04,000 --> 00:36:08,000
This summarizes,
here, different types of

344
00:36:08,000 --> 00:36:12,000
environments.  We have the marine
environment, freshwater environment,

345
00:36:12,000 --> 00:36:16,000
sediment and soils, subsurface
sentiments and animal guts.

346
00:36:16,000 --> 00:36:20,000
And that this number here gives you
the average number of prokaryotic

347
00:36:20,000 --> 00:36:24,000
cells either per milliliter or per
gram.  And it here we have the total

348
00:36:24,000 --> 00:36:28,000
number of cells obtained by
multiplying the average number with

349
00:36:28,000 --> 00:36:33,000
the total volume of the particular
environment.

350
00:36:33,000 --> 00:36:37,000
So what you can see is that in the
marine environment,

351
00:36:37,000 --> 00:36:41,000
we have an average half a million
cells per milliliter of water,

352
00:36:41,000 --> 00:36:45,000
OK?  It freshwater, we have about a
million cells.

353
00:36:45,000 --> 00:36:49,000
What is that telling you?
There's a ton of prokaryotes out

354
00:36:49,000 --> 00:36:53,000
there.  What you go swimming,
you take a little gulp of water:

355
00:36:53,000 --> 00:36:57,000
you've probably eaten several
million prokaryotes,

356
00:36:57,000 --> 00:37:01,000
that it's nothing to worry about
because what this also tells us is

357
00:37:01,000 --> 00:37:05,000
that very, very few prokaryotes out
there are really pathogens because

358
00:37:05,000 --> 00:37:09,000
otherwise you'd be sick
all the time.

359
00:37:09,000 --> 00:37:15,000
Now, in sediments and soils,
in as little as a gram you have five

360
00:37:15,000 --> 00:37:22,000
times 10^9 prokaryotic cells almost.
5 billion prokaryotic cells are out

361
00:37:22,000 --> 00:37:29,000
there, and even in very,
very deep sediments that reach down

362
00:37:29,000 --> 00:37:36,000
to 3,000 m, you have a substantial
number of prokaryotic cells.

363
00:37:36,000 --> 00:37:40,000
Well, and here's your guts,
10^5 times 10^6 gives you 10^11 per

364
00:37:40,000 --> 00:37:45,000
gram.  So again,
you're just a walking incubator for

365
00:37:45,000 --> 00:37:50,000
a very complex microbial community.
Here's the global abundance.  You

366
00:37:50,000 --> 00:37:55,000
see that steeps of surface sediments
and the marine environment,

367
00:37:55,000 --> 00:38:00,000
probably in terms of numbers at
least, the most important

368
00:38:00,000 --> 00:38:05,000
microbial environments.
Now, faced with this enormous

369
00:38:05,000 --> 00:38:09,000
abundance of prokaryotes out there,
very important question is how many

370
00:38:09,000 --> 00:38:14,000
of them are out there?
Or, how diverse our prokaryotes in

371
00:38:14,000 --> 00:38:18,000
the environment?
That's important if you want to

372
00:38:18,000 --> 00:38:23,000
figure out their function and the
environment, and want to understand

373
00:38:23,000 --> 00:38:27,000
also their evolution.
And what I want to show you here is

374
00:38:27,000 --> 00:38:32,000
that we've gone through an amazing
development in our understanding of

375
00:38:32,000 --> 00:38:36,000
prokaryotic diversity in the
environment over the last

376
00:38:36,000 --> 00:38:42,000
10 to 15 years or so.
Who knows about E.

377
00:38:42,000 --> 00:38:48,000
. Wilson here?  One person?
So, he wrote a very famous book on

378
00:38:48,000 --> 00:38:54,000
biodiversity, which was published in
1988, where he tried to summarize,

379
00:38:54,000 --> 00:39:00,000
really, how diverse the known
organisms are on the planet it also

380
00:39:00,000 --> 00:39:06,000
try to extrapolate to
the total diversity.

381
00:39:06,000 --> 00:39:10,000
And what you see is that he came up
with about 1.4 million different

382
00:39:10,000 --> 00:39:14,000
species here, mostly dominated by
insects.  That's the big section

383
00:39:14,000 --> 00:39:19,000
here on this pie chart.
The plants: very important.

384
00:39:19,000 --> 00:39:23,000
And if you look, the prokaryotes
feature with about 3,

385
00:39:23,000 --> 00:39:27,000
00 different species.  So,
in 1988 we thought there were very

386
00:39:27,000 --> 00:39:32,000
few prokaryotic species out there.
If you look about 10 years into the

387
00:39:32,000 --> 00:39:36,000
future and take the assessment here,
and this just exemplifies how the

388
00:39:36,000 --> 00:39:41,000
thinking has changed,
you see that we think now that there

389
00:39:41,000 --> 00:39:45,000
is about 11 million different
species out there,

390
00:39:45,000 --> 00:39:50,000
and that the vast majority of them
are prokaryotic,

391
00:39:50,000 --> 00:39:54,000
OK, 10 million.  So,
this big part of the pie chart is

392
00:39:54,000 --> 00:39:59,000
really the prokaryotic diversity.
Now, what really has changed is

393
00:39:59,000 --> 00:40:03,000
that we've actually started to use
molecular techniques to determine

394
00:40:03,000 --> 00:40:08,000
the diversity of prokaryotes
in the environment.

395
00:40:08,000 --> 00:40:18,000
So molecular ecology is really the
use of molecular gene sequences

396
00:40:18,000 --> 00:40:29,000
obtained directly from
the environment --

397
00:40:29,000 --> 00:40:42,000
-- to learn about the diversity

398
00:40:42,000 --> 00:40:54,000
prokaryotic --

399
00:40:54,000 --> 00:40:58,000
-- diversity out there.
Now, this slide just quickly

400
00:40:58,000 --> 00:41:03,000
summarizes this.
Basically, the idea is that you go

401
00:41:03,000 --> 00:41:08,000
out into the environment and collect
either water or soil samples that,

402
00:41:08,000 --> 00:41:13,000
as I just showed you, invariably
contain a lot of different

403
00:41:13,000 --> 00:41:17,000
prokaryotic cells.
You then lyse the cells and purify

404
00:41:17,000 --> 00:41:22,000
their DNA.  And so that you end up
with a mixture of DNA that

405
00:41:22,000 --> 00:41:27,000
represents the organisms out there,
and then you can use universal PCR

406
00:41:27,000 --> 00:41:32,000
primers to actually amplify
ribosomal RNA genes from all the

407
00:41:32,000 --> 00:41:37,000
organisms that are present
in your samples.

408
00:41:37,000 --> 00:41:42,000
Now, why can you use universal PCR
primers?  Well,

409
00:41:42,000 --> 00:41:48,000
they target the regions number A
that I showed you before.

410
00:41:48,000 --> 00:41:53,000
Those regions in the genes are
invariant amongst all organisms.

411
00:41:53,000 --> 00:41:59,000
You guys all remember how the PCR
works, right?  We cover this.

412
00:41:59,000 --> 00:42:04,000
OK?  Yes?  No?  Who doesn't?
You don't?  All right,

413
00:42:04,000 --> 00:42:09,000
come to the board.  Just kidding.
OK, you should look it up.  I don't

414
00:42:09,000 --> 00:42:15,000
have time to cover this,
unfortunately, but basically it's a

415
00:42:15,000 --> 00:42:20,000
technique that allows you to amplify
specific types of genes millions to

416
00:42:20,000 --> 00:42:25,000
billion fold.  And once you have
done this, what you can do is that

417
00:42:25,000 --> 00:42:31,000
you can purify the genes on gels,
and then separate them by cloning

418
00:42:31,000 --> 00:42:36,000
them into individual plasmids.
And those plasmids have been

419
00:42:36,000 --> 00:42:41,000
inserted into E.
coli cells, and the E.

420
00:42:41,000 --> 00:42:46,000
coli cells are then individually
grown up so that each culture

421
00:42:46,000 --> 00:42:50,000
contains only a single plasmid,
and you can then sequence these

422
00:42:50,000 --> 00:42:55,000
ribosomal DNAs or ribosomal RNA
genes from those clones.

423
00:42:55,000 --> 00:43:00,000
And so, you have obtained a library
of the ribosomal RNA genes

424
00:43:00,000 --> 00:43:08,000
from the environment.
So, we use environmental ribosomal

425
00:43:08,000 --> 00:43:18,000
RNA gene libraries from which we
then can actually compare how many

426
00:43:18,000 --> 00:43:28,000
different types of genes
are out there.

427
00:43:28,000 --> 00:43:32,000
So let me show you an example of
this.  What we have done recently,

428
00:43:32,000 --> 00:43:37,000
we've gone out in one of the first
really comprehensive samplings of

429
00:43:37,000 --> 00:43:42,000
coastal bacteria plankton,
which means the bacteria that are

430
00:43:42,000 --> 00:43:47,000
present free living in ocean water.
And so, we've done this, we've

431
00:43:47,000 --> 00:43:52,000
collected all those clones,
and then basically we constructed

432
00:43:52,000 --> 00:43:57,000
those phylogenetic trees that I
showed you before that really allow

433
00:43:57,000 --> 00:44:02,000
us see how many different types are
out there, and how closely related

434
00:44:02,000 --> 00:44:07,000
they are to one another.
And what we found is that in this

435
00:44:07,000 --> 00:44:12,000
environment that you think might be
very simple because it just the

436
00:44:12,000 --> 00:44:17,000
water column right?
No, not much structure in there.

437
00:44:17,000 --> 00:44:22,000
We found over 1500 bacterial 16S
ribosomal RNA sequences to occur,

438
00:44:22,000 --> 00:44:27,000
so an enormous diversity of
prokaryotes of bacteria in that

439
00:44:27,000 --> 00:44:32,000
particular environment.
And the important point is that when

440
00:44:32,000 --> 00:44:36,000
you actually look at a collection of
such studies that I just showed you,

441
00:44:36,000 --> 00:44:40,000
what you find is that the vast
majority of microorganisms in the

442
00:44:40,000 --> 00:44:44,000
environment have never been cultured.
So traditionally what we do of

443
00:44:44,000 --> 00:44:49,000
course to learn about microorganisms
when you grow E.

444
00:44:49,000 --> 00:44:53,000
coli, or so, you throw them onto
culture plates.

445
00:44:53,000 --> 00:44:57,000
You make lots of different cells,
and that allows you to study some of

446
00:44:57,000 --> 00:45:02,000
their properties.
But when you look,

447
00:45:02,000 --> 00:45:06,000
for example, at results from the
ocean, this summarizes now coastal

448
00:45:06,000 --> 00:45:10,000
and open ocean environments,
again, the bacteria plankton is

449
00:45:10,000 --> 00:45:15,000
those free-floating bacterial
cells in the water.

450
00:45:15,000 --> 00:45:19,000
And you compare this to what we've
actually been able to culture from

451
00:45:19,000 --> 00:45:23,000
those environments.
What you see is that you have some

452
00:45:23,000 --> 00:45:27,000
dominant groups here.
They have all funny names,

453
00:45:27,000 --> 00:45:32,000
most of them, because they're just
clones and clone libraries.

454
00:45:32,000 --> 00:45:36,000
But these are the dominant groups
that show up in clone libraries.

455
00:45:36,000 --> 00:45:40,000
Here's their relative
representation in different clone

456
00:45:40,000 --> 00:45:44,000
libraries from a variety of
environments.  And so here you have

457
00:45:44,000 --> 00:45:48,000
one very important one,
the SAR11 group, or this one,

458
00:45:48,000 --> 00:45:53,000
the SAR86, that always show up in
clone libraries.

459
00:45:53,000 --> 00:45:57,000
But we've never see them in culture,
so the important point to realize

460
00:45:57,000 --> 00:46:01,000
here is that what is actually
happening is that whenever we go out,

461
00:46:01,000 --> 00:46:05,000
we find a great diversity of
bacteria out there,

462
00:46:05,000 --> 00:46:10,000
but we have no idea what they
actually do.

463
00:46:10,000 --> 00:46:14,000
And this is one of the big questions
that we need to answer to understand,

464
00:46:14,000 --> 00:46:18,000
really, how the planet actually
works.  What are those uncultured

465
00:46:18,000 --> 00:46:22,000
microorganisms out in the
environment really doing,

466
00:46:22,000 --> 00:46:26,000
and what is their importance?
And we'll talk about this next time.

467
00:46:26,000 --> 00:46:30,000
We're going to talk about
environmental genomics because

468
00:46:30,000 --> 00:46:34,000
essentially what we can do now,
is we have techniques available that

469
00:46:34,000 --> 00:46:38,000
allow us to isolate and least large
fragments of the genomes,

470
00:46:38,000 --> 00:46:42,000
sequence those, and look at what
kinds of genes they have present.

471
00:46:42,000 --> 00:46:46,000
And that allows us,
then, to infer some of their

472
00:46:46,000 --> 00:46:51,000
function in the biogeochemical
cycles in the environment.

473
00:46:51,000 --> 00:46:55,000
OK, so with this I'm going to close
today unless you have

474
00:46:55,000 --> 00:46:58,000
any more questions.