Biochemistry and Molecular Biology

Loading...
Principles and Techniques of

Biochemistry and Molecular Biology Seventh edition EDITED BY KEITH WILSON AND JOHN WALKER

This new edition of the bestselling textbook integrates the theoretical principles and experimental techniques common to all undergraduate courses in the bio- and medical sciences. Three of the 16 chapters have new authors and have been totally rewritten. The others have been updated and extended to reflect developments in their field exemplified by a new section on stem cells. Two new chapters have been added. One on clinical biochemistry discusses the principles underlying the diagnosis and management of common biochemical disorders. The second one on drug discovery and development illustrates how the principles and techniques covered in the book are fundamental to the design and development of new drugs. In-text worked examples are again used to enhance student understanding of each topic and case studies are selectively used to illustrate important examples. Experimental design, quality assurance and the statistical analysis of quantitative data are emphasised throughout the book.

• • •

Motivates students by including cutting-edge topics and techniques, such as drug discovery, as well as the methods they will encounter in their own lab classes Promotes problem solving by setting students a challenge and then guiding them through the solution Integrates theory and practise to ensure students understand why and how each technique is used. K E I T H W I L S O N is Professor Emeritus of Pharmacological Biochemistry and former Head of the Department of Biosciences, Dean of the Faculty of Natural Sciences, and Director of Research at the University of Hertfordshire. J O H N W A L K E R is Professor Emeritus and former Head of the School of Life Sciences at the University of Hertfordshire.

Cover illustration Main image Electrophoresis gel showing recombinant protein. Photographer: J.C. Revy. Courtesy of Science Photo Library. Top inset Transcription factor and DNA molecule. Courtesy of: Laguna Design/Science Photo Library. Second inset Microtubes, pipettor (pipette) tip & DNA sequence. Courtesy of Tek Image/Science Photo Library. Third inset Stem cell culture, light micrograph. Photographer: Philippe Plailly. Courtesy of Science Photo Library. Fourth inset Embryonic stem cells. Courtesy of Science Photo Library. Bottom inset Herceptin breast cancer drug, molecular model. Photographer: Tim Evans. Courtesy of Science Photo Library.

Principles and Techniques of

Biochemistry and Molecular Biology Seventh edition

Edited by

KEITH WILSON AND JOHN WALKER

CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sa˜o Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521516358 First and second editions # Bryan Williams and Keith Wilson 1975, 1981 Third edition # Keith Wilson and Kenneth H. Goulding 1986 Fourth edition # Cambridge University Press 1993 Fifth edition # Cambridge University Press 2000 Sixth edition # Cambridge University Press 2005 Seventh edition # Cambridge University Press 2010 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published by Edward Arnold 1975 as A Biologist’s Guide to Principles and Techniques of Practical Biochemistry Second edition 1981; Third edition 1986 Third edition first published by Cambridge University Press 1992; Reprinted 1993 Fourth edition published by Cambridge University Press 1994 as Principles and Techniques of Practical Biochemistry; Reprinted 1995, 1997; Fifth edition 2000 Sixth edition first published by Cambridge University Press 2005 as Principles and Techniques of Biochemistry and Molecular Biology; Reprinted 2006, 2007 Seventh edition first published by Cambridge University Press 2010 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library Library of Congress Cataloging-in-Publication Data Principles and techniques of biochemistry and molecular biology / edited by Keith Wilson, John Walker. – 7th ed. p. cm. ISBN 978-0-521-51635-8 (hardback) – ISBN 978-0-521-73167-6 (pbk.) 1. Biochemistry–Textbooks. 2. Molecular biology–Textbooks. I. Wilson, Keith, 1936– John M., 1948– III. Title. QP519.7.P75 2009 6120 .015–dc22 2009043277 ISBN 978-0-521-51635-8 Hardback ISBN 978-0-521-73167-6 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

II. Walker,

CONTENTS

Preface to the seventh edition List of contributors List of abbreviations

1 Basic principles

page xi xiii xv 1

K. WILSON

1.1 1.2 1.3 1.4 1.5 1.6

Biochemical and molecular biology studies Units of measurement Weak electrolytes Quantitative biochemical measurements Safety in the laboratory Suggestions for further reading

2 Cell culture techniques

1 3 6 16 35 37 38

A. R. BAYDOUN

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9

Introduction The cell culture laboratory and equipment Safety considerations in cell culture Aseptic techniques and good cell culture practice Types of animal cell, characteristics and maintenance in culture Stem cell culture Bacterial cell culture Potential use of cell cultures Suggestions for further reading

3 Centrifugation

38 39 43 44 49 61 68 71 72 73

K . O H L EN D I EC K

3.1 3.2 3.3 3.4 3.5 3.6 v

Introduction Basic principles of sedimentation Types, care and safety aspects of centrifuges Preparative centrifugation Analytical centrifugation Suggestions for further reading

73 74 79 86 95 99

vi

Contents

4 Microscopy

100

S. W. PADDOCK

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

Introduction The light microscope Optical sectioning Imaging living cells and tissues Measuring cellular dynamics The electron microscope (EM) Image archiving Suggestions for further reading

5 Molecular biology, bioinformatics and basic techniques

100 103 116 123 126 129 133 136 138

R. RAPLEY

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12

Introduction Structure of nucleic acids Genes and genome complexity Location and packaging of nucleic acids Functions of nucleic acids The manipulation of nucleic acids – basic tools and techniques Isolation and separation of nucleic acids Molecular biology and bioinformatics Molecular analysis of nucleic acid sequences The polymerase chain reaction (PCR) Nucleotide sequencing of DNA Suggestions for further reading

6 Recombinant DNA and genetic analysis

138 139 145 149 152 162 164 170 171 178 187 194 195

R. RAPLEY

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12

Introduction Constructing gene libraries Cloning vectors Hybridisation and gene probes Screening gene libraries Applications of gene cloning Expression of foreign genes Analysing genes and gene expression Analysing whole genomes Pharmacogenomics Molecular biotechnology and applications Suggestions for further reading

7 Immunochemical techniques

195 196 206 223 225 229 234 240 254 259 260 262 263

R. BURNS

7.1 Introduction 7.2 Making antibodies

263 273

vii

7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15

Contents

Immunoassay formats Immuno microscopy Lateral flow devices Epitope mapping Immunoblotting Fluorescent activated cell sorting (FACS) Cell and tissue staining techniques Immunocapture polymerase chain reaction (PCR) Immunoaffinity chromatography (IAC) Antibody-based biosensors Therapeutic antibodies The future uses of antibody technology Suggestions for further reading

283 291 291 292 293 293 294 295 295 296 297 299 299

8 Protein structure, purification, characterisation and function analysis

300

J. WALKER

8.1 8.2 8.3 8.4 8.5 8.6

Ionic properties of amino acids and proteins Protein structure Protein purification Protein structure determination Proteomics and protein function Suggestions for further reading

9 Mass spectrometric techniques

300 304 307 328 340 351 352

A. AITKEN

9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8

Introduction Ionisation Mass analysers Detectors Structural information by tandem mass spectrometry Analysing protein complexes Computing and database analysis Suggestions for further reading

10 Electrophoretic techniques

352 354 359 377 379 390 394 397 399

J. WALKER

10.1 10.2 10.3 10.4 10.5 10.6 10.7

General principles Support media Electrophoresis of proteins Electrophoresis of nucleic acids Capillary electrophoresis Microchip electrophoresis Suggestions for further reading

399 403 407 422 427 431 432

viii

Contents

11 Chromatographic techniques

433

K. WILSON

11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10

Principles of chromatography Chromatographic performance parameters High-performance liquid chromatography Adsorption chromatography Partition chromatography Ion-exchange chromatography Molecular (size) exclusion chromatography Affinity chromatography Gas chromatography Suggestions for further reading

12 Spectroscopic techniques: I Spectrophotometric techniques

433 435 446 453 455 459 462 465 470 476 477

A. HOFMANN

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8

Introduction Ultraviolet and visible light spectroscopy Fluorescence spectroscopy Luminometry Circular dichroism spectroscopy Light scattering Atomic spectroscopy Suggestions for further reading

13 Spectroscopic techniques: II Structure and interactions

477 482 493 507 509 514 516 519 522

A. HOFMANN

13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8

Introduction Infrared and Raman spectroscopy Surface plasmon resonance Electron paramagnetic resonance Nuclear magnetic resonance X-ray diffraction Small-angle scattering Suggestions for further reading

14 Radioisotope techniques

522 523 527 530 536 546 549 551 553

R. J. SLATER

14.1 14.2 14.3 14.4 14.5 14.6

Why use a radioisotope? The nature of radioactivity Detection and measurement of radioactivity Other practical aspects of counting of radioactivity and analysis of data Safety aspects Suggestions for further reading

553 554 561 573 577 580

ix

Contents

15 Enzymes

581

K. WILSON

15.1 15.2 15.3 15.4 15.5 15.6

Characteristics and nomenclature Enzyme steady-state kinetics Analytical methods for the study of enzyme reactions Enzyme active sites and catalytic mechanisms Control of enzyme activity Suggestions for further reading

16 Principles of clinical biochemistry

581 584 602 611 615 624 625

J . F Y F F E A N D K . W I L S ON

16.1 16.2 16.3 16.4 16.5

Principles of clinical biochemical analysis Clinical measurements and quality control Examples of biochemical aids to clinical diagnosis Suggestions for further reading Acknowledgements

625 629 640 658 659

17 Cell membrane receptors and cell signalling

660

K. WILSON

17.1 17.2 17.3 17.4 17.5 17.6

Receptors for cell signalling Quantitative aspects of receptor–ligand binding Ligand-binding and cell-signalling studies Mechanisms of signal transduction Receptor trafficking Suggestions for further reading

18 Drug discovery and development

660 663 680 685 703 707 709

K. WILSON

18.1 18.2 18.3 18.4

Human disease and drug therapy Drug discovery Drug development Suggestions for further reading

709 718 727 734

Index

736

The colour figure section is between pages 128 and 129

PREFACE TO THE SEVENTH EDITION

In designing the content of this latest edition we continued our previous policy of placing emphasis on the recommendations we have received from colleagues and academics outside our university. Above all, we have attempted to respond to the invaluable feedback from student users of our book both in the UK and abroad. In this seventh edition we have retained all 16 chapters from the previous edition. All have been appropriately updated to reflect recent developments in their fields, as exemplified by the inclusion of a section on stem cells in the cell culture chapter. Three of these chapters have new authors and have been completely rewritten. Robert Burns, Scottish Agricultural Science Agency, Edinburgh has written the chapter on immunochemical techniques, and Andreas Hofmann, Eskitis Institute of Molecular Therapies, Griffith University, Brisbane, Australia has written the two chapters on spectroscopic techniques. We are delighted to welcome both authors to our team of contributors. In addition to these changes of authors, two new chapters have been added to the book. Our decision taken for the sixth edition to include a section on the biochemical principles underlying clinical biochemistry has been well received and so we have extended our coverage of the subject and have devoted a whole chapter (16) to this subject. Written in collaboration with Dr John Fyffe, Consultant Biochemist, Royal Hospital for Sick Children, Yorkhill, Glasgow, new topics that are discussed in the chapter include the diagnosis and management of kidney disease, diabetes, endocrine disorders including thyroid dysfunction, conditions of the hypothalamus–pituitary– adrenal axis such as pregnancy, and pathologies of plasma proteins such as myeloma. Case studies are included to illustrate how the principles discussed apply to the diagnosis and treatment of individual patients with the conditions. Our second major innovation for this new edition is the introduction of a new chapter on drug discovery and development. The strategic approaches to the discovery of new drugs has been revolutionised by developments in molecular biology. Pharmaceutical companies now rely on many of the principles and experimental techniques discussed in the chapters throughout the book to identify potential drug targets, screen chemical libraries and to evaluate the safety and efficacy of selected candidate drugs. The new chapter illustrates the principles of target selection by reference to current drugs used in the treatment of atherosclerosis and HIV/AIDS, emphasises the strategic decisions to be taken during the various stages of drug discovery and

xi

xii

Preface to the seventh edition

development and discusses the issues involved in clinical trials and the registration of new drugs. We continue to welcome constructive comments from all students who use our book as part of their studies and academics who adopt the book to complement their teaching. Finally, we wish to express our gratitude to the authors and publishers who have granted us permission to reproduce their copyright figures and our thanks to Katrina Halliday and her colleagues at Cambridge University Press who have been so supportive in the production of this new edition. KEITH WILSON AND JOHN WALKER

CONTRIBUTORS

PROFESSOR A. AITKEN

Division of Biomedical & Clinical Laboratory Sciences University of Edinburgh George Square Edinburgh EH8 9XD Scotland, UK D R A . R . B A Y D O UN

School of Life Sciences University of Hertfordshire College Lane Hatfield Herts AL10 9AB, UK DR R. BURNS

Scottish Agricultural Science Agency 1 Roddinglaw Road Edinburgh EH12 9FJ Scotland, UK DR J. FYFFE

Consultant Clinical Biochemist Department of Clinical Biochemistry Royal Hospital for Sick Children Yorkhill Glasgow G3 8SF Scotland, UK PROFESSOR ANDREAS HOFMANN

Structural Chemistry Eskitis Institute for Cell & Molecular Therapeutics Griffith University Nathan Brisbane, Qld 4111 Australia xiii

xiv

List of contributors

P R O F E S S O R K . OH L EN D I EC K

Department of Biology National University of Ireland Maynooth Co. Kildare Ireland DR S. W. PADDOCK

Howard Hughes Medical Institute Department of Molecular Biology University of Wisconsin 1525 Linden Drive Madison, WI 53706 USA DR R. RAPLEY

School of Life Sciences University of Hertfordshire College Lane Hatfield Herts AL10 9AB, UK PROFESSOR R. J. SLATER

School of Life Sciences University of Hertfordshire College Lane Hatfield Herts AL10 9AB, UK PROFESSOR J. M. WALKER

School of Life Sciences University of Hertfordshire College Lane Hatfield Herts AL10 9AB, UK P R O F E S S O R K . W I L S ON

Emeritus Professor of Pharmacological Biochemistry School of Life Sciences University of Hertfordshire College Lane Hatfield Herts AL10 9AB, UK

ABBREVIATIONS

The following abbreviations have been used throughout this book. AMP ADP ATP bp cAMP CHAPS c.p.m. CTP DDT DMSO DNA e EDTA ELISA FAD FADH2 FMN FMNH2 GC GTP HAT Hepes HPLC kb Mr min NADþ NADH NADPþ NADPH Pipes xv

adenosine 50 -monophosphate adenosine 50 -diphosphate adenosine 50 -triphosphate base-pairs cyclic AMP 3-[(3-chloroamidopropyl)dimethylamino]-1-propanesulphonic acid counts per minute cytidine triphosphate 2,2-bis-(p-chlorophenyl)-1,1,1-trichloroethane dimethylsulphoxide deoxyribonucleic acid electron ethylenediaminetetra-acetate enzyme-linked immunosorbent assay flavin adenine dinucleotide (oxidised) flavin adenine dinucleotide (reduced) flavin mononucleotide (oxidised) flavin mononucleotide (reduced) gas chromatography guanosine triphosphate hypoxanthine, aminopterin, thymidine medium 4(2-hydroxyethyl)-1-piperazine-ethanesulphonic acid high-performance liquid chromatography kilobase-pairs relative molecular mass minute nicotinamide adenine dinucleotide (oxidised) nicotinamide adenine dinucleotide (reduced) nicotinamide adenine dinucleotide phosphate (oxidised) nicotinamide adenine dinucleotide phosphate (reduced) 1,4-piperazinebis(ethanesulphonic acid)

xvi

Pi p.p.m. p.p.b. PPi RNA r.p.m. SDS Tris

List of abbreviations

inorganic phosphate parts per million parts per billion inorganic pyrophosphate ribonucleic acid revolutions per minute sodium dodecyl sulphate 2-amino-2-hydroxymethylpropane-1,3-diol

1 Basic principles K. WILSON

1.1 1.2 1.3 1.4 1.5 1.6

Biochemical and molecular biology studies Units of measurement Weak electrolytes Quantitative biochemical measurements Safety in the laboratory Suggestions for further reading

1.1 BIOCHEMICAL AND MOLECULAR BIOLOGY STUDIES 1.1.1 Aims of laboratory investigations Biochemistry involves the study of the chemical processes that occur in living organisms with the ultimate aim of understanding the nature of life in molecular terms. Biochemical studies rely on the availability of appropriate analytical techniques and on the application of these techniques to the advancement of knowledge of the nature of, and relationships between, biological molecules, especially proteins and nucleic acids, and cellular function. In recent years huge advances have been made in our understanding of gene structure and expression and in the application of techniques such as mass spectrometry to the study of protein structure and function. The Human Genome Project in particular has been the stimulus for major developments in our understanding of many human diseases especially cancer and for the identification of strategies that might be used to combat these diseases. The discipline of molecular biology overlaps with that of biochemistry and in many respects the aims of the two disciplines complement each other. Molecular biology is focussed on the molecular understanding of the processes of replication, transcription and translation of genetic material whereas biochemistry exploits the techniques and findings of molecular biology to advance our understanding of such cellular processes as cell signalling and apoptosis. The result is that the two disciplines now have the opportunity to address issues such as:

• • 1

the structure and function of the total protein component of the cell (proteomics) and of all the small molecules in the cell (metabolomics); the mechanisms involved in the control of gene expression;

2

Basic principles

• • •

the identification of genes associated with a wide range of human diseases; the development of gene therapy strategies for the treatment of human diseases; the characterisation of the large number of ‘orphan’ receptors, whose physiological role and natural agonist are currently unknown, present in the human genome and their exploitation for the development of new therapeutic agents; the identification of novel disease-specific markers for the improvement of clinical diagnosis; the engineering of cells, especially stem cells, to treat human diseases; the understanding of the functioning of the immune system in order to develop strategies for the protection against invading pathogens; the development of our knowledge of the molecular biology of plants in order to engineer crop improvements, pathogen resistance and stress tolerance; the application of molecular biology techniques to the nature and treatment of bacterial, fungal and viral diseases.

• • • • •

The remaining chapters in this book address the major experimental strategies and analytical techniques that are routinely used to address issues such as these.

1.1.2 Experimental design Advances in biochemistry and molecular biology, as in all the sciences, are based on the careful design, execution and data analysis of experiments designed to address specific questions or hypotheses. Such experimental design involves a discrete number of compulsory stages:

• • • • • • • •

the identification of the subject for experimental investigation; the critical evaluation of the current state of knowledge (the ‘literature’) of the chosen subject area noting the strengths and weaknesses of the methodologies previously applied and the new hypotheses which emerged from the studies; the formulation of the question or hypothesis to be addressed by the planned experiment; the careful selection of the biological system (species, in vivo or in vitro) to be used for the study; the identification of the variable that is to be studied; the consideration of the other variables that will need to be controlled so that the selected variable is the only factor that will determine the experimental outcome; the design of the experiment including the statistical analysis of the results, careful evaluation of the materials and apparatus to be used and the consequential potential safety aspects of the study; the execution of the experiment including appropriate calibrations and controls, with a carefully written record of the outcomes; the replication of the experiment as necessary for the unambiguous analysis of the outcomes;

3

1.2 Units of measurement



the evaluation of the outcomes including the application of appropriate statistical tests to quantitative data where applicable; the formulation of the main conclusions that can be drawn from the results; the formulation of new hypotheses and of future experiments that emerge from the study.

• •

The results of well-designed and analysed studies are finally published in the scientific literature after being subject to independent peer review, and one of the major challenges facing professional biochemists and molecular biologists is to keep abreast of current advances in the literature. Fortunately, the advent of the web has made access to the literature easier than it once was.

1.2 UNITS OF MEASUREMENT 1.2.1 SI units The French Syste`me International d0 Unite´s (the SI system) is the accepted convention for all units of measurement. Table 1.1 lists basic and derived SI units. Table 1.2 lists numerical values for some physical constants in SI units. Table 1.3 lists the commonly used prefixes associated with quantitative terms. Table 1.4 gives the interconversion of non-SI units of volume.

1.2.2 Molarity – the expression of concentration In practical terms one mole of a substance is equal to its molecular mass expressed in grams, where the molecular mass is the sum of the atomic masses of the constituent atoms. Note that the term molecular mass is preferred to the older term molecular weight. The SI unit of concentration is expressed in terms of moles per cubic metre (mol m3) (see Table 1.1). In practice this is far too large for normal laboratory purposes and a unit based on a cubic decimetre (dm3, 103 m) is preferred. However, some textbooks and journals, especially those of North American origin, tend to use the older unit of volume, namely the litre and its subunits (see Table 1.4) rather than cubic decimetres. In this book, volumes will be expressed in cubic decimetres or its smaller counterparts (Table 1.4). The molarity of a solution of a substance expresses the number of moles of the substance in one cubic decimetre of solution. It is expressed by the symbol M. It should be noted that atomic and molecular masses are both expressed in daltons (Da) or kilodaltons (kDa), where one dalton is an atomic mass unit equal to onetwelfth of the mass of one atom of the 12C isotope. However, biochemists prefer to use the term relative molecular mass (Mr). This is defined as the molecular mass of a substance relative to one-twelfth of the atomic mass of the 12C isotope. Mr therefore has no units. Thus the relative molecular mass of sodium chloride is 23 (Na) plus

4

Basic principles

Table 1.1 SI units – basic and derived units SI unit

Symbol (basic SI units)

Definition of SI unit

Equivalent in SI units

Length

metre

m

Mass

kilogram

kg

Time

second

s

Electric current

ampere

A

Temperature

kelvin

K

Luminous intensity

candela

cd

Amount of substance

mole

mol

Force

newton

N

kg m s2

J m1

Energy, work, heat

joule

J

kg m2 s2

Nm

Power, radiant flux

watt

W

kg m2 s3

J s1

Electric charge, quantity

coulomb

C

As

J V1

Electric potential difference

volt

V

kg m2 s3A1

J C1

Electric resistance

ohm

O

kg m2 s3A2

V A1

Pressure

pascal

Pa

kg m1 s2

N m2

Frequency

hertz

Hz

s1

Magnetic flux density

tesla

T

kg s2 A1

Area

square metre

m2

Volume

cubic metre

m3

Density

kilogram per cubic metre

kg m3

Concentration

mole per cubic metre

mol m3

Quantity Basic units

Derived units

Other units based on SI

V s m2

5

1.2 Units of measurement

Table 1.2 SI units – conversion factors for non-SI units Unit

Symbol

SI equivalent

Avogadro constant

L or NA

6.022  1023 mol1

Faraday constant

F

9.648  104 C mol1

Planck constant

h

6.626  1034 J s

Universal or molar gas constant

R

8.314 J K1 mol1 22.41 dm3 mol1

Molar volume of an ideal gas at s.t.p. c

2.997  108 m s1

calorie

cal

4.184 J

erg

erg

107 J

electron volt

eV

1.602  1019 J

atmosphere

atm

101 325 Pa

bar

bar

105 Pa

millimetres of Hg

mm Hg

133.322 Pa

centigrade



C

(t  C þ 273.15) K

Fahrenheit



F

(t  F – 32)5/9 þ 273.15 K

Velocity of light in a vacuum Energy

Pressure

Temperature

Length A˚ngstro¨m



1010 m

inch

in

0.0254 m

lb

0.4536 kg

Mass pound Note: s.t.p., standard temperature and pressure.

35.5 (Cl) i.e. 58.5, so that one mole is 58.5 grams. If this was dissolved in water and adjusted to a total volume of 1 dm3 the solution would be one molar (1 M). Biological substances are most frequently found at relatively low concentrations and in in vitro model systems the volumes of stock solutions regularly used for experimental purposes are also small. The consequence is that experimental solutions are usually in the mM, mM and nM range rather than molar. Table 1.5 shows the interconversion of these units.

6

Basic principles

Table 1.3 Common unit prefixes associated with quantitative terms Multiple

Prefix

Symbol

Multiple

Prefix

Symbol

1024

yotta

Y

101

deci

d

1021

zetta

Z

102

centi

c

1018

exa

E

103

milli

m

6

15

peta

P

10

micro

m

1012

tera

T

109

nano

n

109

giga

G

1012

pico

p

106

mega

M

1015

femto

f

10

3

18

kilo

k

10

atto

a

102

hecto

h

1021

zepto

z

101

deca

da

1024

yocto

y

10

Table 1.4 Interconversion of non-SI and SI units of volume Non-SI unit

Non-SI subunit

SI subunit

SI unit

1 litre (l)

103 ml

¼ 1 dm3

¼ 103 m3

1 millilitre (ml)

1 ml

¼ 1 cm3

¼ 106 m3

1 microlitre (ml)

103 ml

¼ 1 mm3

¼ 109 m3

1 nanolitre (nl)

106 ml

¼ 1 nm3

¼ 1012 m3

Table 1.5 Interconversion of mol, mmol and mmol in different volumes to give different concentrations Molar (M)

Millimolar (mM)

Micromolar (mM)

1 mol dm3

1 mmol dm3

1 mmol dm3

1 mmol cm3

1 mmol cm3

1 nmol cm3

1 mmol mm3

1 nmol mm3

1 pmol mm3

1.3 WEAK ELECTROLYTES 1.3.1 The biochemical importance of weak electrolytes Many molecules of biochemical importance are weak electrolytes in that they are acids or bases that are only partially ionised in aqueous solution. Examples include

7

1.3 Weak electrolytes

the amino acids, peptides, proteins, nucleosides, nucleotides and nucleic acids. It also includes the reagents used in the preparation of buffers such as ethanoic (acetic) acid and phosphoric acid. The biochemical function of many of these molecules is dependent upon their precise state of ionisation at the prevailing cellular or extracellular pH. The catalytic sites of enzymes, for example, contain functional carboxyl and amino groups, from the side chains of constituent amino acids in the protein chain, which need to be in a specific ionised state to enable the catalytic function of the enzyme to be realised. Before the ionisation of these compounds is discussed in detail, it is necessary to appreciate the importance of the ionisation of water.

1.3.2 Ionisation of weak acids and bases One of the most important weak electrolytes is water since it ionises to a small extent to give hydrogen ions and hydroxyl ions. In fact there is no such species as a free hydrogen ion in aqueous solution as it reacts with water to give a hydronium ion (H3Oþ): H2 OÐHþ þ HO Hþ þ H2 OÐH3 Oþ Even though free hydrogen ions do not exist it is conventional to refer to them rather than hydronium ions. The equilibrium constant (Keq) for the ionisation of water has a value of 1.8  1016 at 24  C: Keq ¼

½Hþ ½OH  ¼ 1:8  1016 ½H2 O

ð1:1Þ

The molarity of pure water is 55.6 M. This can be incorporated into a new constant, Kw: 1:8  1016  55:6 ¼ ½Hþ ½HO  ¼ 1:0  1014 ¼ Kw

ð1:2Þ

Kw is known as the autoprotolysis constant of water and does not include an expression for the concentration of water. Its numerical value of exactly 1014 relates specifically to 24  C. At 0  C Kw has a value of 1.14  1015 and at 100  C a value of 5.45  1013. The stoichiometry in equation 1.2 shows that hydrogen ions and hydroxyl ions are produced in a 1 : 1 ratio, hence both of them must be present at a concentration of 1.0  107 M. Since the So¨rensen definition of pH is that it is equal to the negative logarithm of the hydrogen ion concentration, it follows that the pH of pure water is 7.0. This is the definition of neutrality. Ionisation of carboxylic acids and amines As previously stressed, many biochemically important compounds contain a carboxyl group (-COOH) or a primary (RNH2), secondary (R2NH) or tertiary (R3N) amine which can donate or accept a hydrogen ion on ionisation. The tendency of a weak acid, generically represented as HA, to ionise is expressed by the equilibrium reaction: HA Ð Hþ þ A weak acid conjugate base ðanionÞ

8

Basic principles

This reversible reaction can be represented by an equilibrium constant, Ka, known as the acid dissociation constant (equation 1.3). Numerically, it is very small. Ka ¼

½Hþ ½A  ½HA

ð1:3Þ

Note that the ionisation of a weak acid results in the release of a hydrogen ion and the conjugate base of the acid, both of which are ionic in nature. Similarly, amino groups (primary, secondary and tertiary) as weak bases can exist in ionised and unionised forms and the concomitant ionisation process is represented by an equilibrium constant, Kb (equation 1.4):  RNH2 þ H2 O Ð RNHþ 3 þ HO weak base conjugate acid ðprimary amineÞ ðsubstituted ammonium ionÞ

Kb ¼

 ½RNHþ 3 ½HO  ½RNH2 ½H2 O

ð1:4Þ

In this case, the non-ionised form of the base abstracts a hydrogen ion from water to produce the conjugate acid that is ionised. If this equation is viewed from the reverse direction it is of a similar format to that of equation 1.3. Equally, equation 1.3 viewed in reverse is similar in format to equation 1.4. A specific and simple example of the ionisation of a weak acid is that of acetic (ethanoic) acid, CH3COOH: CH3 COOH Ð CH3 COO þ Hþ acetic acid acetate anion Acetic acid and its conjugate base, the acetate anion, are known as a conjugate acid– base pair. The acid dissociation constant can be written in the following way: Ka ¼

½CH3 COO ½Hþ  ½conjugate base½Hþ  ¼ ½CH3 COOH ½weak acid

ð1:5aÞ

Ka has a value of 1.75  105 M. In practice it is far more common to express the Ka value in terms of its negative logarithm (i.e. logKa) referred to as pKa. Thus in this case pKa is equal to 4.75. It can be seen from equation 1.3 that pKa is numerically equal to the pH at which 50% of the acid is protonated (unionised) and 50% is deprotonated (ionised). It is possible to write an expression for the Kb of the acetate anion as a conjugate base:  CH3 COO 3 þ H2 OÐCH3 COOH þ HO

Kb ¼

½CH3 COOH½HO  ½weak acid½OH  ¼ ½CH3 COO  ½conjugate base

ð1:5bÞ

Kb has a value of 1.77  1010 M, hence its pKb (i.e. log Kb) ¼ 9.25. Multiplying these two expressions together results in the important relationship: Ka  Kb ¼ ½Hþ ½OH  ¼ Kw ¼ 1:0  1014 at 24  C

9

1.3 Weak electrolytes

Table 1.6 pKa values of some acids and bases that are commonly used as buffer solutions Acid or base

pKa

Acetic acid

4.75

Barbituric acid

3.98

Carbonic acid

6.10, 10.22

Citric acid

3.10, 4.76, 5.40

Glycylglycine

3.06, 8.13

Hepesa

7.50

Phosphoric acid

1.96, 6.70, 12.30

Phthalic acid

2.90, 5.51

Pipesa

6.80

Succinic acid

4.18, 5.56

Tartaric acid

2.96, 4.16

Trisa

8.14

Note: aSee list of abbreviations at the front of the book.

hence pKa þ pKb ¼ pKw ¼ 14

ð1:6Þ

This relationship holds for all acid–base pairs and enables one pKa value to be calculated from knowledge of the other. Biologically important examples of conjugate acid–base pairs are lactic acid/lactate, pyruvic acid/pyruvate, carbonic acid/bicarbonate and ammonium/ammonia. In the case of the ionisation of weak bases the most common convention is to quote the Ka or the pKa of the conjugate acid rather than the Kb or pKb of the weak base itself. Examples of the pKa values of some weak acids and bases are given in Table 1.6. Remember that the smaller the numerical value of pKa the stronger the acid (more ionised) and the weaker its conjugate base. Weak acids will be predominantly unionised at low pH values and ionised at high values. In contrast, weak bases will be predominantly ionised at low pH values and unionised at high values. This sensitivity to pH of the state of ionisation of weak electrolytes is important both physiologically and in in vitro biochemical studies employing such analytical techniques as electrophoresis and ion-exchange chromatography. Ionisation of polyprotic weak acids and bases Polyprotic weak acids and bases are capable of donating or accepting more than one hydrogen ion. Each ionisation stage can be represented by a Ka value using the convention that Ka1 refers to the acid with the most ionisable hydrogen atoms and Kan the acid with the least number of ionisable hydrogen atoms. One of the most important

10

Basic principles

biochemical examples is phosphoric acid, H3PO4, as it is widely used as the basis of a buffer in the pH region of 6.70 (see below): H3 PO4 ÐHþ þ H2 PO 4 þ H2 PO 4 ÐH HPO42 ÐHþ

þ þ

HPO42 PO43

pKa1 1:96 pKa2 6:70 pKa3 12:30

Example 1 CALCULATION OF pH AND THE EXTENT OF IONISATION OF A WEAK ELECTROLYTE Question Calculate the pH of a 0.01 M solution of acetic acid and its fractional ionisation given that its Ka is 1.75  105. Answer To calculate the pH we can write: Ka ¼

½acetate ½Hþ  ¼ 1:75  105 ½acetic acid

Since acetate and hydrogen ions are produced in equal quantities, if x ¼ the concentration of each then the concentration of unionised acetic acid remaining will be 0.01  x. Hence: ðxÞðxÞ 0:01  x 1:75  107  1:75  105 x ¼ x2

1:75  105 ¼

This can now be solved either by use of the quadratic formula or, more easily, by neglecting the x term since it is so small. Adopting the latter alternative gives: x2 ¼ 1:75  107 hence x ¼ 4:18  104 M hence pH ¼ 3:38 The fractional ionisation (a) of the acetic acid is defined as the fraction of the acetic acid that is in the form of acetate and is therefore given by the equation: ½acetate ½acetate þ ½acetic acid 4:18  104 ¼ 4:18  104 þ 0:01  4:18  104 4:18  104 ¼ 0:01 ¼ 4:18  102 or 4:18%



Thus the majority of the acetic acid is present as the unionised form. If the pH is increased above 3.38 the proportion of acetate present will increase in accordance with the Henderson–Hasselbalch equation.

11

1.3 Weak electrolytes

1.3.3 Buffer solutions A buffer solution is one that resists a change in pH on the addition of either acid or base. They are of enormous importance in practical biochemical work as so many biochemical molecules are weak electrolytes so that their ionic status varies with pH so there is a need to stabilise this ionic status during the course of a practical experiment. In practice, a buffer solution consists of an aqueous mixture of a weak acid and its conjugate base. The conjugate base component would neutralise any hydrogen ions generated during an experiment whilst the unionised acid would neutralise any base generated. The Henderson–Hasselbalch equation is of central importance in the preparation of buffer solutions. It can be expressed in a variety of forms. For a buffer based on a weak acid: pH ¼ pKa þ log

½conjugate base ½weak acid

ð1:7Þ

or pH ¼ pKa þ log

½ionised form ½unionised form

For a buffer based on the conjugate acid of a weak base: pH ¼ pKa þ log

½weak base ½conjugate acid

ð1:8Þ

or pH ¼ pKa þ log

½unionised form ½ionised form

Table 1.6 lists some weak acids and bases commonly used in the preparation of buffer solutions. Phosphate, Hepes and Pipes are commonly used because of their optimum pH being close to 7.4. The buffer action and pH of blood is illustrated in Example 2 and the preparation of a phosphate buffer is given in Example 3.

Buffer capacity It can be seen from the Henderson–Hasselbalch equations that when the concentration (or more strictly the activity) of the weak acid and base is equal, their ratio is one and their logarithm zero so that pH ¼ pKa. The ability of a buffer solution to resist a change in pH on the addition of strong acid or alkali is expressed by its buffer capacity (b). This is defined as the amount (moles) of acid or base required to change the pH by one unit i.e. b¼

db da ¼ dpH dpH

ð1:9Þ

where db and da are the amount of base and acid respectively and dpH is the resulting change in pH. In practice, b is largest within the pH range pKa  1.

12

Basic principles

Example 2 BUFFER ACTION AND pH OF BLOOD The normal pH of blood is 7.4 and is maintained at this value by buffer action in particular by the action of HCO 3 and CO2 resulting from gaseous CO2 dissolved in blood and the resulting ionisation of carbonic acid: CO2 þ H2 OÐH2 CO3 H2 CO3 ÐHþ þ HCO 3 It is possible to calculate an overall equilibrium constant (Keq) for these two consecutive reactions and to incorporate the concentration of water (55.6 M) into the value: Keq ¼

½Hþ ½HCO 3 ¼ 7:95  107 ½CO2 

hence pKeq ¼ 6:1

Rearranging: pH ¼ pKeq þ log

½HCO 3 ½CO2 

When the pH of blood falls due to the metabolic production of Hþ, these equilibria shift in favour of increased production of H2CO3 that in turn ionises to give increased CO2 that is then expired. When the pH of blood rises, more HCO 3 is produced and breathing is adjusted to retain more CO2 in the blood thus maintaining blood pH. Some disease states may change this pH causing either acidosis or alkalosis and this may cause serious problems and in extreme cases, death. For example, obstructive lung disease may cause acidosis and hyperventilation alkalosis. Clinical biochemists routinely monitor patient’s acid–base balance in blood, in particular the ratio of HCO 3 and CO2.  Reference ranges for these at pH 7.4 are ½HCO3  ¼ 18:0  26:0 mM and pCO2 ¼ 4.6–6.9 kPa, which gives ½CO2  in the range of 1.20 mM.

Question A patient suffering from acidosis had a blood pH of 7.15 and ½CO2  of 1.15 mM. What was the patient’s ½HCO 3  and what are the implications of its value to the buffer capacity of the blood? Answer Applying the above equation we get: ½HCO 3 ½CO2  ½HCO 3 7:15 ¼ 6:10 þ log 1:15 ½HCO 3 1:05 ¼ log 1:15 pH ¼ pKeq þ log

Taking the antilog of this equation we get 11:22 ¼ ½HCO 3 =1:15 Therefore ½HCO  ¼ 12:90 mM indicating that the bicarbonate concentration in 3 the patient’s blood had decreased by 11.1 mM i.e. 47% thereby severely reducing the buffer capacity of the patient’s blood so that any further significant production of acid would have serious implications for the patient.

13

1.3 Weak electrolytes

Example 3 PREPARATION OF A PHOSPHATE BUFFER Question How would you prepare 1 dm3 of 0.1 M phosphate buffer, pH 7.1, given that pKa2 for phosphoric acid is 6.8 and that the atomic masses for Na, P and O are 23, 31 and 16 daltons respectively? Answer The buffer will be based on the ionisation: þ 2 H2 PO4 ÐHPO2 4 þ H pKa ¼ 6:8

and will therefore involve the use of solid sodium dihydrogen phosphate (NaH2PO4) and disodium hydrogen phosphate (Na2HPO4). Applying the appropriate Henderson–Hasselbalch equation (equation 1.7) gives: 7:1 ¼ 6:8 þ log 0:3 ¼ log 2:0 ¼

½HPO2 4  ½H2 PO 4

½HPO2 4  ½H2 PO 4

½HPO2 4  ½H2 PO4 

Since the total concentration of the two species needs to be 0.1 M it follows that  ½HPO2 4  must be 0.067 M and ½H2 PO4  0.033 M. Their molecular masses are 142 and 120 daltons respectively; hence the weight of each required is 0.067  143 ¼ 9.46 g (Na2HPO4) and 0.033  120 ¼ 4.00 g (NaH2PO4). These weights would be dissolved in approximately 800 cm3 pure water, the pH measured and adjusted as necessary, and the volume finally made up to 1 dm3.

Selection of a buffer When selecting a buffer for a particular experimental study, several factors should be taken into account:

• • •

select the one with a pKa as near as possible to the required experimental pH and within the range pKa  1, as outside this range there will be too little weak acid or weak base present to maintain an effective buffer capacity; select an appropriate concentration of buffer to have adequate buffer capacity for the particular experiment. Buffers are most commonly used in the range 0.05–0.5 M; ensure that the selected buffer does not form insoluble complexes with any anions or cations essential to the reaction being studied (phosphate buffers tend to precipitate polyvalent cations, for example, and may be a metabolite or inhibitor of the reaction);

14



Basic principles

ensure that the proposed buffer has other desirable properties such as being non-toxic, able to penetrate membranes, and does not absorb in the visible or ultraviolet region.

1.3.4 Measurement of pH – the pH electrode The pH electrode is an example of an ion-selective electrode (ISE) that responds to one specific ion in solution, in this case the hydrogen ion. The electrode consists of a thin glass porous membrane sealed at the end of a hard glass tube containing 0.1 M hydrochloric acid into which is immersed a silver wire coated with silver chloride. This silver/silver chloride electrode acts as an internal reference that generates a constant potential. The porous membrane is typically 0.1 mm thick, the outer and inner 10 nm consisting of a hydrated gel layer containing exchange-binding sites for hydrogen or sodium ions. On the inside of the membrane the exchange sites are predominantly occupied by hydrogen ions from the hydrochloric acid whilst on the outside the exchange sites are occupied by sodium and hydrogen ions. The bulk of the membrane is a dry silicate layer in which all exchange sites are occupied by sodium ions. Most of the coordinated ions in both hydrated layers are free to diffuse into the surrounding solution whilst hydrogen ions in the test solution can diffuse in the opposite direction replacing bound sodium ions in a process called ion-exchange equilibrium. Any other types of cations present in the test solution are unable to bind to the exchange sites thus ensuring the high specificity of the electrode. Note that hydrogen ions do not diffuse across the dry glass layer but sodium ions can. Thus effectively the membrane consists of two hydrated layers containing different hydrogen ion activities separated by a sodium ion transport system. The principle of operation of the pH electrode is based upon the fact that if there is a gradient of hydrogen ion activity across the membrane this will generate a potential the size of which is determined by the hydrogen ion gradient across the membrane. Moreover, since the hydrogen ion concentration on the inside is constant (due to the use of 0.1 M hydrochloric acid) the observed potential is directly dependent upon the hydrogen ion concentration of the test solution. In practice a small junction or asymmetry potential (E*) is also created in part as a result of linking the glass electrode to a reference electrode. The observed potential across the membrane is therefore given by the equation: E ¼ E þ 0:059 pH Since the precise composition of the porous membrane varies with time so too does the asymmetry potential. This contributes to the need for the frequent recalibration of the electrode commonly using two standard buffers of known pH. For each 10-fold change in the hydrogen ion concentration across the membrane (equivalent to a pH change of 1 in the test solution) there will be a potential difference change of 59.2 mV across the membrane. The sensitivity of pH measurements is influenced by the prevailing absolute temperature. The most common forms of pH electrode are the glass electrode (Fig. 1.1a) and the combination electrode (Fig. 1.1b) which contains an in-built calomel reference electrode.

15

1.3 Weak electrolytes

(a)

(b) Shielded insulated cable

Glass stem Ag/AgCI internal electrode

Inner electrode (Ag/AgCl wire)

‘External’ reference electrode

Salt bridge solution (usually KCI)

Porous plug HCI solution (0.1 M)

HCI (0.1 M) Thin-walled glass bulb

Glass membrane

Fig. 1.1 Common pH electrodes: (a) glass electrode; (b) combination electrode.

1.3.5 Other electrodes Electrodes exist for the measurement of many other ions such as Liþ, Kþ, Naþ, Ca2þ, þ Cl and NO 3 in addition to H . The principle of operation of these ion-selective electrodes (ISEs) is very similar to that of the pH electrode in that permeable membranes specific for the ion to be measured are used. They lack absolute specificity and their selectivity is expressed by a selectivity coefficient that expresses the ratio of the response to the competing ions relative to that for the desired ion. Most ISEs have a good linear response to the desired ion and a fast response time. Biosensors are derived from ISEs by incorporating an immobilised enzyme onto the surface of the electrode. An important example is the glucose electrode that utilises glucose oxidase to oxidise glucose (Section 15.3.5) in the test sample to generate hydrogen peroxide that is reduced at the anode causing a current to flow that is then measured amperometrically. Micro sensor versions of these electrodes are of great importance in clinical biochemistry laboratories (Section 16.2.2). The oxygen electrode measures molecular oxygen in solution rather than an ion. It works by reducing the oxygen at the platinum cathode that is separated from the test solution by an oxygen-permeable membrane. The electrons consumed in the process are compensated by the generation of electrons at the silver anode hence the oxygen tension in the test sample is directly proportional to the current flow between the two electrodes. Optical sensors use the enzyme luciferase (Section 15.3.2) to measure ATP by generating light and detecting it with a photomultiplier.

16

Basic principles

1.4 QUANTITATIVE BIOCHEMICAL MEASUREMENTS 1.4.1 Analytical considerations and experimental error Many biochemical investigations involve the quantitative determination of the concentration and/or amount of a particular component (the analyte) present in a test sample. For example, in studies of the mode of action of enzymes, trans-membrane transport and cell signalling, the measurement of a particular reactant or product is investigated as a function of a range of experimental conditions and the data used to calculate kinetic or thermodynamic constants. These in turn are used to deduce details of the mechanism of the biological process taking place. Irrespective of the experimental rationale for undertaking such quantitative studies, all quantitative experimental data must first be questioned and validated in order to give credibility to the derived data and the conclusions that can be drawn from them. This is particularly important in the field of clinical biochemistry in which quantitative measurements on a patient’s blood and urine samples are used to aid a clinical diagnosis and monitor the patient’s recovery from a particular disease. This requires that the experimental data be assessed and confirmed as an acceptable estimate of the ‘true’ values by the application of one or more standard statistical tests. Evidence of the validation of quantitative data by the application of such tests is required by the editors of refereed journals for the acceptance for publication of draft research papers. The following sections will address the theoretical and practical considerations behind these statistical tests. Selecting an analytical method The nature of the quantitative analysis to be carried out will require a decision to be taken on the analytical technique to be employed. A variety of methods may be capable of achieving the desired analysis and the decision to select one may depend on a variety of issues. These include:

• • • • • • • •

the availability of specific pieces of apparatus; the precision, accuracy and detection limits of the competing methods; the precision, accuracy and detection limit acceptable for the particular analysis; the number of other compounds present in the sample that may interfere with the analysis; the potential cost of the method (particularly important for repetitive analysis); the possible hazards inherent in the method and the appropriate precautions needed to minimise risk; the published literature method of choice; personal preference. The most common biochemical quantitative analytical methods are visible, ultraviolet and fluorimetric spectrophotometry, chromatographic techniques such as HPLC and GC coupled to spectrophotometry or mass spectrometry, ion-selective electrodes and

17

1.4 Quantitative biochemical measurements

immunological methods such as ELISA. Once a method has been selected it must be developed and/or validated using the approaches discussed in the following sections. If it is to be used over a prolonged period of time, measures will need to be put in place to ensure that there is no drift in response. This normally entails an internal quality control approach using reference test samples covering the analytical range that are measured each time the method is applied to test samples. Any deviation from the known values for these reference samples will require the whole batch of test samples to be re-assayed. The nature of experimental errors Every quantitative measurement has some uncertainty associated with it. This uncertainty is referred to as the experimental error which is a measure of the difference between the ‘true’ value and the experimental value. The ‘true’ value normally remains unknown except in cases where a standard sample (i.e. one of known composition) is being analysed. In other cases it has to be estimated from the analytical data by the methods that will be discussed later. The consequence of the existence of experimental errors is that the measurements recorded can be accepted with a high, medium or low degree of confidence depending upon the sophistication of the technique employed, but seldom, if ever, with absolute certainty. Experimental error may be of two kinds: systematic error and random error. Systematic error (also called determinate error) Systematic errors are consistent errors that can be identified and either eliminated or reduced. They are most commonly caused by a fault or inherent limitation in the apparatus being used but may also be influenced by poor experimental design. Common causes include the misuse of manual or automatic pipettes, the incorrect preparation of stock solutions, and the incorrect calibration and use of pH meters. They may be constant (i.e. have a fixed value irrespective of the amount of test analyte present in the test sample under investigation) or proportional (i.e. the size of the error is dependent upon the amount of test analyte present). Thus the overall effect of the two types in a given experimental result will differ. Both of these types of systematic error have three common causes:

• • •

Analyst error: This is best minimised by good training and/or by the automation of the method. Instrument error: This may not be eliminable and hence alternative methods should be considered. Instrument error may be electronic in origin or may be linked to the matrix of the sample. Method error: This can be identified by comparison of the experimental data with that obtained by the use of alternative methods. Identification of systematic errors Systematic errors are always reproducible and may be positive or negative i.e. they increase or decrease the experimental value relative to the ‘true’ value. The crucial

18

Basic principles

characteristic, however, is that their cause can be identified and corrected. There are four common means of identifying this type of error:



• • •

Use of a ‘blank’ sample: This is a sample that you know contains none of the analyte under test so that if the method gives a non-zero answer then it must be responding in some unintended way. The use of blank samples is difficult in cases where the matrix of the test sample is complex, for example, serum. Use of a standard reference sample: This is a sample of the test analyte of known composition so the method under evaluation must reproduce the known answer. Use of an alternative method: If the test and alternative methods give different results for a given test sample then at least one of the methods must have an inbuilt flaw. Use of an external quality assessment sample: This is a standard reference sample that is analysed by other investigators based in different laboratories employing the same or different methods. Their results are compared and any differences in excess of random errors (see below) identify the systematic error for each analyst. The use of external quality assessment schemes is standard practice in clinical biochemistry laboratories (see Section 16.2.3). Random error (also called indeterminate error) Random errors are caused by unpredictable and often uncontrollable inaccuracies in the various manipulations involved in the method. Such errors may be variably positive or negative and are caused by such factors as difficulty in the process of sampling, random electrical ‘noise’ in an instrument or by the analyst being inconsistent in the operation of the instrument or in recording readings from it. Standard operating procedures The minimisation of both systematic and random errors is essential in cases where the analytical data are used as the basis for a crucial diagnostic or prognostic decision as is common, for example, in routine clinical biochemical investigations and in the development of new drugs. In such cases it is normal for the analyses to be conducted in accordance with standard operating procedures (SOPs) that define in full detail the quality of the reagents, the preparation of standard solutions, the calibration of instruments and the methodology of the actual analytical procedure which must be followed.

1.4.2 Assessment of the performance of an analytical method All analytical methods can be characterised by a number of performance indicators that define how the selected method performs under specified conditions. Knowledge of these performance indicators allows the analyst to decide whether or not the method is acceptable for the particular application. The major performance indicators are:



Precision (also called imprecision and variability): This is a measure of the reproducibility of a particular set of analytical measurements on the same sample

19









1.4 Quantitative biochemical measurements

of test analyte. If the replicated values agree closely with each other, the measurements are said to be of high precision (or low imprecision). In contrast, if the values diverge, the measurements are said to be of poor or low precision (or high imprecision). In analytical biochemical work the normal aim is to develop a method that has as high a precision as possible within the general objectives of the investigation. However, precision commonly varies over the analytical range (see below) and over periods of time. As a consequence, precision may be expressed as either within-batch or betweenbatch. Within-batch precision is the variability when the same test sample is analysed repeatedly during the same batch of analyses on the same day. Between-batch precision is the variability when the same test sample is analysed repeatedly during different batches of analyses over a period of time. Since there is more opportunity for the analytical conditions to change for the assessment of between-batch precision, it is the higher of the two types of assessment. Results that are of high precision may nevertheless be a poor estimate of the ‘true’ value (i.e. of low accuracy or high bias) because of the presence of unidentified errors. Methods for the assessment of precision of a data set are discussed below. The term imprecision is preferred in particular by clinical biochemists since they believe that it best describes the variability that occurs in replicated analyses. Accuracy (also called trueness, bias and inaccuracy): This is the difference between the mean of a set of analytical measurements on the same sample of test analyte and the ‘true’ value for the test sample. As previously pointed out, the ‘true’ value is normally unknown except in the case of standard measurements. In other cases accuracy has to be assessed indirectly by use of an internationally agreed reference method and/or by the use of external quality assessment schemes (see above) and/or by the use of population statistics that are discussed below. Detection limit (also called sensitivity): This is the smallest concentration of the test analyte that can be distinguished from zero with a defined degree of confidence. Concentrations below this limit should simply be reported as ‘less than the detection limit’. All methods have their individual detection limits for a given analyte and this may be one of the factors that influence the choice of a specific analytical method for a given study. Thus the Bradford, Lowry and bicinchoninic acid methods for the measurements of proteins have detection limits of 20, 10 and 0.5 mg protein cm3 respectively. In clinical biochemical measurements, sensitivity is often defined as the ability of the method to detect the analyte without giving false negatives (see Section 16.1.2). Analytical range: This is the range of concentrations of the test analyte that can be measured reproducibly, the lower end of the range being the detection limit. In most cases the analytical range is defined by an appropriate calibration curve (see Section 1.4.6). As previously pointed out, the precision of the method may vary across the range. Analytical specificity (also called selectivity): This is a measure of the extent to which other substances that may be present in the sample of test analyte may interfere with the analysis and therefore lead to a falsely high or low value. A simple example is the ability of a method to measure glucose in the presence of other hexoses such as mannose and galactose. In clinical biochemical measurements, selectivity is an index

20

• •

Basic principles

of the ability of the method to give a consistent negative result for known negatives (see Section 16.1.2) Analytical sensitivity: This is a measure of the change in response of the method to a defined change in the quantity of analyte present. In many cases analytical sensitivity is expressed as the slope of a linear calibration curve. Robustness: This is a measure of the ability of the method to give a consistent result in spite of small changes in experimental parameters such as pH, temperature and amount of reagents added. For routine analysis, the robustness of a method is an important practical consideration. These performance indicators are established by the use of well-characterised test and reference analyte samples. The order in which they are evaluated will depend on the immediate analytical priorities, but initially the three most important may be specificity, detection limit and analytical range. Once a method is in routine use, the question of assuring the quality of analytical data by the implementation of quality assessment procedures comes into play.

1.4.3 Assessment of precision After a quantitative study has been completed and an experimental value for the amount and/or concentration of the test analyte in the test sample obtained, the experimenter must ask the question ‘How confident can I be that my result is an acceptable estimate of the ‘true’ value?’ (i.e. is it accurate?). An additional question may be ‘Is the quality of my analytical data comparable with that in the published scientific literature for the particular analytical method?’ (i.e. is it precise?). Once the answers to such questions are known, a result that has a high probability of being correct can be accepted and used as a basis for the design of further studies whilst a result that is subject to unacceptable error can be rejected. Unfortunately it is not possible to assess the precision of a single quantitative determination. Rather, it is necessary to carry out analyses in replicate (i.e. the experiment is repeated several times on the same sample of test analyte) and to subject the resulting data set to some basic statistical tests. If a particular experimental determination is repeated numerous times and a graph constructed of the number of times a particular result occurs against its value, it is normally bell-shaped with the results clustering symmetrically about a mean value. This type of distribution is called a Gaussian or normal distribution. In such cases the precision of the data set is a reflection of random error. However, if the plot is skewed to one side of the mean value, then systematic errors have not been eliminated. Assuming that the data set is of the normal distribution type, there are three statistical parameters that can be used to quantify precision. Standard deviation, coefficient of variation and variance – measures of precision These three statistical terms are alternative ways of expressing the scatter of the values within a data set about the mean, x-, calculated by summing their total value and dividing by the number of individual values. Each term has its individual merit. In all

21

1.4 Quantitative biochemical measurements

three cases the term is actually measuring the width of the normal distribution curve such that the narrower the curve the smaller the value of the term and the higher the precision of the analytical data set. The standard deviation (s) of a data set is a measure of the variability of the population from which the data set was drawn. It is calculated by use of equation 1.10 or 1.11: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxi  xÞ2 s¼ n1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2i  ðxi Þ2 =n s¼ n1

ð1:10Þ

ð1:11Þ

(xi  x-) is the difference between an individual experimental value (xi) and the calculated mean x- of the individual values. Since these differences may be positive or negative, and since the distribution of experimental values about the mean is symmetrical, if they were simply added together they would cancel out each other. The differences are therefore squared to give consistent positive values. To compensate for this, the square root of the resulting calculation has to be taken to obtain the standard deviation. Standard deviation has the same units as the actual measurements and this is one of its attractions. The mathematical nature of a normal distribution curve is such that 68.2% of the area under the curve (and hence 68.2% of the individual values within the data set) is within one standard deviation either side of the mean, 95.5% of the area under the curve is within two standard deviations and 99.7% within three standard deviations. Exactly 95% of the area under the curve falls between the mean and 1.96 standard deviations. The precision (or imprecision) of a data set is commonly expressed as 1 SD of the mean. The term (n  1) is called the degrees of freedom of the data set and is an important variable. The initial number of degrees of freedom possessed by a data set is equal to the number of results (n) in the set. However, when another quantity characterising the data set, such as the mean or standard deviation, is calculated, the number of degrees of freedom of the set is reduced by 1 and by 1 again for each new derivation made. Many modern calculators and computers include programs for the calculation of standard deviation. However, some use variants of equation 1.10 in that they use n as the denominator rather than n  1 as the basis for the calculation. If n is large, greater than 30 for example, then the difference between the two calculations is small, but if n is small, and certainly if it is less than 10, the use of n rather than n  1 will significantly underestimate the standard deviation. This may lead to false conclusions being drawn about the precision of the data set. Thus for most analytical biochemical studies it is imperative that the calculation of standard deviation is based on the use of n  1. The coefficient of variation (CV) (also known as relative standard deviation) of a data set is the standard deviation expressed as a percentage of the mean as shown in equation 1.12.

22

Basic principles

CV ¼

s100% x

ð1:12Þ

Since the mean and standard deviation have the same units, coefficient of variation is simply a percentage. This independence of the unit of measurement allows methods based on different units to be compared. The variance of a data set is the mean of the squares of the differences between each value and the mean of the values. It is also the square of the standard deviation, hence the symbol s2. It has units that are the square of the original units and this makes it appear rather cumbersome which explains why standard deviation and coefficient of variation are the preferred ways of expressing the variability of data sets. The importance of variance will be evident in later discussions of the ways of making a statistical comparison of two data sets. To appreciate the relative merits of standard deviation and coefficient of variation as measures of precision, consider the following scenario. Suppose that two serum samples, A and B, were each analysed 20 times for serum glucose by the glucose oxidase method (see Section 15.3.5) such that sample A gave a mean value of 2.00 mM with a standard deviation of 0.10 mM and sample B a mean of 8.00 mM and a standard deviation of 0.41 mM. On the basis of the standard deviation values it might be concluded that the method had given a better precision for sample A than for B. However, this ignores the absolute values of the two samples. If this is taken into account by calculating the coefficient of variation, the two values are 5.0% and 5.1% respectively showing that the method had shown the same precision for both samples. This illustrates the fact that standard deviation is an acceptable assessment of precision for a given data set but if it is necessary to compare the precision of two or more data sets, particularly ones with different mean values, then coefficient of variation should be used. The majority of well-developed analytical methods have a coefficient of variation within the analytical range of less than 5% and many, especially automated methods, of less than 2%.

1.4.4 Assessment of accuracy Population statistics Whilst standard deviation and coefficient of variation give a measure of the variability of the data set they do not quantify how well the mean of the data set approaches the ‘true’ value. To address this issue it is necessary to introduce the concepts of population statistics and confidence limit and confidence interval. If a data set is made up of a very large number of individual values so that n is a large number, then the mean of the set would be equal to the population mean mu (m) and the standard deviation would equal the population standard deviation sigma (s). Note that Greek letters represent the population parameters and the common alphabet the sample parameters. These two population parameters are the best estimates of the ‘true’ values since they are based on the largest number of individual measurements so that the influence of random errors is minimised. In practice the population parameters are seldom measured for obvious practicality reasons and the sample parameters have

23

1.4 Quantitative biochemical measurements

Example 4 ASSESSMENT OF THE PRECISION OF AN ANALYTICAL DATA SET Question Five measurements of the fasting serum glucose concentration were made on the same sample taken from a diabetic patient. The values obtained were 2.3, 2.5, 2.2, 2.6 and 2.5 mM. Calculate the precision of the data set. Answer Precision is normally expressed either as one standard deviation of the mean or as the coefficient of variation of the mean. These statistical parameters therefore need to be calculated. Mean 2:2 þ 2:3 þ 2:5 þ 2:5 þ 2:6 x¼ ¼ 2:42 mM 5 Standard deviation Using both equations (1.10) and (1.11) to calculate the value of s: xi

xi–x

(xi–x)2

2.2 2.3 2.5 2.5 2.6 Sxi12.1

0.22 0.12 þ0.08 þ0.08 þ0.18 S0.00

0.0484 0.0144 0.0064 0.0064 0.0324 S0.1080

xi2 4.84 5.29 6.25 6.25 6.75 S29.39

Using equation 1.10 s¼

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0:108=4 ¼ 0:164 mM

Using equation 1.11 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 29:39  ð12:1Þ2 =5 29:39  29:28 ¼ ¼ 0:166 mM s¼ 4 4 Coefficient of variation Using equation 1.12 0:165  100% 2:42 ¼ 6:82%

CV ¼

Discussion In this case it is easier to appreciate the precision of the data set by considering the coefficient of variation. The value 6.82% is moderately high for this type of analysis. Automation of the method would certainly reduce it by at least half. Note that it is legitimate to quote the answers to these calculations to one more digit than was present in the original data set. In practice, it is advisable to carry out the statistical analysis on a far larger data set than that presented in this example.

24

Basic principles

a larger uncertainty associated with them. The uncertainty of the sample mean deviating from the population mean decreases in the proportion of the reciprocal of the square root of the number of values in the data set i.e. 1/√n. Thus to decrease the uncertainty by a factor of two the number of experimental values would have to be increased four-fold and for a factor of 10 the number of measurements would need to be increased 100-fold. The nature of this relationship again emphasises the importance of evaluating the acceptable degree of uncertainty of the experimental result before the design of the experiment is completed and the practical analysis begun. Modern automated analytical instruments recognise the importance of multiple results by facilitating repeat analyses at maximum speed. It is good practice to report the number of measurements on which the mean and standard deviation are based as this gives a clear indication of the quality of the calculated data. Confidence intervals, confidence limits and the Student’s t factor Accepting that the population mean is the best estimate of the ‘true’ value, the question arises ‘How can I relate my experimental sample mean to the population mean?’ The answer is by using the concept of confidence. Confidence level expresses the level of confidence, expressed as a percentage, that can be attached to the data. Its value has to be set by the experimenter to achieve the objectives of the study. Confidence interval is a mathematical statement relating the sample mean to the population mean. A confidence interval gives a range of values about the sample mean within which there is a given probability (determined by the confidence level) that the population mean lies. The relationship between the two means is expressed in terms of the standard deviation of the data set, the square root of the number of values in the data set and a factor known as Student’s t (equation 1.13): ts ¼xp n

ð1:13Þ

where x is the measured mean, m is the population mean, s is the measured standard deviation, n is the number of measurements and t is the Student’s t factor. The term s/√n is known as the standard error of the mean and is a measure of the precision of the sample mean. Unlike standard deviation, standard error depends on the sample size and will fall as the sample size increases. The two measurements are sometimes confused, but in essence, standard deviation should be used if we want to know how widely scattered are the measurements and standard error should be used if we want to indicate the uncertainty around a mean measurement. Confidence level can be set at any value up to 100%. For example, it may be that a confidence level of only 50% would be acceptable for a particular experiment. However, a 50% level means that that there is a one in two chance that the sample mean is not an acceptable estimate of the population mean. In contrast, the choice of a 95% or 99% confidence level would mean that there was only a one in 20 or a one in 100 chance respectively that the best estimate had not been achieved. In practice, most analytical biochemists choose a confidence level in the range 90–99% and most commonly 95%. Student’s t is a way of linking probability with the size of the data set and is used in a number of statistical tests. Student’s t values for varying numbers in a data set

25

1.4 Quantitative biochemical measurements

Table 1.7 Values of Student’s t Degrees of freedom

Confidence level (%) 50

90

95

98

99

99.9

2

0.816

2.920

4.303

6.965

9.925

31.598

3

0.765

2.353

3.182

4.541

5.841

12.924

4

0.741

2.132

2.776

3.747

4.604

8.610

5

0.727

2.015

2.571

3.365

4.032

6.869

6

0.718

1.943

2.447

3.143

3.707

5.959

7

0.711

1.895

2.365

2.998

3.500

5.408

8

0.706

1.860

2.306

2.896

3.355

5.041

9

0.703

1.833

2.262

2.821

3.250

4.798

10

0.700

1.812

2.228

2.764

3.169

4.587

15

0.691

1.753

2.131

2.602

2.947

4.073

20

0.687

1.725

2.086

2.528

2.845

3.850

30

0.683

1.697

2.042

2.457

2.750

3.646

(and hence with the varying degrees of freedom) at selected confidence levels are available in statistical tables. Some values are shown in Table 1.7. The numerical value of t is equal to the number of standard errors of the mean that must be added and subtracted from the mean to give the confidence interval at a given confidence level. Note that as the sample size (and hence the degrees of freedom) increases, the confidence levels converge. When n is large and if we wish to calculate the 95% confidence interval, the value of t approximates to 1.96 and some texts quote equation 1.13 in this form. The term Student’s t factor may give the impression that it was devised specifically with students’ needs in mind. In fact ‘Student’ was the pseudonym of a statistician, by the name of W. S. Gossett, who in 1908 first devised the term and who was not permitted by his employer to publish his work under his own name. Criteria for the rejection of outlier experimental data – Q-test A very common problem in quantitative biochemical analysis is the need to decide whether or not a particular result is an outlier and should therefore be rejected before the remainder of the data set are subjected to statistical analysis. It is important to identify such data as they have a disproportionate effect on the calculation of the mean and standard deviation of the data set. When faced with this problem, the first action should be to check that the suspected outlier is not due to a simple experimental or mathematical error. Once the suspect figure has been confirmed its validity is checked by application of Dixon’s Q-test. Like other tests to be described later, the

26

Basic principles

Example 5 ASSESSMENT OF THE ACCURACY OF AN ANALYTICAL DATA SET Question Calculate the confidence intervals at the 50%, 95% and 99% confidence levels of the fasting serum glucose concentrations given in the previous worked example. Answer Accuracy in this type of situation is expressed in terms of confidence intervals that express a range of values over which there is a given probability that the ‘true’ value lies. As previously calculated, x ¼ 2.42 mM and s ¼ 0.16 mM. Inspection of Table 1.8 reveals that for four degrees of freedom (the number of experimental values minus one) and a confidence level of 50%, t ¼ 0.741 so that the confidence interval for the population mean is given by:

confidence interval ¼ 2:42 

ð0:741Þð0:16Þ p 5

¼ 2:42  0:05 mM For the 95% confidence level and the same number of degrees of freedom, t ¼ 2.776, hence the confidence interval for the population mean is given by: confidence interval ¼ 2:42 

ð2:776Þð0:16Þ p 5

¼ 2:42  0:20 mM For the 99% confidence level and the same number of degrees of freedom, t ¼ 4.604, hence the confidence interval for the population mean is given by: ð4:604Þð0:16Þ p 5 ¼ 2:42  0:33 mM

confidence interval ¼ 2:42 

Discussion These calculations show that there is a 50% chance that the population mean lies in the range 2.37 to 2.47 mM, a 95% chance that the population mean lies within the range 2.22 to 2.62 mM and a 99% chance that it lies in the range 2.09 to 2.75 mM. Note that as the confidence level increases the range of potential values for the population mean also increases. You can calculate for yourself that if the mean and standard deviation had been based on 20 measurements (i.e. a four-fold increase in the number of measurements) then the 50% and 95% confidence intervals would have been reduced to 2.42  0.02 mM and 2.42  0.07 mM respectively. This re-emphasises the beneficial impact of multiple experimental determinations but at the same time highlights the need to balance the value of multiple determinations against the accuracy with which the experimental mean is required within the objectives of the individual study.

27

1.4 Quantitative biochemical measurements

Table 1.8 Values of Q for the rejection of outliers Number of observations

Q (95% confidence)

4

0.83

5

0.72

6

0.62

7

0.57

8

0.52

test is based on a null hypothesis, namely that there is no difference in the values being compared. If the hypothesis is proved to be correct then the suspect value cannot be rejected. The suspect value is used to calculate an experimental rejection quotient, Qexp. Qexp is then compared with tabulated critical rejection quotients, Qtable, for a given confidence level and the number of experimental results (Table 1.8). If Qexp is less than Qtable the null hypothesis is confirmed and the suspect value should not be rejected, but if it is greater then the value can be rejected. The basis of the test is the fact that in a normal distribution 95.5% of the values are within the range of two standard deviations of the mean. In setting limits for the acceptability or rejection of data, a compromise has to be made on the confidence level chosen. If a high confidence level is chosen the limits of acceptability are set wide and therefore there is a risk of accepting values that are subject to error. If the confidence level is set too low, the acceptability limits will be too narrow and therefore there will be a risk of rejecting legitimate data. In practice a confidence level of 90% or 95% is most commonly applied. The Qtable values in Table 1.8 are based on a 95% confidence level. The calculation of Qexp is based upon equation 1.14 that requires the calculation of the separation of the questionable value from the nearest acceptable value (gap) coupled with knowledge of the range covered by the data set: Qexp ¼

xn  xn1 gap ¼ range xn  x1

ð1:14Þ

where x is the value under investigation in the series x1, x2, x3, . . . xn–1, xn.

1.4.5 Validation of an analytical method – the use of t-tests A t-test in general is used to address the question as to whether or not two data sets have the same mean. Both data sets need to have a normal distribution and equal variances. There are three types:

• • •

Unpaired t-test: Used to test whether two data sets have the same mean. Paired t-test: Used to test whether two data sets have the same mean where each value in one set is paired with a value in the other set. One-sample t-test: Used to test whether the mean of a data set is equal to a particular value.

28

Basic principles

Example 6 IDENTIFICATION OF AN OUTLIER EXPERIMENTAL RESULT Question If the data set in Example 5 contained an additional value of 3.0 mM, could this value be regarded as an outlier point at the 95% confidence level? Answer From equation 1.15 Qexp ¼

3:0  2:6 0:4 ¼ ¼ 0:5 3:0  2:2 0:8

Using Table 1.11 for six data points Qtable is equal to 0.62. Since Qexp is smaller than Qtable the point should not be rejected as there is more than a 95% chance that it is part of the same data set as the other five values. It is easy to show that an additional data point of 3.3 rather than 3.0 mM would give a Qexp of 0.64 and could be rejected.

Each test is based on a null hypothesis, which is that there is no difference between the means of the two data sets. The tests measure how likely the hypothesis is to be true. The attraction of such tests is that they are easy to carry out and interpret. Analysis of a standard solution – one-sample t-test Once the choice of the analytical method to be used for a particular biochemical assay has been made, the normal first step is to carry out an evaluation of the method in the laboratory. This evaluation entails the replicated analysis of a known standard solution of the test analyte and the calculation of the mean and standard deviation of the resulting data set. The question is then asked ‘Does the mean of the analytical results agree with the known value of the standard solution within experimental error?’ To answer this question a t-test is applied. In the case of the analysis of a standard solution the calculated mean and standard deviation of the analytical results are used to calculate a value of the Student’s t (tcalc) using equation 1.15. It is then compared with table values of t (ttable) for the particular degrees of freedom of the data set and at the required confidence level (Table 1.7). p ðknown value  xÞ n ð1:15Þ tcalc ¼ s These table values of t represent critical values that separate the border between different probability levels. If tcalc is greater than ttable the analytical results are deemed not to be from the same data set as the known standard solution at the selected confidence level. In such cases the conclusion is therefore drawn that the analytical results do not agree with the standard solution and hence that there are unidentified errors in them. There would be no point in applying the analytical method to unknown test analyte samples until the problem has been resolved.

29

1.4 Quantitative biochemical measurements

Example 7 VALIDATING AN ANALYTICAL METHOD Question A standard solution of glucose is known to be 5.05 mM. Samples of it were analysed by the glucose oxidase method (see Section 15.3.2 for details) that was being used in the laboratory for the first time. A calibration curve obtained using least mean square linear regression was used to calculate the concentration of glucose in the test sample. The following experimental values were obtained: 5.12, 4.96, 5.21, 5.18 and 5.26 mM. Does the experimental data set for the glucose solution agree with the known value within experimental error? Answer It is first necessary to calculate the mean and standard deviation for the set and then to use it to calculate a value for Student’s t. Applying equations 1.10 and 1.11 to the data set gives x ¼ 5:15 mM and s ¼ 0.1 mM Now applying equation 1.16 to give tcalc: tcalc ¼

ð5:05  5:15Þ p 5 ¼ 2:236 0:1

Note that the negative difference between the two mean values in this calculation is ignored. From Table 1.10 at the 95% confidence level with four degrees of freedom, ttable ¼ 2.776. tcalc is therefore less than ttable and the conclusion can be drawn that the measured mean value does agree with the known value. Using equation 1.13, the coefficient of variation for the measured values can be calculated to be 1.96%.

Comparing two competitive analytical methods – unpaired t-test In quantitative biochemical analysis it is frequently helpful to compare the performance of two alternative methods of analysis in order to establish whether or not they give the same quantitative result within experimental error. To address this need, each method is used to analyse the same test sample using replicated analysis. The mean and standard deviation for each set of analytical data is then calculated and a Student’s t-test applied. In this case the t-test measures the overlap between the data sets such that the smaller the value of tcalc the greater the overlap between the two data sets. This is an example of an unpaired t-test. In using the tables of critical t values, the relevant degrees of freedom is the sum of the number of values in the two data sets (i.e. n1 þ n2) minus 2. The larger the number of degrees of freedom the smaller the value of tcalc needs to be to exceed the critical value at a given confidence level. The formulae for calculating tcalc depend on whether or not the standard deviations of the two data sets are the same. This is often obvious by inspection, the two standard deviations being similar. However, if in doubt, an F-test, named after Fisher who introduced it, can be applied. An F-test is based on the null hypothesis that there is no difference between the two variances. The test calculates a value for F (Fcalc), which is the ratio of the larger of the two variances to the smaller variance. It is then compared with critical F values (Ftable) available in statistical tables

30

Basic principles

Table 1.9 Critical values of F at the 95% confidence level Degrees of freedom for S2 2

Degrees of freedom for S1 2

3

4

6

10

15

30

1

19.0

19.2

19.2

19.3

19.4

19.4

19.5

19.5

3

9.55

9.28

9.12

8.94

8.79

8.70

8.62

8.53

4

6.94

6.59

6.39

6.16

5.96

5.86

5.75

5.63

5

5.79

5.41

5.19

4.95

4.74

4.62

4.50

4.36

6

5.14

4.76

4.53

4.28

4.06

3.94

3.81

3.67

7

4.74

4.35

4.12

3.87

3.64

3.51

3.38

3.23

8

4.46

4.07

3.84

3.58

3.35

3.22

3.08

2.93

9

4.26

3.86

3.63

3.37

3.14

3.01

2.86

2.71

10

4.10

3.71

3.48

3.22

2.98

2.84

2.70

2.54

15

3.68

3.29

3.06

2.79

2.54

2.40

2.25

2.07

20

3.49

2.10

2.87

2.60

2.35

2.20

2.04

1.84

30

3.32

2.92

2.69

2.42

2.16

2.01

1.84

1.62

1

3.00

2.60

2.37

2.10

1.83

1.67

1.46

1.00

or computer packages (Table 1.9). If the calculated value of F is less than the table value, the null hypothesis is proved and the two standard deviations are considered to be similar. If the two variances are of the same order, then equations 1.16 and 1.17 are used to calculate tcalc for the two data sets. If not, equations 1.18 and 1.19 are used. rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x1  x2 n1 n2 ð1:16Þ tcalc ¼ spooled n1 þ n2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi s21 ðn1  1Þ þ s22 ðn2  1Þ ð1:17Þ spooled ¼ n1 þ n2  2 x1  x2 tcalc ¼ p 2 ð1:18Þ ðs1 =n1 þ s22 =n2(Þ ) 2 ðs21 =n1 þ s22 =n2 Þ 2 ð1:19Þ degrees of freedom ¼ ðs21 =n1 Þ2 =ðn1 þ 1Þ þ ðs22 =n2 Þ2 =ðn2 þ 1Þ where x1 and x2 are the calculated means of the two methods, s12 and s22 are the calculated standard deviations of the two methods and n1 and n2 are the number of measurements in the two methods. At first sight these four equations may appear daunting, but closer inspection reveals that they are simply based on variance (s2), mean (x) and number of analytical measurements (n) and that the mathematical manipulation of the data is relatively easy.

31

1.4 Quantitative biochemical measurements

Example 8 COMPARISON OF TWO ANALYTICAL METHODS USING REPLICATED ANALYSIS OF A SINGLE TEST SAMPLE Question A sample of fasting serum was used to evaluate the performance of the glucose oxidase and hexokinase methods for the quantification of serum glucose concentrations (for details see Section 15.3.5). The following replicated values were obtained: for the glucose oxidase method 2.3, 2.5, 2.2, 2.6 and 2.5 mM and for the hexokinase method 2.1, 2.7, 2.4, 2.4 and 2.2 mM. Establish whether or not the two methods gave the same results at the 95% confidence level. Answer Using the standard formulae we can calculate the mean, standard deviation and variance for each data set. Glucose oxidase method x ¼ 2:42 mM; s ¼ 0:16 mM; s2 ¼ 0:026 ðmMÞ2 Hexokinase method x ¼ 2:36 mM; s ¼ 0:23 mM; s2 ¼ 0:053 ðmMÞ2 We can then apply the F-test to the two variances to establish whether or not they are the same: Fcalc ¼

0:053 ¼ 2:04 0:026

Ftable for the two sets of data each with four degrees of freedom and for the 95% confidence level is 6.39 (Table 1.11). Since Fcalc is less than Ftable we can conclude that the two variances are not significantly different. Therefore using equations 1.17 and 1.18 we can calculate that:

spooled

tcalc

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð0:16Þ2 ð4Þ þ ð0:23Þ2 4 0:102 þ 0:212 pffiffiffiffiffiffiffiffiffiffiffiffi ¼ ¼ 0:039 ¼ 0:198 ¼ 8 8 rffiffiffiffiffiffiffiffiffiffiffiffiffi 2:42  2:36 ð5Þð5Þ ¼ ð0:303Þð1:58Þ ¼ 0:48 ¼ 0:198 10

Using Table 1.10 at the 95% confidence level and for eight degrees of freedom ttable is 2.306. Thus tcalc is far less than ttable and so the two sets of data are not significantly different, i.e. the two methods have given the same result at the 95% confidence level.

Comparison of two competitive analytical methods – paired t-test A variant of the previous type of comparison of two analytical methods based upon the analysis of a common standard sample, is the case in which a series of test samples is analysed once by the two different analytical methods. In this case there is no replication of analysis of any test sample by either method. The t-test is applied to the

32

Basic principles

differences between the results of each method for each test sample. This is an example of a paired t-test. The formula for calculating tcalc in this case is given by equation 1.20: p d n tcalc ¼ sd sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ðdi  dÞ sd ¼ n1

ð1:20Þ ð1:21Þ

 is the mean difference between where di is the difference between the paired results, d the paired results, n is the number of paired results and sd is the standard deviation of the differences between the pairs.

1.4.6 Calibration methods Quantitative biochemical analyses often involve the use of a calibration curve produced by the use of known amounts of the analyte using the selected analytical procedure. A calibration curve is a record of the measurement (absorbance, peak area, etc.) produced by the analytical procedure in response to a range of known quantities of the standard analyte. It involves the preparation of a standard solution of the analyte and the use of a range of aliquots in the test analytical procedure. It is good practice to replicate each calibration point and to use the mean  one standard deviation for the construction of the calibration plot. Inspection of the compiled data usually reveals a scatter of the points about a linear relationship but such that there are several options for the ‘best’ fit. The technique of fitting the best fit ‘by eye’ is not recommended, as it is highly subjective and irreproducible. The method of least mean squares linear regression (LMSLR) is the most common mathematical way of fitting a straight line to data but in applying the method, it is important to realise that the accuracy of the values for slope and intercept that it gives are determined by experimental error built into the x and y values. The mathematical basis of LMSLR is complex and will not be considered here, but the principles upon which it is based are simple. If the relationship between the two variables, such as the concentration or amount of analyte and response, is linear, then the ‘best’ straight line will have the general form y ¼ mx þ c where x and y are the two variables, m is the slope of the line and c is the intercept on the y-axis. It is assumed, correctly in most cases, that the errors in the measurement of y are much greater than those for x (it does not assume that there are no errors in the x values) and secondly that uncertainties (standard deviations) in the y values are all of the same magnitude. The method uses two criteria. The first is that the line will pass through the point (x,y) where x and y are the mean of the x and y values respectively. The second is that the slope (m) is based on the calculation of the optimum values of m and c that give minimum variation between individual experimental y values and their corresponding values as predicted by the ‘best’ straight line. Since these variations can be positive or negative (i.e. the experimental values can be greater or smaller than those predicted by the ‘best’ straight line), in the process of arriving at the best slope the method measures the deviations between the experimental and candidate straight line values, squares

33

1.4 Quantitative biochemical measurements

Example 9 COMPARISON OF TWO ANALYTICAL METHODS USING DIFFERENT TEST SAMPLES Question Ten fasting serum samples were each analysed by the glucose oxidase and the hexokinase methods. The following results, in mM, were obtained: Glucose oxidase (mM)

Hexokinase (mM)

1.1 2.0 3.2 3.7 5.1 8.6 10.4 15.2 18.7 25.3

0.9 2.1 2.9 3.5 4.8 8.7 10.6 14.9 18.7 25.0

Difference di 0.2 0.1 0.3 0.2 0.3 0.1 0.2 0.3 0.0 0.3  0.12 Mean (d)

Difference minus mean of difference

(Difference minus mean of difference)2

0.08 0.22 0.18 0.08 0.18 0.22 0.32 0.18 0.12 0.18

0.0064 0.0484 0.0324 0.0064 0.0324 0.0484 0.1024 0.0324 0.0144 0.0324 P 0.3560

Do the two methods give the same results at the 95% confidence level?

Answer Before addressing the main question, note that the ten samples analysed by the two methods were chosen to cover the whole analytical range for the methods. To assess whether or not the two methods have given the same result at the chosen confidence level, it is necessary to calculate a value for tcalc and to compare it with ttable for the nine degrees of freedom in the study. To calculate tcalc, it is first necessary to calculate the value of sd in equation 1.21. The appropriate calculations are shown in the table above. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sd ¼ ððdi   dÞ2 Þ=ðn  1Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ð0:356=9Þ ¼ 0:199 From equation 1.20 p n d tcalc ¼ sd p ¼ ð0:12 10Þ=0:199 ¼ 1:907 Using Table 1.10, ttable at the 95% confidence level and for nine degrees of freedom is 2.262. Since tcalc is smaller than ttable the two methods do give the same results at the 95% confidence level. Inspection of the two data sets shows that the glucose oxidase method gave a slightly high value for seven of the ten samples analysed.

34

Basic principles

Example 9 (cont.) An alternative approach to the comparison of the two methods is to plot the two data sets as an x/y plot and to carry out a regression analysis of the data. If this is done using the glucose oxidase data as the y variable, the following results are obtained: Slope: 1.0016, intercept: 0.1057, correlation coefficient r: 0.9997. The slope of very nearly one confirms the similarity of the two data sets, whilst the small positive intercept on the y-axis confirms that the glucose oxidase method gives a slightly higher, but insignificantly different, value to that of the hexokinase method. them (so they are all positive), sums them and then selects the values of m and c that give the minimum deviations. The end result of the regression analysis is the equation for the best-fit straight line for the experimental data set. This is then used to construct the calibration curve and subsequently to analyse the test analyte(s). Most modern calculators will carry out this type of analysis and will simultaneously report the 95% confidence limits for the m and c values and/or the standard deviation associated with the two values together with the ‘goodness-of-fit’ of the data as expressed by a correlation coefficient, r or a coefficient of determination, r2. The stronger the correlation between the two variables, the closer the value of r approaches þ1 or 1. Values of r are quoted to four decimal places and for good correlations commonly exceed 0.99. Values of 0.98 and less should be considered with care since even slight curvature can give r-values of this order. In the routine construction of a calibration curve, a number of points have to be borne in mind:





Selection of standard values: A range of standard analyte amounts/concentrations should be selected to cover the expected values for the test analyte(s) in such a way that the values are equally distributed along the calibration curve. Test samples should not be estimated outside this selected range, as there is no evidence that the regression analysis relationship holds outside the range. It is good practice to establish the analytical range and the limit of detection for the method. It is also advisable to determine the precision (standard deviation) of the method at different points across the analytical range and to present the values on the calibration curve. Such a plot is referred to as a precision profile. It is common for the precision to decrease (standard deviation to increase) at the two ends of the curve and this may have implications for the routine use of the curve. For example, the determination of testosterone in male and female serum requires the use of different methods since the two values (reference range 10–30 nM for males, <3 nM for females) cannot be accommodated with acceptable precision on one calibration curve. Use of a ‘blank’ sample: This is one in which no standard analyte is present. One should be included in the experimental design when possible (it will not be possible, for example, with analyses based on serum or plasma). Any experimental value, e.g. absorbance, obtained for it must be deducted from all other measurements.

35

• •

1.5 Safety in the laboratory

This may be achieved automatically in spectrophotometric measurements by the use of a double-beam spectrophotometer in which the blank sample is placed in the reference cell. Shape of curve: It should not be assumed that all calibration curves are linear. They may be curved and best represented by a quadratic equation of the type y ¼ ax2 þ bx þ c where a, b and c are constants or they may be logarithmic. Recalibration: A new calibration curve should be constructed on a regular basis. It is not acceptable to rely on a calibration curve produced on a much earlier occasion.

1.4.7 Internal standards An additional approach to the control of time-related minor changes in a calibration curve and the quantification of an analyte in a test sample is the use of an internal standard. An ideal internal standard is a compound that has a molecular structure and physical properties as similar as possible to the test analyte and which gives a similar response to the analytical method as the test analyte. This response, expressed on a unit quantity basis, may be different from that for the test analyte but provided that the relative response of the two compounds is constant, the advantages of the use of the internal standard are not compromised. Quite commonly the internal standard is a structural or geometrical isomer of the test analyte. A known fixed quantity of the standard is added to each test sample and analysed alongside the test analyte by the standard analytical procedure. The resulting response for the standard and the range of amounts or concentrations of the test analyte is used to calculate a relative response for the test analyte and used in the construction of the calibration curve. The curve therefore consists of a plot of the relative response to the test analyte against the range of quantities of the analyte. Internal standards are commonly used in liquid and gas chromatography since they help to compensate for small temporal variations in the flow of liquid or gas through the chromatographic column. In such applications it is, of course, essential that the internal standard chromatographs are near to, but distinct from, the test analyte. If the analytical procedure involves preliminary sampling procedures, such as solidphase extraction, it is important that a known amount of the internal standard is introduced into the test sample at as early a stage as possible and is therefore taken through the preliminary procedures. This ensures that any loss of the test analyte during these preliminary stages will be compensated by identical losses to the internal standard so that the final relative response of the method to the two compounds is a true reflection of the quantity of the test analyte.

1.5 SAFETY IN THE LABORATORY Virtually all experiments conducted in a biochemistry laboratory present a potential risk to the well-being of the investigator. In planning any experiment it is essential

36

Basic principles

that careful thought be given to all aspects of safety before the experimental design is finalised. Health hazards come from a variety of sources:









Chemical hazards: All chemicals are, to varying extents, capable of causing damage to the body. They may be irritants and cause a short-term effect on exposure. Alternatively they may be corrosive and cause severe and often irreversible damage to the skin. Examples include strong acids and alkalis. Thirdly they may be toxic once they have gained access to the body by ingestion, inhalation or absorption across the skin. Once in the body their effect may range from slight to the extremes of being a poison (e.g. cyanide), a carcinogen (e.g. benzene and vinyl chloride) or a teratogen (e.g. thalidomide). Finally there is the special case of the use of radioactive compounds that are discussed in detail in Chapter 14. Biological hazards: Examples include human body fluids that may carry infections such as HIV, laboratory animals that may cause allergic reactions or transmit certain diseases, pathogenic animal and cell tissue cultures, and all microorganisms including genetically engineered forms. In the UK, animal experiments must be conducted in accordance with Home Office regulations and guidelines. All experiments with tissue and cell cultures should be conducted in microbiological cabinets that are provided with a sterile airflow away from the operator (Section 2.2). Electrical and mechanical hazards: All electrical apparatus should be used and maintained in accordance with the manufacturers’ instructions. Electrophoresis equipment presents a particular potential for safety problems. Centrifuges, especially high-speed varieties, also need careful use especially in the correct use and balance of the rotors. General laboratory hazards: Common examples include syringe needles, broken glassware and liquid nitrogen flasks. Routine precautions that should be taken to minimise personal exposure to these hazards include the wearing of laboratory coats, which should be of the high-necked buttoned variety for work with microorganisms, safety spectacles and lightweight disposable gloves. It is also good practice not to work alone in a laboratory so that help is to hand if needed. In the UK, laboratory work is subject to legislation including the Health and Safety at Work Act 1974, the Control of Substances Hazardous to Health (COSHH) Regulations 1994 and the Management of Health and Safety at Work Regulations 1999. This legislation requires a risk assessment to be carried out prior to undertaking laboratory work. As the name implies, a risk assessment requires potential hazards to be identified and an assessment made of their potential severity and probability of occurrence. Action must be taken in cases where the potential severity and probability are medium to high. Such assessments require knowledge of the toxicity of all the chemicals used in the study. Toxicity data are widely available via computer packages and published handbooks and should be on reference in all laboratories. Once the toxicity data are known, consideration may be given to the use of alternative and less toxic compounds or, if it is decided to proceed with the use of toxic compounds, precautions taken to minimise their risk and plans laid for dealing with an accident should one occur. These include arranging access to first-aiders and

37

1.6 Suggestions for further reading

other emergency services. It is normal for all laboratories to have a nominated Safety Officer whose responsibility it is to give advice on safety issues. To facilitate good practice, procedures for the disposal of organic solvents, radioactive residues, body fluids, tissue and cell cultures and microbiological cultures are posted in all laboratories.

1.6 SUGGESTIONS FOR FURTHER READING Analytical methodology and quality assurance Burns, M. (2004). Current practice in the assessment and control of measurement uncertainty in bio-analytical chemistry. Trends in Analytical Chemistry, 23, 393–397. Carson, P. A. and Dent, N. (eds.) (2007). Good Clinical, Laboratory and Manufacturing Practices: Techniques for the QA Professional. London: RSC. (A comprehensive but easy-to-read book aimed at both newcomers and professionals involved in laboratory quality assurance issues.) Fesling, M. F. W. (2003). Principles: the need for better experimental design. Trends in Pharmacological Sciences, 24, 341–345. Safety Control of Substances Hazardous to Health Regulations 2002: Approved Code of Practice and Guidance. Kingston-upon-Thames: HSE Books. (A step-by-step approach to understanding the practical implications of COSHH.)

2 Cell culture techniques A. R. BAYDOUN

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9

Introduction The cell culture laboratory and equipment Safety considerations in cell culture Aseptic techniques and good cell culture practice Types of animal cell, characteristics and maintenance in culture Stem cell culture Bacterial cell culture Potential use of cell cultures Suggestions for further reading

2.1 INTRODUCTION Cell culture is a technique that involves the isolation and maintenance in vitro of cells isolated from tissues or whole organs derived from animals, microbes or plants. In general, animal cells have more complex nutritional requirements and usually need more stringent conditions for growth and maintenance. By comparison, microbes and plants require less rigorous conditions and grow effectively with the minimum of needs. Regardless of the source of material used, practical cell culture is governed by the same general principles, requiring a sterile pure culture of cells, the need to adopt appropriate aseptic techniques and the utilisation of suitable conditions for optimal viable growth of cells. Once established, cells in culture can be exploited in many different ways. For instance, they are ideal for studying intracellular processes including protein synthesis, signal transduction mechanisms and drug metabolism. They have also been widely used to understand the mechanisms of drug actions, cell–cell interaction and genetics. Additionally, cell culture technology has been adopted in medicine, where genetic abnormalities can be determined by chromosomal analysis of cells derived, for example, from expectant mothers. Similarly, viral infections can be assayed both qualitatively and quantitatively on isolated cells in culture. In industry, cultured cells are used routinely to test both the pharmacological and toxicological effects of pharmaceutical compounds. This technology thus provides a valuable tool to scientists, offering a user-friendly system that is relatively cheap to run and the 38

39

2.2 The cell culture laboratory and equipment

exploitation of which avoids the legal, moral and ethical questions generally associated with animal experimentation. More importantly, cell culture also presents a tremendous potential for future exploitation in disease treatment, where, for instance, defective or malfunctioning genes could be corrected in the host’s own cells and transplanted back into the host to treat a disease. Furthermore, successful development of culture techniques for stem cells will provide a much needed cell-based strategy for treating diseases where organ transplant is currently the only available option. In this chapter, fundamental information required for standard cell culture, together with a series of principles and outline protocols used routinely in growing animal and bacterial cells are discussed. Additionally, a section has been dedicated to human embryonic stem cell culture, an emerging field where protocols to be used routinely are still being developed. The discussion in this chapter is thus limited to techniques that are now becoming routine for stem cell culture and should therefore provide the basic knowledge for those new to the field of cell culture and act as a revision aid for those with limited experience in the field. Throughout the chapter, particular attention is paid to the importance of the work environment, outlining safety considerations together with adequate description and hints on the essential techniques required for tissue culture work.

2.2 THE CELL CULTURE LABORATORY AND EQUIPMENT 2.2.1 The cell culture laboratory The design and maintenance of the cell culture laboratory is perhaps the most important aspect of cell culture, since a sterile surrounding is critical for handling of cells and culture media, which should be free from contaminating microorganisms. Such organisms, if left unchecked, would outgrow the cells being cultured, eventually resulting in culture-cell demise owing to the release of toxins and/or depletion of nutrient from the culture medium. Where possible, a cell culture laboratory should be designed in such a way that it facilitates preparation of media and allows for the isolation, examination, evaluation and maintenance of cultures under controlled sterile conditions. In an ideal situation, there should be a room dedicated to each of the above tasks. However, many cell culture facilities, especially in academia, form part of an open-plan laboratory and as such are limited in space. It is not unusual therefore to find an open-plan area where places are designated for each of the above functions. This is not a serious problem as long as a few basic guidelines are adopted. For instance, good aseptic techniques (discussed below) should be used at all times. There should also be adequate facilities for media preparation and sterilisation, and all cell culture materials should be maintained under sterile conditions until used. In addition, all surfaces within the culture area should be non-porous to prevent adsorption of media and other materials that may provide a good breeding ground for microorganisms, resulting in the infection of the cultures. Surfaces should also be easy to clean and all waste

40

Cell culture techniques

generated should be disposed of immediately. The disposal procedure may require prior autoclaving of the waste, which can be carried out using pressurised steam at 121  C under 105 kPa for a defined period of time. These conditions are required to destroy microorganisms. For smooth running of the facilities, daily checks should be made of the temperature in incubators, and of the gas supply to the incubators by checking the CO2 cylinder pressure. Water baths should be kept clean at all times and areas under the work surfaces of the flow cabinets cleaned of any spills.

2.2.2 Equipment for cell culture Several pieces of equipment are essential. These include a tissue culture hood, incubator(s), autoclave and microscope. A brief description will be given of these and other essential equipments. Cell culture hoods The cell culture hood is the central piece of equipment where all the cell handling is carried out and is designed not only to protect the cultures from the operator but in some cases to protect the operator from the cultures. These hoods are generally referred to as laminar flow hoods as they generate a smooth uninterrupted streamlined flow (laminar flow) of sterile air which has been filtered through a high-efficiency particulate air (HEPA) filter. There are two types of laminar flow hood classified as either vertical or horizontal. The horizontal hoods allow air to flow directly at the operator and as a result are generally used for media preparation or when one is working with non-infectious materials, including those derived from plants. The vertical hoods (also known as biology safety cabinets) are best for working with hazardous organisms, since air within the hood is filtered before it passes into the surrounding environment. Currently, there are at least three different classes of hood used which all offer various levels of protection to the cultures, the operator or both and these are described below. Class I hoods These hoods, as with the class II type, have a screen at the front that provides a barrier between the operator and the cells but yet allows access into the hood through an opening at the bottom of the screen (Fig. 2.1). This barrier prevents too much turbulence to air flow from the outside and, more importantly, provides good protection for the operator. Cultures are also protected but to a lesser extent when compared to the class II hoods as the air drawn in from the outside is sucked through the inner cabinet to the top of the hood. These hoods are suitable for use with low-risk organisms and when operator protection only is required. Class II hoods Class II hoods are the most common units found in tissue culture laboratories. These hoods offer good protection to both the operator and the cell culture. Unlike class I hoods, air drawn from the outside is passed through the grill in the front of the work area and filtered through the HEPA filter at the top of the hood before streaming down

41

2.2 The cell culture laboratory and equipment

Class I

Class II

Room air Contaminated air Clean air Class III

Fig. 2.1 Schematic representation of tissue culture cabinets.

over the tissue culture (Fig. 2.1). This mechanism protects the operator and ensures that the air over the cultures is largely sterile. These hoods are adequate for animal cell culture, which involves low to moderate toxic or infectious agents, but are not suitable for use with high-risk pathogens, which may require a higher level of containment. Class III hoods Class III safety cabinets are required when the highest levels of operator and product protection are required. These hoods are completely sealed, providing two glove pockets through which the operator can work with material inside the cabinet (Fig. 2.1). Thus the operator is completely shielded, making class III hoods suitable for work with highly pathogenic organisms including tissue samples carrying known human pathogens. Practical hints and safety aspects of using cell culture hoods All hoods must be maintained in a clutter-free and clean state at all times as too much clutter may affect air flow and contamination will introduce infections. Thus, as a rule of thumb, put only items that are required inside the cabinet and clean all work surfaces before and after use with industrial methylated spirit (IMS). The latter is used at an effective concentration of 70% (prepared by adding 70% v/v IMS to 30% Milli-Q

42

Cell culture techniques

water), which acts against bacteria and fungal spores by dehydrating and fixing cells, thus preventing contamination of cultures. Some cabinets may be equipped with a short-wave ultraviolet light that can be used to irradiate the interior of the hood to kill microorganisms. When present, switch on the ultraviolet light for at least 15 min to sterilise the inside of the cabinet, including the work area. Note, however, that ultraviolet radiation can cause adverse damage to the skin and eyes and precaution should be taken at all times to ensure that the operator is not in direct contact with the ultraviolet light when using this option to sterilise the hood. Once finished, ensure that the front panel door (class I and II hoods) is replaced securely after use. In addition always turn the hood on for at least 10 min before starting work to allow the flow of air to stabilise. During this period, monitor the air flow and check all dials in the control panel at the front of the hood to ensure that they are within the safe margin. CO2 incubators Water-jacketed incubators are required to facilitate optimal cell growth under strictly maintained and regulated conditions, normally requiring a constant temperature of 37  C and an atmosphere of 5–10% CO2 plus air. The purpose of the CO2 is to ensure that the culture medium is maintained at the required physiological pH (usually pH 7.2–7.4). This is achieved by the supply of CO2 from a gas cylinder into the incubator through a valve that is triggered to draw in CO2 whenever the level falls below the set value of 5% or 10%. The CO2 that enters the inner chamber of the incubator dissolves into the culture medium containing bicarbonate. The latter reacts with Hþ (generated from cellular metabolism), forming carbonic acid, which is in equilibrium with water and CO2, thereby maintaining the pH in the medium at approximately pH 7.2. þ HCO 3 þ H ÐH2 CO3 ÐCO2 þ H2 O

These incubators are generally humidified by the inclusion of a tray of sterile water on the bottom deck. The evaporation of water creates a highly humidified atmosphere, which helps to prevent evaporation of medium from the cultures. An alternative to humidified incubators is the dry non-gassed unit which is not humidified and relies on the use of alternative buffering systems such as 4(2-hydroxyethyl)-1-piperazine-ethanesulphonic acid (Hepes) or morpholinopropane sulphonic acid (Mops) for maintaining a balanced pH within the culture medium. The advantage of this system is that it eliminates the risk from infections that can be posed by the tray of water in the humidified unit. The disadvantage, however, is that the culture medium will evaporate rapidly, thereby stressing the cells. One way round this problem is to place the cell culture plate in a sandwich box containing little pots of sterile water. With the sandwich box lid partially closed, evaporation of water from the pots will create a humidified atmosphere within the sandwich box, thus reducing the risk of evaporation of medium from the culture plate. Practical hints and safety aspects of using cell culture incubators The incubator should be maintained at 37  C and supplied with 5% CO2 at all times. A constant temperature can be maintained by keeping a thermometer in the incubator,

43

2.3 Safety considerations in cell culture

preferably on the inside of the inner glass door. This can then be checked on a regular basis and adjustments made as required. CO2 levels inside the unit can be monitored and adjusted by using a gas analyser such as the Fyrite Reader. Regular checks should also be made on the levels of CO2 in the gas cylinders that supply CO2 to the incubators and these should be replaced when levels are very low. Most incubators are designed with an inbuilt alarm that sounds when the CO2 level inside the chamber drops. At this point the gas cylinder must be replaced immediately to avoid stressing or killing the cultures. It is now possible to connect two gas cylinders to a cylinder changeover unit that switches automatically to the second source of gas supply when the first is empty. It is advisable therefore to use this device where possible. When one is using a humidified incubator, it is essential that the water tray is maintained and kept free from microorganisms. This can be achieved by adding various agents to the water such as the antimicrobial agent Roccal at a concentration of 1% (w/v). Other products such as Thimerosal or SigmaClean from Sigma-Aldrich can also be used. Proper care and maintenance of the incubator should, however, include regular cleaning of the interior of the unit using any of the above reagents then swabbing with 70% IMS. More recently, copper-coated incubators have been introduced which, due to the antimicrobial properties of copper, are reported to reduce microbial contamination. Microscopes Inverted phase contrast microscopes (see Chapter 4) are routinely used for visualising cells in culture. These are expensive but easy to operate, with a light source located above and the objective lenses below the stage on which the cells are placed. Visualisation of cells by microscopy can provide useful information about the morphology and state of the cells. Early signs of cell stress may be easily identified and appropriate action taken to prevent loss of cultures. Other general equipment Several other pieces of equipment are required in cell culture. These include a centrifuge to spin down cells, a water bath for thawing frozen samples of cells and warming media to 37  C before use, and a fridge and freezer for storage of media and other materials required for cell culture. Some cells need to attach onto a surface in order to grow and are therefore referred to as adherent. These cells are cultured in non-toxic polystyrene plastics that contain a biologically inert surface on which the cells attach and grow. Various types of plastics are available for this purpose and include Petri dishes, multi-well plates (with either 96, 24, 12 or 6 wells per plate) and screwcap flasks classified according to their surface areas: T-25, T-75, T-225 (cm2 of surface area). A selection of these plastics is shown in Fig. 2.2.

2.3 SAFETY CONSIDERATIONS IN CELL CULTURE Because of the nature of the work, safety in the cell culture laboratory must be of a major concern to the operator. This is particularly the case when one is working with pathogenic

44

Cell culture techniques

Fig. 2.2 Tissue culture plastics used generally for cell culture. (A–C) T-flasks; (D–G) representative of multi-well plates. (A) T–25 (25 cm2), (B) T-75 (75 cm2), (C) T-225 (225 cm2), (D) 96-well plate, (E) 24-well plate, (F) 12-well plate and (G) 6-well plate.

microbes or with fresh primate or human tissues or cells which may contain agents that use humans as hosts. One very good example of this would be working with fresh human lymphocytes, which may contain infectious agents such as the human immunodeficiency virus (HIV) and/or hepatitis B virus. Thus, when one is working with fresh human tissue, it is essential that the infection status of the donor is determined in advance of use and all necessary precautions taken to eliminate or limit the risks to which the operator is exposed. A recirculation class II cabinet would be a minimum requirement for this type of cell culture work and the operator should be provided with protective clothing including latex gloves and a face mask if required. Such work should also be carried out under the guidelines laid down by the UK Advisory Committee on Dangerous Pathogens (ACDP). Apart from the risks posed by the biological material being used, the operator should also be aware of his or her work environment and be fairly conversant with the equipment being used, as these may also pose a serious hazard. The culture cabinet should be serviced routinely and checked (approximately every 6 months) to ensure its safety to the operator. Additionally the operator could ensure his or her own safety by adopting some common precautionary measures such as refraining from eating or drinking whilst working in the cabinet and using a pipette aid as opposed to mouth pipetting to prevent ingestion of unwanted substances. Gloves and adequate protective clothing such as a clean laboratory coat should be worn at all times and gloves must be discarded after handling of non-sterile or contaminated material.

2.4 ASEPTIC TECHNIQUES AND GOOD CELL CULTURE PRACTICE 2.4.1 Good practice In order to maintain a clean and safe culture environment, adequate aseptic or sterile technique should be adopted at all times. This simply involves working under

45

2.4 Aseptic techniques and good cell culture practice

conditions that prevent contaminating microorganisms from the environment from entering the cultures. Part of the precaution taken involves washing hands with antiseptic soap and ensuring that all work surfaces are kept clean and sterile by swabbing with 70% IMS before starting work. Moreover, all procedures, including media preparation and cell handling, should be carried out in a cell culture cabinet that is maintained in a clean and sterile condition. Other essential precautions should include avoiding talking, sneezing or coughing into the cabinet or over the cultures. A clean pipette should be used for each different procedure and under no circumstance should the same pipette be used between different bottles of media, as this will significantly increase the risk of cross-contamination. All spillages must be cleaned quickly to avoid contamination from microorganisms that may be present in the air. Failing to do so may result in infections to the cultures, which may be reduced by using antibiotics. However, this is not always guaranteed and good aseptic techniques should eliminate the need for antibiotics. In the event of cultures becoming contaminated, these should be removed immediately from the laboratory, disinfected and autoclaved to prevent the contamination spreading. Under no circumstance can an infected culture be opened inside the cell culture cabinet or incubator. Moreover, all waste generated must be decontaminated and disposed of immediately after completing the work. This should be carried out in accordance with the national legislative requirements, which state that cell culture waste including media be inactivated using a disinfectant before disposal and that all contaminated materials and waste be autoclaved before being discarded or incinerated. The risk from infections is the most common cause for concern in cell culture. Various factors can contribute to this, including poor work environment, poor aseptic techniques and indeed poor hygiene of the operator. The last of these is important, since most of the common sources of infections such as bacteria, yeast and fungus originate from the worker. Maintaining a clean environment and adopting good laboratory practice and aseptic techniques should, therefore, help to reduce the risks of infection. However, should infections occur, it is advisable to address this immediately and eradicate the problem. To do this, it helps to know the types of infection to expect and what to look for. In animal cell cultures, bacterial and fungal infections are relatively easy to identify and isolate. The other most common contamination originates from mycoplasma. These are the smallest (approximately 0.3 mm in diameter) self-replicating prokaryotes in existence. They lack a rigid cell wall and generally infect the cytoplasm of mammalian cells. There are at least five species known to contaminate cells in culture: Mycoplasma hyorhinis, Mycoplasma arginini, Mycoplasma orale, Mycoplasma fermentans and Acholeplasma laidlawii. Infections caused by these organisms are more problematic and not easily identified or eliminated. Moreover, if left unchecked, mycoplasma contamination will cause subtle but adverse effects on cultures, including changes in metabolism, DNA, RNA and protein synthesis, morphology and growth. This can lead to non-reproducible, unreliable experimental results and unsafe biological products.

46

Cell culture techniques

2.4.2 Identification and eradication of bacterial and fungal infections Both bacterial and fungal contaminations are easily identified as the infective agents are readily visible to the naked eye even in the early stages. This is usually made noticeable by the increase in turbidity and the change in colour of the culture medium owing to the change in pH caused by the infection. In addition, bacteria can be easily identified under microscopic examination as motile round bodies. Fungi on the other hand are distinctive by their long hyphal growth and by the fuzzy colonies they form in the medium. In most cases the simplest solution to these infections is to remove and dispose of the contaminated cultures. In the early stages of an infection, attempts can be made to eliminate the infecting microorganism using repeated washes and incubations with antibiotics or antifungal agents. This is however not advisable as handling infected cultures in the sterile work environment increases the chances of the infection spreading. As part of the good laboratory practice, sterile testing of cultures should be carried out regularly to ensure that cultures are free from microbial organisms. This is particularly important when preparing cell culture products or generating cells for storage. Generally, the presence of these organisms can be detected much earlier and necessary precautions taken to avoid a full-blown contamination crisis in the laboratory. The testing procedure usually involves culturing a suspension of cells or products in an appropriate medium such as tryptone soya broth (TSB) for bacterial or thioglycollate medium (TGM) for fungal detection. The mixture is incubated for up to 14 days but examined daily for turbidity, which is used as an indication of microbial growth. It is essential that both positive and negative controls are set up in parallel with the sample to be tested. For this purpose a suspension of bacteria such as Bacillus subtilis or fungus such as Clostridium sporogenes is used instead of the cells or product to be tested. Uninoculated flasks containing only the growth medium are used as negative controls. Any contamination in the cell cultures will result in the broth appearing turbid, as would the positive controls. The negative controls should remain clear. Infected cultures should be discarded, whilst clear cultures would be safe to use or keep.

2.4.3 Identification of mycoplasma infections Mycoplasma contaminations are more prevalent in cell culture than many workers realise. The reason for this is that mycoplasma contaminations are not evident under light microscopy nor do they result in a turbid growth in culture. Instead the changes induced are more subtle and manifest themselves mainly as a slowdown in growth and in changes in cellular metabolism and functions. However, cells generally return to their native morphology and normal proliferation rates relatively rapidly after eradication of mycoplasma. The presence of mycoplasma contamination in cultures has, until recently, been difficult to determine and samples had to be analysed by specialist laboratories. There are, however, improved techniques now available for detection of mycoplasma in cell culture laboratories. These include microbiological cultures of infected cells, an

47

2.4 Aseptic techniques and good cell culture practice

Fig. 2.3 Photograph of mycoplasma, showing the characteristic opaque granular central zone surrounded by a translucent border, giving a ‘fried egg’ appearance.

indirect DNA staining technique using the fluorochrome dye Hoechst 33258, enzymelinked immunosorbent assay (ELISA) or polymerase chain reaction (PCR). With the microbiological culture technique, cells in suspension are inoculated into liquid broth and then incubated under aerobic conditions at 37  C for 14 days. A noninoculated flask of broth is used as a negative control. Aliquots of broth are taken every 3 days and inoculated onto an agar plate, which is incubated anaerobically as above. All plates are then examined under an inverted microscope at a magnification of 300 after 14 days of incubation. Positive cultures will show the typical mycoplasma colony formation, which has an opaque granular central zone surrounded by a translucent border, giving a ‘fried egg’ appearance (Fig. 2.3). It may be necessary to set up positive controls in parallel, in which case plates and broth should be inoculated with a known strain of mycoplasma such as Mycoplasma orale or Mycoplasma pneumoniae. The DNA binding method offers a rapid alternative for detecting mycoplasma and works on the principle that Hoechst 33258 fluoresces under ultraviolet light once bound to DNA. Thus, in contaminated cells, the fluorescence will be fairly dispersed in the cytoplasm of the cells owing to the presence of mycoplasma. In contrast, uncontaminated cells will show localised fluorescence in their nucleus only. The Hoechst 33258 assay, although rapid, is relatively less sensitive when compared with the culture technique described above. For this assay, an aliquot of the culture to be tested is placed on a sterile coverslip in a 35-mm culture dish and incubated at 37  C in a cell culture incubator to allow cells to adhere. The coverslip is then fixed by adding a fixative consisting of 1 part glacial acetic acid and 3 parts methanol, prepared fresh on the day. A freshly prepared solution of Hoechst 33258 stain is added to the fixed coverslip, incubated in the dark at room temperature to allow the dye to bind to the DNA and then viewed under ultraviolet fluorescence at 1000. All positive cultures will show fluorescence of mycoplasma DNA, which will appear as small cocci or filaments in the cytoplasm of the contaminated cells (Fig. 2.4b, see also colour section). Negative cultures will show only fluorescing nuclei of

48

Cell culture techniques

(a)

(b)

Fig. 2.4 Hoechst 33258 staining of mycoplasma in cells. (a) A Hoechst-negative stain, with the dye staining cellular DNA in the nucleus and thus showing nuclear fluorescence. (b) A Hoechst-positive stain, showing staining of mycoplasma DNA in the cytoplasm of the cells. (See also colour plate.)

uncontaminated cells against a dark cytoplasmic background (Fig. 2.4a, see also colour section). However, this technique is prone to errors, including false-negative results. To avoid the latter, cells should be cultured in antibiotic-free medium for two to three passages before being used. A positive control using a strain of mycoplasma seeded onto a coverslip is essential. Such controls should be handled away from the cell culture laboratory to avoid contaminating clean cultures of cells. It is also important to ensure that the fluorescence detected is not due to the presence of bacterial contamination or debris embedded into the plastics during manufacture. The former normally appear larger than the fluorescing cocci or filaments of mycoplasma. Debris, on the other hand, would show a non-uniform fluorescence owing to the variation in size of the particles usually found in plastics. ELISA detection of mycoplasma is now becoming more commonly used and can be carried out using specifically designed kits following the manufacturer’s protocol and reagents supplied. In this assay, 96-well plates are coated with the antibodies against different mycoplasma species. Each plate is then incubated at 37  C for 2 h with the required antibody or antibodies before blocking with the appropriate blocking solution and incubating with the test sample(s). A negative control, which is simply media with sample buffer, and a positive control normally provided with the kits, should also be included in each assay. A detection antibody is subsequently added to the samples, incubated for a further 2 h at 37  C before washing and incubating with a streptavidin solution for 1 h at 37  C. Each plate is then detected for mycoplasma by adding the substrate solution and read on a plate reader at 405 nm after a further 30 min incubation at room temperature. This method is apparently suitable for detecting high levels of mycoplasma and could also be used to identify several species in one assay.

49

2.5 Types of animal cell, characteristics and maintenance in culture

As with the ELISAs, commercial kits are also available for PCR detection of mycoplasma which contain the required primers, internal control template, positive control template and all the relevant buffers. Samples are generated and set up in a reaction mix as instructed in the manufacturer’s protocol. The PCR is performed, again using the defined conditions outlined in the manufacturer’s protocol, and the products generated analysed by electrophoresis on a high-grade 2% agarose gel. Although sensitive, PCR detection of mycoplasma is not always the protocol of choice because it has been shown to be prone to false-negative results, presumably due to the presence of ingredients in the kit which may inhibit PCR amplification of the target gene. In addition, this method is time-consuming and expensive.

2.4.4 Eradication of mycoplasma Until recently, the most common approach for eradicating mycoplasma has been the use of antibiotics such as gentamycin. This approach is, however, not always effective, as not all strains of mycoplasma are susceptible to this antibiotic. Moreover antibiotic therapy does not always result in long-lasting successful elimination and most drugs can be cytotoxic to the cell culture. More recently, a new generation of bactericidal antibiotic preparation referred to as PlasmocinTM was introduced and has been shown to be effective against mycoplasma even at relatively low, non-cytotoxic concentrations. The antibiotics contained in this product are actively transported into cells, thus facilitating killing of intracellular mycoplasma but without any adverse effects on actual cellular metabolism. Apart from antibiotics, various products have also been introduced into the cell culture market that the manufacturers claim eradicate mycoplasma efficiently and quickly without causing any adverse effects to the cells. One such product is Mynox®, a biological agent that integrates into the membrane of mycoplasma, compromising its integrity and eventually initiating its disintegration. This process apparently occurs within an hour of applying Mynox® and may have the added advantage that it is not an antibiotic and as a result will not lead to the development of resistant strains. It is safe to cultures and eliminated once the medium has been replaced. Moreover, this reagent is highly sensitive, detecting as little as 1–5 fg of mycoplasma DNA, which corresponds to two to five mycoplasma per sample and is effective against many of the common mycoplasma contaminations encountered in cell culture.

2.5 TYPES OF ANIMAL CELL, CHARACTERISTICS AND MAINTENANCE IN CULTURE The cell types used in cell culture fall into two categories generally referred to as either a primary culture or a cell line.

2.5.1 Primary cell cultures Primary cultures are cells derived directly from tissues following enzymatic dissociation or from tissue fragments referred to as explants. These are usually the cells of

50

Cell culture techniques

preference, since it is argued that primary cultures retain their characteristics and reflect the true activity of the cell type in vivo. The disadvantage in using primary cultures, however, is that their isolation can be labour-intensive and may produce a heterogeneous population of cells. Moreover, primary cultures have a relatively limited lifespan and can be used over only a limited period of time in culture. Primary cultures can be obtained from many different tissues and the source of tissue used generally defines the cell type isolated. For instance, cells isolated from the endothelium of blood vessels are referred to as endothelial cells whilst those isolated from the medial layer of the blood vessels and other similar tissues are smooth muscle cells. Although both can be obtained from the same vessels, endothelial cells are different in morphology and function, generally growing as a single monolayer characterised by a cobble-stoned morphology. Smooth muscle cells on the other hand are elongated, with spindle-like projections at either end, and grow in layers even when maintained in culture. In addition to these cell types there are several other widely used primary cultures derived from a diverse range of tissues, including fibroblasts from connective tissue, lymphocytes from blood, neurons from nervous tissues and hepatocytes from liver tissue.

2.5.2 Continuous cell lines Cell lines consist of a single cell type that has gained the ability for infinite growth. This usually occurs after transformation of cells by one of several means that include treatment with carcinogens or exposure to viruses such as the monkey simian virus 40 (SV40), Epstein–Barr virus (EBV) or Abelson murine leukaemia virus (A-MuLV) amongst others. These treatments cause the cells to lose their ability to regulate growth. As a result, transformed cells grow continuously and, unlike primary culture, have an infinite lifespan (become ‘immortalised’). The drawback to this is that transformed cells generally lose some of their original in vivo characteristics. For instance, certain established cell lines do not express particular tissue-specific genes. One good example of this is the inability of liver cell lines to produce clotting factors. Continuous cell lines, however, have several advantages over primary cultures, not least because they are immortalised. In addition, they require less serum for growth, have a shorter doubling time and can grow without necessarily needing to attach or adhere to the surface of the flask. Many different cell lines are currently available from various cell banks, which makes it easier to obtain these cells without having to generate them. One of the largest organisations that supplies cell lines is the European Collection of Animal Cell Cultures (ECACC) based in Salisbury, UK. A selection of the different cell lines supplied by this organisation is listed in Table 2.1.

2.5.3 Cell culture media and growth requirements for animal cells The cell culture medium used for animal cell growth is a complex mixture of nutrients (amino acids, a carbohydrate such as glucose, and vitamins), inorganic salts (e.g. containing magnesium, sodium, potassium, calcium, phosphate, chloride, sulphate,

51

2.5 Types of animal cell, characteristics and maintenance in culture

Table 2.1 Examples of cell lines supplied by commercial sources Cell line

Morphology

Species

Tissue origin

BAE-1

Endothelial

Bovine

Aorta

BHK-21

Fibroblast

Syrian hamster

Kidney

CHO

Fibroblast

Chinese hamster

Ovary

COS-1/7

Fibroblast

African green monkey

Kidney

HeLa

Epithelial

Human

Cervix

HEK-293

Epithelial

Human

Kidney

HT-29

Epithelial

Human

Colon

MRC-5

Fibroblast

Human

Lung

NCI-H660

Epithelial

Human

Lung

NIH/3T3

Fibroblast

Mouse

Embryo

THP-1

Monocytic

Human

Blood

V-79

Fibroblast

Chinese hamster

Lung

HEP1

Hepatocytes

Human

Liver

and bicarbonate ions) and broad-spectrum antibiotics. In certain situations it may be essential to include a fungicide such as amphotericin B, although this may not always be necessary. For convenience and ease of monitoring the status of the medium, the pH indicator phenol red may also be included. This will change from red at pH 7.2–7.4 to yellow or fuchsia as the pH becomes either acidic or alkaline, respectively. The other key basic ingredient in the cell culture medium is serum, usually bovine or fetal calf. This is used to provide a buffer for the culture medium, but, more importantly, enhances cell attachment and provides additional nutrients and hormone-like growth factors that promote healthy growth of cells. An attempt to culture cells in the absence of serum does not usually result in successful or healthy cultures, even though cells can produce growth factors of their own. However, despite these benefits, the use of serum is increasingly being questioned not least because of many of the other unknowns that can be introduced, including infectious agents such as viruses and mycoplasma. The recent resurgence of ‘mad cow disease’ (bovine spongiform encephalitis) has introduced an additional drawback, posing a particular risk for the cell culturist, and has increased the need for alternative products. In this regard, several cell culture reagent manufacturers have now developed serum-free medium supplemented with various components including albumin, transferrin, insulin, growth factors and other essential elements required for optimal cell growth. This is proving very useful, particularly for the pharmaceutical and biotechnology companies involved in the manufacture of drugs or biological products for human and animal consumption.

52

Cell culture techniques

2.5.4 Preparation of animal cell culture medium Preparation of the culture medium is perhaps taken for granted as a simple straightforward procedure that is often not given due care and attention. As a result, most infections in cell culture laboratories originate from infected media. Following the simple yet effective procedures outlined in Section 2.4.1 should prevent or minimise the risk of infecting the media when they are being prepared. Preparation of the medium itself should also be carried out inside the culture cabinet and usually involves adding a required amount of serum together with antibiotics to a fixed volume of medium. The amount of serum used will depend on the cell type but usually varies between 10% and 20%. The most common antibiotics used are penicillin and streptomycin, which inhibit a wide spectrum of Gram-positive and Gram-negative bacteria. Penicillin acts by inhibiting the last step in bacterial cell wall synthesis whilst streptomycin blocks protein synthesis. Once prepared, the mixture, which is referred to as complete growth medium, should be kept at 4  C until used. To minimise wastage and risk of contamination it is advisable to make just the required volume of medium and use this within a short period of time. As an added precaution it is also advisable always to check the clarity of the medium before use. Any infected medium, which will appear cloudy or turbid, should be discarded immediately. In addition to checking the clarity, a close eye should also be kept on the colour of the medium, which should be red at physiological pH owing to the presence of phenol red. Media that looks acidic (yellow) or alkaline (fuchsia) should be discarded, as these extremes will affect the viability and thus growth of the cells.

2.5.5 Subculture of cells Subculturing is the process by which cells are harvested, diluted in fresh growth medium and replaced in a new culture flask to promote further growth. This process, also known as passaging, is essential if the cells are to be maintained in a healthy and viable state, otherwise they may die after a certain period in continuous culture. The reason for this is that adherent cells grow in a continuous layer that eventually occupies the whole surface of the culture dish and at this point they are said to be confluent. Once confluent, the cells stop dividing and go into a resting state where they stop growing (senesce) and eventually die. Thus, to keep cells viable and facilitate efficient transformation, they must be subcultured before they reach full contact inhibition. Ideally, cells should be harvested just before they reach a confluent state. Cells can be harvested and subcultured using one of several techniques. The precise method used is dependent to a large extent on whether the cells are adherent or in suspension. Subculture of adherent cells Adherent cells can be harvested either mechanically, using a rubber spatula (also referred to as a ‘rubber policeman’) or enzymatically using proteolytic enzymes. Cells in suspension are simply diluted in fresh medium by taking a given volume of cell suspension and adding an equal volume of medium.

53

2.5 Types of animal cell, characteristics and maintenance in culture

Fig. 2.5 Cell scrapers.

Harvesting of cells mechanically This method is simple and easy. It involves gently scraping cells from the growth surface into the culture medium using a rubber spatula that has a rigid polystyrene handle with a soft polyethylene scraping blade (Fig. 2.5). This method is not suitable for all cell types as the scraping may result in membrane damage and significant cell death. Before adopting this approach it is important to carry out some test runs where cell viability and growth are monitored in a small sample of cells following harvesting. Harvesting of cells using proteolytic enzymes Several different proteolytic enzymes can be exploited including trypsin, a proteolytic enzyme that destroys proteinaceous connections between cells and between cells and the surface of the flask in which they grow. As a result, harvesting of cells using this enzyme results in the release of single cells, which is ideal for subculturing as each cell will then divide and grow, thus enhancing the propagation of the cultures. Trypsin is commonly used in combination with EDTA, which enhances the action of the enzyme. EDTA alone can also be effective in detaching adherent cells as it chelates the Ca2þ required by some adhesion molecules that facilitate cell–cell or cell–matrix interactions. Although EDTA alone is much gentler on the cells than trypsin, some cell types may adhere strongly to the plastic, requiring trypsin to detach. The standard procedure for detaching adherent cells using trypsin and EDTA involves making a working solution of 0.1% trypsin plus 0.02% EDTA in Ca2þ/Mg2þ-free phosphate-buffered saline. The growth medium is aspirated from confluent cultures and washed at least twice with a serum-free medium such as Ca2þ or Mg2þ-free PBS to remove traces of serum that may inactivate the trypsin. The trypsin–EDTA solution (approximately 1 cm3 per 25 cm2 of surface area) is then added to the cell monolayer and swirled around for a few seconds. Excess trypsin– EDTA is aspirated, leaving just enough to form a thin film over the monolayer. The flask is then incubated at 37  C in a cell culture incubator for 2–5 min but monitored under an inverted light microscope at intervals to detect when the cells

54

Cell culture techniques

are beginning to round up and detach. This is to ensure that the cells are not overexposed to trypsin, as this may result in extensive damage to the cell surface, eventually resulting in cell death. It is important therefore that the proteolysis reaction is quickly terminated by the addition of complete medium containing serum that will inactivate the trypsin. The suspension of cells is collected into a sterile centrifuge tube and spun at 1000 r.p.m. for 10 min to pellet the cells, which are then resuspended in a known volume of fresh complete culture medium to give a required density of cells per cubic centimetre volume. As with all tissue culture procedures, aseptic techniques should be adopted at all times. This means that all the above procedures should be carried out in a tissue culture cabinet under sterile conditions. Other precautions worth noting include the handling of the trypsin stock. This should be stored frozen at 20  C and, when needed, placed in a water bath just to the point where it thaws. Any additional time in the 37  C water bath will inactivate the enzymatic activity of the trypsin. The working solution should be kept at 4  C once made and can be stored for up to 3 months. Subculture of cells in suspension For cells in suspension it is important initially to examine an aliquot of cells under a microscope to establish whether cultures are growing as single cells or clumps. If cultures are growing as single cells, an aliquot is counted as described in Section 2.5.6 below and then reseeded at the desired seeding density in a new flask by simply diluting the cell suspension with fresh medium, provided the original medium in which the cells were growing is not spent. However, if the medium is spent and appears acidic, then the cells must be centrifuged at 1000 r.p.m. for 10 min, resuspended in fresh medium and transferred into a new flask. Cells that grow in clumps should first be centrifuged and resuspended in fresh medium as single cells using a glass Pasteur or fine-bore pipette.

2.5.6 Cell quantification It is essential that when cells are subcultured they are seeded at the appropriate seeding density that will facilitate optimum growth. If cells are seeded at a lower seeding density they may take longer to reach confluency and some may expire before getting to this point. On the other hand, if seeded at a high density, cells will reach confluency too quickly, resulting in irreproducible experimental results. This is because trypsin can digest surface proteins, including receptors for drugs, and these will need time (sometimes several days) to renew. Failure to allow these proteins to be regenerated on the cell surface may therefore result in variable responses to drugs specific for such receptors. Several techniques are now available for quantification of cells and of these the most common method involves the use of a haemocytometer. This has the added advantage of being simple and cheap to use. The haemocytometer itself is a thickened glass slide that has a small chamber of grids cut into the glass. The chamber has a fixed volume and is etched into nine large squares, of which the large corner squares

55

2.5 Types of animal cell, characteristics and maintenance in culture

1 mm

1 mm

1 mm

1 mm

1 mm

1 mm

0.25 mm

Fig. 2.6 Haemocytometer.

contain 16 small squares each; each large square measures 1 mm  1 mm and is 0.1 mm deep (see Fig. 2.6). Thus, with a coverslip in place, each square represents a volume of 0.1 mm3 (1.0 mm2 area  0.1 mm depth) or 104 cm3. Knowing this, the cell concentration (and the total number of cells) can therefore be determined and expressed per cubic centimetre. The general procedure involves loading approximately 10 ml of a cell suspension into a clean haemocytometer chamber and counting the cells within the four corner squares with the aid of a microscope set at 20 magnification. The count is mathematically converted to the number of cells per cm3 of suspension. To ensure accuracy, the coverslip must be firmly in place and this can be achieved by moistening a coverslip with exhaled breath and gently sliding it over the haemocytometer chamber, pressing firmly until Newton’s refraction rings (usually rainbowlike) appear under the coverslip. The total number of cells in each of the four 1-mm3 corner squares should be counted, with the proviso that only cells touching the top or left borders but not those touching the bottom and right borders are counted. Moreover, cells outside the large squares, even if they are within the field of view, should not be counted. When present, clumps should be counted as one cell. Ideally 100 cells should be counted to ensure a high degree of accuracy in counting. If the total cell count is less than 100 or if more than 10% of the cells counted appear to be clustered, then the original cell suspension should be thoroughly mixed and the counting procedure repeated. Similarly, if the total cell count is greater than 400, the suspension should be diluted further to get counts of between 100 and 400 cells. Since some cells may not survive the trypsinisation procedure it is usually advisable to add an equal volume of the dye trypan blue to a small aliquot of the cell suspension before counting. This dye is excluded by viable cells but taken up by dead cells. Thus, when viewed under the microscope, viable cells will appear as bright translucent

56

Cell culture techniques

structures while dead cells will stain blue (see Section 2.5.12). The number of dead cells can therefore be excluded from the total cell count, ensuring that the seeding density accurately reflects viable cells. Calculating cell number Cell number is usually expressed per cm3 and is determined by multiplying the average of the number of cells counted by a conversion factor which is constant for the haemocytometer. The conversion factor is estimated at 1000, based on the fact that each large square counted represents a total volume of 104 cm3. Thus: cells cm3 ¼

number of cells counted  conversion factor number of squares counted

If the cells were diluted before counting then the dilution factor should also be taken into account. Therefore: cells cm3 ¼

number of cells counted  conversion factor  dilution factor number of squares counted

To get the total number of cells harvested the number of cells determined per cm3 should be multiplied by the original volume of fluid from which the cell sample was removed, i.e.: total cells ¼ cells cm3  total volume of cell suspension

Example 1 CALCULATION OF CELL NUMBER Question Calculate the total number of cells suspended in a final volume of 5 ml, taking into account that the cells were diluted 1 : 2 before counting and the number of cells counted with the haemocytometer was 400. number of cells counted  conversion factor large squares counted 400  1000 ¼ 4 ¼ 100 000 cells cm3

Answer Cells cm3 ¼

Because there is a dilution factor of 2, the correct number of cells cm3 is given as: 100 000  2 ¼ 200 000 cells cm3 Thus in a final volume of 5 cm3 the total number of cells present is: 200 000  5 ¼ 1 000 000 cells

57

2.5 Types of animal cell, characteristics and maintenance in culture

2

5

5

Pulse measurement system To vacuum pump

Internal electrode External electrode Aperture Cell suspension

Fig. 2.7 Coulter counter. Cells entering the aperture create a pulse of resistance between the internal and external electrodes that is recorded on the oscilloscope.

Alternative methods for determination of cell number Several other methods are available for quantifying cells in culture, including direct measurement using an electronic Coulter counter. This is an automated method of counting and measuring the size of microscopic particles. The instrument itself consists of a glass probe with an electrode that is connected to an oscilloscope (Fig. 2.7). The probe has a small aperture of fixed diameter near its bottom end. When immersed in a solution of cell suspension, cells are flushed through the aperture causing a brief increase in resistance owing to a partial interruption of current flow. This will result in spikes being recorded on the oscilloscope and each spike is counted as a cell. One disadvantage of this method, however, is that it does not distinguish between viable and dead cells. Indirectly, cells can be counted by determining total cell protein and using a protein versus cell number standard curve to determine cell number in test samples. However, protein content per cell can vary during culture and may not give a true reflection of cell number. Alternatively, the DNA content of cells may be used as an indicator of cell number, since the DNA content of diploid cells is usually constant. However, the DNA content of cells may change during the cell cycle and therefore not give an accurate estimate of cell number.

2.5.7 Seeding cells onto culture plates Once counted, cells should then be seeded at a density that promotes optimal cell growth. It is essential therefore that when cells are subcultured they are seeded at the

58

Cell culture techniques

appropriate seeding density. If cells are seeded at a lower density they may take longer to reach confluency and some may die before getting to this point. On the other hand, if seeded at too high a density cells will reach confluency too quickly, resulting in irreproducible experimental results as already discussed above (see Section 2.5.6). The seeding density will vary depending on the cell type and on the surface area of the culture flask into which the cells will be placed. These factors should therefore be taken into account when deciding on the seeding density of any given cell type and the purpose of the experiments carried out.

2.5.8 Maintenance of cells in culture It is important that after seeding, flasks are clearly labelled with the date, cell type and the number of times the cells have been subcultured or passaged. Moreover, a strict regime of feeding and subculturing should be established that permits cells to be fed at regular intervals without allowing the medium to be depleted of nutrients or the cells to overgrow or become super confluent. This can be achieved by following a standard but routine procedure for maintaining cells in a viable state under optimum growth conditions. In addition, cultures should be examined daily under an inverted microscope, looking particularly for changes in morphology and cell density. Cell shape can be an important guide when determining the status of growing cultures. Round or floating cells in subconfluent cultures are not usually a good sign and may indicate distressed or dying cells. The presence of abnormally large cells can also be useful in determining the well-being of the cells, since the number of such cells increases as a culture ages or becomes less viable. Extremes in pH should be avoided by regularly replacing spent medium with fresh medium. This may be carried out on alternate days until the cultures are approximately 90% confluent, at which point the cells are either used for experimentation or trypsinised and subcultured following the procedures outlined in Section 2.5.5. The volume of medium added to the cultures will depend on the confluency of the cells and the surface area of the flasks in which the cells are grown. As a guide, cells which are under 25% confluent may be cultured in approximately 1 cm3 of medium per 5 cm2 and those between 25% and 40% or ≧ 45% confluency should be supplemented with 1.5 cm3 or 2 cm3 culture medium per 5 cm2, respectively. When changing the medium it is advisable to pipette the latter on to the sides or the opposite surface of the flask from where the cells are attached. This is to avoid making direct contact with the monolayers as this will damage or dislodge the cells.

2.5.9 Growth kinetics of animal cells in culture When maintained under optimum culture conditions, cells follow a characteristic growth pattern (Fig. 2.8), exhibiting an initial lag phase in which there is enhanced cellular activity but no apparent increase in cell growth. The duration of this phase is dependent on several factors including the viability of the cells, the density at which the cells are plated and the media component.

59

2.5 Types of animal cell, characteristics and maintenance in culture

Cell density (cells cm-3)

Stationary phase

Decline phase

Log phase

Lag phase

Time (days)

Fig. 2.8 Growth curve showing the phases of cell growth in culture.

The lag phase is followed by a log phase in which there is an exponential increase in cell number with high metabolic activity. These cells eventually reach a stationary phase where there is no further increase in growth due to depletion of nutrients in the medium, accumulation of toxic metabolic waste or a limitation in available growth space. If left unattended, cells in the stationary phase will eventually begin to die, resulting in the decline phase on the growth curve.

2.5.10 Cryopreservation of cells Cells can be preserved for later use by freezing stocks in liquid nitrogen. This process is referred to as cryopreservation and is an efficient way of sustaining stocks. Indeed, it is advisable that, when good cultures are available, aliquots of cells should be stored in the frozen state. This provides a renewable source of cells that could be used in future without necessarily having to culture new batches from tissues. Freezing can, however, result in several lethal changes within the cells, including formation of ice crystals and changes in the concentration of electrolytes and in pH. To minimise these risks a cryoprotective agent such as DMSO is usually added to the cells prior to freezing in order to lower the freezing point and prevent ice crystals from forming inside the cells. In addition, the freezing process is carried out in stages, allowing the cells initially to cool down slowly from room temperature to 80  C at a rate of 1–3  C min1. This initial stage can be carried out using a freezing chamber or alternatively a cryo freezing container (‘Mr Frosty’) filled with isopropanol, which provides the critical, repeatable 1  C min1 cooling rate required for successful cell cryopreservation. When this process is complete, the cryogenic vials, which are polypropylene tubes that can withstand temperatures as low as 190  C, are removed and immediately placed in a liquid nitrogen storage tank where they can remain for an indefinite period or until required.

60

Cell culture techniques

The actual cryogenic procedure is itself relatively straightforward. It involves harvesting cells as described in Section 2.5.5 and resuspending them in 1 cm3 of freezing medium, which is basically culture medium containing 40% serum. The cell suspension is counted and appropriately diluted to give a final cell count of between 106 and 107 cells cm3. A 0.9-cm3 aliquot is transferred into a cryogenic vial labelled with the cell type, passage number and date harvested. This is then made up to 1 cm3 by adding 100 mm3 of DMSO to give a final concentration of 10%. The cells should then be mixed gently by rotating or inverting the vial and placed in a ‘Mr Frosty’ cryo freezing container. The container and cells are placed in a 80  C freezer and allowed to freeze overnight. The frozen vials may then be transferred into a liquid nitrogen storage container. At this stage cells can be stored frozen until required for use. All procedures should be carried out under sterile conditions to avoid contaminating cultures as this will appear once the frozen stocks are recultured. As an added precaution it is advisable to replace the growth medium in the 24-h period prior to harvesting cells for freezing. Moreover, cells used for freezing should be in the log phase of growth and not too confluent in case they may already be in growth arrest.

2.5.11 Resuscitation of frozen cells When required, frozen stocks of cells may be revived by removing the cryogenic vial from storage in liquid nitrogen and placing in a water bath at 37  C for 1–2 min or until the ice crystals melt. It is important that the vials are not allowed to warm up to 37  C as this may cause the cells to rapidly die. The thawed cell suspension may then be transferred into a centrifuge tube, to which fresh medium is added and centrifuged at 1000 r.p.m. for 10 min. The supernatant should be discarded to remove the DMSO used in the freezing process and the cell pellet resuspended in 1 cm3 of fresh medium, ensuring that clumps are dispersed into single cells or much smaller clusters using a glass Pasteur pipette. The required amount of fresh pre-warmed growth medium is placed in a culture flask and the cells pipetted into the flask, which is then placed in a cell culture incubator and the cells allowed to adhere and grow. Practical hints and tips in resuscitation of frozen cells It is important to handle resuscitated cells delicately after thawing as these may be fairly fragile and could degenerate quite readily if not treated correctly. In addition, it is important to dilute the freezing medium immediately after thawing to reduce the concentration of DMSO or freezing agent to which the cells are exposed.

2.5.12 Determination of cell viability Determination of cell viability is extremely important, since the survival and growth of the cells may depend on the density at which they are seeded. The degree of viability is most commonly determined by differentiating living from dead cells using the dye exclusion method. Basically, living cells exclude certain dyes that are readily taken up by dead cells. As a result, dead cells stain the colour of the dye used whilst living cells remain refractile owing to the inability of the dye to penetrate into the

61

2.6 Stem cell culture

cytoplasm. One of the most commonly used dyes in such assays is trypan blue. This is incubated at a concentration of 0.4% with cells in suspension and applied to a haemocytometer. The haemocytometer is then viewed under an inverted microscope set at 100  magnification and the cells counted as described in Section 2.5.6, keeping separate counts for viable and non-viable cells. The total number of cells is calculated using the following equation as described previously: cells cm3 ¼

number of cells counted  conversion factor  dilution factor number of squares counted

and the percentage of viable cells determined using the following formula: % viability ¼

number of unstained cells counted  100 total number of cells counted

To avoid underestimating cell viability it is important that the cells are not exposed to the dye for more than 5 min before counting. This is because uptake of trypan blue is time sensitive and the dye may be taken up by viable cells during prolonged incubation periods. Additionally, trypan blue has a high affinity for serum proteins and as such may produce a high background staining. The cells should therefore be free from serum, which can be achieved by washing the cells with PBS before counting.

2.6 STEM CELL CULTURE Stem cells are unspecialised cells which have the ability to undergo self-renewal, replicating many times over prolonged periods, thereby generating new unspecialised cells. More importantly, stem cells have the potential to give rise to specialised cells with specific functions by the process of differentiation. Because of this property, stem cells are now being developed and exploited for cell-based therapies in various disease states. It has therefore become essential to be able to isolate, maintain and grow these cells in culture. This is however an emerging field where protocols to be used routinely are still being developed. This section of the chapter will focus on techniques that are now becoming routine for stem cell culture, focussing essentially on human embryonic stem cells (hESCs). The latter are cells derived from the inner cell mass of the blastocyst which is a hollow microscopic ball made up of an outer layer of cells (the trophoblast), a fluid-filled cavity (the blastocoel) and the cluster of inner cell mass. Culturing of hESCs can be carried out in a standard cell culture laboratory using equipment already described earlier in the chapter. As with normal cell culture, the important criteria are that good aseptic techniques are adopted together with good laboratory practice. Unlike normal specialised cells, however, culture of hESCs requires certain conditions specifically aimed at maintaining these cells in a viable undifferentiated state. Historically, hESCs, and indeed other stem cells, have been cultured on what are referred to as feeders which act to sustain growth and maintain cells in the undifferentiated state without allowing them to lose their pluripotency

62

Cell culture techniques

(i.e. ability to differentiate, when needed, into specialised cell types of the three germ layers). The most common feeder cultures used are fibroblasts derived from embryos. The methodology for this together with other techniques for successful maintenance and propagation of hESCs are described below. Other protocols such as freezing and resuscitation of frozen cells are similar to those already described and the reader is therefore referred to the relevant sections above.

2.6.1 Preparation of embryonic fibroblasts Typically, fibroblasts are isolated under sterile conditions in a tissue culture cabinet from embryos obtained from mice at 13.5 days of gestation. Each embryo is minced into very fine pieces using sterile scissors and incubated in a cell culture incubator at 37  C with trypsin/EDTA (0.25% (w/v)/5 mM) for 20 minutes. The mixture is then pipetted vigorously using a fine-bore pipette until it develops a sludgy consistency. This process is repeated, returning the digest into the incubator if necessary, until the embryos have been virtually digested. The trypsin is subsequently neutralised with culture medium containing 10% serum ensuring that the volume of medium is at least twice that of the trypsin used. The minced tissue is plated onto a tissue culture flask and incubated overnight at 37  C in a tissue culture incubator. The medium is subsequently removed after 24 h and the cell monolayer washed to remove any tissue debris and non-adherent cells. Adherent cells are cultured to 80–90% confluency before being passaged using trypsin as described in Section 2.5.5. If needed, the trypsinised cells could be propagated, otherwise they should be frozen as described in Section 2.5.10 and used as stock. If the latter is preferred, ensure that cells are frozen at no higher than passage three. Practical hints and tips in using fibroblast feeders Mouse fibroblasts should be used as feeders for stem cell culture between passages three and five. This is to ensure that fibroblasts support the growth of undifferentiated cells. After passage five the cells may begin to senesce and could also potentially fail to maintain stem cells in the undifferentiated state. Each batch of feeders prepared should be tested for their ability to support cells in an undifferentiated state.

2.6.2 Inactivation of fibroblast cells for use as feeders Fibroblasts isolated should be inactivated before they can be used as feeders in order to prevent their proliferation and expansion during culture. This can be achieved using one of two protocols which include either irradiation or treatment with the antibiotic DNA cross-linker mitomycin C. With the former, cells in suspension are exposed to 80 Gy of irradiation using a caesium-source gamma irradiator. This is the dose of irradiation normally used for mouse fibroblasts; however, the radiation dose and exposure time may vary between batches of fibroblasts. As a result, a dose

63

2.6 Stem cell culture

curve should be performed to determine the effective irradiation that is sufficient to stop cell division without cellular toxicity. Once irradiated, cells are spun at 1000 r.p.m. before resuspending the pellet using the appropriate medium and at the appropriate density for freezing or plating on gelatin-coated plates. With the mitomycin procedure, cells are normally incubated with the compound at a concentration of 10 mg cm3 for 2–3 h at 37  C in a cell culture incubator. After this, the mitomycin solution is aspirated and the cells washed several times with phosphate buffered saline or serum-free culture medium to ensure that there are no trace amounts of mitomycin that could affect the stem cells. The cells are then trypsinised, neutralised with serum containing medium, centrifuged and re-plated onto gelatincoated dishes at the appropriate cell density. Practical hints and tips with feeders Of the two methods, exposure of cells to a gamma irradiation is the much preferred methodology because this gives a more consistent and reliable inactivation of cells. More importantly, mitomycin can be harmful and toxic, with embryonic cells showing particular sensitivity to this compound. Use of mitomycin-inactivated fibroblasts should therefore generally be avoided if irradiated feeders can be obtained. If frozen stocks are required of inactivated feeders, these can be prepared as described in Section 2.5.10. It is, however, important to ensure that stocks are not kept frozen for periods exceeding 4 months to avoid degeneration of cells. In addition, once plated, feeders should be used for stem cell culture within 24 h or no longer than 5 days after plating.

2.6.3 Plating of feeder cells As with standard cell culture, fibroblast feeders are plated on tissue culture grade plastics but usually in the presence of a substrate such as gelatin, to provide the extracellular matrix component needed for cell attachment of the inactivated fibroblasts. In brief, the plates or flasks are incubated for 1 h at room temperature or overnight at 4  C with the appropriate volume of 0.1% sterile gelatin. Excess gelatin is subsequently removed and the feeder cells plated at the approriate density for each cell line, e.g. 3.5  105 cells per 25-cm2 flask. Feeders should be ready for use after 5–6 h but are best left to establish overnight for better results. Practical hints and tips in plating feeders It is important to ensure that the seeding density is optimal for each cell line otherwise feeders may fail to maintain the hESCs in the undifferentiated state. If frozen stocks of feeders are used for plating, these should be resuscitated, resuspended in fresh growth medium and plated on gelatin-coated plates as described in Section 2.5.11. Again the density of post-thaw feeders required to support the cells in an undifferentiated state should be established for each batch of frozen feeders since there is cell loss during the freeze–thaw process.

64

Cell culture techniques

Fig. 2.9 Undifferentiated hESCs on mouse feeder cells.

2.6.4 Culture of human embryonic stem cells Once the feeders are ready, hESCs can be plated directly by depositing the suspension of hESC onto the feeder layer. The dishes are placed in a cell culture incubator and the cells allowed to attach and establish over a 24-h period. Any non-adherent cells are removed during the first culture medium change. The cells are monitored and fed on a daily basis until the colonies are ready to be passaged. Depending on the conditions of growth, this can usually take up to 6 days. As with the feeders, frozen stocks of hESCs should be resuscitated and diluted in fresh growth medium as described in Section 2.5.11. Practical hints and tips in hESC culture It is important to ensure that the colonies do not grow too large and to the point where adjacent colonies touch each other as this will initiate their differentiation. Similarly, the seeding density should be high enough to sustain growth otherwise sparsely plated colonies will grow very slowly and may never establish fully. Colonies should be plated on healthy feeders that are not more than 4 days old. More importantly, only tightly packed colonies containing cells with the typical hESC morphology should be passaged (see Fig. 2.9). Any colony that has a less defined border (see Fig. 2.10) at the periphery, with loose cells spreading out or cells with atypical morphology, should not be passaged because these characteristics are evidence of cell differentiation. Should cells differentiate, these should be excised or aspirated before passaging the undifferentiated cells. Alternatively, if the majority of the colonies appear differentiated and no colonies display the characteristic morphology of undifferentiated cells, then it is advisable to discard the cultures and start with a new batch of undifferentiated hESCs.

65

2.6 Stem cell culture

Fig. 2.10 Partially differentiated hESCs on mouse feeder cells.

2.6.5 Enzymatic subculture of hESCs As with standard cell culture, hESCs can be passaged using enzymes but in this case an enzyme that does not disperse clusters of cells into single cells is preferred. This is because hESCs need to grow in colonies since single cells may not adhere to the feeders and may differentiate easily. One of the most commonly used enzymes for subculturing hESCs is collagenase. When employed, hESC colonies are washed with phosphate-buffered saline and then incubated for 8–10 min with collagenase IV made up in serum-free medium at a concentration of 1 mg cm3. Curled up colonies can then be dislodged with gentle pipetting using a 5-ml pipette to break large clumps. Alternatively, colonies can be fragmented using glass beads. These are then washed with culture medium to remove the enzyme which may otherwise impair the attachment and growth of the cells, thus reducing the plating efficiency. hESCs can be washed by allowing the colonies to sediment slowly over 5–10 mins, leaving any residual feeder cells in the supernatant which are removed by aspiration. The colonies are subsequently resuspended in growth medium and are usually plated at a ratio of between 1 : 3 and 1 : 6. Alternatively, fragmented colonies could be frozen as described in Section 2.5.10 and stored for later use.

2.6.6 Mechanical subculture of hESCs An alternative to the enzymatic method of subculturing hESCs is to manually cut colonies into appropriate size fragments using a fine-bore needle or a specially designed cutter such as the STEMPRO® EZPassageTM disposable stem cell passaging tool from Invitrogen. To do this, the dish of hESCs is placed under a dissecting

66

Cell culture techniques

Fig. 2.11 Mechanically harvested hESCs.

microscope in a tissue culture hood. Undifferentiated colonies are identified by their morphology and then cut into grids (see Fig. 2.11) by scoring across and perpendicular to the first cut. Using a 1-ml pipette or pastette, the cut segments are transferred to dishes containing fresh feeders and culture medium. The colony fragments are placed evenly across the feeders (see Fig. 2.12) to avoid the colonies clumping together and attaching to the dish as one mass of cells. The dishes are then carefully transferred to a tissue culture incubator and left undisturbed for 1 day before replacing the spent medium with fresh. Established colonies are then fed every day until subcultured.

2.6.7 Feeder-free culture of hESCs Although culture of hESCs on feeders has been extensively used, there have been concerns over this procedure when stem cells are being considered for clinical use in humans. One of the main drawbacks of using feeders is the concern over potential transmission of animal pathogens to humans and the possibility of expression of immunogenic antigens. Feeders are also inconvenient, expensive, and time-consuming to generate and inactivate. As a result of these limitations, there has been a drive towards developing a feeder-free culture system using feeder-conditioned media or media supplemented with different growth factors and other signalling molecules essential

67

2.6 Stem cell culture

Fig. 2.12 Plating of hESCs onto feeder layer.

for sustaining growth. The conditioned medium can be generated by incubating normal growth medium with feeder cells for 24 h before use. Feeder-free culture of hESCs is often carried out on tissue culture plastics coated with Matrigel, a substrate derived from mouse tumour and rich in extracellular matrix proteins such as laminin, collagen and hepran sulphate proteoglycan. It is also rich in growth factors such as basic fibroblast growth factor (bFGF) which can help to sustain and promote stem cell growth whilst maintaining them in an undifferentiated state. Practically, dishes are coated with 5% Matrigel made up in culture medium. Just prior to use, the Matrigel is removed and replaced with culture medium before plating cells. The hESCs, subcultured from feeders or obtained from frozen stocks, are resuspended in conditioned medium supplemented often with bFGF at a concentration of 4 ng ml1 before seeding. Alternatively, normal growth medium could be used but this will require a much higher concentration usually around 100 ng ml1 bFGF. Once established, hESCs are fed every day with fresh growth medium. Colonies on Matrigel tend to show a different morphology to those on feeders; they tend to be larger and less packed initially than when cultured on feeders. Practical hints and tips in using Matrigel All work with Matrigel, other than plating of the hESCs, should be carried out at 4  C. Thus, when coating tissue culture plastics with Matrigel, all the plates and pipette tips should be kept on ice and used cold to prevent the Matrigel solidifying. Stock Matrigel is usually in the solid form and should be placed on ice or in the fridge at 4  C overnight until it liquefies. Once liquefied, the Matrigel should be diluted in ice-cold culture medium at a final concentration of 5%. Each plate should have a smooth even layer of Matrigel and if this is not the case, the plates should be incubated at 4  C until the Matrigel liquefies and settles as a uniform layer. Once coated, Matrigel plates should be used within 7 days of preparation.

68

Cell culture techniques

2.7 BACTERIAL CELL CULTURE As with animal cells, pure bacterial cultures (cultures that contain only one species of organism) are cultivated routinely and maintained indefinitely using standard sterile techniques that are now well defined. However, since bacterial cells exhibit a much wider degree of diversity in terms of both their nutritional and environmental requirements, conditions for their cultivation are diverse and the precise requirements highly dependent on the species being cultivated. Outlined below are general procedures and precautions adopted in bacterial cell culture.

2.7.1 Safety considerations for bacterial cell culture Culture of microbial cells, like that involving cells of animal origin, requires care and sterile techniques, not least of all to prevent accidental contamination of pure cultures with other organisms. More importantly, utmost care should be given towards protecting the operator, especially from potentially harmful organisms. Aseptic techniques and safety conditions described for animal cell culture should be adopted at all times. Additionally, instruments used during the culturing procedures should be sterilised before and after use by heating in a Bunsen burner flame. Moreover, to avoid spread of bacteria, areas of work must be decontaminated after use using germicidal sprays and/or ultraviolet radiation. This is to prevent airborne bacteria from spreading rapidly. In line with these precautions, all materials used in microbial cell culture work must be disposed of appropriately; for instance, autoclaving of all plastics and tissue culture waste before disposal is usually essential.

2.7.2 Nutritional requirements of bacteria The growth of bacteria requires much simpler conditions than those described for animal cells. However, due to their diversity, the composition of the medium used may be variable and largely determined by the nutritional classification of the organisms to be cultured. These generally fall into two main categories classified as either autotrophs (self-feeding organisms that synthesise food in the form of sugars using light energy from the sun) or heterotrophs (non-self-feeding organisms that derive chemical energy by breaking down organic molecules consumed). These in turn are subgrouped into chemo- or photoautotrophs or heterotrophs. Both chemo- and photoautotrophs rely on carbon dioxide as a source of carbon but derive energy from completely different sources, with the chemoautotrophs utilising inorganic substances whilst the photoautotrophs use light. Chemoheterotrophs and photoheterotrophs both use organic compounds as the main source of carbon with the photoheterotrophs using light for energy and the chemo subgroup getting their energy from the metabolism of organic substances.

2.7.3 Culture media for bacterial cell culture Several different types of medium are used to culture bacteria and these can be categorised as either complex or defined. The former usually consist of natural

69

2.7 Bacterial cell culture

substances, including meat and yeast extract, and as a result are less well defined, since their precise composition is largely unknown. Such media are, however, rich in nutrients and therefore generally suitable for culturing fastidious organisms that require a mixture of nutrients for growth. Defined media, by contrast, are relatively simple. These are usually designed to the specific needs of the bacterial species to be cultivated and as a result are made up of known components put together in the required amounts. This flexibility is usually exploited to select or eliminate certain species by taking advantage of their distinguishing nutritional requirements. For instance, bile salts may be included in media when selective cultivation of enteric bacteria (rod-shaped Gram-negative bacteria such as Salmonella or Shigella) is required, since growth of most other Gram-positive and Gram-negative bacteria will be inhibited.

2.7.4 Culture procedures for bacterial cells Bacteria can be cultured in the laboratory using either liquid or solid media. Liquid media are normally dispensed into flasks and inoculated with an aliquot of the organism to be grown. This is then agitated continuously on a shaker that rotates in an orbital manner, mixing and ensuring that cultures are kept in suspension. For such cultures, sufficient space should be allowed above the medium to facilitate adequate diffusion of oxygen into the solution. Thus, as a rule of thumb, the volume of medium added to the flasks should not exceed more than 20% of the total volume of the flask. This is particularly important for aerobic bacteria and less so for anaerobic microorganisms. In large-scale culture, fermenters or bioreactors equipped with stirring devices for improved mixing and gas exchange may be used. The device (Fig. 2.13) is usually fitted with probes that monitor changes in pH, oxygen concentration and temperature. In addition most systems are surrounded by a water jacket with fast-flowing cold water to reduce the heat generated during fermentation. Outlets are also included to release CO2 and other gases produced by cell metabolism. When fermenters are used, precautions should be taken to reduce potential contamination with airborne microorganisms when air is bubbled through the cultures. Sterilisation of the air may therefore be necessary and can be achieved by introducing a filter (pore size of approximately 0.2 mm) at the point of entry of the air flow into the chamber. Solid medium is usually prepared by solidifying the selected medium with 1–2% of the seaweed extract agar, which, although organic, is not degraded by most microbes thereby providing an inert gelling medium on which bacteria can grow. Solid agar media are widely used to separate mixed cultures and form the basis for isolation of pure cultures of bacteria. This is achieved by streaking diluted cultures of bacteria onto the surface of an agar plate by using a sterile inoculating loop. Cells streaked across the plate will eventually grow into a colony, each colony being the product of a single cell and thus of a single species. Once isolated, cells can be cultivated either in batch or continuous cultures. Of these, batch cultures are the most commonly used for routine liquid growth and entail

70

Cell culture techniques

Pressure release valve

Nutrients

Temperature and pH monitors Sample tube

Motor

Sterile air Cooling water

Water jacket

Cooling water

Stirrer

Tap

Products

Fig. 2.13 Schematic representation of a fermenter.

inoculating an aliquot of cells into a sterile flask containing a finite amount of medium. Such systems are referred to as closed, since nutrient supply is limited to that provided at the start of culture. Under these conditions, growth will continue until the medium is depleted of nutrients or there is an excessive build-up of toxic waste products generated by the microbes. Thus, in this system, the cellular composition and physiological status of the cells will vary throughout the growth cycle. In continuous cultures (also referred to as open systems) the medium is refreshed regularly to replace that spent by the cells. The objective of this system is to maintain the cells in the exponential growth phase by enabling nutrients, biomass and waste products to be controlled through varying the dilution rate of the cultures. Continuous cultures, although more complex to set up, offer certain advantages over batch cultures in that they facilitate growth under steady-state conditions in which there is tight coupling between cell division and biosynthesis. As a result, the physiological status of the cultures is more clearly defined, with very little variation in the cellular composition of the cells during the growth cycle. The main concern with the open system is the high risk of contamination associated with the dilution of the cultures. However, applying strict aseptic techniques during feeding or harvesting cells may help to reduce the risk of such contaminations. In addition, the whole system can be automated by connecting the culture vessels to their reservoirs through solenoid valves that can be triggered to open when required. This minimises direct contact with the operator or outside environment and thus reduces the risk of contamination.

71

2.8 Potential use of cell cultures

2.7.5 Determination of growth of bacterial cultures Several methods are available for determining the growth of bacterial cells in culture, including directly counting cells using a haemocytometer as described (Section 2.5.6). This is, however, suitable only for cells in suspension. When cells are grown on solid agar plates, colony counting can be used instead to estimate growth. This method assumes that each colony is derived from a single cell, which may not always be the case, since errors in dilution and/or streaking may result in clumps rather than single cells producing colonies. In addition, suboptimal culture conditions may cause poor growth, thus leading to an underestimation of the true cell count. When cells are grown in suspension, changes in the turbidity of the growth medium could be determined using a spectrophotometer and the absorbance value converted to cell number using a standard curve of absorbance versus cell number. This should be constructed for each cell type by taking the readings of a series of known numbers of cells in suspension (see also Section 12.4.1).

2.8 POTENTIAL USE OF CELL CULTURES Cell cultures of various sorts from animal and microbes are becoming increasingly exploited not only by scientists for studying the activity of cells in isolation, but also by various biotechnology and pharmaceutical companies for the production of valuable biological products including viral vaccines (e.g. polio vaccine), antibodies (e.g. OKT3 used in suppressing immunological organ rejection in transplant surgery) and various recombinant proteins. The application of recombinant DNA techniques has led to an ever-expanding list of improved products, both from mammalian and bacterial cells, for therapeutic use in humans. These products include the commercial production of factor VIII for haemophilia, insulin for diabetes, interferon-a and b for anticancer chemotherapy and erythropoietin for anaemia. Bacterial cultures have also been widely used for other industrial purposes including the large-scale production of cell proteins, growth regulators, organic acids, alcohols, solvents, sterols, surfactants, vitamins, amino acids and many more products. In addition, degradation of waste products particularly those from the agricultural and food industries is another important industrial application of microbial cells. They are also exploited in the bioconversion of waste to useful end products, and in toxicological studies where some of these organisms are rapidly replacing animals in preliminary toxicological testing of xenobiotics. The advent of stem cell culture now provides the possibility of treating diseases using cell-based therapy. This would be particularly important in regenerating diseased or damaged tissues by transplanting stem cells programmed to differentiate into a specific cell type specialised in carrying out a specific function.

ACKNOWLEDGEMENTS Images courtesy of Lesley Young and Paula M Timmons, UK Stem Cell Bank, NIBSC, United Kingdom. Thanks also to Lyn Healy, UK Stem Cell Bank, NIBSC, United Kingdom for valuable comments and advice on stem cell culture.

72

Cell culture techniques

2.9 SUGGESTIONS FOR FURTHER READING Ball, A. S. (1997). Bacterial Cell Culture: Essential Data. John Wiley & Sons, Inc., New York (Gives an adequate background into bacterial cell culture and techniques.) Davis, J. M. (2002). Basic Cell Culture: A Practical Approach, 2nd edn. Oxford University Press, Oxford. (A comprehensive coverage of basic cell culture techniques.) Freshney, R. I. (2005). Culture of Animal Cells: A Manual of Basic Technique, (5th edition). John Wiley & Sons, Inc., New York. (A comprehensive coverage of animal cell culture techniques and applications.) Furr, A. K. (ed.) (2001). CRC Handbook of Laboratory Safety, 5th edn. CRC Press, Boca Raton, FL. (A complete guide to laboratory safety.) HSC advisory committee on dangerous pathogens (2001). The Management Design and Operation of Microbiological Containment Laboratories. HSE books, Sudbury. (Provides guidance, legal requirements and detailed technical information on the design, management and operation of containment laboratories.) Parekh, S. R. and Vinci, V. A. (2003). Handbook of Industrial Cell Culture: Mammalian, Microbial, and Plant Cells. Humana Press, Totowa, NJ. (Provides a good coverage of state-of-the-art techniques for industrial screening, cultivation and scale-up of mammalian, microbial, and plant cells.).

3 Centrifugation K. OHLENDIECK

3.1 3.2 3.3 3.4 3.5 3.6

Introduction Basic principles of sedimentation Types, care and safety aspects of centrifuges Preparative centrifugation Analytical centrifugation Suggestions for further reading

3.1 INTRODUCTION Biological centrifugation is a process that uses centrifugal force to separate and purify mixtures of biological particles in a liquid medium. It is a key technique for isolating and analysing cells, subcellular fractions, supramolecular complexes and isolated macromolecules such as proteins or nucleic acids. The development of the first analytical ultracentrifuge by Svedberg in the late 1920s and the technical refinement of the preparative centrifugation technique by Claude and colleagues in the 1940s positioned centrifugation technology at the centre of biological and biomedical research for many decades. Today, centrifugation techniques represent a critical tool for modern biochemistry and are employed in almost all invasive subcellular studies. While analytical centrifugation is mainly concerned with the study of purified macromolecules or isolated supramolecular assemblies, preparative centrifugation methodology is devoted to the actual separation of tissues, cells, subcellular structures, membrane vesicles and other particles of biochemical interest. Most undergraduate students will be exposed to preparative centrifugation protocols during practical classes and might also experience a demonstration of analytical centrifugation techniques. This chapter is accordingly divided into a short introduction into the theoretical background of sedimentation, an overview of practical aspects of using centrifuges in the biochemical laboratory, an outline of preparative centrifugation and a description of the usefulness of ultracentrifugation techniques in the biochemical characterisation of macromolecules. To aid in the understanding of the basic principles of centrifugation, the general design of various rotors and separation processes is diagrammatically represented. Often the learning process of undergraduate students is hampered by the lack of a proper linkage between theoretical knowledge and practical 73

74

Centrifugation

applications. To overcome this problem, the description of preparative centrifugation techniques is accompanied by an explanatory flow chart and the detailed discussion of the subcellular fractionation protocol of a specific tissue preparation. Taking the isolation of fractions from skeletal muscle homogenates as an example, the rationale behind individual preparative steps is explained. Since affinity isolation methods not only represent an extremely powerful tool in purifying biomolecules (see Chapter 11), but can also be utilised to separate intact organelles and membrane vesicles by centrifugation, lectin affinity agglutination of highly purified plasmalemma vesicles from skeletal muscle is described. Traditionally, marker enzyme activities are used to determine the overall yield and enrichment of particular structures within subcellular fractions following centrifugation. As an example, the distribution of key enzyme activities in mitochondrial subfractions from liver is given. However, most modern fractionation procedures are evaluated by more convenient methods, such as protein gel analysis in conjunction with immunoblot analysis. Miniature gel and blotting equipment can produce highly reliable results within a few hours making it an ideal analytical tool for high-throughput testing. Since electrophoretic techniques are introduced in Chapter 10 and are used routinely in biochemical laboratories, the protein gel analysis of the distribution of typical marker proteins in affinity isolated plasmalemma fractions is graphically represented and discussed. Although monomeric peptides and proteins are capable of performing complex biochemical reactions, many physiologically important elements do not exist in isolation under native conditions. Therefore, if one considers individual proteins as the basic units of the proteome (see Chapter 8), protein complexes actually form the functional units of cell biology. This gives investigations into the supramolecular structure of protein complexes a central place in biochemical research. To illustrate this point, the sedimentation analysis of a high-molecular-mass membrane assembly, the dystrophin–glycoprotein complex of skeletal muscle, is shown and the use of sucrose gradient centrifugation explained.

3.2 BASIC PRINCIPLES OF SEDIMENTATION From everyday experience, the effect of sedimentation due to the influence of the Earth’s gravitational field (g ¼ 981 cm s–2) versus the increased rate of sedimentation in a centrifugal field (g > 981 cm s–2) is apparent. To give a simple but illustrative example, crude sand particles added to a bucket of water travel slowly to the bottom of the bucket by gravitation, but sediment much faster when the bucket is swung around in a circle. Similarly, biological structures exhibit a drastic increase in sedimentation when they undergo acceleration in a centrifugal field. The relative centrifugal field is usually expressed as a multiple of the acceleration due to gravity. Below is a short description of equations used in practical centrifugation classes. When designing a centrifugation protocol, it is important to keep in mind that:

• •

the more dense a biological structure is, the faster it sediments in a centrifugal field; the more massive a biological particle is, the faster it moves in a centrifugal field;

75

• • • •

3.2 Basic principles of sedimentation

the denser the biological buffer system is, the slower the particle will move in a centrifugal field; the greater the frictional coefficient is, the slower a particle will move; the greater the centrifugal force is, the faster the particle sediments; the sedimentation rate of a given particle will be zero when the density of the particle and the surrounding medium are equal. Biological particles moving through a viscous medium experience a frictional drag, whereby the frictional force acts in the opposite direction to sedimentation and equals the velocity of the particle multiplied by the frictional coefficient. The frictional coefficient depends on the size and shape of the biological particle. As the sample moves towards the bottom of a centrifuge tube in swing-out or fixed-angle rotors, its velocity will increase due to the increase in radial distance. At the same time the particles also encounter a frictional drag that is proportional to their velocity. The frictional force of a particle moving through a viscous fluid is the product of its velocity and its frictional coefficient, and acts in the opposite direction to sedimentation. From the equation (3.1) for the calculation of the relative centrifugal field it becomes apparent that when the conditions for the centrifugal separation of a biological particle are described, a detailed listing of rotor speed, radial dimensions and duration of centrifugation has to be provided. Essentially, the rate of sedimentation is dependent upon the applied centrifugal field (cm s2), G, that is determined by the radial distance, r, of the particle from the axis of rotation (in cm) and the square of the angular velocity, !, of the rotor (in radians per second): G ¼ !2 r

ð3:1Þ

The average angular velocity of a rigid body that rotates about a fixed axis is defined as the ratio of the angular displacement in a given time interval. One radian, usually abbreviated as 1 rad, represents the angle subtended at the centre of a circle by an arc with a length equal to the radius of the circle. Since 360o equals 2 radians, one revolution of the rotor can be expressed as 2 rad. Accordingly, the angular velocity in rads per second of the rotor can be expressed in terms of rotor speed s as: !¼

2 s 60

ð3:2Þ

Example 1 CALCULATION OF CENTRIFUGAL FIELD Question What is the applied centrifugal field at a point equivalent to 5 cm from the centre of rotation and an angular velocity of 3000 rad s1? Answer The centrifugal field, G, at a point 5 cm from the centre of rotation may be calculated using the equation G ¼ !2r ¼ (3000)2  5 cm s2 ¼ 4.5  107 cm s2

76

Centrifugation

and therefore the centrifugal field can be expressed as: G¼

42 ðrev min1 Þ2 r 42 s2 r ¼ 3600 3600

ð3:3Þ

Example 2 CALCULATION OF ANGULAR VELOCITY Question For the pelleting of the microsomal fraction from a liver homogenate, an ultracentrifuge is operated at a speed of 40 000 r.p.m. Calculate the angular velocity, v, in radians per second. Answer The angular velocity, v, may be calculated using the equation: !¼

2 rev min1 60

! ¼ 2  3.1416  40,000/60 rad s1 ¼ 4188.8 rad s1

The centrifugal field is generally expressed in multiples of the gravitational field, g (981 cm s–2). The relative centrifugal field (g), RCF, which is the ratio of the centrifugal acceleration at a specified radius and the speed to the standard acceleration of gravity, can be calculated from the following equation: RCF ¼

42 ðrev min1 Þ2 r G ¼ 3600  981 g

ð3:4Þ

RCF units are therefore dimensionless (denoting multiples of g) and revolutions per minute are usually abbreviated as r.p.m.: RCF ¼ 1.12  105 r.p.m.2r. Although the relative centrifugal force can easily be calculated, centrifugation manuals usually contain a nomograph for the convenient conversion between relative centrifugal force and speed of the centrifuge at different radii of the centrifugation spindle to a point along the centrifuge tube. A nomograph consists of three columns representing the radial distance (in mm), the relative centrifugal field and the rotor speed (in r.p.m.). For the conversion between relative centrifugal force and speed of the centrifuge spindle in r.p.m. at different radii, a straight-edge is aligned through known values in two columns, then the desired figure is read where the straight-edge intersects the third column. See Figure 3.1 for an illustration of the usage of a nomograph. In a suspension of biological particles, the rate of sedimentation is dependent not only upon the applied centrifugal field, but also on the nature of the particle, i.e. its density and radius, and also the viscosity of the surrounding medium. Stokes’ Law describes these relationships for the sedimentation of a rigid spherical particle:  ¼

2 r2 ðp  m Þ g 9 

ð3:5Þ

77

3.2 Basic principles of sedimentation

100 000 95 000 90 000 85 000 80 000 75 000 70 000 65 000 60 000

200 180 160

1 000 000 900 000 800 000 700 000 600 000 500 000 400 000 300 000

40 000

140 120

50 000

200 000

100 90 80 70 60

100 000 90 000 80 000 70 000 60 000

30 000

50 000 40 000

50

40

30 000 20 000 20 000

30 10 000 9 000 8 000 7 000 6 000 20

5 000 4 000 3 000 2 000

10

Radial distance (mm)

1 000 Relative centrifugal field

10 000 Rotor speed (r.p.m.)

Fig. 3.1 Nomograph for the determination of the relative centrifugal field for a given rotor speed and radius. The three columns represent the radial distance (in mm), the relative centrifugal field and the rotor speed (in r.p.m.). For the conversion between relative centrifugal force and speed of the centrifuge spindle in revolutions per minute at different radii, draw a straight-edge through known values in two columns. The desired figure can then be read where the straight-edge intersects the third column. (Courtesy of Beckman-Coulter.)

78

Centrifugation

where  is the sedimentation rate of the sphere, 2/9 is the shape factor constant for a sphere, r is the radius of particle, rp is the density of particle, rm is the density of medium, g is the gravitational acceleration and Z is the viscosity of the medium.

Example 3 CALCULATION OF RELATIVE CENTRIFUGAL FIELD Question A fixed-angle rotor exhibits a minimum radius, rmin, at the top of the centrifuge tube of 3.5 cm, and a maximum radius, rmax, at the bottom of the tube of 7.0 cm. See Fig. 3.2a for a cross-sectional diagram of a fixed-angle rotor illustrating the position of the minimum and maximum radius. If the rotor is operated at a speed of 20 000 r.p.m., what is the relative centrifugal field, RCF, at the top and bottom of the centrifuge tube? Answer The relative centrifugal field may be calculated using the equation: RCF ¼ 1,12  105 r.p.m.2r Top of centrifuge tube: RCF ¼ 1,12  105  (20 000)2  3.5 ¼ 15 680 Bottom of centrifuge tube: RCF ¼ 1,12  105  (20 000)2  7.0 ¼ 31 360 This calculation illustrates that with fixed-angle rotors the centrifugal field at the top and bottom of the centrifuge tube might differ considerably, in this case exactly two-fold. Accordingly a mixture of biological particles exhibiting an approximately spherical shape can be separated in a centrifugal field based on their density and/or their size. The time of sedimentation (in seconds) for a spherical particle is: t ¼

9  rb  ln 2 2 2 ! rp ðp  m Þ rt

ð3:6Þ

where t is the sedimentation time,  is the viscosity of medium, rp is the radius of particle, rb is the radial distance from the centre of rotation to bottom of tube, rt is the radial distance from the centre of rotation to liquid meniscus, rp is the density of the particle, rm is the density of the medium and ! is the angular velocity of rotor. The sedimentation rate or velocity of a biological particle can also be expressed as its sedimentation coefficient (s), whereby: s¼

 !2 r

ð3:7Þ

Since the sedimentation rate per unit centrifugal field can be determined at different temperatures and with various media, experimental values of the sedimentation coefficient are corrected to a sedimentation constant theoretically obtainable in water at 20  C, yielding the S20,W value. The sedimentation coefficients of biological

79

3.3 Types, care and safety aspects of centrifuges

macromolecules are relatively small, and are usually expressed (see Section 3.5), as Svedberg units, S. One Svedberg unit equals 10–13 s.

3.3 TYPES, CARE AND SAFETY ASPECTS OF CENTRIFUGES 3.3.1 Types of centrifuges Centrifugation techniques take a central position in modern biochemical, cellular and molecular biological studies. Depending on the particular application, centrifuges differ in their overall design and size. However, a common feature in all centrifuges is the central motor that spins a rotor containing the samples to be separated. Particles of biochemical interest are usually suspended in a liquid buffer system contained in specific tubes or separation chambers that are located in specialised rotors. The biological medium is chosen for the specific centrifugal application and may differ considerably between preparative and analytical approaches. As outlined below, the optimum pH value, salt concentration, stabilising cofactors and protective ingredients such as protease inhibitors have to be carefully evaluated in order to preserve biological function. The most obvious differences between centrifuges are:

• • • •

the maximum speed at which biological specimens are subjected to increased sedimentation; the presence or absence of a vacuum; the potential for refrigeration or general manipulation of the temperature during a centrifugation run; and the maximum volume of samples and capacity for individual centrifugation tubes. Many different types of centrifuges are commercially available including:

• • • • • •

large-capacity low-speed preparative centrifuges; refrigerated high-speed preparative centrifuges; analytical ultracentrifuges; preparative ultracentrifuges; large-scale clinical centrifuges; and small-scale laboratory microfuges. Some large-volume centrifuge models are quite demanding on space and also generate considerable amounts of heat and noise, and are therefore often centrally positioned in special instrument rooms in biochemistry departments. However, the development of small-capacity bench-top centrifuges for biochemical applications, even in the case of ultracentrifuges, has led to the introduction of these models in many individual research laboratories. The main types of centrifuge encountered by undergraduate students during introductory practicals may be divided into microfuges (so called because they centrifuge small volume samples in Eppendorf tubes), large-capacity preparative centrifuges, highspeed refrigerated centrifuges and ultracentrifuges. Simple bench-top centrifuges vary

80

Centrifugation

in design and are mainly used to collect small amounts of biological material, such as blood cells. To prevent denaturation of sensitive protein samples, refrigerated centrifuges should be employed. Modern refrigerated microfuges are equipped with adapters to accommodate standardised plastic tubes for the sedimentation of 0.5 to 1.5 cm3 volumes. They can provide centrifugal fields of approximately 10 000 g and sediment biological samples in minutes, making microfuges an indispensable separation tool for many biochemical methods. Microfuges can also be used to concentrate protein samples. For example, the dilution of protein samples, eluted by column chromatography, can often represent a challenge for subsequent analyses. Accelerated ultrafiltration with the help of plastic tube-associated filter units, spun at low g-forces in a microfuge, can overcome this problem. Depending on the proteins of interest, the biological buffers used and the molecular mass cut-off point of the particular filters, a 10- to 20-fold concentration of samples can be achieved within minutes. Larger preparative bench-top centrifuges develop maximum centrifugal fields of 3000 to 7000 g and can be used for the spinning of various types of containers. Depending on the range of available adapters, considerable quantities of 5 to 250 cm3 plastic tubes or 96-well ELISA plates can be accommodated. This gives simple and relatively inexpensive bench centrifuges a central place in many high-throughput biochemical assays where the quick and efficient separation of coarse precipitates or whole cells is of importance. High-speed refrigerated centrifuges are absolutely essential for the sedimentation of protein precipitates, large intact organelles, cellular debris derived from tissue homogenisation and microorganisms. As outlined in Section 3.4, the initial bulk separation of cellular elements prior to preparative ultracentrifugation is performed by these kinds of centrifuges. They operate at maximum centrifugal fields of approximately 100 000 g. Such centrifugal force is not sufficient to sediment smaller microsomal vesicles or ribosomes, but can be employed to differentially separate nuclei, mitochondria or chloroplasts. In addition, bulky protein aggregates can be sedimented using high-speed refrigerated centrifuges. An example is the contractile apparatus released from muscle fibres by homogenisation, mostly consisting of myosin and actin macromolecules aggregated in filaments. In order to harvest yeast cells or bacteria from large volumes of culture media, high-speed centrifugation may also be used in a continuous flow mode with zonal rotors. This approach does not therefore use centrifuge tubes but a continuous flow of medium. As the medium enters the moving rotor, biological particles are sedimented against the rotor periphery and excess liquid removed through a special outlet port. Ultracentrifugation has decisively advanced the detailed biochemical analysis of subcellular structures and isolated biomolecules. Preparative ultracentrifugation can be operated at relative centrifugal fields of up to 900 000 g. In order to minimise excessive rotor temperatures generated by frictional resistance between the spinning rotor and air, the rotor chamber is sealed, evacuated and refrigerated. Depending on the type, age and condition of a particular ultracentrifuge, cooling to the required running temperature and the generation of a stable vacuum might take a considerable amount of time. To avoid delays during biochemical procedures involving ultracentrifugation, the cooling and evacuation system of older centrifuge models should be

81

3.3 Types, care and safety aspects of centrifuges

switched on at least an hour prior to the centrifugation run. On the other hand, modern ultracentrifuges can be started even without a fully established vacuum and will proceed in the evacuation of the rotor chamber during the initial acceleration process. For safety reasons, heavy armour plating encapsulates the ultracentrifuge to prevent injury to the user in case of uncontrolled rotor movements or dangerous vibrations. A centrifugation run cannot be initiated without proper closing of the chamber system. To prevent unfavourable fluctuations in chamber temperature, excessive vibrations or operation of rotors above their maximum rated speed, newer models of ultracentrifuges contain sophisticated temperature regulation systems, flexible drive shafts and an over-speed control device. Although slight rotor imbalances can be absorbed by modern ultracentrifuges, a more severe misbalance of tubes will cause the centrifuge to switch off automatically. This is especially true for swinging-bucket rotors. The many safety features incorporated into modern ultracentrifuges make them a robust piece of equipment that tolerates a certain degree of misuse by an inexperienced operator (see Sections 3.3.2 and 3.3.4 for a more detailed discussion of safety and centrifugation). In contrast to preparative ultracentrifuges, analytical ultracentrifuges contain a solid rotor which in its simplest form incorporates one analytical cell and one counterbalancing cell. An optical system enables the sedimenting material to be observed throughout the duration of centrifugation. Using a light absorption system, a Schlieren system or a Raleigh interferometric system, concentration distributions in the biological sample are determined at any time during ultracentrifugation. The Raleigh and Schlieren optical systems detect changes in the refractive index of the solution caused by concentration changes and can thus be used for sedimentation equilibrium analysis. This makes analytical ultracentrifugation a relatively accurate tool for the determination of the molecular mass of an isolated macromolecule. It can also provide crucial information about the thermodynamic properties of a protein or other large biomolecules.

3.3.2 Types of rotors To illustrate the difference in design of fixed-angle rotors, vertical tube rotors and swinging-bucket rotors, Fig. 3.2 outlines cross-sectional diagrams of these three main types of rotors. Companies usually name rotors according to their type of design, the maximum allowable speed and sometimes the material composition. Depending on the use in a simple low-speed centrifuge, a high-speed centrifuge or an ultracentrifuge, different centrifugal forces are encountered by a spinning rotor. Accordingly different types of rotors are made from different materials. Low-speed rotors are usually made of steel or brass, while high-speed rotors consist of aluminium, titanium or fibre-reinforced composites. The exterior of specific rotors might be finished with protective paints. For example, rotors for ultracentrifugation made out of titanium alloy are covered with a polyurethane layer. Aluminium rotors are protected from corrosion by an electrochemically formed tough layer of aluminium oxide. In order to avoid damaging these protective layers, care should be taken during rotor handling.

82

Centrifugation

(a) Centrifugal field

Tube angle 14°–40°

rmin rav rmax

Axis of rotor (b)

Centrifugal field

rmin rav rmax

Axis of rotor (c)

Centrifugal field

rmin rav rmax

Axis of rotor

Fig. 3.2 Design of the three main types of rotors used in routine biochemical centrifugation techniques. Shown is a cross-sectional diagram of a fixed-angle rotor (a), a vertical tube rotor (b), and a swinging-bucket rotor (c). A fourth type of rotor is represented by the class of near-vertical rotors (not shown).

83

3.3 Types, care and safety aspects of centrifuges

Fixed-angle rotors are an ideal tool for pelleting during the differential separation of biological particles where sedimentation rates differ significantly, for example when separating nuclei, mitochondria and microsomes. In addition, isopycnic banding may also be routinely performed with fixed-angle rotors. For isopycnic separation, centrifugation is continued until the biological particles of interest have reached their isopycnic position in a gradient. This means that the particle has reached a position where the sedimentation rate is zero because the density of the biological particle and the surrounding medium are equal. Centrifugation tubes are held at a fixed angle of between 14 o and 40 o to the vertical in this class of rotors (Fig. 3.2a). Particles move radially outwards and since the centrifugal field is exerted at an angle, they only have to travel a short distance until they reach their isopycnic position in a gradient using an isodensity technique or before colliding with the outer wall of the centrifuge tube using a differential centrifugation method. Vertical rotors (Fig. 3.2b) may be divided into true vertical rotors and near-vertical rotors. Sealed centrifuge tubes are held parallel to the axis of rotation in vertical rotors and are restrained in the rotor cavities by screws, special washers and plugs. Since samples are not separated down the length of the centrifuge tube, but across the diameter of the tube, isopycnic separation time is significantly shorter as compared to swingingbucket rotors. In contrast to fixed-angle rotors, near-vertical rotors exhibit a reduced tube angle of 7o to 10o and also employ quick-seal tubes. The reduced angle results in much shorter run times as compared to fixed-angle rotors. Near-vertical rotors are useful for gradient centrifugation of biological elements that do not properly participate in conventional gradients. Hinge pins or a crossbar is used to attach rotor buckets in swinging-bucket rotors (Fig. 3.2c). They are loaded in a vertical position and during the initial acceleration phase, rotor buckets swing out horizontally and then position themselves at the rotor body for support. To illustrate the separation of particles in the three main types of rotors, Fig. 3.3 outlines the path of biological samples during the initial acceleration stage, the main centrifugal separation phase, de-acceleration and the final harvesting of separated particles in the rotor at rest. In the case of isopycnic centrifugation in a fixed angle rotor, the centrifuge tubes are gradually filled with a suitable gradient, the sample carefully loaded on top of this solution and then the tubes placed at a specific fixedangle into the rotor cavities. During rotor acceleration, the sample solution and the gradient undergo reorientation in the centrifugal field, followed by the separation of particles with different sedimentation properties (Fig. 3.3a). The gradient returns to its original position during the de-acceleration phase and separated particle bands can be taken from the tubes once the rotor is at rest. In analogy, similar reorientation of gradients and banding of particles occurs in a vertical rotor system (Fig. 3.3b). Although run times are reduced and this kind of rotor can usually hold a large number of tubes, resolution of separated bands during isopycnic centrifugation is less when compared with swinging-bucket applications. Since a greater variety of gradients exhibiting different steepness can be used with swinging-bucket rotors, they are the method of choice when maximum resolution of banding zones is required (Fig. 3.3c), such as in rate zonal studies based on the separation of biological particles as a function of sedimentation coefficient.

84

Centrifugation

(a) Centrifugal field

(b) Centrifugal field

(c) Centrifugal field

Fig. 3.3 Operation of the three main types of rotors used in routine biochemical centrifugation techniques. Shown is a cross-sectional diagram of a centrifuge tube positioned in a fixed-angle rotor (a), a vertical tube rotor (b), and a swinging-bucket rotor (c). The diagrams illustrate the movement of biological samples during the initial acceleration stage, the main centrifugal separation phase, de-acceleration and the final harvesting of separated particles in the rotor at rest. Using a fixed-angle rotor, the tubes are filled with a gradient, the sample loaded on top of this solution and then the tubes placed at a specific fixed-angle into the rotor cavities. The sample and the gradient undergo reorientation in the centrifugal field during rotor acceleration, resulting in the separation of particles with different sedimentation properties. Similar reorientation of gradients and banding of particles occurs in a vertical rotor system. A great variety of gradients can be used with swingingbucket rotors, making them the method of choice when maximum resolution of banding zones is required.

3.3.3 Care and maintenance of centrifuges Corrosion and degradation due to biological buffer systems used within rotors or contamination of the interior or exterior of the centrifuge via spillage may seriously affect the lifetime of this equipment. Another important point is the proper balancing of centrifuge tubes. This is not only important with respect to safety, as outlined below, but might also cause vibration-induced damage to the rotor itself and the drive

85

3.3 Types, care and safety aspects of centrifuges

shaft of the centrifuge. Thus, proper handling and care, as well as regular maintenance of both centrifuges and rotors is an important part of keeping this biochemical method available in the laboratory. In order to avoid damaging the protective layers of rotors, such as polyurethane paint or aluminium oxide, care should be taken in the cleaning of the rotor exterior. Coarse brushes that may scratch the finish should not be used and only non-corrosive detergents employed. Corrosion may be triggered by longterm exposure of rotors to alkaline solutions, acidic buffers, aggressive detergents or salt. Thus, rotors should be thoroughly washed with distilled or deionised water after every run. For overnight storage, rotors should be first left upside down to drain excess liquid and then positioned in a safe and dry place. To avoid damage to the hinge pins of swinging-bucket rotors, they should be dried with tissue paper following removal of biological buffers and washing with water. Centrifuge rotors are often not properly stored in a clean environment; this can quickly lead to the destruction of the protective rotor coating and should thus be avoided. It is advisable to keep rotors in a special clean room, physically separated from the actual centrifugation facility, with dedicated places for individual types of rotors. Some researchers might prefer to pre-cool their rotors prior to centrifugation by transferring them to a cold room. Although this is an acceptable practice and might keep proteolytic degradation to a minimum, rotors should not undergo long-term storage in a wet and cold environment. Regular maintenance of rotors and centrifuges by engineers is important for ensuring the safe operation of a centralised centrifugation facility. In order to judge properly the need for replacement of a rotor or parts of a centrifuge, it is essential that all users of core centrifuge equipment participate in proper book-keeping. Accurate record-keeping of run times and centrifugal speeds is important, since cyclic acceleration and deacceleration of rotors may lead to metal fatigue.

3.3.4 Safety and centrifugation Modern centrifuges are not only highly sophisticated but also relatively sturdy pieces of biochemical equipment that incorporate many safety features. Rotor chambers of high-speed and ultracentrifuges are always enclosed in heavy armour plating. Most centrifuges are designed to buffer a certain degree of imbalance and are usually equipped with an automatic switch-off mode. However, even in a well-balanced rotor, tube cracking during a centrifugation run might cause severe imbalance resulting in dangerous vibrations. When the rotor can only be partially loaded, the order of tubes must be organised according to the manufacturer’s instructions, so that the load is correctly distributed. This is important not only for ultracentrifugation with enormous centrifugal fields, but also for both small- and large-capacity bench centrifuges where the rotors are usually mounted on a more rigid suspension. When using swingingbucket rotors, it is important always to load all buckets with their caps properly screwed on. Even if only two tubes are loaded with solutions, the empty swinging buckets also have to be assembled since they form an integral part of the overall balance of the rotor system. In some swinging-bucket rotors, individual rotor buckets are numbered and should not be interchanged between their designated positions on similarly numbered hinge pins. Centrifugation runs using swinging-bucket rotors are

86

Centrifugation

usually set up with low acceleration and deceleration rates, as to avoid any disturbance of delicate gradients, and reduce the risk of disturbing bucket attachment. This practice also avoids the occurrence of sudden imbalances due to tube deformation or cracking and thus eliminates potentially dangerous vibrations. Generally, safety and good laboratory practice are important aspects of all research projects and the awareness of the exposure to potentially harmful substances should be a concern for every biochemist. If you use dangerous chemicals, potentially infectious material or radioactive substances during centrifugation protocols, refer to up-to-date safety manuals and the safety statement of your individual department. Perform mock runs of important experiments in order to avoid the loss of precious specimens or expensive chemicals. As with all other biochemical procedures, experiments should never be rushed, and protective clothing should be worn at all times. Centrifuge tubes should be handled slowly and carefully so as not to disturb pellets, bands of separated particles or unstable gradients. To help you choose the right kind of centrifuge tube for a particular application, the manufacturers of rotors usually give detailed recommendation of suitable materials. For safety reasons and to guarantee experimental success, it is important to make sure that individual centrifuge tubes are chemically resistant to solvents used, have the right capacity for sample loading, can be used in the designated type of rotor and are able to withstand the maximum centrifugal forces and temperature range of a particular centrifuge. In fixed-angle rotors, large centrifugal forces tend to cause a collapse of centrifuge tubes, making thick-walled tubes the choice for these rotors. The volume of liquid and the sealing mechanisms of these tubes are very important for the integrity of the run and should be done according to manufacturer’s instructions. In contrast, swinging-bucket rotor tubes are better protected from deformation and usually thin-walled polyallomer tubes are used. An important safety aspect is the proper handling of separated biological particles following centrifugation. In order to perform post-centrifugation analysis of individual fractions, centrifugation tubes often have to be punctured or sliced. For example, separated vesicle bands can be harvested from the pierced bottom of the centrifuge tube or can be collected by slicing of the tube following quick-freezing. If samples have been pre-incubated with radioactive markers or toxic ligands, contamination of the centrifugation chamber and rotor cavities or buckets should be avoided. If centrifugal separation processes have to be performed routinely with a potentially harmful substance, it makes sense to dedicate a particular centrifuge and accompanying rotors for this work and thereby eliminate the potential of cross-contamination.

3.4 PREPARATIVE CENTRIFUGATION 3.4.1 Differential centrifugation Cellular and subcellular fractionation techniques are indispensable methods used in biochemical research. Although the proper separation of many subcellular structures is absolutely dependent on preparative ultracentrifugation, the isolation of large

3.4 Preparative centrifugation

Time of centrifugation

Centrifugal field

(a)

Solvent Small-sized particles Medium-sized particles Large-sized particles

Time of centrifugation

(b)

Centrifugal field

87

Sample

Density gradient

Small-sized or lowdensity particles Medium-sized or mediumdensity particles Large-sized or highdensity particles

Fig. 3.4 Diagram of particle behaviour during differential and isopycnic separation. During differential sedimentation (a) of a particulate suspension in a centrifugal field, the movement of particles is dependent upon their density, shape and size. For separation of biological particles using a density gradient (b), samples are carefully layered on top of a preformed density gradient prior to centrifugation. For isopycnic separation, centrifugation is continued until the desired particles have reached their isopycnic position in the liquid density gradient. In contrast, during rate separation, the required fraction does not reach its isopycnic position during the centrifugation run.

cellular structures, the nuclear fraction, mitochondria, chloroplasts or large protein precipitates can be achieved by conventional high-speed refrigerated centrifugation. Differential centrifugation is based upon the differences in the sedimentation rate of biological particles of different size and density. Crude tissue homogenates containing organelles, membrane vesicles and other structural fragments are divided into different fractions by the stepwise increase of the applied centrifugal field. Following the initial sedimentation of the largest particles of a homogenate (such as cellular debris) by centrifugation, various biological structures or aggregates are separated into pellet and supernatant fractions, depending upon the speed and time of individual centrifugation steps and the density and relative size of the particles. To increase the yield of membrane structures and protein aggregates released, cellular debris pellets are often rehomogenised several times and then recentrifuged. This is especially important in the case of rigid biological structures such as muscular or connective tissues, or in the case of small tissue samples as is the case with human biopsy material or primary cell cultures. The differential sedimentation of a particulate suspension in a centrifugal field is diagrammatically shown in Fig. 3.4a. Initially all particles of a homogenate are evenly distributed throughout the centrifuge tube and then move down the tube at their

88

Centrifugation

respective sedimentation rate during centrifugation. The largest class of particles forms a pellet on the bottom of the centrifuge tube, leaving smaller-sized structures within the supernatant. However, during the initial centrifugation step smaller particles also become entrapped in the pellet causing a certain degree of contamination. At the end of each differential centrifugation step, the pellet and supernatant fraction are carefully separated from each other. To minimise cross-contamination, pellets are usually washed several times by resuspension in buffer and recentrifugation under the same conditions. However, repeated washing steps may considerably reduce the yield of the final pellet fraction, and are therefore omitted in preparations with limiting starting material. Resulting supernatant fractions are centrifuged at a higher speed and for a longer time to separate medium-sized and small-sized particles. With respect to the separation of organelles and membrane vesicles, crude differential centrifugation techniques can be conveniently employed to isolate intact mitochondria and microsomes.

3.4.2 Density-gradient centrifugation To further separate biological particles of similar size but differing density, ultracentrifugation with preformed or self-establishing density gradients is the method of choice. Both rate separation or equilibrium methods can be used. In Fig. 3.4b, the preparative ultracentrifugation of low- to high-density particles is shown. A mixture of particles, such as is present in a heterogeneous microsomal membrane preparation, is layered on top of a preformed liquid density gradient. Depending on the particular biological application, a great variety of gradient materials are available. Caesium chloride is widely used for the banding of DNA and the isolation of plasmids, nucleoproteins and viruses. Sodium bromide and sodium iodide are employed for the fractionation of lipoproteins and the banding of DNA or RNA molecules, respectively. Various companies offer a range of gradient material for the separation of whole cells and subcellular particles, e.g. Percoll, Ficoll, Dextran, Metrizamide and Nycodenz. For the separation of membrane vesicles derived from tissue homogenates, ultra-pure DNase-, RNase and protease-free sucrose represents a suitable and widely employed medium for the preparation of stable gradients. If one wants to separate all membrane species spanning the whole range of particle densities, the maximum density of the gradient must exceed the density of the most dense vesicle species. Both step gradient and continuous gradient systems are employed to achieve this. If automated gradient makers are not available, which is probably the case in most undergraduate practical classes, the manual pouring of a stepwise gradient with the help of a pipette is not so time-consuming or difficult. In contrast, the formation of a stable continuous gradient is much more challenging and requires a commercially available gradient maker. Following pouring, gradients are usually kept in a cold room for temperature equilibration and are moved extremely slowly in special holders so as to avoid mixing of different gradient layers. For rate separation of subcellular particles, the required fraction does not reach its isopycnic position within the gradient. For isopycnic separation, density centrifugation is continued until the buoyant density of the particle of interest and the density of the gradient are equal.

89

3.4 Preparative centrifugation

3.4.3 Practical applications of preparative centrifugation To illustrate practical applications of differential centrifugation, density gradient ultracentrifugation and affinity methodology, the isolation of the microsomal fraction from muscle homogenates and subsequent separation of membrane vesicles with a differing density is described (Fig. 3.5), the isolation of highly purified sarcolemma vesicles outlined (Fig. 3.6), and the subfractionation of liver mitochondrial membrane systems shown (Fig. 3.7). Skeletal muscle fibres are highly specialised structures involved in contraction and the membrane systems that maintain the regulation of excitation–contraction coupling, energy metabolism and the stabilisation of the cell periphery are diagrammatically shown in Fig. 3.5a. The surface membrane consists of the sarcolemma and its invaginations, the transverse tubular membrane system. The transverse tubules may be subdivided into the non-junctional region and the triad part that forms contact zones with the terminal cisternae of the sarcoplasmic reticulum. Motor neuron-induced depolarisation of the sarcolemma travels into the transverse tubules and activates a voltage-sensing receptor complex that directly initiates the transient opening of a junctional calcium release channel. The membrane system that provides the luminal ion reservoir for the regulatory calcium cycling process is represented by the specialised endoplasmic reticulum. It forms membranous sheaths around the contractile apparatus whereby the longitudinal tubules are mainly involved in the uptake of calcium ions during muscle relaxation and the terminal cisternae provide the rapid calcium release mechanism that initiates muscle contraction. Mitochondria are the site of oxidative phosphorylation and exhibit a complex system of inner and outer membranes involved in energy metabolism. For the optimum homogenisation of tissue specimens, mincing of tissue has to be performed in the presence of a biological buffer system that exhibits the right pH value, salt concentration, stabilising co-factors and chelating agents. The optimum ratio between the wet weight of tissue and buffer volume as well as the temperature (usually 4 oC) and presence of a protease inhibitor cocktail is also essential to minimise proteolytic degradation. Prior to the 1970s, researchers did not widely use protease inhibitors or chelating agents in their homogenisation buffers. This resulted in the degradation of many high-molecular-mass proteins. Since protective measures against endogenous enzymes have been routinely introduced into subcellular fractionation protocols, extremely large proteins have been isolated in their intact form, such as 427 kDa dystrophin, the 565 kDa ryanodine receptor, 800 kDa nebulin and the longest known polypeptide, of 2200 kDa, named titin. Commercially available protease inhibitor cocktails usually exhibit a broad specificity for the inhibition of cysteineproteases, serine-proteases, aspartic-proteases, metallo-proteases and amino-peptidases. They are used in the micromolar concentration range and are best added to buffer systems just prior to the tissue homogenisation process. Depending on the half-life of specific protease inhibitors, the length of a subcellular fractionation protocol and the amount of endogenous enzymes present in individual fractions, tissue suspensions might have to be replenished with a fresh aliquot of a protease inhibitor cocktail. Protease inhibitor kits for the creation of individualised cocktails are also available

90

Centrifugation

(a) Subcellular membrane systems that can be isolated by differential centrifugation Extracellular matrix Sarcolemma

Cytosol

Sarcoplasmic reticulum

Nonjunctional transverse tubules

Longitudinal tubules Triad junction Mitochondria

Terminal cisternae

Skeletal muscle fibre

(b) Scheme of subcellular fractionation of membranes from muscle homogenates Muscle tissue

Homogenisation

Tissue homogenate

Homogenisation

10 min at 1000 g

Supernatant

Nuclei, cell debris

10 min at 10 000 g Contractile apparatus Supernatant

Differential centrifugation

20 min at 20 000 g Mitochondria Supernatant 60 min at 100 000 g

Crude microsomes

10–60% Sucrose density gradient

360 min at 150 000 g

Cytosol

Surface membranes Gradient Triads centrifugation Light sarcoplasmic reticulum fraction Heavy sarcoplasmic reticulum fraction Debris

Fig. 3.5 Scheme of the fractionation of skeletal muscle homogenate into various subcellular fractions. Shown is a diagrammatic presentation of the subcellular membrane system from skeletal muscle fibres (a) and a flow chart of the fractionation protocol of these membranes from tissue homogenates using differential centrifugation and density gradient methodology (b).

91

3.4 Preparative centrifugation

and consist of substances such as trypsin inhibitor, E-64, aminoethyl-benzenesulfonylfluoride, antipain, aprotinin, benzamidine, bestatin, chymostatin, E-aminocaproic acid, N-ethylmaleimide, leupeptin, phosphoramidon and pepstatin. The most commonly used chelators of divalent cations for the inhibition of degrading enzymes such as metallo-proteases are EDTA and EGTA.

3.4.4 Subcellular fractionation A typical flow chart outlining a subcellular fractionation protocol is shown in Fig. 3.5b. Depending on the amount of starting material, which would usually range between 1 and 500 g in the case of skeletal muscle preparations, a particular type of rotor and size of centrifuge tubes is chosen for individual stages of the isolation procedure. The repeated centrifugation at progressively higher speeds and longer centrifugation periods will divide the muscle homogenate into distinct fractions. Typical values for centrifugation steps are 10 min for 1000 g to pellet nuclei and cellular debris, 10 min for 10 000 g to pellet the contractile apparatus, 20 min at 20 000 g to pellet a fraction enriched in mitochondria, and 1 h at 100 000 g to separate the microsomal and cytosolic fractions. Mild salt washes can be carried out to remove myosin contamination of membrane preparations. Sucrose gradient centrifugation is then used to further separate microsomal subfractions derived from different muscle membranes. Using a vertical rotor or swinging-bucket rotor system at a sufficiently high g-force, the crude surface membrane fraction, triad junctions, longitudinal tubules and terminal cisternae membrane vesicles can be separated. To collect bands of fractions, the careful removal of fractions from the top can be achieved manually with a pipette. Alternatively, in the case of relatively unstable gradients or tight banding patterns, membrane vesicles can be harvested from the bottom by an automated fraction collector. In this case, the centrifuge tube is pierced and fractions collected by gravity or slowly forced out of the tube by a replacing liquid of higher density. Another method for collecting fractions from unstable gradients is the slicing of the centrifuge tube after freezing. Both latter methods destroy the centrifuge tubes and are routinely used in research laboratories. Cross-contamination of vesicular membrane populations is an inevitable problem during subcellular fractionation procedures. The technical reason for this is the lack of adequate control in the formation of various types of membrane species during tissue homogenisation. Membrane domains originally derived from a similar subcellular location might form a variety of structures including inside-out vesicles, right-sideout vesicles, sealed structures, leaky vesicles and/or membrane sheets. In addition, smaller vesicles might become entrapped in larger vesicles. Different membrane systems might aggregate non-specifically or bind to or entrap abundant solubilised proteins. Hence, if highly purified membrane preparations are needed for sophisticated cell biological or biochemical studies, affinity separation methodology has to be employed. The flow chart and immunoblotting diagram in Fig. 3.6 illustrates both the preparative and analytical principles underlying such a biochemical approach. Modern preparative affinity techniques using centrifugation steps can be performed

(a)

Scheme of subcellular fractionation of muscle sarcolemma Crude surface membrane Mixture of sarcolemma, transverse tubules and sarcoplasmic reticulum WGA SL

+purified WGA lectin

WGA

2 min at 15 000 g Agglutinated sarcolemma SL vesicles

WGA

SN

+0.01% Triton X-100 Agglutinated and WGA solubilised SL vesicles 2 min at 15 000 g

Agglutinated SL vesicles

SN

+0.2 M NAG Deagglutinated SL vesicles 20 min at 150 000 g

Supernatant

(b)

Highly enriched sarcolemma

Diagram of immunoblot analysis of subcellular fractionation procedures Total protein

SL marker

Non-SL marker

Relative molecular weight standards (× 10–3)

600 400

100

30 1

2

3

1

2

3

1

2

3

Gel/blot lane 1: Crude surface membrane Gel/blot lane 2: Lectin void fraction Gel/blot lane 3: Highly purified sarcolemma

Fig. 3.6 Affinity separation method using centrifugation of lectin-agglutinated surface membrane vesicles from skeletal muscle. Shown is a flow chart of the various preparative steps in the isolation of highly purified sarcolemma vesicles (a) and a diagram of the immunoblot analysis of this subcellular fractionation procedure (b). The sarcolemma (SL) and non-SL markers are surface-associated dystrophin of 427 kDa and the transverse-tubular a1S-subunit of the dihydropyridine receptor of 170 kDa, respectively.

93

3.4 Preparative centrifugation

with various biological or chemical ligands. In the case of immuno affinity purification, antibodies are used to specifically bind to their respective antigen.

3.4.5 Affinity purification of membrane vesicles In Fig. 3.6a is shown a widely employed lectin agglutination method. Lectins are plant proteins that bind tightly to specific carbohydrate structures. The rationale behind using purified wheat germ agglutinin (WGA) lectin for the affinity purification of sarcolemma vesicles is the fact that the muscle plasmalemma forms mostly rightside-out vesicles following homogenisation. By contrast, vesicles derived from the transverse tubules are mostly inside out and thus do not expose their carbohydrates. Glycoproteins from the abundant sarcoplasmic reticulum do not exhibit carbohydrate moieties that are recognised by this particular lectin species. Therefore only sarcolemma vesicles are agglutinated by the wheat germ lectin and the aggregate can be separated from the transverse tubular fraction by centrifugation for 2 min at 15 000 g. The electron microscopical characterisation of agglutinated surface membranes revealed large smooth sarcolemma vesicles that had electron-dense entrapments. To remove these vesicular contaminants, originally derived from the sarcoplasmic reticulum, immobilised surface vesicles are treated with low concentrations of the non-ionic detergent Triton X-100. This procedure does not solubilise integral membrane proteins, but introduces openings in the sarcolemma vesicles for the release of the much smaller sarcoplasmic reticulum vesicles. Low g-force centrifugation is then used to separate the agglutinated sarcolemma vesicles and the contaminants. To remove the lectin from the purified vesicles, the fraction is incubated with the competitive sugar N-acetylglucosamine that eliminates the bonds between the surface glycoproteins and the lectin. A final centrifugation step for 20 min at 150 000 g results in a pellet of highly purified sarcolemma vesicles. A quick and convenient analytical method of confirming whether this subcellular fractionation procedure has resulted in the isolation of the muscle plasmalemma is immunoblotting with a mini electrophoresis unit. Figure 3.6b shows a diagram of the protein and antigen banding pattern of crude surface membranes, the lectin void fraction and the highly purified sarcolemma fraction. Using antibodies to markers of the transverse tubules and the sarcolemma, such as the a1S-subunit of the dihydropyridine receptor of 170 kDa and dystrophin of 427 kDa, respectively, the separation of both membrane species can be monitored. This analytical method is especially useful for the characterisation of membrane vesicles, when no simple and fast assay systems for testing marker enzyme activities are available. In the case of the separation of mitochondrial membranes, the distribution of enzyme activities rather than immunoblotting is routinely used for determining the distribution of the inner membrane, contact zones and the outer membrane in density gradients. Binding assays or enzyme testing represents the more traditional way of characterising subcellular fractions following centrifugation. Figure 3.7a outlines diagrammatically the micro compartments of liver mitochondria and the associated marker enzymes. While the monoamino oxidase (MAO) is enriched in the outer membrane, the enzyme succinate dehydrogenase (SDH) is associated with the inner membrane system and a representative marker of contact sites between both

Centrifugation

(a)

Cytosol

GT

Outer membrane

Pore

MAO

Contact site

Inner membrane

SDH Mitochondrial matrix (b) 30–70% sucrose density gradient

Intact mitochondria Swelling shrinking sonication

Vesicular mixture

Outer membranes 20 h at 150 000 g

Contact sites Inner membranes

(c) GT

SDH

70

1.0

60

0.8

50

0.6

40

0.4

30

0.2

20

0 2

4

6

8 10 12 14 16 Gradient fraction

18

Enzyme activity (units per fraction)

MAO

% sucrose

94

20

Fig. 3.7 Scheme of the fractionation of membranes derived from liver mitochondria. Shown is the distribution of marker enzymes in the micro compartments of liver mitochondria (MAO, monoamino oxidase; SDH, succinate dehydrogenase; GT, glutathione transferase) (a), the separation method to isolate fractions highly enriched in the inner cristae membrane, contact zones and the outer mitochondrial membrane (b), as well as the distribution of mitochondrial membranes after density gradient centrifugation (c).

membranes is glutathione transferase (GT). Membrane vesicles from intact mitochondria can be generated by consecutive swelling, shrinking and sonication of the suspended organelles. The vesicular mixture is then separated by sucrose density centrifugation into the three main types of mitochondrial membranes (Fig. 3.7b). The distribution of marker enzyme activities in the various fractions demonstrates that the outer membrane has a lower density compared to the inner membrane. The glutathione transferase-containing contact zones are positioned in a band between the

95

3.5 Analytical centrifugation

inner and outer mitochondrial membrane and contain enzyme activities characteristic for both systems (Fig. 3.7c). Routinely used enzymes as subcellular markers would be the Naþ/Kþ-ATPase for the plasmalemma, glucose-6-phosphatase for the endoplasmic reticulum, galactosyl transferase for the Golgi apparatus, succinate dehydrogenase for mitochondria, acid phosphatase for lysosomes, catalase for peroxisomes and lactate dehydrogenase for the cytosol.

3.5 ANALYTICAL CENTRIFUGATION 3.5.1 Applications of analytical ultracentrifugation As biological macromolecules exhibit random thermal motion, their relative uniform distribution in an aqueous environment is not significantly affected by the Earth’s gravitational field. Isolated biomolecules in solution only exhibit distinguishable sedimentation when they undergo immense accelerations, e.g. in an ultracentrifugal field. A typical analytical ultracentrifuge can generate a centrifugal field of 250 000 g in its analytical cell. Within these extremely high gravitational fields, the ultracentrifuge cell has to allow light passage through the biological particles for proper measurement of the concentration distribution. The schematic diagram of Fig. 3.8 outlines the optical system of a modern analytical ultracentrifuge. The availability of high-intensity xenon flash lamps and the advance in instrumental sensitivity and wavelength range has made the accurate measurement of highly dilute protein samples below 230 nm possible. Analytical ultracentrifuges such as the Beckman Optima XL-A allow the use of wavelengths between 190 nm and 800 nm. Sedimentation of isolated proteins or nucleic acids can be useful in the determination of the relative molecular mass, purity and shape of these biomolecules. Analytical ultracentrifugation for the determination of the relative molecular mass of a macromolecule can be performed by a sedimentation velocity approach or sedimentation equilibrium methodology. The hydrodynamic properties of macromolecules are described by their sedimentation coefficients and can be determined from the rate that a concentration boundary of the particular biomolecules moves in the gravitational field. Such studies on the solution behaviour of macromolecules can give detailed insight into the properties of large aggregates and thereby confirm results from biochemical analyses on complex formation. The sedimentation coefficient can be used to characterise changes in the size and shape of macromolecules with changing experimental conditions. This allows for the detailed biophysical analysis of the effect of variations in the pH value, temperature or co-factors on molecular shape. Analytical ultracentrifugation is most often employed in

• • • • •

the determination of the purity of macromolecules; the determination of the relative molecular mass of solutes in their native state; the examination of changes in the molecular mass of supramolecular complexes; the detection of conformational changes; and in ligand-binding studies (Section 17.3.2).

96

Centrifugation

Reference Top view

Toroidal diffraction grating

Sample

Incident light detector

Reflector

Sample/reference cell assembly Rotor Imaging system for radial scanning

Slit (2 nm) Aperture Xenon flash lamp

Photomultiplier tube

Fig. 3.8 Schematic diagram of the optical system of an analytical ultracentrifuge. The high-intensity xenon flash lamp of the Beckman Optima XL-A analytical ultracentrifuge shown here allows the use of wavelengths between 190 nm and 800 nm. The high sensitivity of the absorbance optics allows the measurement of highly dilute protein samples below 230 nm. (Courtesy of Beckman-Coulter.)

The sedimentation velocity method can be employed to estimate sample purity. Sedimentation patterns can be obtained using the Schlieren optical system. This method measures the refractive index gradient at each point in the ultracentrifugation cell at varying time intervals. During the entire duration of the sedimentation velocity analysis, a homogeneous preparation forms a single sharp symmetrical sedimenting boundary. Such a result demonstrates that the biological macromolecules analysed exhibit the same molecular mass, shape and size. However, one can not assume that the analysed particles exhibit an identical electrical charge or biological activity. Only additional biochemical studies using electrophoretic techniques and enzyme/ bioassays can differentiate between these minor subtypes of macromolecules with similar molecular mass. The great advantage of the sedimentation velocity method is

97

3.5 Analytical centrifugation

that smaller or larger contaminants can be clearly recognised as shoulders on the main peak, asymmetry of the main peak and/or additional peaks. For a list of references outlining the applicability of ultracentrifugation to the characterisation of macromolecular behaviour in complex solution, please consult the review articles listed in Section 3.6. In addition, manufacturers of analytical ultracentrifuges make a large range of excellent brochures on the theoretical background of this method and its specific applications available. These introductory texts are usually written by research biochemists and are well worth reading to become familiar with this field.

3.5.2 Relative molecular mass determination For the accurate determination of the molecular mass of solutes in their native state, analytical ultracentrifugation represents an unrivalled technique. The method requires only small sample sizes (20–120 mm3) and low particle concentrations (0.01–1 g dm3) and biological molecules with a wide range of molecular masses can be characterised. In conjunction with electrophoretic, chromatographic, crystallographic and sequencing data, the biochemical properties of a biological particle of interest can be determined in great detail. As long as the absorbance of the biomolecules to be investigated (such as proteins, carbohydrates or nucleic acids) is different from that of the surrounding solvent, analytical ultracentrifugation can be applied. At the start of an experiment using the boundary sedimentation method, the biological particles are uniformly distributed throughout the solution in the analytical cell. The application of a centrifugal field then causes a migration of the randomly distributed biomolecules through the solvent radially outwards from the centre of rotation. The solvent that has been cleared of particles and the solvent still containing the sedimenting material form a sharp boundary. The movement of the boundary with time is a measure of the rate of sedimentation of the biomolecules. The sedimentation coefficient depends directly on the mass of the biological particle. The concentration distribution is dependent on the buoyant molecular mass. The movement of biomolecules in a centrifugal field can be determined and a plot of the natural logarithm of the solute concentration versus the squared radial distance from the centre of rotation (ln c vs. r2) yields a straight line with a slope proportional to the monomer molecular mass. Alternatively, the relative molecular mass of a biological macromolecule can be determined by the band sedimentation technique. In this case, the sample is layered on top of a denser solvent. During centrifugation, the solvent forms its own density gradient and the migration of the particle band is followed in the analytical cell. Molecular mass determination by analytical ultracentrifugation is applicable to values from a few hundred to several millions. It is therefore used for the analysis of small carbohydrates, proteins, nucleic acid macromolecules, viruses and subcellular particles such as mitochondria.

3.5.3 Sedimentation coefficient Biochemical studies over the last few decades have clearly demonstrated that biological macromolecules do not perform their biochemical and physiological functions in isolation. Many proteins have been shown to be multifunctional and their activity

98

Centrifugation

is regulated by complex interactions within homogeneous and heterogeneous complexes. Co-operative kinetics and the influence of micro-domains have been recognised to play a major role in the regulation of biochemical processes. Since conformational changes in biological macromolecules may cause differences in their sedimentation rates, analytical ultracentrifugation represents an ideal experimental tool for the determination of such structural modifications. For example, a macromolecule that changes its conformation into a more compact structure decreases its frictional resistance in the solvent. In contrast, the frictional resistance increases when a molecular assembly becomes more disorganised. The binding of ligands (such as inhibitors, activators or substrates) or a change in temperature or buffering conditions may induce conformational changes in subunits of biomolecules that in turn can result in major changes in the supramolecular structure of complexes. Such modifications can be determined by distinct differences in the sedimentation velocity of the molecular species. Sedimentation equilibrium experiments can be used to determine the relative size of individual subunits participating in complex formation, the stoichiometry and size of a complex assembly under different physiological conditions and the strength of interactions between subunits. When a new protein species is identified that appears to exist under native conditions in a large complex, several biochemical techniques are available to evaluate the oligomeric status of such a macromolecule. Gel filtration analysis, blot overlay assays, affinity chromatography, differential immuno precipitation and chemical crosslinking are typical examples of such techniques. With respect to centrifugation, sedimentation analysis using a density gradient is an ideal method to support such biochemical data. For the initial determination of the size of a complex, the sedimentation of known marker proteins is compared to the novel protein complex. Biological particles with a different molecular mass, shape or size migrate with different velocities in a centrifugal field (Section 3.1). As can be seen in equation 3.7, the sedimentation coefficient has dimensions of seconds. The value of Svedberg units (S ¼ 1013 s) lies for many macromolecules of biochemical interest typically between 1 and 20, and for larger biological particles such as ribosomes, microsomes and mitochondria between 80 and several thousand. The prototype of a soluble protein, serum albumin of apparent 66 kDa, has a sedimentation coefficient of 4.5 S. Figure 3.9 illustrates the sedimentation analysis of the dystrophin–glycoprotein complex (DGC) from skeletal muscle fibres. The size of this complex was estimated to be approximately 18 S by comparing its migration to that of the standards b-galactosidase (16S) and thyroglobulin (19 S). When the membrane cytoskeletal element dystrophin was first identified, it was shown to bind to a lectin column, although it does not exhibit any carbohydrate chains. This suggested that dystrophin might exist in a complex with surface glycoproteins. Sedimentation analysis confirmed the existence of such a dystrophin–glycoprotein complex and centrifugation following various biochemical modifications of the protein assembly led to a detailed understanding of its composition. Alkaline extraction, acid treatment or incubation with different types of detergent causes the differential disintegration of the dystrophin–glycoprotein complex. It is now known that dystrophin is tightly associated with at least 10 different surface proteins that are involved in membrane stabilisation, receptor anchoring and signal

99

3.6 Suggestions for further reading

Dystrophin–glycoprotein complex

20

1.0

18

0.8

16

0.6

14

0.4

12

0.2

10

0 1

2

3

4 5 6 7 8 Gradient fraction

Protein (mg per fraction)

% sucrose

16S marker 19S marker β-galactosidase DGC thyroglobulin

9 10

Fig. 3.9 Sedimentation analysis of a supramolecular protein complex. Shown is the sedimentation of the dystrophin–glycoprotein complex (DGC). Its size was estimated to be approximately 18 S by comparing its migration to that of the standards b-galactosidase (16 S) and thyroglobulin (19 S). Since the sedimentation coefficients of biological macromolecules are relatively small, they are expressed as Svedberg units, S, whereby 1 Svedberg unit equals 10–13 s.

transduction processes. The successful characterisation of the dystrophin–glycoprotein complex by sedimentation analysis is an excellent example of how centrifugation methodology can be exploited to gain biochemical knowledge of a newly discovered protein quickly.

3.6 SUGGESTIONS FOR FURTHER READING Burgess, N. K., Stanley, A. M. and Fleming, K. G. (2008). Determination of membrane protein molecular weights and association equilibrium constants using sedimentation equilibrium and sedimentation velocity. Methods in Cell Biology, 84, 181–211. (Focuses on the centrifugal analysis of interactions between integral membrane proteins.) Cole, J. L., Lary, J. W., Moody, T. P. and Laue, T. M. (2008). Analytical ultracentrifugation: sedimentation velocity and sedimentation equilibrium. Methods in Cell Biology, 84, 143–179. (Provides an excellent synopsis of the applicability of ultracentrifugation to the characterisation of macromolecular behaviour in complex solution.) Cox, B. and Emili, A. (2006). Tissue subcellular fractionation and protein extraction for use in mass-spectrometry-based proteomics. Nature Protocols, 1, 1872–1878. (Outlines differential centrifugation protocols for the isolation of the nuclear, cytosolic, mitochondrial and microsomal fraction.) Girard, M., Allaire, P. D., Blondeau, F. and McPherson, P. S (2005). Isolation of clathrin-coated vesicles by differential and density gradient centrifugation. Current Protocols in Cell Biology, Chapter 3, Unit 3.13. (Describes a typical subcellular fractionation protocol used in modern biochemical applications.) Klassen, R., Fricke, J., Pfeiffer, A. and Meinhardt, F. (2008). A modified DNA isolation protocol for obtaining pure RT-PCR grade RNA. Biotechnology Letters, 30, 1041–1044. (Describes typical centrifugation protocol used for the isolation of DNA and RNA molecules.)

4

Microscopy S. W. PADDOCK

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

Introduction The light microscope Optical sectioning Imaging living cells and tissues Measuring cellular dynamics The electron microscope (EM) Image archiving Suggestions for further reading

4.1 INTRODUCTION Biochemical analysis is frequently accompanied by microscopic examination of tissue, cell or organelle preparations. Such examinations are used in many different applications, for example: to evaluate the integrity of samples during an experiment; to map the fine details of the spatial distribution of macromolecules within cells; to directly measure biochemical events within living tissues. There are two fundamentally different types of microscope: the light microscope and the electron microscope (Fig. 4.1). Light microscopes use a series of glass lenses to focus light in order to form an image whereas electron microscopes use electromagnetic lenses to focus a beam of electrons. Light microscopes are able to magnify to a maximum of approximately 1500 times whereas electron microscopes are capable of magnifying to a maximum of approximately 200 000 times. Magnification is not the best measure of a microscope, however. Rather, resolution, the ability to distinguish between two closely spaced points in a specimen, is a much more reliable estimate of a microscope’s utility. Standard light microscopes have a lateral resolution limit of about 0.5 micrometers (mm) for routine analysis. In contrast, electron microscopes have a lateral resolution of up to 1 nanometer (nm). Both living and dead specimens are viewed with a light microscope, and often in real colour, whereas only dead ones are viewed with an electron microscope, and never in real colour. Computer enhancement methods have improved upon the 0.5 mm resolution limit of the light microscope down to 20 nm resolution in some 100

101

4.1 Introduction

Transmission electron microscope

Light microscope Light source

Electron gun

Condenser lens Slide Specimen Coverslip

Specimen (EM grid)

Objective lens

Projector lens

Eyepiece

Eye or digital camera

Viewing screen or digital camera

(Resolution limit 0.2 m) Live and dead cells

(Resolution limit 1 nm) Dead cells only

Fig. 4.1 Light and electron microscopy. Schematic that compares the path of light through a compound light microscope (LM) with the path of electrons through a transmission electron microscope (TEM). Light from a lamp (LM) or a beam of electrons from an electron gun (TEM) is focussed at the specimen by a glass condenser lens (LM) or electromagnetic lenses (TEM). For the LM the specimen is mounted on a glass slide with a coverslip placed on top, and for the TEM the specimen is placed on a copper or gold electron microscope grid. The image is magnified with an objective lens, glass in the LM and electromagnetic lens in the TEM, and projected onto a detector with the eyepiece lens in the LM or the projector lens in the TEM. The detector can be the eye or a digital camera in the LM or a phosphorescent viewing screen or digital camera in the TEM. (Light and EM images courtesy of Tatyana Svitkina, University of Pennsylvania, USA.)

specialised applications, for example using total internal reflection microscopy (TIRF) (Section 4.3.5). Applications of the microscope in biomedical research may be relatively simple and routine; for example, a quick check of the status of a preparation or of the health of cells growing in a plastic dish in tissue culture. Here, a simple bench-top light microscope is perfectly adequate. On the other hand, the application may be more involved, for example, measuring the concentration of calcium in a living embryo

102

Microscopy

0.1 nm

1 nm

10 nm

100 nm

Viruses

1.0 m

10 m

100 m

1 mm

C. elegans

Yeast

Small molecules Globular protein

Bacteria Embryos Plant cells

Ribosomes Organelles

Animal cells

Electron microscope Light microscope MRI Human eye

Fig. 4.2 The relative sizes of a selection of biological specimens and some of the devices used to image them. The range of resolution for each instrument is included in the dark bars at the base of the figure. MRI, magnetic resonance imaging.

over a millisecond timescale. Here a more advanced light microscope (often called an imaging system) is required. Some microscopes are more suited to specific applications than others. There may be constraints imposed by the specimen. Images may be required from specimens of vastly different sizes and magnifications (Fig. 4.2). For example, for imaging whole animals (metres), through tissues and embryos (micrometres), and down to cells, proteins and DNA (nm). The study of living cells may require time resolution from days, for example, when imaging neuronal development or disease processes to milliseconds, for example, when imaging cell signalling events. The field of microscopy has undergone a renaissance over the past 20 years with many technological improvements to the instruments. Most images produced by microscopes are now recorded electronically using digital imaging techniques – digital cameras, digital image acquisition software, digital printing and digital display methods. In addition, vast improvements have been made in the biological aspects of specimen preparation. These advancements on both fronts have fostered many more applications of the microscope in biomedical research.

103

4.2 The light microscope

4.2 THE LIGHT MICROSCOPE 4.2.1 Basic components of the light microscope The simplest form of light microscope consists of a single glass lens mounted in a metal frame – a magnifying glass. Here the specimen requires very little preparation, and is usually held close to the eye in the hand. Focussing of the region of interest is achieved by moving the lens and the specimen relative to one another. The source of light is usually the Sun or ambient indoor light. The detector is the human eye. The recording device is a hand drawing or an anecdote. Compound microscopes All modern light microscopes are made up of more than one glass lens in combination. The major components are the condenser lens, the objective lens and the eyepiece lens, and, such instruments are therefore called compound microscopes (Fig. 4.1). Each of these components is in turn made up of combinations of lenses, which are necessary to produce magnified images with reduced artifacts and aberrations. For example, chromatic aberration occurs when different wavelengths of light are separated and pass through a lens at different angles. This results in rainbow colours around the edges of objects in the image. This problem was encountered in the early microscopes of van Leeuwenhoek and Hooke, for example. All modern lenses are now corrected to some degree in order to avoid this problem. The main components of the compound light microscope include a light source that is focussed at the specimen by a condenser lens. Light that either passes through the specimen (transmitted light) or is reflected back from the specimen (reflected light) is focussed by the objective lens into the eyepiece lens. The image is either viewed directly by eye in the eyepiece or it is most often projected onto a detector, for example photographic film or, more likely, a digital camera. The images are displayed on the screen of a computer imaging system, stored in a digital format and reproduced using digital methods. The part of the microscope that holds all of the components firmly in position is called the stand. There are two basic types of compound light microscope stand – an upright or an inverted microscope (Fig. 4.3). The light source is below the condenser lens in the upright microscope and the objectives are above the specimen stage. This is the most commonly used format for viewing specimens. The inverted microscope is engineered so that the light source and the condenser lens are above the specimen stage, and the objective lenses are beneath it. Moreover, the condenser and light source can often be swung out of the light path. This allows additional room for manipulating the specimen directly on the stage, for example, for the microinjection of macromolecules into tissue culture cells, for in vitro fertilisation of eggs or for viewing developing embryos over time. The correct illumination of the specimen is critical for achieving high-quality images and photomicrographs. This is achieved using a light source. Typically light sources are mercury lamps, xenon lamps, lasers or light-emitting diodes (LEDs).

104

Microscopy

(a)

(b)

Fig. 4.3 Two basic types of compound light microscope. An upright light microscope (a) and an inverted light microscope (b). Note how there is more room available on the stage of the inverted microscope (b). This instrument is set up for microinjection with a needle holder to the left of the stage.

Light from the light source passes into the condenser lens, which is mounted beneath the microscope stage in an upright microscope (and above the stage in an inverted microscope) in a bracket that can be raised and lowered for focussing (Fig. 4.3). The condenser focusses light from the light source and illuminates the specimen with parallel beams of light. A correctly positioned condenser lens produces illumination that is uniformly bright and free from glare across the viewing area of the specimen (Koehler illumination). Condenser misalignment and an improperly adjusted condenser aperture diaphragm are major sources of poor images in the light microscope. The specimen stage is a mechanical device that is finely engineered to hold the specimen firmly in place (Fig. 4.4). Any movement or vibration will be detrimental to the final image. The stage enables the specimen to be moved and positioned in fine and smooth increments, both horizontally and transversely, in the X and the Y directions, for locating a region of interest. The stage is moved vertically in the Z direction for focussing the specimen or for inverted microscopes, the objectives themselves are moved and the stage remains fixed. There are usually coarse and fine focussing controls for low magnification and high magnification viewing respectively. The fine focus control can be moved in increments of 1 mm or better in the best research microscopes. The specimen stage can either be moved by hand or by a stepper motor attached to the fine focus control of the microscope, and controlled by a computer. The objective lens is responsible for producing the magnified image, and can be the most expensive component of the light microscope (Fig. 4.4). Objectives are available

105

4.2 The light microscope

Fig. 4.4 The objective lens. A selection of objective lenses mounted on an upright research grade compound light microscope. From the inscription on the two lenses in focus they are relatively low magnification 10 and 5 of numerical aperture (NA) 0.3 and 0.16 respectively. Both lenses are Plan Neofluar, which means they are relatively well corrected. The 10 lens is directly above a specimen mounted on a slide and coverslip, and held in place on the specimen stage.

in many different varieties, and there is a wealth of information inscribed on each one. This may include the manufacturer, magnification (4, 10, 20, 40, 60, 100), immersion requirements (air, oil or water), coverslip thickness (usually 0.17 mm) and often more-specialised optical properties of the lens (Section 4.2.3). In addition, lens corrections for optical artifacts such as chromatic aberration and flatness of field may also be included in the lens description. For example, words such as fluorite, the least corrected (often shortened to ‘fluo’), or plan apochromat, the most highly corrected (often shortened to ‘plan’ or ‘plan apo’), may appear somewhere on the lens. Objective lenses can either be dry (glass/air/coverslip) or immersion lenses (glass/ oil or water/coverslip). As a rule of thumb, most objectives below 40 are air (dry) objectives, and those of 40 and above are immersion (oil, glycerol or water). Should the objective be designed to operate in oil it will be labelled ‘OIL’ or ‘OEL’. Other immersion media include glycerol and water, and the lens will be marked to indicate this. Many lenses are colour-coded to a manufacturer’s specifications. Dipping lenses are specially designed to work without a coverslip, and are dipped directly into water or tissue culture medium. These are used for physiological experiments. The numerical aperture (NA) is always marked on the lens. This is a number usually between 0.04 and 1.4. The NA is a measure of the ability of a lens to collect light from the specimen. Lenses with a low NA collect less light than those with a high NA.

106

Microscopy

Table 4.1 Resolution in optical imaging xy

z

Standard microscope

0.5 mm

1.6 mm

Confocal/multiple photon

0.25 mm

0.7 mm

TIRF – evanescent wave

0.5 mm

0.3 mm

Nanometres (1 nanometre = 0.001 micrometres) Visible wavelengths

Ultraviolet (UV – invisible)

Infrared (IR – invisible) V

300

350

400

B 450

G 500

Y 550

R 600

650

700

750

800

850

Spectrum of 'white' light

Fig. 4.5 The visible spectrum – the spectrum of white light visible to the human eye. Our eyes are able to detect colour in the visible wavelengths of the spectrum, usually in the region between 400 nm (violet) and 750 nm (red). Most modern electronic detectors are sensitive beyond the visible spectrum of the human eye.

Resolution varies inversely with NA, which implies that higher NA objectives yield the best resolution. Generally speaking the higher-power objectives have a higher NA and better resolution than the lower-power lenses with lower NAs. For example, 0.2 mm resolution can only be achieved using a 100 plan-apochromat oil immersion lens with a NA of 1.4. Should there be a choice between two lenses of the same magnification, then it is usually best to choose the one of higher NA. The objective lens is also the part of the microscope that can most easily be damaged by mishandling. Many lenses are coated with a protective coating but even so, one scratch on the front of the lens can result in serious image degradation. Therefore, great care should be taken when handling objective lenses. Objective lenses must be cleaned using a protocol recommended by the manufacturer, and only by a qualified person. A dirty objective lens is a major source of poor images. The resolution achieved by a lens is a measure of its ability to distinguish between two objects in the specimen. The shorter the wavelengths of illuminating light the higher the resolving power of the microscope (Fig. 4.5). The limit of resolution for a microscope that uses visible light is about 300 nm with a dry lens (in air) and 200 nm with an oil immersion lens. By using ultraviolet light (UV) as a light source the resolution can be improved to 100 nm because of the shorter wavelength of the light (200–300 nm). These limits of resolution are often difficult to achieve practically because of aberrations in the lenses and the poor optical properties of many biological specimens. The lateral resolution is usually higher than the axial resolution for any given objective lens (Table 4.1). The eyepiece (sometimes referred to as the ocular) works in combination with the objective lens to further magnify the image, and allows it to be detected by eye or more

107

4.2 The light microscope

Fig. 4.6 A research-grade stereomicroscope. Note the light source is from the side, which can give a shadow effect to the specimen; in this example a vial of fruit flies. The large objective lens above the specimen can be rotated to zoom the image.

usually to project the image into a digital camera for recording purposes. Eyepieces usually magnify by 10 since an eyepiece of higher magnification merely enlarges the image with no improvement in resolution. There is an upper boundary to the useful magnification of the collection of lenses in a microscope. For each objective lens the magnification can be increased above a point where it is impossible to resolve any more detail in the specimen. Any magnification above this point is often called empty magnification. The best way to improve magnification is to use a higher magnification and higher NA objective lens. Should sufficient resolution not be achieved using the light microscope, then it will be necessary to use the electron microscope (Section 4.6). In addition to the human eye and photographic film there are two types of electronic detectors employed on modern light microscopes. These are area detectors that actual form an image directly, for example video cameras and charge-coupled devices (CCDs). Alternatively, point detectors can be used to measure intensities in the image; for example photomultiplier tubes (PMTs) and photodiodes. Point detectors are capable of producing images in scanning microscopy (Section 4.3). Stereomicroscopes A second type of light microscope, the stereomicroscope, is used for the observation of the surfaces of large specimens (Fig. 4.6). The microscope is used when 3D

108

Microscopy

information is required, for example for the routine observation of whole organisms, for example for screening through vials of fruit flies. Stereomicroscopes are useful for micromanipulation and dissection where the wide field of view and the ability to zoom in and out in magnification is invaluable. A wide range of objectives and eyepieces are available for different applications. The light sources can be from above, from below the specimen, encircling the specimen using a ring light or from the side giving a darkfield effect (Section 4.2.3). These different light angles serve to add contrast or shadow relief to the images.

4.2.2 The specimen The specimen (sometimes called the sample) can be the entire organism or a dissected organ (whole mount); an aliquot collected during a biochemical protocol for a quick check of the preparation; or a small part of an organism (biopsy) or smear of blood or spermatozoa. In order to collect images from it, the specimen must be in a form that is compatible with the microscope. This is achieved using a published protocol. The end product of a protocol is a relatively thin and somewhat transparent piece of tissue mounted on a piece of glass (slide) in a mounting medium (water, tissue culture medium or glycerol) with a thin square of glass mounted on top (coverslip). Coverslips are graded by their thickness. The thinnest ones are labelled #1, which corresponds to a thickness of approximately 0.17 mm. The coverslip side of the specimen is always placed closest to the objective lens. It is essential to use a coverslip that is optically matched to the objective lens in order to achieve optimal resolution. This is critical for high-magnification imaging because if the coverslip is too thick it will be impossible to achieve an image. The goal of a specimen preparation protocol is to render the tissue of interest into a form for optimal study in the microscope. This usually involves placing the specimen in a suitable medium on a glass slide with a coverslip over it. Such protocols can be relatively simple or they may involve a lengthy series of many steps that take several days to complete (Table 4.2). An example of a simple protocol would be taking an aliquot of a biological preparation, for example, isolating living spermatozoa into a balanced salt solution, placing an aliquot of it onto a slide and gently placing a clean coverslip onto the top. The entire protocol would take less than a minute. The coverslip is sealed to the glass slide in some way, for example, using nail polish for dead cells or perhaps a mixture of beeswax and Vaseline for living cells. Shear forces from the movement of the coverslip over the glass slide can cause damage to the specimen or the objective lens. In order to keep cells alive on the stage of the microscope, they are usually mounted in some form of chamber, and if necessary heated. Many specimens are too thick to be mounted directly onto a slide, and these are cut into thin sections using a device called a microtome. The tissue is usually mounted in a block of wax and cut with the knife of the microtome into thin sections (between 100 mm and 500 mm in thickness). The sections are then placed onto a glass slide, stained and sealed with mounting medium with a coverslip. Some samples are frozen, and cut on a cryostat, which is basically a microtome that can

109

4.2 The light microscope

Table 4.2 Generalised indirect immunofluorescence protocol 1. Fix in 1% formaldehyde for 30 min 2. Rinse in cold buffer 3. Block buffer 4. Incubate in primary antibody e.g. mouse anti-tubulin 5. Wash 4 in buffer 6. Incubate in secondary antibody e.g. fluorescein-labelled rabbit anti-mouse 7. Wash 4 in buffer 8. Incubate in anti-fade reagent e.g. Vectashield 9. Mount on slide with a coverslip 10. View using epifluorescence microscopy

keep a specimen in the frozen state, and produce frozen sections more suitable for immunolabelling (Section 4.2.3). Prior to sectioning, the tissue is usually treated with a chemical agent called a fixative to preserve it. Popular fixatives include formaldehyde and glutaraldehyde, which act by cross-linking proteins, or alcohols, which act by precipitation. All of these fixatives are designed to maintain the structural integrity of the cell. After fixation the specimen is usually permeabilised in order to allow a stain to infiltrate the entire tissue. The amount of permeabilisation (time and severity) depends upon several factors; for example, the size of the stain or the density of the tissue. These parameters are found by trial and error for a new specimen, but are usually available in published protocols. The goal is to infiltrate the entire tissue with a uniform staining.

4.2.3 Contrast in the light microscope Most cells and tissues are colourless and almost transparent, and lack contrast when viewed in a light microscope. Therefore to visualise any details of cellular components it is necessary to introduce contrast into the specimen. This is achieved either by optical means using a specific configuration of microscope components, or by staining the specimen with a dye or, more usually, using a combination of optical and staining methods. Different regions of the cell can be stained selectively with different stains. Optical contrast Contrast is achieved optically by introducing various elements into the light path of the microscope and using lenses and filters that change the pattern of light passing

110

Microscopy

through the specimen and the optical system. This can be as simple as adding a piece of coloured glass or a neutral density filter into the illuminating light path; by changing the light intensity; or by adjusting the diameter of a condenser aperture. Usually all of these operations are adjusted until an acceptable level of contrast is achieved for imaging. The most basic mode of the light microscope is called brightfield (bright background), which can be achieved with the minimum of optical elements. Contrast in brightfield images is usually produced by the colour of the specimen itself. Brightfield is therefore used most often to collect images from pigmented tissues or histological sections or tissue culture cells that have been stained with colourful dyes (Figs. 4.7a, 4.8b). Several configurations of the light microscope have been introduced over the years specifically to add contrast to the final image. Darkfield illumination produces images of brightly illuminated objects on a black background (Figs. 4.7b, 4.8a). This technique has traditionally been used for viewing the outlines of objects in liquid media such as living spermatozoa, microorganisms or cells growing in tissue culture, or for a quick check of the status of a biochemical preparation. For lower magnifications, a simple darkfield setting on the condenser will be sufficient. For more critical darkfield imaging at a higher magnification, a darkfield condenser with a darkfield objective lens will be required. Phase contrast is used for viewing unstained cells growing in tissue culture and for testing cell and organelle preparations for lysis (Fig. 4.7c,d). The method images differences in the refractive index of cellular structures. Light that passes through thicker parts of the cell is held up relative to the light that passes through thinner parts of the cytoplasm. It requires a specialised phase condenser and phase objective lenses (both labelled ‘ph’). Each phase setting of the condenser lens is matched with the phase setting of the objective lens. These are usually numbered as Phase 1, Phase 2 and Phase 3, and are found on both the condenser and the objective lens. Differential interference contrast (DIC) is a form of interference microscopy that produces images with a shadow relief (Fig. 4.7e, f ). It is used for viewing unstained cells in tissue culture, eggs and embryos, and in combination with some stains. Here the overall shape and relief of the structure is viewed using DIC and a subset of the structure is stained with a coloured dye (Fig. 4.8c). Fluorescence microscopy is currently the most widely used contrast technique since it gives superior signal-to-noise ratios (typically white on a black background) for many applications (Fig. 4.9). The most commonly used fluorescence technique is called epifluorescence light microscopy, where ‘epi’ simply means ‘from above’. Here the light source comes from above the sample, and the objective lens acts as both condenser and objective lens (Fig. 4.10). Fluorescence is popular because of the ability to achieve highly specific labelling of cellular compartments. The images usually consist of distinct regions of fluorescence (white) over large regions of no fluorescence (black), which gives excellent signal-to-noise ratios. The light source is usually a high-pressure mercury or xenon vapour lamp, and more recently lasers and LED sources, which emit from the UV into the red wavelengths (Fig. 4.5). A specific wavelength of light is used to excite a fluorescent

111

4.2 The light microscope

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4.7 Contrast methods in the light microscope. (a) and (b) A comparison of brightfield (a) and darkfield images (b). Here the sensory bristles on the surface of the fly appear dark on a white background in the bright

112

Microscopy

(a)

(b)

(c)

Fig. 4.8 Examples of different preparations in the light microscope. (a) Darkfield image of rat sperm preparation. An aliquot was collected from an experimental protocol in order to assess the amount of damage incurred during sonication of a population of spermatozoa. Many sperm heads can be seen in the preparation, and the fibres of the tail are starting to fray (arrowed). (b) A brightfield image of total protein staining on a section of a fly eye cut on a microtome, and stained with Coomassie blue. (c) DIC image of a stained Drosophila embryo – the DIC image shows the outline of the embryo with darker regions of neuronal staining. The DIC image of the whole embryo provides structural landmarks for placing the specific neuronal staining in context of the anatomy.

Caption for Fig. 4.7 (cont.) field image (a) and white on a black background in a dark field image (b). The dark colour in the larger bristles in (a) is produced by pigment. (c) and (d) Phase contrast view of cells growing in tissue culture. Two images extracted from a time-lapse video sequence (time between each frame is 5 min). The sequence shows the movement of a mouse 3T3 fibrosarcoma cell and a chick heart fibroblast. Note the bright ‘phase halo’ around the cells. (e) and (f ) Differential interference contrast (DIC) image of two focal planes of the multicellular alga Volvox. (Images (e) and (f ) courtesy of Michael Davidson, Florida State University, USA.)

113

4.2 The light microscope

(a)

(b)

(c)

Fig. 4.9 Fluorescence microscopy. Comparison of epifluorescence and confocal fluorescence imaging of a mitotic spindle labelled using indirect immunofluorescence labelling with anti-tubulin (primary antibody) and a fluorescently labelled secondary antibody. The specimen was imaged using (a) conventional epifluorescence light microscopy or (b) and (c) using laser scanning confocal microscopy. Note the improved resolution of microtubules in the two confocal images (b) and (c) as compared with the conventional image (a). (b) and (c) represent two different resolution settings of the confocal microscope. Image (b) was collected with the pinhole set to a wider aperture than (c). (Images kindly provided by Brad Amos, University of Cambridge, UK.)

CCD array Emission filter Lens Light source Dichromatic mirror Objective lens Excitation filter Immersion medium Objective lens

Coverslip Specimen Slide

Fig. 4.10 Epifluorescence microscopy. Light from a xenon or mercury arc lamp (Light source) passes through a lens and the excitation filter and reflects off the dichromatic mirror into the objective lens. The objective lens focusses the light at the specimen via the immersion medium (usually immersion oil) and the glass coverslip (see insert). Any light resulting from the fluorescence excitation in the specimen passes back through the objective lens, and since it is of longer wavelength than the excitation light, it passes through the dichromatic mirror. The emission filter only allows light of the specific emission wavelength of the fluorochrome of interest to pass through to the CCD array, where an image is formed.

114

Microscopy

molecule or fluorophore in the specimen (Fig. 4.10). Light of longer wavelength from the excitation of the fluorophore is then imaged. This is achieved in the fluorescence microscope using combinations of filters that are specific for the excitation and emission characteristics of the fluorophore of interest. There are usually three main filters: an excitation, a dichromatic mirror (often called a dichroic) and a barrier filter, mounted in a single housing above the objective lens. For example, the commonly used fluorophore fluorescein is optimally excited at a wavelength of 488 nm, and emits maximally at 518 nm (Table 4.3). A set of glass filters for viewing fluorescein requires that all wavelengths of light from the lamp be blocked except for the 488 nm light. A filter is available that allows a maximum amount of 488 nm light to pass through it (the exciter filter). The 488 nm light is then directed to the specimen via the dichromatic mirror. Any fluorescein label in the specimen is excited by the 488 nm light, and the resulting 518 nm light that returns from the specimen passes through both the dichromatic mirror and the barrier filter to the detector. The emission filters only allow light of 518 nm to pass through to the detector, and ensure that only the signal emitted from the fluorochrome of interest reaches it. Chromatic mirrors and filters can be designed to filter two or three specific wavelengths for imaging specimens labelled with two or more fluorochromes (multiple labelling). The fluorescence emitted from the specimen is often too low to be detected by the human eye or it may be out of the wavelength range of detection of the eye, for example, in the far-red wavelengths (Fig. 4.6). A sensitive digital camera easily detects such signals; for example a CCD or a PMT. Specimen stains Contrast can be introduced into the specimen using one or more coloured dyes or stains. These can be non-specific stains, for example, a general protein stain such as Coomassie blue (Fig. 4.8) or a stain that specifically labels an organelle for example, the nucleus, mitochondria etc. Combinations of such dyes may be used to stain different organelles in contrasting colours. Many of these histological stains are usually observed using brightfield imaging. Other light microscopy techniques may also be employed in order to view the entire tissue along with the stained tissue. For example, one can use DIC to view the entire morphology of an embryo and a coloured stain to image the spatial distribution of the protein of interest within the embryo (Fig. 4.8). More specific dyes are usually used in conjunction with fluorescence microscopy. Immunofluorescence microscopy is used to map the spatial distribution of macromolecules in cells and tissues. The method takes advantage of the highly specific binding of antibodies to proteins. Antibodies are raised to the protein of interest and labelled with a fluorescent probe. This probe is then used to label the protein of interest in the cell and can be imaged using fluorescence microscopy. In practice, cells are usually labelled using indirect immunofluorescence. Here the antibody to the protein of interest (primary antibody) is further labelled with a second antibody carrying the fluorescent tag (secondary antibody). Such a protocol gives a higher fluorescent signal than using a single fluorescently labelled antibody (Table 4.2).

115

4.2 The light microscope

Table 4.3 Table of fluorophores Dye

Excitation max. (nm)

Emission max. (nm)

Fluorescein (FITC)

496

518

Bodipy

503

511

CY3

554

568

Tetramethylrhodamine

554

576

Lissamine rhodamine

572

590

Texas red

592

610

CY5

652

672

Hoechst 33342

346

460

DAPI

359

461

Acridine orange

502

526

Propidium iodide

536

617

TOTO3

642

661

Ethidium bromide

510

595

Feulgen

570

625

Fluo-3

506

526

Calcium green

506

533

CFP (cyan fluorescent protein)

443/445

475/503

GFP (green fluorescent protein)

395/489

509

YFP (yellow fluorescent protein)

514

527

DsRed

558

583

JC-1

514

529

Rhodamine 123

507

529

Commonly used fluorophores

Nuclear dyes

Calcium indicators

Reporter molecules

Mitochondria

116

Microscopy

Additional methods are available for amplifying the fluorescence signal in the specimen, for example using the tyramide amplification method or at the microscope, for example by using a more sensitive detector. A related technique, fluorescence in situ hybridisation (FISH), employs the specificity of fluorescently labelled DNA or RNA sequences. The nucleic acid probes are hybridised to chromosomes, nuclei or cellular preparations. Regions that bind the probe are imaged using fluorescence microscopy. Many different probes can be labelled with different fluorochromes in the same preparation. Multiple-colour FISH is used extensively for clinical diagnoses of inherited genetic diseases. This technique has been applied to rapid screening of chromosomal and nuclear abnormalities in inherited diseases, for example, Down’s syndrome. There are many different types of fluorescent molecules that can be attached to antibodies, DNA or RNA probes for fluorescence analysis (Table 4.3). All of these reagents including primary antibodies are available commercially or often from the laboratories that produced them. An active area of development is the production of the brightest fluorescent probes that are excited by the narrowest wavelength band and that are not damaged by light excitation (photobleaching). Traditional examples of such fluorescent probes include fluorescein, rhodamine, the Alexa range of dyes and the cyanine dyes. A recent addition to the extensive list of probes for imaging is the quantum dot. Quantum dots do not fluoresce per se but they rather are nanocrystals of different sizes that glow in different colours in laser light. The colours depend on the size of the dots, and they have the advantage that they are not photobleached.

4.3 OPTICAL SECTIONING Many images collected from relatively thick specimens produced using epifluorescence microscopy are not very clear. This is because the image is made up of the optical plane of interest together with contributions from fluorescence above and below the focal plane of interest. Since the conventional epifluorescence microscope collects all of the information from the specimen, it is often referred to as a wide field microscope. The ‘out-of-focus fluorescence’ can be removed using a variety of optical and electronic techniques to produce optical sections (Fig. 4.9). The term optical section refers to a microscope’s ability to produce sharper images of specimens than those produced using a standard wide field epifluorescence microscope by removing the contribution from out-of-focus light to the image, and in most cases, without resorting to physically sectioning the tissue. Such methods have revolutionised the ability to collect images from thick and fluorescently labelled specimens such as eggs, embryos and tissues. Optical sections can also be produced using high-resolution DIC optics (Fig. 4.7e, f), micro computerised tomography (CT) scanning or optical projection tomography. However, currently by far the most prevalent method is using some form of confocal or associated microscopical approach.

117

4.3 Optical sectioning

I J H G

K

A B

C D

E

F L

Fig. 4.11 Information flow in a generic LSCM. Light from the laser (A) passes through a neutral density filter (B) and an exciter filter (C) on its way to the scanning unit (D). The scanning unit produces a scanned beam at the back focal plane of the objective lens (E) which focusses the light at the specimen (F). The specimen is scanned in the X and the Y directions in a raster pattern and in the Z direction by fine focussing (arrows). Any fluorescence from the specimen passes back through the objective lens and the scanning unit and is directed via dichromatic mirrors (G) to three pinholes (H). The pinholes act as spatial filters to block any light from above or below the plane of focus in the specimen. The point of light in the specimen is confocal with the pinhole aperture. This means that only distinct regions of the specimen are sampled. Light that passes through the pinholes strikes the PMT detectors (I) and the signal from the PMT is built into an image in the computer (J). The image is displayed on the computer screen (K) often as three greyscale images (K1, K2 and K3) together with a merged colour image of the three greyscale images (K4 and Fig. 4.13a, see colour section). The computer synchronises the scanning mirrors with the build-up of the image in the computer framestore. The computer also controls a variety of peripheral devices. For example, the computer controls and correlates movement of a stepper motor connected to the fine focus of the microscope with image acquisition in order to produce a Z-series. Furthermore the computer controls the area of the specimen to be scanned by the scanning unit so that zooming is easily achieved by scanning a smaller region of the specimen. In this way, a range of magnifications is imparted to a single objective lens so that the specimen does not have to be moved when changing magnification. Images are written to the hard disk of the computer or exported to various devices for viewing, hardcopy production or archiving (L).

4.3.1 Laser scanning confocal microscopes (LSCM) Optical sections are produced in the laser scanning confocal microscope by scanning the specimen point by point with a laser beam focussed in the specimen, and using a spatial filter, usually a pinhole (or a slit), to remove unwanted fluorescence from above and below the focal plane of interest (Fig. 4.11). The power of the confocal

118

(a)

Microscopy

(b)

Fig. 4.12 Computer 3D reconstruction of confocal images. (a) Sixteen serial optical sections collected at 0.3 μm intervals through a mitotic spindle of a PtK1 cell stained with anti-tubulin and a second rhodamine-labelled antibody. Using the Z-series macro program a preset number of frames can be summed, and the images transferred into a file on the hard disk. The stepper motor moves the fine focus control of the microscope by a preset increment. (b) Three-dimensional reconstruction of the data set produced using computer 3D reconstruction software. Such software can be used to view the data set from any specified angle or to produce movies of the structure rotating in 3D.

approach lies in the ability to image structures at discrete levels within an intact biological specimen. There are two major advantages of using the LSCM in preference to conventional epifluorescence light microscopy. Glare from out-of-focus structures in the specimen is reduced and resolution is increased both laterally in the X and the Y directions (0.14 mm) and axially in the Z direction (0.23 mm). Image quality of some relatively thin specimens, for example, chromosome spreads and the leading lamellipodium of cells growing in tissue culture (<0.2 mm thick) is not dramatically improved by the LSCM whereas thicker specimens such as fluorescently labelled multicellular embryos can only be imaged using the LSCM. For successful confocal imaging, a minimum number of photons should be used to efficiently excite each fluorescent probe labelling the specimen, and as many of the emitted photons from the fluorochromes as possible should make it through the light path of the instrument to the detector. The LSCM has found many different applications in biomedical imaging. Some of these applications have been made possible by the ability of the instrument to produce a series of optical sections at discrete steps through the specimen (Fig. 4.12). This Z series of optical sections collected with a confocal microscope are all in register with each other, and can be merged together to form a single projection of the image (Z projection) or a 3D representation of the image (3D reconstruction).

119

4.3 Optical sectioning

Fig. 4.13 Optical sectioning. Optical sections produced using laser scanning confocal microscopy. Comparison of alkaline phosphatase (a) and tyramide-amplified detection of mRNAs (b,c). Staining patterns obtained using DIG-labelled antisense probes directed against the CG14217 mRNAs, through conventional AP-based detection (a) or tyramide signal amplification (b), using tyramide–Alexa Fluor 488 (green fluorescence). Close-up images of tyramide-amplified samples are also shown (c). In (b) and (c), nuclei were labelled in red with propidium iodide. (d, e, f, g) Triple-labelled Drosophila embryo at the cellular blastoderm stage. The images were produced using an air-cooled 25 mW krypton argon laser which has three major lines at 488 nm (blue), 568 nm (yellow) and 647 nm (red). The three fluorochromes used were fluorescein (exc. 496 nm; em. 518 nm), lissamine rhodamine (exc. 572 nm; em. 590 nm) and cyanine 5 (exc. 649 nm; em. 666 nm). The images were collected simultaneously as single optical sections into the red, the green and the blue channels respectively, and merged as a three-colour (red/green/blue) image (Fig. 4.11). The image shows the expression of three genes: hairy (in red), Kru¨ppel (in green) and giant (in blue). Regions of overlap of gene expression appear as an additive colour in the image, for example, the two yellow stripes of hairy expression in the Kru¨ppel domain (g). (Images (a), (b) and (c) were kindly provided by Henry Krause, University of Toronto, Canada.) (See also colour plate.)

Multiple-label images can be collected from a specimen labelled with more than one fluorescent probe using multiple laser light sources for excitation (Fig. 4.13, see also colour section). Since all of the images collected at different excitation wavelengths are in register it is relatively easy to combine them into a single multicoloured image. Here any overlap of staining is viewed as an additive colour change. Most confocal microscopes are able to routinely image three or four different wavelengths simultaneously. The scanning speed of most laser scanning systems is around one full frame per second. This is designed for collecting images from fixed and brightly labelled fluorescent specimens. Such scan speeds are not optimal for living specimens, and laser scanning instruments are available that scan at faster rates for more optimal live cell imaging. In addition to point scanning, swept field scanning rapidly moves a mmthin beam of light horizontally and vertically through the specimen.

120

Microscopy

Fig. 4.14 Time-lapse imaging of Caenorhabditis elegans development. Z-series were collected every 90 s of a developing C. elegans embryo genetically labelled with GFP-histone (nuclear material) and GFP-alpha tubulin (microtubules – cytoskeleton) and imaged with a spinning disk confocal microscope. Each column consists of six optical sections collected 2 mm apart, and the columns are separated by 90 s increments of time. (Image kindly provided by Dr Kevin O’Connell, National Institutes of Health, USA.)

4.3.2 Spinning disk confocal microscopes The spinning disk confocal microscope employs a different scanning system from the LSCM. Rather than scanning the specimen with a single beam, multiple beams scan the specimen simultaneously, and optical sections are viewed in real time. Modern spinning disk microscopes have been improved significantly by the addition of laser light sources and high-quality CCD detectors to the instrument. Spinning disk systems are generally used in experiments where high-resolution images are collected at a fast rate (high spatial and temporal resolution), and are used to follow the dynamics of fluorescently labelled proteins in living cells (Fig. 4.14).

121

4.3 Optical sectioning

Wide field

Laser scanning

Multiple photon

Fig. 4.15 Illumination in a wide field, a confocal and a multiple photon microscope. The diagram shows a schematic of a side view of a fluorescently labelled cell on a coverslip. The shaded green areas in each cell represent the volume of fluorescent excitation produced by each of the different microscopes in the cell. Conventional epifluorescence microscopy illuminates throughout the cell. In the LSCM fluorescence illumination is throughout the cell but the pinhole in front of the detector excludes the out-of-focus light from the image. In the multiple photon microscope, excitation only occurs at the point of focus where the light flux is high enough.

4.3.3 Multiple photon microscopes The multiple photon microscope has evolved from the confocal microscope. In fact, many of the instruments use the same scanning system as the LSCM. The difference is that the light source is a high-energy pulsed laser with tunable wavelengths, and the fluorochromes are excited by multiple rather than single photons. Optical sections are produced simply by focussing the laser beam in the specimen since multiple photon excitation of a fluorophore only occurs where energy levels are high enough – statistically confined to the point of focus of the objective lens (Fig. 4.15). Since red light is used in multiple photon microscopes, optical sections can be collected from deeper within the specimen than those collected with the LSCM. Multiple photon imaging is generally chosen for imaging fluorescently labelled living cells because red light is less damaging to living cells than the shorter wavelengths usually employed by confocal microscopes. In addition, since the excitation of the fluorophore is restricted to the point of focus in the specimen, there is less chance of over exciting (photobleaching) the fluorescent probe and causing photodamage to the specimen itself (Fig. 4.15).

4.3.4 Deconvolution Optical sections can be produced using an image processing method called deconvolution to remove the out-of-focus information from the digital image. Such images are computed from conventional wide field microscope images. There are two basic types of deconvolution algorithm: deblurring and restoration. The approach relies upon knowledge of the point spread function of the imaging system. This is usually

122

Microscopy

Water Vesicles and microtubules outside evanescent field

EVANESCENT FIELD ~100 nm Vesicles and microtubules inside evanescent field

Glass

Fig. 4.16 Total internal reflection microscopy (TIRF). A 100-nm thick region of excitation is produced at the glass–water interface when illumination conditions are right for internal reflection. In this example only those vesicles and microtubules within the evanescent field will contribute to the fluorescence image at 100 nm Z-resolution.

measured by imaging a point source, for example, a small sub-resolution fluorescent bead (0.1mm), and imaging how the point is spread out in the microscope. Since it is assumed that the real image of the bead should be a point, it is possible to calculate the amount of distortion in the image of the bead imposed by the imaging system. The actual image of the point can then be restored using a mathematical function, which can be applied to any subsequent images collected under identical settings of the microscope. Early versions of the deconvolution method were relatively slow; for example, it could take some algorithms in the order of hours to compute a single optical section. Deconvolution is now much faster using today’s fast computers and improved software, and the method compares favourably with the confocal approach for producing optical sections. Deconvolution is practical for multiple-label imaging of both fixed and living cells, and excels over the scanning methods for imaging relatively dim and thin specimens, for example yeast cells. The method can also be used to remove additional background from images that were collected with the LSCM, the spinning disk microscope or a multiple photon microscope.

4.3.5 Total internal reflection microscopy Another area of active research is in the development of single molecule detection techniques. For example total internal reflection microscopy (TIRF) uses the properties of an evanescent wave close to the interface of two media (Fig. 4.16), for example, the region between the specimen and the glass coverslip. The technique relies on the fact that the intensity of the evanescent field falls off rapidly so that the excitation of any fluorophore is confined to a region of just 100 nm above the glass interface. This is thinner than the optical section thickness achieved using confocal methods and allows the imaging of single molecules at the interface.

123

4.4 Imaging living cells and tissues

4.4 IMAGING LIVING CELLS AND TISSUES There are two basically different approaches to imaging biochemical events over time. One strategy is to collect images from a series of fixed and stained tissues at different developmental ages. Each animal represents a single time point in the experiment. Alternatively, the same tissue can be imaged in the living state. Here the events of interest are captured directly. The second approach, imaging living cells and tissues, is technically more challenging than the first approach.

4.4.1 Avoidance of artifacts The only way to eliminate artifacts from specimen preparation is to view the specimen in the living state. Many living specimens are sensitive to light, and especially those labelled with fluorescent dyes. This is because the excitation of fluorophores can release cytotoxic free radicals into the cell. Moreover, some wavelengths are more deleterious than others. Generally, the shorter wavelengths are more harmful than the longer ones and near-infrared light rather than ultraviolet light is preferred for imaging (Fig. 4.5). The levels of light used for imaging must not compromise the cells. This is achieved using extremely low levels of light, using relatively bright fluorescent dyes and extremely sensitive photodetectors. Moreover, the viability of cells may also depend upon the cellular compartment that has been labelled with the fluorochrome. For example, imaging the nucleus with a dye that is excited with a short wavelength will cause more cellular damage than imaging in the cytoplasm with a dye that is excited in the far red. Great care has to be observed in order to maintain the tissue in the living state on the microscope stage. A live cell chamber is usually required for mounting the specimen on the microscope stage. This is basically a modified slide and coverslip arrangement that allows access to the specimen by the objective and condenser lenses. It also supports the cells in a constant environment, and depending on the cell type of interest, the chamber may have to provide a constant temperature, humidity, pH, carbon dioxide and/or oxygen levels. Many chambers have the facility for introducing fluids or perfusing the preparation with drugs for experimental treatments.

4.4.2 Time-lapse imaging Time-lapse imaging continues to be used for the study of cellular dynamics. Here images are collected at predetermined time intervals (Fig. 4.14). Usually a shutter arrangement is placed in the light path so that the shutter is only open when an image is collected in order to reduce the amount of light energy impacting the cells. When the images are played back in real time, a movie of the process of interest is produced, albeit speeded up from real time. Time-lapse is used to study cell behaviour in tissues and embryos and the dynamics of macromolecules within single cells. The event of interest and also the amount of light energy absorbed and tolerated by the cells govern the time interval used. For example, a cell in tissue culture moves relatively slowly

124

Microscopy

and a time interval of 30 s between images might be used. Stability of the specimen and of the microscope is extremely important for successful time-lapse imaging. For example, the focus should not drift during the experiment. Phase contrast was the traditional choice for imaging cell movement and behaviour of cells growing in tissue culture. DIC or fluorescence microscopy is generally chosen for imaging the development of eggs and embryos. Computer imaging methods can be used in conjunction with DIC to improve resolution. Here a background image is subtracted from each time-lapse frame and the contrast of the images is enhanced electronically. In this way microtubules assembled in vitro from tubulin in the presence of microtubule associated proteins can be visualised on glass. These images are below the resolution of the light microscope. Such preparations have formed the basis of motility assays for motor proteins, for example kinesin and dynein.

4.4.3 Fluorescent stains of living cells Relatively few cells possess any inherent fluorescence (autofluorescence) although some endogenous molecules are fluorescent and can be used for imaging, for example, NAD(P)H. Relatively small fluorescent molecules are loaded into living cells using many different methods including diffusion, microinjection, bead loading or electroporation. Relatively larger fluorescently labelled proteins are usually injected into cells, and after time they are incorporated into the general protein pool of the cell for imaging. Many reporter molecules are now available for recording the expression of specific genes in living cells using fluorescence microscopy including viewing whole transgenic animals using fluorescence stereomicroscopes (Table 4.3). The green fluorescent protein (GFP) is a very convenient reporter of gene expression because it is directly visible in the living cell using epifluorescence light microscopy with standard filter sets. The GFP gene can be linked to another gene of interest so that its expression is accompanied by GFP fluorescence in the living cell. No fixation, substrates or co-enzymes are required. The fluorescence of GFP is extremely bright and is not susceptible to photobleaching. Spectral variants of GFP and additional reporters such as DsRed are now available for multiple labelling of living cells. These probes have revolutionised the ability to image living cells and tissues using light microscopy (Fig. 4.17, see also colour section).

4.4.4 Multidimensional imaging The collection of Z-series over time is called four-dimensional (4D) imaging where individual optical sections (X and Y dimensions) are collected at different depths in the specimen (Z dimension) at different times (the fourth dimension), i.e. one time and three space dimensions (Fig. 4.18). Moreover multiple wavelength images can also be collected over time. This approach has been called 5D imaging. Software is now available for the analysis and display of such 4D and 5D data sets. For example, the movement of a structure through the consecutive stacks of images can be traced, changes in volume of a structure can be measured, and the 4D data sets can be

125

4.4 Imaging living cells and tissues

Fig. 4.17 Multiple labelling in a living mouse brain using the ‘Brainbow’ technique. Unique colour combinations in individual neurons are achieved by the relative levels of three or more fluorescent proteins (XFPs). The images are collected using a multi-channel laser scanning confocal microscope. Up to 90 different colours (neurons) can be distinguished using this technique. Top image, hippocampus; bottom image, brainstem. (Image courtesy of Jeff Lichtman, Harvard University, USA.) (See also colour plate.)

126

Microscopy

(a)

(b)

(c)

Single wavelength Z series

Single wavelength time-lapse

Z

Multiple wavelength

t1

focus t2 time

wavelength Y

t3 X

t4

Fig. 4.18 Multidimensional imaging. (a) Single wavelength excitation over time or time-lapse X,Y imaging; (b) Z-series or X,Y,Z imaging. The combination of (a) and (b) is 4D imaging. (c) Multiple wavelength imaging. The combination of (a) and (b) and (c) is 5D imaging.

displayed as series of Z-projections or stereo movies. Multidimensional experiments can present problems for handling large amounts of data since gigabytes of information can be collected from a single 4D imaging experiment.

4.5 MEASURING CELLULAR DYNAMICS Understanding the function of proteins within the context of the intact living cell is one of the main aims of contemporary biological research. The visualisation of specific cellular events has been greatly enhanced by modern microscopy. In addition to qualitatively viewing the images collected with a microscope, quantitative information can be gleaned from the images. The collection of meaningful measurements has been greatly facilitated by the advent of digital image processing. Subtle changes in intensity of probes of biochemical events can be detected with sensitive digital detectors. These technological advancements have allowed insight into the spatial aspects of molecular mechanisms. Relatively simple measurements include counting features within a 2D image or measuring areas and lengths. Measurements of depth and volume can be made in 3D, 4D and 5D data sets. Images can be calibrated by collecting an image of a calibration grid at the same settings of the microscope as were used for collecting the images during the experiment. Many image processing systems allow for a calibration factor to be added into the program, and all subsequent measurements will then be comparable.

127

4.5 Measuring cellular dynamics

Fig. 4.19 Calcium imaging in living cells. A fertilisation-induced calcium wave in the egg of the starfish. The egg was microinjected with the calcium-sensitive fluorescent dye fluo-3 and subsequently fertilised by the addition of sperm during observation using time-lapse confocal microscopy with a 40 water immersion lens and a LSCM. An optical section located near the egg equator was collected every 4 s using the normal scan mode accumulated for two frames, and afterwards the images were corrected for offset and ratioed by linearly dividing the initial pre-fertilisation image into each successive frame of the time-lapse run. The ratioed images were then prepared as a montage and outputted with a pseudocolour look-up table in which blue regions represent low ratios and free calcium levels, and red areas depict high ratios and free calcium levels. Note that the wave sweeps through the entire ooplasm, rather than being cortically restricted. (Image kindly provided by Steve Stricker, University of New Mexico, USA.) (See also colour plate.)

The rapid development of fluorescence microscopy together with digital imaging and, above all, the development of new fluorescent probes of biological activity have brought a new level of sophistication into quantitative imaging. Most of the measurements are based on the ability to measure accurately the brightness of and the wavelength emitted from a fluorescent probe within a sample using a digital imaging system. This is also the basis of flow cytometry, which measures the brightness of each cell in a population of cells as they pass through a laser beam. Cells can be sorted into different populations using a related technique, fluorescence-activated cell sorting. The brightness of the fluorescence from the probe can be calibrated to the amount of probe present at any given location in the cell at high resolution. For example, the concentration of calcium is measured in different regions of living embryos using calcium indicator dyes, for example fluo-3, whose fluorescence increases in proportion to the amount of free calcium in the cell (Fig. 4.19, see also colour section). Many probes have been developed for making such measurements in living tissues. Controls are a necessary part of such measurements since photobleaching and various dye

128

Microscopy

artifacts during the experiment can obscure the true measurements. This can be achieved by staining the sample with two ion-sensitive dyes, and comparing their measured brightness during the experiment. These measurements are usually expressed as ratios (ratio imaging) and control for dye loading problems, photobleaching and instrument variation. Fluorescently labelled proteins can be injected into cells where they incorporate into macromolecular structures over time. This makes the structures accessible to time-lapse imaging using fluorescence microscopy. Such methods can lead to high backgrounds, and can be difficult to interpret. In addition to optical sectioning methods several methods have been developed for avoiding high backgrounds for fluorescence measurements of biochemical events in cells. Fluorescence recovery after photobleaching (FRAP) uses the high light flux from a laser to locally destroy fluorophores labelling the macromolecules to create a bleached zone (photobleaching). The observation and recording of the subsequent movement of undamaged fluorophores into the bleached zone gives a measure of molecular mobility. This enables biochemical analysis within the living cell. A second technique related to FRAP, photoactivation, uses a probe whose fluorescence can be induced by a flash of short wavelength (UV) light. The method depends upon ‘caged’ fluorescent probes that are locally activated (uncaged) by a pulse of UV light. Alternatively variants of GFP can be expressed in cells and selectively photoactivated. The activated probe is imaged using a longer wavelength of light. Here the signal-to-noise ratio of the images can be better than that for photobleaching experiments. A third method, fluorescence speckle microscopy, was discovered as a chance observation while microinjecting fluorescently labelled proteins into living cells. Basically, when a really low concentration of fluorescently labelled protein is injected into cells, the protein of interest is not fully labelled inside the cell. When viewed in the microscope, structures inside cells that have been labelled in this way have a speckled appearance. The dark regions act as fiduciary marks for the observation of dynamics. Fluorescence resonance energy transfer (FRET) is a fluorescence-based method that can take fluorescence microscopy past the theoretical resolution limit of the light microscope allowing the observation of protein–protein interactions in vivo (Fig. 4.20). FRET occurs between two fluorophores when the emission of the first one (the donor) serves as the excitation source for the second one (the acceptor). This will only occur when two fluorophore molecules are very close to one another, at a distance of 6 nm or less. An example of a FRET experiment would be to use spectral variants of GFP (Fig. 4.20). Here the excitation of a cyan fluorescent protein (CFP)-tagged protein is used to monitor the emission of a yellow fluorescent protein (YFP)-tagged protein. YFP fluorescence will only be observed under the excitation conditions of CFP if the proteins are close together. Since this can be monitored over time, FRET can be used to measure direct binding of proteins or protein complexes. A more complex technique, fluorescence lifetime imaging (FLIM) measures the amount of time a fluorophore is fluorescent after excitation with a 10 ns pulse of laser light. FLIM is a method used for detecting multiple fluorophores with different fluorescent lifetimes and overlapping emission spectra.

129

4.6 The electron microscope (EM)

430 nm excitation blue light

NO FRET

490 nm emission green fluorescence

CFP

YFP

no red fluorescence

Separated fluorochromes

490 nm green fluorescence 430 nm energy blue light transfer

FRET

CFP

530 nm emission red fluorescence

YFP

Adjacent fluorochromes

Fig. 4.20 Fluorescence resonance energy transfer (FRET). In the upper example (NO FRET) the cyan fluorescent protein (CFP) and the yellow fluorescent protein (YFP) are not close enough for FRET to occur (more than 60 nm separation). Here excitation with the 430 nm blue light results in the green 490 nm emission of the CFP only. In contrast, in the lower example (FRET), the CFP and YFP are close enough for ‘energy transfer’ or FRET to occur (closer than 6 nm). Here excitation with the 430 nm blue light results in fluorescence of the CFP (green) and of the YFP (red).

Example 1 LOCATING AN UNKNOWN PROTEIN TO A SPECIFIC CELLULAR COMPARTMENT Question You have isolated and purified a novel protein from a biochemical preparation. How might you determine its subcellular distribution and possible function in the cell? Answer Many fluorescent probes are available that label specific cellular compartments. For example, ToTo3 labels the nucleus and fluorescent phalloidins label cell outlines. An antibody to your protein could be raised and used to immunofluorescently label cells. Using a multiple-labelling approach and perhaps an optical sectioning technique such as laser scanning confocal microscopy the distribution of the protein in the cell relative to known distributions can be ascertained. For higher resolution immuno-EM or FRET studies could be performed.

4.6 THE ELECTRON MICROSCOPE (EM) 4.6.1 Principles Electron microscopy is used when the greatest resolution is required, and when the living state can be ignored. The images produced in an electron microscope reveal the

130

Microscopy

ultrastructure of cells. There are two different types of electron microscope – the transmission electron microscope (TEM) and the scanning electron microscope (SEM). In the TEM, electrons that pass through the specimen are imaged. In the SEM electrons that are reflected back from the specimen (secondary electrons) are collected, and the surfaces of specimens are imaged. The equivalent of the light source in an electron microscope is the electron gun. When a high voltage of between 40 000 and 100 000 volts (the accelerating voltage) is passed between the cathode and the anode, a tungsten filament emits electrons (Fig. 4.1). The negatively charged electrons pass through a hole in the anode forming an electron beam. The beam of electrons passes through a stack of electromagnetic lenses (the column). Focussing of the electron beam is achieved by changing the voltage across the electromagnetic lenses. When the electron beam passes through the specimen some of the electrons are scattered while others are focussed by the projector lens onto a phosphorescent screen or recorded using photographic film or a digital camera. The electrons have limited penetration power which means that specimens must be thin (50–100 nm) to allow them to pass through. Thicker specimens can be viewed by using a higher accelerating voltage, for example in the high-voltage electron microscope (HVEM) which uses 1 000 000 V accelerating voltage or in the intermediate voltage electron microscope (IVEM) which uses an accelerating voltage of around 400 000 V. Here stereo images are made by collecting two images at 8–10  tilt angles. Such images are useful in assessing the 3D relationships of organelles within cells when viewed in a stereoscope or with a digital stereo projection system.

4.6.2 Preparation of specimens Contrast in the EM depends on atomic number; the higher the atomic number the greater the scattering and the contrast. Thus heavy metals are used to add contrast in the EM, for example uranium, lead and osmium. Labelled structures appear black or electron dense in the image (Fig. 4.21). All of the water has to be removed from any biological specimen before it can be imaged in the EM. This is because the electron beam can only be produced and focussed in a vacuum. The major drawback of EM observation of biological specimens therefore is the non-physiological conditions necessary for their observation. Nevertheless, the improved resolution afforded by the EM has provided much information about biological structures and biochemical events within cells that could not have been collected using any other microscopical technique. Extensive specimen preparation is required for EM analysis, and for this reason there can be issues of interpreting the images because of artifacts from specimen preparation. For example, specimens have been traditionally prepared for the TEM by fixation in glutaraldehyde to cross-link proteins followed by osmium tetroxide to fix and stain lipid membranes. This is followed by dehydration in a series of alcohols to remove the water, and then embedding in a plastic such as Epon for thin sectioning (Fig. 4.21). Small pieces of the embedded tissue are mounted and sectioned on an ultramicrotome using either a glass or a diamond knife. Ultrathin sections are cut to a thickness

131

(a)

4.6 The electron microscope (EM)

(b)

(c)

Fig. 4.21 Transmission electron microscopy (TEM). (a) and (c) Ultrathin Epon sections (60 nm thick) of developing rat sperm cells stained with uranyl acetate and lead citrate. (b) Carbon surface replica of a mouse sperm preparation.

of approximately 60 nm. The ribbons of sections are floated onto the surface of water and their interference colours are used to assess their thickness. The desired 60 nm section thickness has a silver/gold interference colour on the water surface. The sections are then mounted onto copper or gold EM grids, and are subsequently stained with heavy metals, for example uranyl acetate and lead citrate.

132

Microscopy

(a)

(b)

(c)

(d)

Fig. 4.22 Imaging surfaces using the light microscope (stereomicroscope) and the electron microscope (scanning electron microscope). Images produced using the stereomicroscope (a) and (b) and the scanning electron microscope (c) and (d). A stereomicroscope view of a fly (Drosophila melanogaster) on a butterfly wing (Precis coenia) (a) zoomed in to view the head region of the red-eyed fly (b). SEM image of a similar region of the fly’s head (c) and zoomed more to view the individual ommatidia of the eye (d). Note that the stereomicroscope images can be viewed in real colour whereas those produced using the SEM are in greyscale. Colour can only be added to EM images digitally (d). (Images (b), (c) and (d) kindly provided by Georg Halder, MD Anderson Medical Centre, Houston, USA.) (See also colour plate.)

For the SEM, samples are fixed in glutaraldehyde, dehydrated through a series of solvents and dried completely either in air or by critical point drying. This method removes all of the water from the specimen instantly and avoids surface tension in the drying process thereby avoiding artifacts of drying. The specimens are then mounted onto a special metal holder or stub and coated with a thin layer of gold before viewing in the SEM (Fig. 4.22, see also colour section). Surfaces can also be viewed in the TEM using either negative stains or carbon replicas of air-dried specimens (Fig. 4.21). Immuno-EM methods allow the localisation of molecules within the cellular microenvironment for TEM and on the cell surface for SEM (Fig. 4.23). Cells are prepared in a similar way to indirect immunofluorescence, with the exception that rather than a fluorescent probe bound to the secondary antibody, electron dense colloidal gold particles (10 nm) are used. Multiple labelling can be achieved using different sizes of gold particles attached to antibodies to the proteins of interest. The method depends upon the binding of protein A to the gold particles since protein A binds in turn to antibody fragments. Certain resins, for example Lowicryl and LR White, have been formulated to allow antibodies and gold particles to be attached to ultrathin sections for immunolabelling.

4.6.3 Electron tomography New methods of fixation continue to be developed in an attempt to avoid the artifacts of specimen preparation and to observe the specimen more closely to its living state. Specimens are rapidly frozen in milliseconds by high-pressure freezing. Under these conditions the biochemical state of the cell is more likely to be preserved. Many of these frozen hydrated samples can be observed directly in the EM or they can be chemically

133

4.7 Image archiving

Fig. 4.23 Immunoelectron microscopy. Scanning electron microscope (SEM) image of microbes Enterococcus faecalis labelled with 10 nm collidal gold for the surface adhesion protein ‘aggregation substance’. This protein facilitates exchange of DNA during conjugation. The gold labels appear as white dots on the surface of the bacteria. (Image kindly provided by the late Stan Erlandsen, University of Minnesota, USA.)

fixed using freeze substitution methods. Here fixatives are infused into the preparation at low temperature, after which the specimen is slowly warmed to room temperature. Using cryo-electron tomography (Cryo-ET) the 3D structure of cells and macromolecules can be visualised at 5–8 nm resolution. Cells are typically rapidly frozen, fixed by freeze substitution and embedded in epoxy resin. Thick 200 nm sections are cut and imaged in the TEM equipped with a tilting stage. A typical tilt series of 100 or so images is collected in a digital form and exported to a computer reconstruction program for analysis. By using electron tomography, a 2D digital EM image is converted into a highresolution 3D representation of the specimen (Fig. 4.24, see also colour section). The method is especially useful for imaging the fine connections within cells especially the cytoskeleton and nuclear pores and elucidating the surface structures of viruses.

4.6.4 Integrated microscopy The same specimen can be viewed in the light microscope and subsequently in the EM. This approach is called integrated microscopy. The correlation of images of the same cell collected using the high temporal resolution of the light microscope and the high spatial resolution of the EM gives additional information to imaging using the two techniques separately (Fig. 4.25). The integrated approach also addresses the problem of artifacts. Probes are now available that are fluorescent in the light microscope and are electron dense in the EM.

4.7 IMAGE ARCHIVING Most images produced by any kind of modern microscope are collected in a digital form. In addition to greatly speeding up the collection of the images (and experiment times), the use of digital imaging has allowed the use of digital image databases and

134

(a)

Microscopy

(b)

(d)

(c)

Fig. 4.24 Electron tomography revealing the interconnected nature of SARS–Coronavirus-induced doublemembrane vesicles. Monkey kidney cells were infected with SARS–Coronavirus in a biosafety level-3 laboratory and pre-fixed using 3% paraformaldehyde at 7 h post-infection. Subsequently, the cells were rapidly frozen by plunge-freezing and freeze substitution was performed at low temperature, using osmium tetraoxide and uranyl acetate in acetone to optimally preserve cellular ultrastructure and gain maximal contrast. After washing with pure acetone at room temperature, the samples were embedded in an epoxy resin and polymerised at 60  C for 2 days. Using an ultramicrotome, 200-nm thick sections were cut, placed on a 100 mesh EM grid, and used for electron tomography. To facilitate the image alignment that is required for the final 3D reconstruction, a suspension of 10 nm gold particles was layered on top of the sections as fiducial markers (a). Scale bar represents 100 nm. Images were recorded with an FEI T12 transmission electron microscope operating at an acceleration voltage of 120 kV. A tilt series consisted of 131 images recorded using 1 tilt increments between 65 and 65 . For dual-axis tomography, which improves resolution in the X and Y directions, the specimen was rotated 90 around the Z-axis and a second tilt series was recorded. To compute the final electron tomogram, the dual-axis tilt series were aligned by means of the fiducial markers using the IMOD software package. A single tomogram slice through the 3D reconstruction with a digital thickness of 1.2 nm is shown in (b). The 3D surface-rendered reconstruction of viral structures and adjacent cellular features (c) was made by thresholding and subsequent surface rendering using the AMIRA Visualization Package (TGS Europe). The final 3D surface-rendered model (d) shows interconnected double-membrane vesicles (outer membrane, gold; inner membrane, silver) and their connection to an endoplasmic reticulum stack (depicted in bronze). (Images kindly provided by Kevin Knoops and Eric Snijder, Leiden University, The Netherlands.) (See also colour plate.)

the rapid transfer of information between laboratories across the World Wide Web. Moreover there is no loss in resolution or colour balance from the images collected at the microscope as they pass between laboratories and journal web pages. International image databases are under development for the storage and access of microscope image data from many different locations. One such effort is the Open Microscopy Environment (OME). There is a trend for modern microscopes to produce more and more data, especially when multi-dimensional datasets are generated. This trend is continuing with the need to develop automated methods of image analysis for large scale screening of gene expression data from genomic screens.

135

4.7 Image archiving

Table 4.4 Websites of interest http://www.microscopyu.com/ http://www.microscopy.fsu.edu/ http://www.microscopy-analysis.com/ http://www.msa.microscopy.org http://www.rms.org.uk/index.shtml http://www.peachpit.com/articles/article.aspx?p=1221827 http://www.openmicroscopy.org http://swehsc.pharmacy.arizona.edu/exppath/micro/index.html http://www.itg.uiuc.edu/ http://rsb.info.nih.gov/ij/ http://www.openmicroscopy.org (a)

(b)

(c)

Fig. 4.25 Integrated microscopy. (a) Epifluorescence image and (b) and (c) whole mount TEM at different magnifications of the same cell. The fluorescence image is labelled with rhodamine phalloidin, which stains polymerised actin. A stress fibre at the periphery of the cell appears as a white line in the fluorescence image (a), and when viewed in the TEM the stress fibres appear as aligned densities of actin filaments. The TEM whole mount was prepared using detergent extraction, chemical fixation, critical point drying and platinum/carbon coating. (Image kindly provided by Tatyana Svitkina, University of Pennsylvania, USA.)

136

Microscopy

More detailed information on any of the microscopes and their applications in biochemistry and molecular biology can accessed on the World Wide Web. Several websites have been included as starting points for further study (Table 4.4). Should any of these listed websites become out of date, more information on any topic can be accessed using a web search engine. In addition, a comprehensive reference list has been provided for more detailed information (Section 4.8). The field of microscopy continues to be advanced but the basic principles and practices of light and electron microscopy remain unchanged.

4.8 SUGGESTIONS FOR FURTHER READING Abramowitz, M. (2003). Microscope Basics and Beyond. Melville, NY: Olympus of America. (Good well-illustrated primer on all aspects of basic light microscopy, also available online as a pdf. file.) Afzelius, B. A. and Maunsbach, A. B. (2004). Biological ultrastructure research: the first 50 years. Tissue Cell, 36, 83–94. (Ageless review of the early history of electron microscopy.) Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. (2007). Molecular Biology of the Cell, 5th edn. New York: Garland Science. (Basic introduction to all forms of microscopy and live cell imaging for the cell biologist.) Andrews, P. D., Harper, I. S. and Swedlow, J. R. (2002). To 5D and beyond: quantitative fluorescence microscopy in the postgenomic era. Traffic, 3, 29–36. (Review of multidimensional imaging, methods of coping with large data sets and international image databases.) Baumeister, W. (2004). Mapping molecular landscapes inside cells. Biological Chemistry, 385, 865–872. (Review of electron tomography.) Cox, G. C. (2006). Optical Imaging Techniques in Cell Biology. Boca Raton, FL: CRC Press. (Overview of the entire field of light microscopy.) Damle, S., Hanser, B., Davidson, E. H. and Fraser, S. E. (2006). Confocal quantification of cis-regulatory reporter gene expression in living sea urchins. Developmental Biology, 299, 543–550. (Practical example of quantitative measurements in living cells.) Darzacq, X. et al. (2009). Imaging transcription in living cells. Annual Review of Biophysics, 38, 173–196. Dunn, G. A. and Jones, G. E. (2004). Cell motility under the microscope: Vorsprung durch Technik. Nature Reviews Molecular and Cell Biology, 5, 667–672. (Review of techniques used to study cell motility.) Evanko, D., Heinrichs, A. and Karlsson-Rosenthal, C. (eds.) (2009). Light Microscopy. Nature Milestones. www.nature.com/milestones/light-microscopy (Well-produced and complete review of all aspects of contemporary light microscopy.) Frankel, F. (2002). Envisioning Science: The Design and Craft of the Science Image. Cambridge, MA: MIT Press. (Popular work on imaging with some great tips and tricks for the stereomicroscope.) Giepmans, B. N. G., Adams, S. R., Ellisman, M. H. and Tsien, R. Y. (2006). The fluorescent toolbox for assessing protein location and function. Science, 312, 217–224. (A review of the characteristics and benefits of using fluorescent probes to study proteins.) Hadjantonakis, A. K., Dickinson, M. E., Fraser, S. E. and Papaioannou, V. E. (2003). Technicolor transgenics: imaging tools for functional genomics in the mouse. Nature Review Genetics, 4, 613–625. Heath, J. P. (2005). Dictionary of Microscopy. Chichester, UK: John Wiley. Hoenger, A. and McIntosh, J. R. (2009). Probing the macromolecular organisation of cells by electron tomography. Current Opinion in Cell Biology, 21, 89–96. Inoue, S. and Spring, K. (1997). Video Microscopy: The Fundamentals, 2nd edn. New York: Plenum Press. (The classic text on live cell imaging, video microscopy and general microscopy.) Jaiswal, J. K. and Simon, S. M. (2007). Imaging single events at the cell membrane. Nature Chemical Biology, 3, 92–98. (Overview of high resolution methods of light microscopy including TIRF.)

137

4.8 Suggestions for further reading

Keller, P. J., Schimdt, A. D., Wittbrodt, J. and Stelzer, E. H. K. (2008). Reconstruction of zebrafish early embryonic development by scanned light sheet microscopy. Sciencexpress, www. sciencexpress.org, 9 October 2008. (Application of scanning light microscopy to image living zebrafish embryos – stunning movies of zebrafish embryogenesis available online.) Knoops, K., Kikkert, M., van den Worm, S. H. E., Zevenhoven-dobbe, J. C., van der Meer, Y., Koster, A. J., Mommaas, A. M. and Snijder, E. J. (2008). SARS–Coronavirus replication is supported by a reticulovesicular network of modified endoplasmic reticulum. PLoS Biology, 6, 1957–1974. (Cryo-electron tomography in action.) Lichtman, J. W. and Fraser, S. E. (2001). The neuronal naturalist: watching neurons in their native habitat. Nature Neuroscience (Suppl.), 4, 1215–1220. Livet, J., Weissman, T. A., Kang, H., Draft, R. W., Lu, J., Bennis, R. A., Sanes, J. R. and Lichtman, J. W. (2007). Transgenic strategies for combinatorial expression of fluorescent proteins in the nervous system. Nature, 450, 56–62. (Imaginative use of reporter gene technology to label multiple neurons in living brains.) McGurk, L., Morrison, H., Keegan, L. P., Sharpe, J. and O’Connell, M. A. (2007). Three-dimensional imaging of Drosophila melanogaster. PLoS ONE, 2, E834. (Methods of three-dimensional imaging including confocal and optical projection tomography.) Sedgewick, J. (2008). Scientific Imaging with PhotoShop: Methods, Measurement, and Output. Berkeley, CA: Pearson Education, Peachpit Press. (Practical manual on the use of PhotoShop for measuring and preparing images for publication.) Shapiro, H. M. (2003). Practical Flow Cytometry, 4th edn. New York: John Wiley. (Wonderfully written book on basic fluorescence and flow cytometry.) Spector, D. L. and Goldman, R. D. (2006). Basic Methods in Microscopy. Plainview, NY: Cold Spring Harbor Laboratory Press. (A good introduction to contemporary methods of imaging both fixed and living cells at both the light and electron microscope level.) Swedlow, J. R., Lewis, S. E. and Goldberg, I. G. (2006). Modeling data across labs, genomes, space and time. Nature Cell Biology, 8, 1190–1194. Swedlow, J. R., Goldberg, I. G., Eliceiri, K. W. and the OME Consortium (2009). Bioimage informatics for experimental biology. Annual Review of Biophysics, 38, 327–346. Tomancak, P., Berman, B. P., Beaton, A., Weiszmann, R., Kwan, E., Hartenstein, V., Celniker, S. E. and Rubin, G. M. (2007). Global analysis of gene expression during Drosophila embryogenesis. Genome Biology, 8, R145. Van Roessel, P. and Brand, A. H. (2002). Imaging into the future: visualizing gene expression and protein interaction with fluorescent proteins. Nature Cell Biology, 4, E15–E20. (Good primer on GFP and FRET.) Volpi, E. V. and Bridger, J. M. (2008). FISH glossary: an overview of the fluorescence in situ hybridization technique. BioTechniques, 45, 385–409. Wallace, W., Schaefer, L. H. and Swedlow, J. R. (2001). Workingperson’s guide to deconvolution in light microscopy. BioTechniques, 31, 1076–1097. (Comprehensive review of the deconvolution technique.) Wilt, B. A., Burns, L. D., Tatt Wei Ho, E., Ghosh, K. K., Mukamel, E. A. and Schnitzer, M. J. (2009). Advances in light microscopy for neuroscience. Annual Review of Neuroscience, 32, 435–506. (Complete coverage of all modern methods of imaging including super-resolution methods.) Zhang, J., Campbell, R. E., Ting, A. Y. and Tsien, R. Y. (2002). Creating new fluorescent probes for cell biology. Nature Reviews Molecular Cell Biology, 3, 906–918. (Review of the development of fluorescent probes of biological activity especially reporter molecules.)

5 Molecular biology, bioinformatics and basic techniques R. RAPLEY

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12

Introduction Structure of nucleic acids Genes and genome complexity Location and packaging of nucleic acids Functions of nucleic acids The manipulation of nucleic acids – basic tools and techniques Isolation and separation of nucleic acids Molecular biology and bioinformatics Molecular analysis of nucleic acid sequences The polymerase chain reaction (PCR) Nucleotide sequencing of DNA Suggestions for further reading

5.1 INTRODUCTION The completion of the Human Genome Project (HGP) has been heralded as one of the major landmark events in science. The human genome contains the blueprint for human development and maintenance and may ultimately provide the means to understand human cellular and molecular processes in both health and disease. The genome is the full complement of DNA from an organism and carries all the information needed to specify the structure of every protein the cell can produce. The realisation that DNA lies behind all of the cell’s activities led to the development of what is termed molecular biology. Rather than a discrete area of biosciences, molecular biology is now accepted as a very important means of understanding and describing complex biological processes. The development of methods and techniques for studying processes at the molecular level has led to new and powerful ways of isolating, analysing, manipulating and exploiting nucleic acids. Moreover, to keep pace with the explosion in biological information the discipline termed bioinformatics has evolved and provides a vital role in current biosciences. The completion of the human genome project and numerous other genome projects has allowed the continued 138

139

5.2 Structure of nucleic acids

development of new exciting areas of biological sciences such as biotechnology, genome mapping, molecular medicine and gene therapy. In considering the potential utility of molecular biology techniques it is important to understand the basic structure of nucleic acids and gain an appreciation of how this dictates the function in vivo and in vitro. Indeed many techniques used in molecular biology mimic in some way the natural functions of nucleic acids such as replication and transcription. This chapter is therefore intended to provide an overview of the general features of nucleic acid structure and function and describe some of the basic methods used in its isolation and analysis.

5.2 STRUCTURE OF NUCLEIC ACIDS 5.2.1 Primary structure of nucleic acids DNA and RNA are macromolecular structures composed of regular repeating polymers formed from nucleotides. These are the basic building blocks of nucleic acids and are derived from nucleosides which are composed of two elements: a five-membered pentose carbon sugar (2-deoxyribose in DNA and ribose in RNA), and a nitrogenous base. The carbon atoms of the sugar are designated ‘prime’ (l0 , 20 , 30 , etc.) to distinguish them from the carbons of nitrogenous bases of which there are two types, either a purine or a pyrimidine. A nucleotide, or nucleoside phosphate, is formed by the attachment of a phosphate to the 50 position of a nucleoside by an ester linkage (Fig. 5.1). Such nucleotides can be joined together by the formation of a second ester bond by reaction between the phosphate of one nucleotide and the 30 hydroxyl of another, thus generating a 50 to 30 phosphodiester bond between adjacent sugars; this process can be repeated indefinitely to give long polynucleotide molecules (Fig. 5.2). DNA has two such polynucleotide strands; however, since each strand has both a free 50 hydroxyl group at one end, and a free 30 hydroxyl at the other end, each strand has a polarity or directionality. The polarity of the two strands of the molecule is in opposite directions, and thus DNA is described as an antiparallel structure (Fig. 5.3). The purine bases (composed of fused five- and six-membered rings), adenine (A) and guanine (G), are found in both RNA and DNA, as is the pyrimidine (a single sixmembered ring) cytosine (C). The other pyrimidines are each restricted to one type of nucleic acid: uracil (U) occurs exclusively in RNA, whilst thymine (T) is limited to DNA. Thus it is possible to distinguish between RNA and DNA on the basis of the presence of ribose and uracil in RNA, and deoxyribose and thymine in DNA. However, it is the sequence of bases along a molecule that distinguishes one DNA (or RNA) from another. It is conventional to write a nucleic acid sequence starting at the 50 end of the molecule, using single capital letters to represent each of the bases, e.g. CGGATCT. Note that there is usually no point in including the sugar or phosphate groups, since these are identical throughout the length of the molecule. Terminal phosphate groups can, when necessary, be indicated by use of a ‘p’; thus 50 pCGGATCT 30 indicates the presence of a phosphate on the 50 end of the molecule.

140

Molecular biology, bioinformatics and basic techniques

Pyrimidines

Purines O

NH2 N1 HC

C 6

5 4

2

N

C

CH

8

C

3

N

C

HN

7

N

C

N3

CH

C

9

N H

N

H2N

adenine

O

NH2

C

C

N H

O

4

5 6

CH

HN

C

C

CH

1

N H

guanine

Pentoses

C

2

O

cytosine

N H

O CH

C

CH

O

uracil (RNA only)

Nucleosides

CH2 O

Nucleotides NH2

NH2

1 3

N

2

C

C

N

N

C

HC N

C

N

N

C

CH

OH OH

N

C

N

CH2 O

C

C

N CH

HC

CH HC

N

N

C

N

O

ribose HO

CH

NH2

OH

4

N H

CH3

C

thymine (DNA only)

5

HO

C

HN

O P O CH2 O HO

OH

HO

CH2 O

CH2 O

O OH OH

OH

OH

OH OH

deoxyribose

adenosine

adenosine monophosphate

deoxyadenosine

Fig. 5.1 Structure of bases, nucleosides and nucleotides. (5end) O O

P

O

5 CH2

Base

O

O O P

OH O

Repeated many times to give polynucleotide

O O

CH2

Base

O

O

OH O P

O

Phosphodiester linkage

O

CH2

Base

O

(3end) 3 O

Fig. 5.2 Polynucleotide structure.

OH

141

5.2 Structure of nucleic acids

5

3

3

5

Fig. 5.3 The antiparallel nature of DNA. One strand in a double helix runs 5’ to 3’, whilst the other strand runs in the opposite direction 3’ to 5’. The strands are held together by hydrogen bonds between the bases.

5.2.2 Secondary structure of nucleic acids The two polynucleotide chains in DNA are usually found in the shape of a right-handed double helix, in which the bases of the two strands lie in the centre of the molecule, with the sugar–phosphate backbones on the outside. A crucial feature of this doublestranded structure is that it depends on the sequence of bases in one strand being complementary to that in the other. A purine base attached to a sugar residue on one strand is always hydrogen bonded to a pyrimidine base attached to a sugar residue on the other strand. Moreover, adenine (A) always pairs with thymine (T) or uracil (U) in RNA, via two hydrogen bonds, and guanine (G) always pairs with cytosine (C) by three hydrogen bonds (Fig. 5.4). When these conditions are met a stable double helical structure results in which the backbones of the two strands are, on average, a constant distance apart. Thus, if the sequence of one strand is known, that of the other strand can be deduced. The strands are designated as plus (þ) and minus () and an RNA molecule complementary to the minus () strand is synthesised during transcription (Section 5.5.3). The base sequence may cause significant local variations in the shape of the DNA molecule and these variations are vital for specific interactions between the DNA and various proteins to take place. Although the three-dimensional structure of DNA may vary it generally adopts a double helical structure termed the B form or B-DNA in vivo. There are also other forms of right-handed DNA such as A and C, which are formed when DNA fibres are subjected to different relative humidities (Table 5.1).

142

Molecular biology, bioinformatics and basic techniques

Table 5.1 The various forms of DNA DNA form

% humidity

Helix direction

Base/turn helix

Helix diameter (A)

B

92%

RH

10

19

A

75%

RH

11

23

C

66%

RH

Z

(Pu-Py)n

LH

9.3 12

19 18

Notes: RH, right-handed helix; LH, left-handed helix; Pu, Purine; Py, Pyrimidine. Different forms of DNA may be obtained by subjecting DNA fibres to different relative humidities. The B form is the most common form of DNA whilst the A and C forms have been derived under laboratory conditions. The Z form may be produced with a DNA sequence made up from alternating purine and pyrimidine nucleotides.

Thymine

Adenine H

CH3 C H

O

H

N

C

C

N C

N

H

H C

C

N

N

C

C N

C

C

C

N

H

O Cytosine H

H

N C

H

O

N C

N N

H

C

C

C

Guanine

H

N

C

C C

O

H

H C

C

N

N

C

N H

constant distance between C–1 of deoxyriboses (1.1 nm)

Fig. 5.4 Base-pairing in DNA. C in a circle represents carbon at the 1’ position of deoxyribose.

The major distinguishing feature of B-DNA is that it has approximately 10 bases for one turn of the double helix; furthermore a distinctive major and minor groove may be identified (Fig. 5.5). In certain circumstances where repeated DNA sequences or motifs are found the DNA may adopt a left-handed helical structure termed Z-DNA.

143

5.2 Structure of nucleic acids

3

5 G

C

Sugar–phosphate backbone

T

A A

T

10 base-pairs per turn

CG A

T

G

C

C

Base-pairs at centre of double helix

G A

T AT CG T

G

A C

T

A

Strands run in opposite directions (antiparallel)

CG 3

5

Fig. 5.5 The DNA double helix.

This form of DNA was first synthesised in the laboratory and is thought not to exist in vivo. The various forms of DNA serve to show that it is not a static molecule but dynamic and constantly in flux, and may be coiled, bent or distorted at certain times. Although RNA almost always exists as a single strand, it often contains sequences within the same strand that are self-complementary, and which can therefore base-pair if brought together by suitable folding of the molecule. A notable example is transfer RNA (tRNA) which folds up to give a clover-leaf secondary structure (Fig. 5.6).

5.2.3 Separation of double-stranded DNA The two antiparallel strands of DNA are held together only by the weak forces of hydrogen bonding between complementary bases, and partly by hydrophobic interactions between adjacent, stacked base pairs, termed base-stacking. Little energy is needed to separate a few base pairs, and so, at any instant, a few short stretches of DNA will be opened up to the single-stranded conformation. However, such stretches immediately pair up again at room temperature, so the molecule as a whole remains predominantly double-stranded. If, however, a DNA solution is heated to approximately 90  C or above there will be enough kinetic energy to denature the DNA completely, causing it to separate into single strands. This is termed denaturation and can be followed spectrophotometrically by monitoring the absorbance of light at 260 nm. The stacked bases of double-stranded DNA are less able to absorb light than the less constrained bases of single-stranded molecules, and so the absorbance of DNA at 260 nm increases as the DNA becomes denatured, a phenomenon known as the hyperchromic effect.

144

Molecular biology, bioinformatics and basic techniques

3 end (accepts phenylalanine) OH A C C 5 end P G

D

G

C

C

G

G

C

G

U

A

U

U

A

U

A

C

Acceptor stem

T stem G A C A C

A

A

D

A

G mC U G U G

U C mG

T

C G

G G

G

A G

U

C



C

G A G

C

G

A

U

G

mC

A



D stem D loop

Cm

Variable loop

Anticodon stem

A

U Gm

C T loop

mG

m2G

A

C U mA

Y A

Anticodon loop

A

Anticodon

Fig. 5.6 Secondary structure of yeast tRNAPhe. A single strand of 76 ribonucleotides forms four double-stranded ‘stem’ regions by base-pairing between complementary sequences. The anticodon will base-pair with UUU or UUC (both are codons for phenylalanine); phenylalanine is attached to the 3’ end by a specific aminoacyl tRNA synthetase. Several ‘unusual’ bases are present: D, dihydrouridine; T, ribothymidine; ψ, pseudouridine; Y, very highly modified, unlike any ‘normal’ base. mX indicates methylation of base X (m2X shows dimethylation); Xm indicates methylation of ribose on the 2’ position.

The absorbance at 260 nm may be plotted against the temperature of a DNA solution which will indicate that little denaturation occurs below approximately 70  C, but further increases in temperature result in a marked increase in the extent of denaturation. Eventually a temperature is reached at which the sample is totally denatured, or melted. The temperature at which 50% of the DNA is melted is termed the melting temperature or Tm, and this depends on the nature of the DNA (Fig. 5.7). If several different samples of DNA are melted, it is found that the Tm is highest for those DNAs which contain the highest proportion of cytosine and guanine, and Tm can actually be used to estimate the percentage (C þ G) in a DNA sample. This relationship

145

5.3 Genes and genome complexity

1.2 50 1.1

1.0

Degree of denaturation (%)

Absorbance at 260 nm

1.3

0 40

60

80 Tm

100

Temperature (°C)

Fig. 5.7 Melting curve of DNA.

between Tm and (C þ G) content arises because cytosine and guanine form three hydrogen bonds when base-paired, whereas thymine and adenine form only two. Because of the differential numbers of hydrogen bonds between A–T and C–G pairs those sequences with a predominance of C–G pairs will require greater energy to separate or denature them. The conditions required to separate a particular nucleotide sequence are also dependent on environmental conditions such as salt concentration. If melted DNA is cooled it is possible for the separated strands to reassociate, a process known as renaturation. However, a stable double-stranded molecule will only be formed if the complementary strands collide in such a way that their bases are paired precisely, and this is an unlikely event if the DNA is very long and complex (i.e. if it contains a large number of different genes). Measurements of the rate of renaturation can give information about the complexity of a DNA preparation. Strands of RNA and DNA will associate with each other, if their sequences are complementary, to give double-stranded, hybrid molecules. Similarly, strands of radioactively labelled RNA or DNA, when added to a denatured DNA preparation, will act as probes for DNA molecules to which they are complementary. This hybridisation of complementary strands of nucleic acids is very useful for isolating a specific fragment of DNA from a complex mixture. It is also possible for small single-stranded fragments of DNA (up to 40 bases in length) termed oligonucleotides to hybridise to a denatured sample of DNA. This type of hybridisation is termed annealing and again is dependent on the base sequence of the oligonucleotide and the salt concentration of the sample.

5.3 GENES AND GENOME COMPLEXITY 5.3.1 Gene complexity Each region of DNA which codes for a single RNA or protein is called a gene, and the entire set of genes in a cell, organelle or virus forms its genome. Cells and organelles

146

Molecular biology, bioinformatics and basic techniques

Table 5.2 Repetitive satellite sequences found in DNA, and their characteristics Types of repetitive DNA

Repeat unit size (bp)

Characteristics/motifs

Satellite DNA

5200

Large repeat unit range (Mb) usually found at centromeres

Telomere sequence

6

Found at the ends of chromosomes. Repeat unit may span up to 20 kb G-rich sequence

Hypervariable sequence

1060

Repeat unit may span up to 20 kb

14

Mononucleotide repeat of adenine dinucleotide repeats common (CA). Usually known as VNTR (variable number tandem repeat)

Minisatellite DNA

Microsatellite DNA

Notes: bp, base-pairs; kb, kilobase-pairs.

may contain more than one copy of their genome. Genomic DNA from nearly all prokaryotic and eukaryotic organisms is also complexed with protein and termed chromosomal DNA. Each gene is located at a particular position along the chromosome, termed the locus, whilst the particular form of the gene is termed the allele. In mammalian DNA each gene is present in two allelic forms which may be identical (homozygous) or which may vary (heterozygous). It is thought that there are approximately 20 000 genes present in the human genome, although not all will be expressed in a given cell at the same time. However various processing events such as alternative splicing or RNA editing can increase the number of actual proteins found in the cell in relation to the number of genes to nearly 1 million. The occurrence of different alleles at the same site in the genome is termed polymorphism. In general the more complex an organism the larger its genome, although this is not always the case since many higher organisms have non-coding sequences some of which are repeated numerous times and termed repetitive DNA. In mammalian DNA repetitive sequences may be divided into low copy number and high copy number DNA. The latter is composed of repeat sequences that are dispersed throughout the genome and those that are clustered together. The repeat cluster DNA may be defined into so-called classical satellite DNA, minisatellite and microsatellite DNA, the latter being mainly composed of dinucleotide repeats (Table 5.2). These sequences are termed polymorphic, collectively termed polymorphisms, and vary between individuals; they also form the basis of genetic fingerprinting.

5.3.2 Single nucleotide polymorphisms (SNPs) A further important source of polymorphic diversity known to be present in genomes is termed single nucleotide polymorphisms or SNPs (pronounced snips). SNPs are substitutions of one base at a precise location within the genome. Those that occur in coding regions are termed cSNPs. Estimates indicate that an SNP occurs every once in

147

5.3 Genes and genome complexity

every 300 bases and there are thought to be approximately 10 million in the human genome. Interest in SNPs lies in the fact that these polymorphisms may account for the differences in disease susceptibility, drug metabolism and response to environmental factors between individuals. Indeed there are now a number of initiatives to identify SNPs and produce genomic SNP maps. One initiative is the international HapMap project. This will enable a haplotype map of common sources of variations from groups of associated SNPs to be produced. This will potentially allow a set of socalled tag SNPs to be identified and potentially provide an association between the haplotype and a disease.

5.3.3 Chromosomes and karyotypes Higher organisms may be identified by using the size and shape of their genetic material at a particular point in the cell division cycle, termed metaphase. At this point DNA condenses to form a number of very distinct chromosome structures. Various morphological characteristics of chromosomes may be identified at this stage including the centromere and the telomere. The array of chromosomes from a given organism may also be stained with dyes such as giemsa stain and subsequently analysed by light microscopy. The complete array of chromosomes in an organism is termed the karyotype. In certain genetic disorders aberrations in the size, shape and number of chromosomes may occur and thus the karyotype may be used as an indicator of the disorder. Perhaps the most well known example of this is the correlation of Down syndrome, where three copies of chromosome 21 (trisomy 21) exist rather than two as in the normal state.

5.3.4 Renaturation kinetics and genome complexity When preparations of double-stranded DNA are denatured and allowed to renature, measurement of the rate of renaturation can give valuable information about the complexity of the DNA, i.e. how much information it contains (measured in basepairs). The complexity of a molecule may be much less than its total length if some sequences are repetitive, but complexity will equal total length if all sequences are unique, appearing only once in the genome. In practice, the DNA is first cut randomly into fragments about 1 kb in length (Section 5.9), and is then completely denatured by heating above its Tm (Section 5.2.3). Renaturation at a temperature about 10  C below the Tm is monitored either by decrease in absorbance at 260 nm (the hypochromic effect), or by passing samples at intervals through a column of hydroxylapatite, which will adsorb only double-stranded DNA, and measuring how much of the sample is bound. The degree of renaturation after a given time will depend on Co, the concentration (in nucleotides per unit volume) of doublestranded DNA prior to denaturation, and t, the duration of the renaturation in seconds. For a given Co, it should be evident that a preparation of bacteriophage l DNA (genome size 49 kb) will contain many more copies of the same sequence per unit

148

Molecular biology, bioinformatics and basic techniques

0

Renaturation (%)

Rapidly renaturing DNA

50

Slowly renaturing DNA

100 –2

–1

0

1

2

3

4

5

Log Cot

Fig. 5.8 Cot curve of human DNA. DNA was allowed to renature at 60  C after being completely dissociated by heat. Samples were taken at intervals and passed through a hydroxylapatite column to determine the percentage of double-stranded DNA present. This percentage was plotted against log Cot (original concentration of DNA  3 time of sampling).

volume than a preparation of human DNA (haploid genome size 3  106 kb), and will therefore renature far more rapidly, since there will be more molecules complementary to each other per unit volume in the case of l DNA, and therefore more chance of two complementary strands colliding with each other. In order to compare the rates of renaturation of different DNA samples it is usual to measure Co and the time taken for renaturation to proceed half way to completion, t1/2, and to multiply these values together to give a Cot1/2 value. The larger the Cot1/2, the greater the complexity of the DNA; hence l DNA has a far lower Cot1/2 than does human DNA. In fact, the human genome does not renature in a uniform fashion. If the extent of renaturation is plotted against log Cot (this is known as a Cot curve), it is seen that part of the DNA renatures quite rapidly, whilst the remainder is very slow to renature (Fig. 5.8). This indicates that some sequences have a higher concentration than others; in other words, part of the genome consists of repetitive sequences. These repetitive sequences can be separated from the single-copy DNA by passing the renaturing sample through a hydroxylapatite column early in the renaturation process, at a time which gives a low value of Cot. At this stage only the rapidly renaturing sequences will be double-stranded, and they will therefore be the only ones able to bind to the column.

5.3.5 The nature of the genetic code DNA encodes the primary sequence of a protein by utilising sets of three nucleotides, termed a codon or triplet, to encode a particular amino acid. The four bases (A, C, G and T) present in DNA allow a possible 64 triplet combinations; however, since there are only 20 naturally occurring amino acids more than one codon may encode an amino acid. This phenomenon is termed the degeneracy of the genetic code. With the exception of a limited number of differences found in mitochondrial DNA and one or

149

5.4 Location and packaging of nucleic acids

Fig. 5.9 The genetic code. Note that the codons in blue represent the start codon (ATG) and the three stop codons.

two other species the genetic code appears to be universal. In addition to coding for amino acids particular triplet sequences also indicate the beginning (Start) and the end (Stop) of a particular gene. Only one start codon exists (ATG) which also codes for the amino acid methionine, whereas three dedicated stop codons are available (TAT, TAG and TGA) (Fig. 5.9). A sequence flanked by a start and a stop codon containing a number of codons that may be read in-frame to represent a continuous protein sequence is termed an open reading frame (ORF).

5.4 LOCATION AND PACKAGING OF NUCLEIC ACIDS 5.4.1 Cellular compartments In general, DNA in eukaryotic cells is confined to the nucleus and organelles such as mitochondria or chloroplasts which contain their own genome. The predominant RNA species are however normally located within the cytoplasm. The genetic information of cells and most viruses is stored in the form of DNA. This information is used to

150

Molecular biology, bioinformatics and basic techniques

Nucleus Plasma membrane

DNA

Small nuclear RNA

tRNA mRNA rRNA

Nucleolus rRNA

Transcription

mRNA tRNA rRNA Mitochondria/Chloroplasts

Ribosomes Translation

DNA tRNA

Proteins

Fig. 5.10 Location of DNA and RNA molecules in eukaryotic cells and the flow of genetic information.

direct the synthesis of RNA molecules, which fall into three classes. Figure 5.10 indicates the locations of nucleic acids in prokaryotic and eukaryotic cells.

• • •

Messenger RNA (mRNA) contains sequences of ribonucleotides which code for the amino acid sequences of proteins. A single mRNA codes for a single polypeptide chain in eukaryotes, but may code for several polypeptides in prokaryotes. Ribosomal RNA (rRNA) forms part of the structure of ribosomes, which are the sites of protein synthesis. Each ribosome contains only three or four different rRNA molecules, complexed with a total of between 55 and 75 proteins. Transfer RNA (tRNA) molecules carry amino acids to the ribosomes, and interact with the mRNA in such a way that their amino acids are joined together in the order specified by the mRNA. There is at least one type of tRNA for each amino acid. In eukaryotic cells alone a further group of RNA molecules termed small nuclear RNA (snRNA) is present which function within the nucleus and promote the maturation of mRNA molecules. All RNA molecules are associated with their respective binding proteins and are essential for their cellular functions. Nucleic acids from prokaryotic cells are less well compartmentalised although they serve similar functions.

5.4.2 The packaging of DNA The DNA in prokaryotic cells resides in the cytoplasm although it is associated with nucleoid proteins, where it is tightly coiled and supercoiled by topoisomerase enzymes to enable it to physically fit into the cell. By contrast eukaryotic cells have

151

5.4 Location and packaging of nucleic acids

A nucleosome Histone proteins H2A H2B H3 H4 each repeated twice and approximately 180 bp DNA

H3 H2A H2B H4 Chromatin

Fig. 5.11 Structure and composition of the nucleosome and chromatin.

many levels of packaging of the DNA within the nucleus involving a variety of DNA binding proteins. First-order packaging involves the winding of the DNA around a core complex of four small proteins repeated twice, termed histones (H2A, H2B, H3 and H4). These are rich in the basic amino acids lysine and arginine and form a barrel-shaped core octomer structure. Approximately 180 bp of DNA is wound twice around the structure which is termed a nucleosome. A further histone protein, H1, is found to associate with the outer surface of the nucleosome. The compacting effect of the nucleosome reduces the length of the DNA by a factor of six. Nucleosomes also associate to form a second order of packaging termed the 30 nm chromatin fibre thus further reducing the length of the DNA by a factor of seven (Fig. 5.11). These structures may be further folded and looped through the interaction with other non-histone proteins and ultimately form chromosome structures. DNA is found closely associated with the nuclear lamina matrix, which forms a protein scaffold within the nucleus. The DNA is attached at certain positions within

152

Molecular biology, bioinformatics and basic techniques

the scaffold, usually coinciding with origins of replication. Many other DNA binding proteins are also present, such as high mobility group (HMG) proteins, which assist in promoting certain DNA conformations during processes such as replication or active gene expression.

5.5 FUNCTIONS OF NUCLEIC ACIDS 5.5.1 DNA replication The double-stranded nature of DNA provides a means of replication during cell division since the separation of two DNA strands allows complementary strands to be synthesised upon them. Many enzymes and accessory proteins are required for in vivo replication, which in prokaryotes begins at a region of the DNA termed the origin of replication. DNA has to be unwound before any of the proteins and enzymes needed for replication can act, and this involves separating the double-helical DNA into single strands. This process is carried out by the enzyme DNA helicase. Furthermore, in order to prevent the single strands from re-annealing small proteins termed single-stranded DNA binding proteins (SSBs) attach to the single DNA strands (Fig. 5.12). On each exposed single strand a short, complementary RNA chain termed a primer is first produced, using the DNA as a template. The primer is synthesised by an RNA polymerase enzyme known as a primase which uses ribonucleoside triphosphates and itself requires no primer to function. Then DNA polymerase III (DNApolIII) also uses the original DNA as a template for synthesis of a DNA strand, using the RNA primer as a starting point. The primer is vital since it leaves an exposed 30 hydroxyl group. This is necessary since DNA polymerase III can only add new nucleotides to the 30 end and not the 50 end of a nucleic acid. Synthesis of the DNA strand therefore occurs only in a 50 to 30 direction from the RNA primer. This DNA strand is usually termed the leading strand and provides the means for continuous DNA synthesis. Since the two strands of double-helical DNA are antiparallel, only one can be synthesised in a continuous fashion. Synthesis of the other strand must take place in a more complex way. The precise mechanism was worked out by Reiji Okazaki in the 1960s. Here the strand, usually termed the lagging strand, is produced in relatively short stretches of 1–2 kb termed Okazaki fragments. This is still in a 50 to 30 direction, using many RNA primers for each individual stretch. Thus, discontinuous synthesis of DNA takes place and allows DNA polymerase III to work in the 50 to 30 direction. The RNA primers are then removed by DNA polI, which has a 50 to 30 exonuclease, and the gaps are filled by the same enzyme acting as a polymerase. The separate fragments are joined together by DNA ligase to give a newly formed strand of DNA on the lagging strand (Fig. 5.13). The replication of eukaryotic DNA is less well characterised, involves multiple origins of replication and is certainly more complex than that of prokaryotes; however, in both cases the process involves 50 to 30 synthesis of new DNA strands. The net result of the replication is that the original DNA is replaced by two molecules, each

5.5 Functions of nucleic acids

Single-stranded DNA binding proteins

Supercoiled double-stranded DNA

5 Single-stranded DNA DNA helicase

Replication fork

DNA helicase

3 Single-stranded DNA Single-stranded binding proteins

Fig. 5.12 Initial events at the replication fork involving DNA unwinding. (a) Direction of DNA replication

153

Origin of replication

3

(b)

(c)

(d)

5

5

Replication fork

5

3

3

5

5 Leading strand

5 5

5

5

Lagging strand

5 3 RNA primers

Newly synthesised DNA strand

Fig. 5.13 DNA replication. (a) Double-stranded DNA separates at the origin of replication. RNA polymerase synthesises short DNA primer strands complementary to both DNA strands. (b) DNA polymerase III synthesises new DNA strands in a 5’ to 3’ direction, complementary to the exposed, old DNA strands, and continuing from the 3’end of each RNA primer. Consequently DNA synthesis is in the same direction as DNA replication for one strand (the leading strand) and in the opposite direction for the other (the lagging strand). RNA primer synthesis occurs repeatedly to allow the synthesis of fragments of the lagging strand. (c) As the replication fork moves away from the origin of replication, DNA polymerase III continues the synthesis of the leading strand, and synthesises DNA between RNA primers of the lagging strand. (d) DNA polymerase I removes RNA primers from the lagging strand and fills the resulting gaps with DNA. DNA ligase then joins the resulting fragments, producing a continuous DNA strand.

containing one ‘old’ and one ‘new’ strand; the process is therefore known as semiconservative replication. The ideas behind DNA synthesis, replication and the enzymes involved in them have been adopted in many molecular biology techniques and form the basis of many manipulations in genetic engineering.

154

Molecular biology, bioinformatics and basic techniques

5.5.2 DNA protection and repair systems Cellular growth and division require the correct and coordinated replication of DNA. Mechanisms that proofread replicated DNA sequences and maintain integrity of those sequences are, however, complex and are only beginning to be elucidated for prokaryotic systems. Bacterial protection is afforded by the use of a restriction modification system based on differential methylation of host DNA, so as to distinguish it from foreign DNA such as viruses. The most common is type II and consists of a host DNA methylase and restriction endonuclease that recognises short (4–6 bp) palindromic sequences and cleaves foreign unmethylated DNA at a particular target sequence. The enzymes involved in this process have been of enormous benefit for the manipulation and analysis of DNA, as indicated in Section 5.9. Repair systems allow the recognition of altered, mispaired or missing bases in double-stranded DNA and invoke an excision repair process. The systems characterised for bacterial systems are based on the length of repairable DNA during either replication (dam system) or in general repair (urr system). In some cases damage to DNA activates a protein termed RecA to produce an SOS response that includes the activation of many enzymes and proteins; however, this has yet to be fully characterised. The recombination–repair systems in eukaryotic cells may share some common features with prokaryotes although the precise mechanism has yet to be established. Defects in DNA repair may result in the stable incorporation of errors into genomic sequences which may underscore several genetic-based diseases.

5.5.3 Transcription of DNA Expression of genes is carried out initially by the process of transcription, whereby a complementary RNA strand is synthesised by an enzyme termed RNA polymerase from a DNA template encoding the gene. Most prokaryotic genes are made up of three regions. At the centre is the sequence which will be copied in the form of RNA, called the structural gene. To the 50 side (upstream) of the strand which will be copied (the plus (þ) strand) lies a region called the promoter, and downstream of the transcription unit is the terminator region. Transcription begins when DNA-dependent RNA polymerase binds to the promoter region and moves along the DNA to the transcription unit. At the start of the transcription unit the polymerase begins to synthesise an RNA molecule complementary to the minus () strand of the DNA, moving along this strand in a 30 to 50 direction, and synthesising RNA in a 50 to 30 direction, using ribonucleoside triphosphates. The RNA will therefore have the same sequence as the þ strand of DNA, apart from the substitution of uracil for thymine. On reaching the stop site in the terminator region, transcription is stopped, and the RNA molecule is released. The numbering of bases in genes is a useful way of identifying key elements. Point or base þ1 is the residue located at the transcription start site; positive numbers denote 30 regions, whilst negative numbers denote 50 regions (Fig. 5.14). In eukaryotes, three different RNA polymerases exist, designated I, II and III. Messenger RNA is synthesised by RNA polymerase II, while RNA polymerase I and

155

5.5 Functions of nucleic acids

Translation start site

Structural gene

Stop site

Transcription start site

Terminator regions

Promoter regions + strand 5

3 5

– strand 3 +1 ‘Upstream’

‘Downstream’

Fig. 5.14 Structure and nomenclature of a typical gene.

III catalyse the synthesis of rRNA (I), tRNA and snRNA (III). Many non-expressed genes tend to have residues that are methylated, usually the C of a GC dinucleotide, and in general active genes tend to be hypomethylated. This is especially prevalent at the 50 flanking regions and is a useful means of discovering and identifying new genes.

5.5.4 Promoter and terminator sequences in DNA Promoters are usually to the 50 end or upstream of the structural gene and have been best characterised in prokaryotes such as Escherichia coli. They comprise two highly conserved sequence elements: the TATA box (consensus sequence ‘TATATT’) which is centred approximately 10 bp upstream from the transcription initiation site (10 in the gene numbering system), and a ‘GC-rich’ sequence which is centred about 25 bp upstream from the TATA box. The GC element is thought to be important in the initial recognition and binding of RNA polymerase to the DNA, while the 10 sequence is involved in the formation of a transcription initiation complex (Fig. 5.15a). The promoter elements serve as recognition sites for DNA binding proteins that control gene expression and these proteins are termed transcription factors or transacting factors. These proteins have a DNA binding domain for interaction with promoters and an activation domain to allow interaction with other transcription factors. A well-studied example of a transcription factor is TFIID which binds to the 35 promoter sequence in eukaryotic cells. Gene regulation occurs in most cases at the level of transcription, and primarily by the rate of transcription initiation, although control may also be by modulation of mRNA stability, or at other levels such as translation. Terminator sequences are less well characterised, but are thought to involve nucleotide sequences near the end of mRNA with the capacity to form a hairpin loop, followed by a run of U residues, which may constitute a termination signal for RNA polymerase. In the case of eukaryotic genes numerous short sequences spanning several hundred bases may be important for transcription, compared to normally less than 100 bp for prokaryotic promoters. Particularly critical is the TATA box sequence, located approximately 35 bp upstream of the transcription initiation point in the majority of genes (Fig. 5.15b). This is analogous to the 10 sequence in prokaryotes. A number

156

Molecular biology, bioinformatics and basic techniques

(a) Promoters 3

5 –35 TTGACA

+1

–10 TATA

Structural gene

(b) Upstream promoter elements

5 Variable distance enhancer

–80 CAAT or G+C-rich

+1

–35 TATA

3 Structural gene

(c) TFII D TFII A TFII B TFII E/F RNA pol II

5 –80 CAAT or G+C-rich

–35 TATA

+1

3 Structural gene

Fig. 5.15 (a) Typical promoter elements found in a prokaryotic cell (e.g. E. coli). (b) Typical promoter elements found in eukaryotic cells. (c) Generalised scheme of binding of transcription factors to the promoter regions of eukaryotic cells. Following the binding of the transcription factors IID, IIA, IIB, IIE and IIF a pre-initiation complex is formed. RNA polymerase II then binds to this complex and begins transcription from the start point þ 1.

of other transcription factors also bind sequentially to form an initiation complex that includes RNA polymerase, subsequent to which transcription is initiated. In addition to the TATA box, a CAT box (consensus GGCCAATCT) is often located at about 80 bp, which is an important determinant of promoter efficiency. Many upstream promoter elements (UPEs) have been described that are either general in their action or tissue (or gene) specific. GC elements that contain the sequence GGGCG may be present at multiple sites and in either orientation and are often associated with housekeeping genes such as those encoding enzymes involved in general metabolism. Some promoter sequence elements, such as the TATA box, are common to most genes, while others may be specific to particular genes or classes of genes. Of particular interest is a class of promoter first investigated in the virus SV40 and termed an enhancer. These sequences are distinguished from other promoter sequences by their unique ability to function over several kilobases either upstream

157

5.5 Functions of nucleic acids

or downstream of a particular gene in an orientation-independent manner. Even at such great distances from the transcription start point they may increase transcription by several hundred-fold. The precise interactions between transcription factors, RNA polymerase or other DNA binding proteins and the DNA sequences they bind to may be identified and characterised by the technique of DNA footprinting (Section 6.8.3). For transcription in eukaryotic cells to proceed a number of transcription factors need to interact with the promoters and with each other. This cascade mechanism is indicated in Fig. 5.15c and is termed a pre-initiation complex. Once this has been formed around the 35 TATA sequence RNA polymerase II is able to transcribe the structural gene and form a complementary RNA copy (Section 5.5.6).

5.5.5 Transcription in prokaryotes Prokaryotic gene organisation differs from that found in eukaryotes in a number of ways. Prokaryotic genes are generally found as continuous coding sequences which are not interrupted. Moreover they are frequently found clustered into operons which contain genes that relate to a particular function such as the metabolism of a substrate or synthesis of a product. This is particularly evident in the best-known operon identified in E. coli termed the lactose operon where three genes lacZ, lacY and lacA share the same promoter and are therefore switched on and off at the same time. In this model the absence of lactose results in a repressor protein binding to an operator region upstream of the Z, Y and A gene and prevents RNA polymerase from transcribing the genes (Fig. 5.16a). However the presence of lactose requires the genes to be transcribed to allow its metabolism. Lactose binds to the repressor protein and causes a conformational change in its structure. This prevents it binding to the operator and allows RNA polymerase to bind and transcribe the three genes (Fig. 5.16b). Transcription and translation in prokaryotes is also closely linked or coupled whereas in eukaryotic cells the two processes are distinct and take place in different cell compartments.

5.5.6 Post-transcriptional processing Transcription of a eukaryotic gene results in the production of a heterogeneous nuclear RNA transcript (hnRNA) which faithfully represents the entire structural gene (Fig. 5.17). Three processing events then take place. The first processing step involves the addition of a methylated guanosine residue (m7Gppp) termed a cap to the 50 end of the hnRNA. This may be a signalling structure or aid in the stability of the molecule (Fig. 5.18). In addition, 150 to 300 adenosine residues termed a poly(A) tail are attached at the 30 end of the hnRNA by the enzyme poly(A) polymerase. The poly(A) tail allows the specific isolation of eukaryotic mRNA from total RNA by affinity chromatography (Section 5.7.2); its presence is thought to confer stability on the transcript. Unlike prokaryotic transcripts those from eukaryotes have their coding sequence (expressed regions or exons) interrupted by non-coding sequence (intervening

158

Molecular biology, bioinformatics and basic techniques

(a) R

P

O

Regulatory gene

Z

Y

-Galactosidase

Promoter

Operator

A Transacetylase

Permease

RNA polymerase blocked by repressor protein binding to operator sequence

mRNA

Repressor protein (b) R

O

P

Z

Y

A

-Galactosidase Permease Transacetylase Cannot bind operator

Inducer–repressor complex

Inducer (lactose)

Fig. 5.16 Lactose operon (a) in a state of repression (no lactose present) and (b) following induction by lactose.

regions or introns). Intron–exon boundaries are generally determined by the sequence GU–AG and need to be removed or spliced before the mature mRNA is formed (Fig. 5.18). The process of intron splicing is mediated by small nuclear RNAs (snRNAs) which exist in the nucleus as ribonuclear protein particles. These are often found in

159

5.5 Functions of nucleic acids

Stop site

Start site

Poly(A) site

Promoter regions

Exon1 Intron1

Exon2

Intron2

Exon3

5

3

3

5 RNA polymerase I I

Transcription Exon1 Intron1 hnRNA

Exon2

Intron2

Exon3

5

3

Fig. 5.17 Transcription of a typical eukaryotic gene to form heterogeneous nuclear RNA.

Exon1

Intron1

Exon2

Intron2

Exon3 3

hnRNA 5

Methylated G residue added to 5 end Me 5 G

AAAAAAA 3 Poly(A) tail added to 3 end

Removal of introns in splicing reaction Me mRNA 5

Exon1

Exon2

Exon3

G

AAAAAAA 3

Fig. 5.18 Post-transcriptional modifications of heterogeneous nuclear RNA.

a large nuclear structure complex termed the spliceosome where splicing takes place. Introns are usually removed in a sequential manner from the 50 to the 30 end and their number varies between different genes. Some eukaryotic genes such as histone genes contain no introns whereas the gene for dystrophin, the gene responsible for muscular dystrophy, contains over 250 introns. In some cases, however, the same hnRNA transcript may be processed in different ways to produce different mRNAs coding for different proteins in a process known as alternative splicing. Thus a sequence that constitutes an exon for one RNA species may be part of an excised intron in another. The particular type or amount of mRNA synthesised from a cell or cell type may be analysed by a variety of molecular biology techniques (Section 6.8.1).

160

Molecular biology, bioinformatics and basic techniques

5.5.7 Translation of mRNA Messenger RNA molecules are read and translated into protein by complex RNA– protein particles termed ribosomes. The ribosomes are termed 70S or 80S depending on their sedimentation coefficient. Prokaryotic cells have 70S ribosomes whilst those of the eukaryotic cytoplasm are 80S. Ribosomes are composed of two subunits that are held apart by ribosomal binding proteins until translation proceeds. There are sites on the ribosome for the binding of one mRNA and two tRNA molecules and the translation process is in three stages.

• • •

Initiation: involving the assembly of the ribosome subunits and the binding of the mRNA. Elongation: where specific amino acids are used to form polypeptides, this being directed by the codon sequence in the mRNA. Termination: which involves the disassembly of the components of translation following the production of a polypeptide. Transfer RNA molecules are also essential for translation. Each of these are covalently linked to a specific amino acid, forming an aminoacyl tRNA, and each has a triplet of bases exposed which is complementary to the codon for that amino acid. This exposed triplet is known as the anticodon, and allows the tRNA to act as an ‘adapter’ molecule, bringing together a codon and its corresponding amino acid. The process of linking an amino acid to its specific tRNA is termed charging and is carried out by the enzyme aminoacyl tRNA synthetase. In prokaryotic cells the ribosome binds to the 50 end of the mRNA at a sequence known as a ribosome binding site or sometimes termed the Shine–Dalgarno sequence after the discoverers of the sequence. In eukaryotes the situation is similar but involves a Kozak sequence located around the initiation codon. Following translation initiation the ribosome moves towards the 30 end of the mRNA, allowing an aminoacyl tRNA molecule to base-pair with each successive codon, thereby carrying in amino acids in the correct order for protein synthesis. There are two sites for tRNA molecules in the ribosome, the A site and the P site, and when these sites are occupied, directed by the sequence of codons in the mRNA, the ribosome allows the formation of a peptide bond between the amino acids. The process is also under the control of an enzyme, peptidyl transferase. When the ribosome encounters a termination codon (UAA, UGA or UAG) a release factor binds to the complex and translation stops, the polypeptide and its corresponding mRNA are released and the ribosome divides into its two subunits (Fig. 5.19). A myriad of accessory initiation and elongation protein factors are involved in this process. In eukaryotic cells the polypeptide may then be subjected to post-translational modifications such as glycosylation and by virtue of specific amino acid signal sequences may be directed to specific cellular compartments or exported from the cell. Since the mRNA base sequence is read in triplets, an error of one or two nucleotides in positioning of the ribosome will result in the synthesis of an incorrect polypeptide. Thus it is essential for the correct reading frame to be used during translation. This is ensured

161

5.5 Functions of nucleic acids

Amino acid tRNA Aminoacyl tRNA Anticodon

4

Growing polypeptide 3

Complete polypeptide released

2

4 1 Ribosome falls off mRNA

1 Ribosome

2

3

(C)

(A)

(B) 3

5

AUG

Codon for 4th amino acid

Direction of translation

Termination codon

Fig. 5.19 Translation. Ribosome A has moved only a short way from the 5’ end of the mRNA, and has built up a dipeptide (on one tRNA) that is about to be transferred onto the third amino acid (still attached to tRNA). Ribosome B has moved much further along the mRNA and has built up an oligopeptide that has just been transferred onto the most recent aminoacyl tRNA. The resulting free tRNA leaves the ribosome and will receive another amino acid. The ribosome moves towards the 3’ end of the mRNA by a distance of three nucleotides, so that the next codon can be aligned with its corresponding aminoacyl tRNA on the ribosome. Ribosome C has reached a termination codon, has released the completed polypeptide, and has fallen off the mRNA.

in prokaryotes by base-pairing between the Shine–Dalgarno sequence (Kozak sequence in eukaryotes) and a complementary sequence of one of the ribosome’s rRNAs, thus establishing the correct starting point for movement of the ribosome along the mRNA. However if a mutation such as a deletion/insertion takes place within the coding sequence it will also cause a shift of the reading frame and result in an aberrant polypeptide. Genetic mutations and polymorphisms are considered in more detail in Section 6.8.6.

5.5.8 Control of protein production – RNA interference There are a number of mechanisms by which protein production is controlled; however the control may be either at the gene level or at the protein level. Typically this could include controlling levels of expression of mRNA, an increase or decrease in mRNA turnover, or controlling mRNA availability for translation. One recently discovered control mechanism that has also been adapted as a molecular biology technique to aid in the modulation of mRNA is termed RNA interference (RNAi). This involves the synthesis of short double-stranded RNA molecules which are cleaved into 21–23 nucleotide-long fragments to form an RNA-induced silencing complex (RISC). This complex potentially uses the short RNA molecules complementary to mRNA transcripts which, following hybridisation, allow an RNase to destroy the bound mRNA. The technique has important implications for medical conditions where, for example, increased levels of specific mRNA molecules in certain cancers and viral infections may be reduced using RNAi.

162

Molecular biology, bioinformatics and basic techniques

5.6 THE MANIPULATION OF NUCLEIC ACIDS – BASIC TOOLS AND TECHNIQUES 5.6.1 Enzymes used in molecular biology The discovery and characterisation of a number of key enzymes has enabled the development of various techniques for the analysis and manipulation of DNA. In particular the enzymes termed type II restriction endonucleases have come to play a key role in all aspects of molecular biology. These enzymes recognise certain DNA sequences, usually 4–6 bp in length, and cleave them in a defined manner. The sequences recognised are palindromic or of an inverted repeat nature. That is they read the same in both directions on each strand. When cleaved they leave a flush-ended or staggered (also termed a cohesive-ended) fragment depending on the particular enzyme used (Fig. 5.20). An important property of staggered ends is that those produced from different molecules by the same enzyme are complementary (or ‘sticky’) and so will anneal to each other. The annealed strands are held together only by hydrogen bonding between complementary bases on opposite strands. Covalent joining of ends on each of the two strands may be brought about by the enzyme DNA ligase (Section 6.2.2). This is widely exploited in molecular biology to enable the construction of recombinant DNA, i.e. the joining of DNA fragments from different sources. Approximately 500 restriction (a) Enzyme

(b)

Recognition sequence

Products

Hpa II

5–CCGG–3 3–GGCC–5

5–C 3–GGC

CGG–3 C–5

Hae III

5–GGCC–3 3–CCGG–5

5–GG 3–CC

CC–3 GG–5

BamHI

5–GGATCC–3 3–CCTAGG–5

5–G 3–CCTAG

GATCC–3 G–5

Hpa I

5–GTTAAC–3 3–CAATTG–5

5–GTT 3–CAA

AAC–3 TTG–5

EcoR I

GAATTC

Hind III

AAGCTT

Pvu II

CAGCTG

BamHI

GGATCC

Fig. 5.20 Recognition sequences of some restriction enzymes showing (a) full descriptions and (b) conventional representations. Arrows indicate positions of cleavage. Note that all the information in (a) can be derived from knowledge of a single strand of the DNA, whereas in (b) only one strand is shown, drawn 5’ to 3’; this is the conventional way of representing restriction sites.

163

5.6 The manipulation of nucleic acids – basic tools and techniques

Table 5.3 Types and examples of typical enzymes used in the manipulation of nucleic acids Enzyme

Specific example

Use in nucleic acid manipulation

DNA pol I

DNA-dependent DNA polymerase 50 !30 !50 exonuclease activity

Klenow

DNA pol I lacks 50 !30 exonuclease activity

T4 DNA pol

Lacks 50 !30 exonuclease activity

Taq DNA pol

Thermostable DNA polymerase used in PCR

Tth DNA pol

Thermostable DNA polymerase with RT activity

T7 DNA pol

Used in DNA sequencing

T7 RNA pol

DNA-dependent RNA polymerase

T3 RNA pol

DNA-dependent RNA polymerase

Qß replicase

RNA-dependent RNA polymerase, used in RNA amplification

DNase I

Non-specific endonuclease that cleaves DNA

Exonuclease III

DNA-dependent 30 !50 stepwise removal of nucleotides

RNase A

RNases used in mapping studies

RNase H

Used in second strand cDNA synthesis

S1 nuclease

Single-strand-specific nuclease

Reverse transcriptase

AMV-RT

RNA-dependent DNA polymerase, used in cDNA synthesis

Transferases

Terminal transferase (TdT)

Adds homopolymer tails to the 30 end of DNA

Ligases

T4 DNA ligase

Links 50 -phosphate and 30 -hydroxyl ends via phosphodiester bond

Kinases

T4 polynucleotide kinase (PNK)

Transfers terminal phosphate groups from ATP to 50 -OH groups

Phosphatases

Alkaline phosphatase

Removes 50 -phosphates from DNA and RNA

Transferases

Terminal transferase

Adds homopolymer tails to the 30 end of DNA

Methylases

EcoRI methylase

Methylates specific residues and protects from cleavage by restriction enzymes

DNA polymerases

RNA polymerases

Nucleases

f f

f

Notes: PCR, polymerase chain reaction; RT, reverse transcriptase; cDNA, complementary DNA; AMV, avian myeloblastosis virus.

164

Molecular biology, bioinformatics and basic techniques

enzymes have been characterised that recognise over 100 different target sequences. A number of these, termed isoschizomers, recognise different target sequences but produce the same staggered ends or overhangs. A number of other enzymes have proved to be of value in the manipulation of DNA, as summarised in Table 5.3, and are indicated at appropriate points within the text.

5.7 ISOLATION AND SEPARATION OF NUCLEIC ACIDS 5.7.1 Isolation of DNA The use of DNA for analysis or manipulation usually requires that it is isolated and purified to a certain extent. DNA is recovered from cells by the gentlest possible method of cell rupture to prevent the DNA from fragmenting by mechanical shearing. This is usually in the presence of EDTA which chelates the Mg2þ ions needed for enzymes that degrade DNA termed DNase. Ideally, cell walls, if present, should be digested enzymatically (e.g. lysozyme treatment of bacteria), and the cell membrane should be solubilised using detergent. If physical disruption is necessary, it should be kept to a minimum, and should involve cutting or squashing of cells, rather than the use of shear forces. Cell disruption (and most subsequent steps) should be performed at 4  C, using glassware and solutions that have been autoclaved to destroy DNase activity. After release of nucleic acids from the cells, RNA can be removed by treatment with ribonuclease (RNase) that has been heat-treated to inactivate any DNase contaminants; RNase is relatively stable to heat as a result of its disulphide bonds, which ensure rapid renaturation of the molecule on cooling. The other major contaminant, protein, is removed by shaking the solution gently with water-saturated phenol, or with a phenol/chloroform mixture, either of which will denature proteins but not nucleic acids. Centrifugation of the emulsion formed by this mixing produces a lower, organic phase, separated from the upper, aqueous phase by an interface of denatured protein. The aqueous solution is recovered and deproteinised repeatedly, until no more material is seen at the interface. Finally, the deproteinised DNA preparation is mixed with two volumes of absolute ethanol, and the DNA allowed to precipitate out of solution in a freezer. After centrifugation, the DNA pellet is redissolved in a buffer containing EDTA to inactivate any DNases present. This solution can be stored at 4  C for at least a month. DNA solutions can be stored frozen although repeated freezing and thawing tends to damage long DNA molecules by shearing. The procedure described above is suitable for total cellular DNA. If the DNA from a specific organelle or viral particle is needed, it is best to isolate the organelle or virus before extracting its DNA, since the recovery of a particular type of DNA from a mixture is usually rather difficult. Where a high degree of purity is required DNA may be subjected to density gradient ultracentrifugation through caesium chloride which is particularly useful for the preparation of plasmid DNA. A flow chart of DNA extraction is indicated in Fig. 5.21.

165

5.7 Isolation and separation of nucleic acids

Homogenise Cells/Tissues 4°C/sterile equipment

Cellular Lysis Detergent/Lysozyme

Chelating Agents EDTA/Citrate

Proteinase Agents Proteinase K

Phenol Extraction Phenol/Chloroform

Alcohol Precipitation 70%/100% Ethanol

Redissolve DNA TE Buffer (Tris-EDTA)

Fig. 5.21 General steps involved in extracting DNA from cells or tissues.

It is possible to check the integrity of the DNA by agarose gel electrophoresis and determine the concentration of the DNA by using the fact that 1 absorbance unit equates to 50 mg ml1 of DNA and so: 50A260 ¼ concentration of DNA sample ðmg ml1 Þ Contaminants may also be identified by scanning UV spectrophotometry from 200 nm to 300 nm. A ratio of 260 nm : 280 nm of approximately 1.8 indicates that the sample is free of protein contamination, which absorbs strongly at 280 nm.

5.7.2 Isolation of RNA The methods used for RNA isolation are very similar to those described above for DNA; however, RNA molecules are relatively short, and therefore less easily damaged by shearing, so cell disruption can be rather more vigorous. RNA is, however, very vulnerable to digestion by RNases which are present endogenously in various concentrations in certain cell types and exogenously on fingers. Gloves should therefore

166

Molecular biology, bioinformatics and basic techniques

Treat Reagents Treat with RNase inhibitors e.g. diethylpyrocarbonate (DEPC)

Homogenise Cells/Tissues 4°C/treated reagents

Cellular Lysis Detergent/Lysozyme

RNA solvents Guanadinium thiocyanate

Proteinase Agents Proteinase K

Phenol Extraction Phenol/Chloroform

Alcohol Precipitation 70%/100% Ethanol

Redissolve RNA

Fig. 5.22 General steps involved in extracting RNA from cells or tissues.

be worn, and a strong detergent should be included in the isolation medium to immediately denature any RNases. Subsequent deproteinisation should be particularly rigorous, since RNA is often tightly associated with proteins. DNase treatment can be used to remove DNA, and RNA can be precipitated by ethanol. One reagent in particular which is commonly used in RNA extraction is guanadinium thiocyanate which is both a strong inhibitor of RNase and a protein denaturant. A flow chart of RNA extraction is indicated in Fig. 5.22. It is possible to check the integrity of an RNA extract by analysing it by agarose gel electrophoresis. The most abundant RNA species, the rRNA molecules 23S and 16S for prokaryotes and 18S and 28S for eukaryotes, appear as discrete bands on the agarose gel and thus indicate that the other RNA components are likely to be intact. This is usually carried out under denaturing conditions to prevent secondary structure formation in the RNA. The concentration of the RNA may be estimated by using UV spectrophotometry. At 260 nm 1 absorbance unit equates to 40 mg ml1 of RNA and therefore: 40A260 ¼ concentration of DNA sample ðmg ml1 Þ

167

5.7 Isolation and separation of nucleic acids

Cellular mRNA (heterogeneous size transcripts) AAAAAAA AAAAAAA AAAAAAA

Poly(dT) affinity column AAAAA TTTTT

TTTT TTTTTT

TTTTTT AAAAAAA TTTT

Poly(A)+ RNA binds to poly(dT)

AAAAAAA TTTTT TTTT

TTTT

Non-poly(A)+ RNA and DNA are washed through column in high salt concentrations

Poly(A) + RNA is eluted by changing to low salt concentrations

Fig. 5.23 Affinity chromatography of poly(A)þRNA.

Contaminants may also be identified in the same way as that for DNA by scanning UV spectrophotometry; however, in the case of RNA a 260 nm : 280 nm ratio of approximately 2 would be expected for a sample containing no protein (Section 5.8.1). In many cases it is desirable to isolate eukaryotic mRNA which constitutes only 2–5% of cellular RNA from a mixture of total RNA molecules. This may be carried out by affinity chromatography on oligo(dT)-cellulose columns. At high salt concentrations, the mRNA containing poly(A) tails binds to the complementary oligo(dT) molecules of the affinity column, and so mRNA will be retained; all other RNA molecules can be washed through the column by further high salt solution. Finally, the bound mRNA can be eluted using a low concentration of salt (Fig. 5.23). Nucleic acid species may also be subfractionated by more physical means such as electrophoretic or chromatographic separations based on differences in nucleic acid fragment sizes or physicochemical characteristics. Nanodrop spectrophotometer systems have also aided the analysis of nucleic acids in recent years in allowing the full spectrum of information whilst requiring only a very small (microlitre) sample volume.

168

Molecular biology, bioinformatics and basic techniques

5.7.3 Automated and kit-based extraction of nucleic acids Most of the current reagents used in molecular biology and the most common techniques can now be found in kit form or can be automated, and the extraction of nucleic acids by these means is no exception. The advantage of their use lies in the fact that the reagents are standardised and quality control tested providing a high degree of reliability. For example glass bead preparations for DNA purification have been used increasingly and with reliable results. Small compact column-type preparations such as QIAGEN columns are also used extensively in research and in routine DNA analysis. Essentially the same reagents for nucleic acid extraction may be used in a format that allows reliable and automated extraction. This is of particular use where a large number of DNA extractions are required. There are also many kit-based extraction methods for RNA; these in particular have overcome some of the problems of RNA extraction such as RNase contamination. A number of fully automated nucleic acid extraction machines are now employed in areas where high throughput is required, e.g. clinical diagnostic laboratories. Here the raw samples such as blood specimens are placed in 96- or 384-well microtitre plates and these follow a set computer-controlled processing pattern carried out robotically. Thus the samples are rapidly manipulated and extracted in approximately 45 min without any manual operations being undertaken.

5.7.4 Electrophoresis of nucleic acids Electrophoresis in agarose or polyacrylamide gels is the most usual way to separate DNA molecules according to size. The technique can be used analytically or preparatively, and can be qualitative or quantitative. Large fragments of DNA such as chromosomes may also be separated by a modification of electrophoresis termed pulsed field gel electrophoresis (PFGE). The easiest and most widely applicable method is electrophoresis in horizontal agarose gels, followed by staining with ethidium bromide. This dye binds to DNA by insertion between stacked base pairs (intercalation), and it exhibits a strong orange/red fluorescence when illuminated with ultraviolet light (Fig. 5.24). Very often electrophoresis is used to check the purity and intactness of a DNA preparation or to assess the extent of an enzymatic reaction during for example the steps involved in the cloning of DNA. For such checks ‘minigels’ are particularly convenient, since they need little preparation, use small samples and give results quickly. Agarose gels can be used to separate molecules larger than about 100 bp. For higher resolution or for the effective separation of shorter DNA molecules polyacrylamide gels are the preferred method. When electrophoresis is used preparatively, the piece of gel containing the desired DNA fragment is physically removed with a scalpel. The DNA may be recovered from the gel fragment in various ways. This may include crushing with a glass rod in a small volume of buffer, using agarase to digest the agarose leaving the DNA, or by the process of electroelution. In this method the piece of gel is sealed in a length of dialysis tubing containing buffer, and is then placed between two electrodes in a tank containing more buffer. Passage of an electrical current between the electrodes causes

169

5.7 Isolation and separation of nucleic acids

NH2 Ethidium bromide intercalates between the planer rings of the DNA double helix. Under ultraviolet irradiation the intercalating ethidium bromide fluoresces and the DNA becomes visible

Br– N+ H2N

C2H5

A photograph of an agarose gel stained with ethidium bromide and illuminated with UV irradiation showing discrete DNA bands

Fig. 5.24 The use of ethidium bromide to detect DNA.

DNA to migrate out of the gel piece, but it remains trapped within the dialysis tubing, and can therefore be recovered easily.

5.7.5 Automated analysis of nucleic acid fragments Gel electrophoresis remains the established method for the separation and analysis of nucleic acids. However a number of automated systems using pre-cast gels and standardised reagents are available that are now very popular. This is especially useful in situations where a large number of samples or high-throughput analysis is required. In addition technologies such as the Agilents’ Lab-on-a-chip have been developed that obviate the need to prepare electrophoretic gels. These employ microfluidic circuits constructed on small cassette units that contain interconnected micro-reservoirs. The sample is applied in one area and driven through microchannels under computercontrolled electrophoresis. The channels lead to reservoirs allowing, for example, incubation with other reagents such as dyes for a specified time. Electrophoretic separation is thus carried out in a microscale format. The small sample size minimises sample and reagent consumption and the units, being computer controlled, allow data

170

Molecular biology, bioinformatics and basic techniques

to be captured within a very short timescale. More recently alternative methods of analysis including high performance liquid chromatography based approaches have gained in popularity, especially for DNA mutation analysis. Mass spectrometry is also becoming increasingly used for nucleic acid analysis.

5.8 MOLECULAR BIOLOGY AND BIOINFORMATICS 5.8.1 Basic bioinformatics Bioinformatics is now an established and vital resource for molecular biology research and is also a mainstay of routine analysis of DNA. This increase in use of bioinformatics has been driven by the increase in genetic sequence information and the need to store, analyse and manipulate the data. There are now a huge number of sequences stored in genetic databases from a variety of organisms, including the human genome. Indeed the genetic information from various organisms is now an indispensable starting point for molecular biology research. The main primary databases include GenBank at the National Institutes of Health (NIH) in the USA, EMBL at the European Bioinformatics Institute (EBI) at Cambridge, UK and the DNA Database of Japan (DDBJ) at Mishima in Japan. These databases contain the nucleotide sequences which are annotated to allow easy identification. There are also many other databases such as secondary databases that contain information relating to sequence motifs, such as core sequences found in cytochrome P450 domains, or DNA-binding domains. Importantly all of the databases may be freely accessed over the internet. A number of these important databases and internet resources are listed in Table 5.4. Consequently the new expanding and exciting areas of bioscience research are those that analyse genome and cDNA sequence databases (genomics) and also their protein counterparts (proteomics). This is sometimes referred to as in silico research.

5.8.2 Analysing information using bioinformatics One of the most useful bioinformatics resources is termed BLAST (Basic Local Alignment Search Tool) located at the NCBI (www.ncbi.nlm.nih.gov). This allows a DNA sequence to be submitted via the internet in order to compare it to all the sequences contained within a DNA database. This is very useful since it is possible once a nucleotide sequence has been deduced by, for example, Sanger sequencing, to identify sequences of similarity. Indeed if human sequences are used and have already been mapped it is possible to locate their position to a particular chromosome using NCBI Map Viewer. Further resources such as ORF (open reading frame) finder allow a search to be undertaken for open reading frames, e.g. sequences beginning with a start codon (ATG) and continuing with a significant number of ‘coding’ triplets before a stop codon is reached. There are a number of other sequences that may be used to define coding sequences; these include ribosome binding sites, splice site junctions, poly(A) polymerase sequences and promoter sequences that lie outside the coding

171

5.9 Molecular analysis of nucleic acid sequences

Table 5.4 Nucleic acid and protein database resources available on the World Wide Web Database or resource

URL (uniform resource locator)

General DNA sequence databases EMBL

European Bioinformatics Institute



GenBank

US genetic database resource



DDBJ

Japanese genetic database



Protein sequence databases Swiss-Prot

European protein sequence database



UniProt TREMBL

European protein sequence database



Protein structure databases PDB

Protein structure database



Genome project databases Human Genome Database, USA



dbEST

cDNA and partial sequences



Ge´ne´thon

Genetic maps based on repeat markers



regions. A number of bioinformatics resources such as GRAIL can be used to identify such features in a DNA sequence.

5.9 MOLECULAR ANALYSIS OF NUCLEIC ACID SEQUENCES 5.9.1 Restriction mapping of DNA fragments Restriction mapping involves the size analysis of restriction fragments produced by several restriction enzymes individually and in combination (Section 5.6.1). The principle of this mapping is illustrated in Fig. 5.25, in which the restriction sites of two enzymes, A and B, are being mapped. Cleavage with A gives fragments 2 and 7 kb from a 9 kb molecule, hence we can position the single A site 2 kb from one end. Similarly, B gives fragments 3 and 6 kb, so it has a single site 3 kb from one end; but it is not possible at this stage to say if it is near to A’s site, or at the opposite end of the DNA. This can be resolved by a double digestion. If the resultant fragments are 2, 3 and 4 kb, then A and B cut at opposite ends of the molecule; if they are 1, 2 and 6 kb, the sites are near each other. Not surprisingly, the mapping of real molecules is rarely

172

Molecular biology, bioinformatics and basic techniques

Treatment No digestion

Measured sizes of fragments (kb)

Interpretation 9

9 A

Enzyme A

2+7 2

7 A B

Enzyme B

3+6

EITHER 3

6 B

A OR 6

3

A Enzymes A + B

2, 3 + 4

2

B 3

4 A B

alternative result1, 2 + 6

2

1

6

Fig. 5.25 Restriction mapping of DNA. Note that each experimental result and its interpretation should be considered in sequence, thus building up an increasingly unambiguous map.

as simple as this, and bioinformatic analysis of the restriction fragment lengths is usually needed to construct a map.

5.9.2 Nucleic acid blotting methods Electrophoresis of DNA restriction fragments allows separation based on size to be carried out, however it provides no indication as to the presence of a specific, desired fragment among the complex sample. This can be achieved by transferring the DNA from the intact gel onto a piece of nitrocellulose or nylon membrane placed in contact with it. This provides a more permanent record of the sample since DNA begins to diffuse out of a gel that is left for a few hours. First the gel is soaked in alkali to render the DNA single stranded. It is then transferred to the membrane so that the DNA becomes bound to it in exactly the same pattern as that originally on the gel. This transfer, named a Southern blot after its inventor Ed Southern, can be performed electrophoretically or by drawing large volumes of buffer through both gel and membrane, thus transferring DNA from one to the other by capillary action (Fig. 5.26). The point of this operation is that the membrane can now be treated with a labelled DNA molecule, for example a gene probe (Section 5.9.4). This single-stranded DNA probe will hybridise under the right conditions to complementary fragments immobilised onto the membrane. The conditions of hybridisation, including the temperature and salt concentration, are critical for this process to take place effectively. This is usually referred to as

173

5.9 Molecular analysis of nucleic acid sequences

Nylon or nitrocellulose membrane

WEIGHT Absorbent tissue Chromatography paper Gel Buffer

Chromatography paper

Fig. 5.26 Southern blot apparatus.

the stringency of the hybridisation and it is particular for each individual gene probe and for each sample of DNA. A series of washing steps with buffer is then carried out to remove any unbound probe and the membrane is developed after which the precise location of the probe and its target may be visualised. It is also possible to analyse DNA from different species or organisms by blotting the DNA and then using a gene probe representing a protein or enzyme from one of the organisms. In this way it is possible to search for related genes in different species. This technique is generally termed zoo blotting. The same basic process of nucleic acid blotting can be used to transfer RNA from gels onto similar membranes. This allows the identification of specific mRNA sequences of a defined length by hybridisation to a labelled gene probe and is known as Northern blotting. It is possible with this technique to not only detect specific mRNA molecules but it may also be used to quantify the relative amounts of the specific mRNA. It is usual to separate the mRNA transcripts by gel electrophoresis under denaturing conditions since this improves resolution and allows a more accurate estimation of the sizes of the transcripts (Section 5.7.2). The format of the blotting may be altered from transfer from a gel to direct application to slots on a specific blotting apparatus containing the nylon membrane. This is termed slot or dot blotting and provides a convenient means of measuring the abundance of specific mRNA transcripts without the need for gel electrophoresis; it does not, however, provide information regarding the size of the fragments.

5.9.3 Design and production of gene probes The availability of a gene probe is essential in many molecular biology techniques yet in many cases is one of the most difficult steps. The information needed to produce a gene probe may come from many sources; however, the availability of bioinformatics resources and genetic databases has ensured that this is the usual starting point for gene probe design. In some cases it is possible to use related genes, that is from the same gene family, to gain information on the most useful DNA sequence to use as a probe. Similar proteins or DNA sequences but from different species may also provide a starting

174

Molecular biology, bioinformatics and basic techniques

Polypeptide Corresponding nucleotide sequences

5

Phe

Met

T TTC

ATC

Pro T CCC A G

Trp

His

TGG

T CAC

3

Fig. 5.27 Oligonucleotide probes. Note that only methionine and tryptophan have unique codons. It is impossible to predict which of the indicated codons for phenylalanine, proline and histidine will be present in the gene to be probed, so all possible combinations must be synthesised (16 in the example shown).

point with which to produce a so-called heterologous gene probe. Although in some cases probes are already produced and cloned it is possible, armed with a DNA sequence from a DNA database, to chemically synthesise a single-stranded oligonucleotide probe. This is usually undertaken by computer-controlled gene synthesisers which link dNTPs (deoxyribonucleoside triphosphates) together based on a desired sequence. It is essential to carry out certain checks before probe production to determine that the probe is unique, is not able to self-anneal or that it is self-complementary, all of which may compromise its use. Where little DNA information is available to prepare a gene probe it is possible in some cases to use the knowledge gained from analysis of the corresponding protein. Thus it is possible to isolate and purify proteins and sequence part of the N-terminal end or an internal region of the protein. From our knowledge of the genetic code, it is possible to predict the various DNA sequences that could code for the protein, and then synthesise appropriate oligonucleotide sequences chemically. Due to the degeneracy of the genetic code most amino acids are coded for by more than one codon, therefore there will be more than one possible nucleotide sequence that could code for a given polypeptide (Fig. 5.27). The longer the polypeptide, the greater the number of possible oligonucleotides that must be synthesised. Fortunately, there is no need to synthesise a sequence longer than about 20 bases, since this should hybridise efficiently with any complementary sequences, and should be specific for one gene. Ideally, a section of the protein should be chosen which contains as many tryptophan and methionine residues as possible, since these have unique codons, and there will therefore be fewer possible base sequences that could code for that part of the protein. The synthetic oligonucleotides can then be used as probes in a number of molecular biology methods.

5.9.4 Labelling DNA gene probe molecules An essential feature of a gene probe is that it can be visualised or labelled by some means. This allows any complementary sequence that the probe binds to be flagged up or identified. There are two main types of label used for gene probes: traditionally this has been carried out using radioactive labels, but gaining in popularity are non-radioactive labels.

175

5.9 Molecular analysis of nucleic acid sequences

Perhaps the most common radioactive label is 32-phosphorus (32P), although for certain techniques 35-sulphur (35S) and tritium (3H) are used. These may be detected by the process of autoradiography where the labelled probe molecule, bound to sample DNA, located for example on a nylon membrane, is placed in contact with an X-ray-sensitive film. Following exposure the film is developed and fixed just as a black-and-white negative. The exposed film reveals the precise location of the labelled probe and therefore the DNA to which it has hybridised. Non-radioactive labels are increasingly being used to label DNA gene probes. Until recently radioactive labels were more sensitive than their non-radioactive counterparts. However, recent developments have led to similar sensitivities which, when combined with their improved safety, have led to their greater acceptance. The labelling systems are either termed direct or indirect. Direct labelling allows an enzyme reporter such as alkaline phosphatase to be coupled directly to the DNA. Although this may alter the characteristics of the DNA gene probe it offers the advantage of rapid analysis since no intermediate steps are needed. However indirect labelling is at present more popular. This relies on the incorporation of a nucleotide which has a label attached. At present three of the main labels in use are biotin, fluorescein and digoxygenin. These molecules are covalently linked to nucleotides using a carbon spacer arm of 7, 14 or 21 atoms. Specific binding proteins may then be used as a bridge between the nucleotide and a reporter protein such as an enzyme. For example, biotin incorporated into a DNA fragment is recognised with a very high affinity by the protein streptavidin. This may either be coupled or conjugated to a reporter enzyme molecule such as alkaline phosphatase. This is able to convert a colourless substrate p-nitrophenol phosphate (PNPP) into a yellow-coloured compound p-nitrophenol (PNP) and also offers a means of signal amplification. Alternatively labels such as digoxygenin incorporated into DNA sequences may be detected by monoclonal antibodies, again conjugated to reporter molecules such as alkaline phosphatase. Thus rather than the detection system relying on autoradiography which is necessary for radiolabels, a series of reactions resulting in the products of either a colour, light or the product of a chemiluminescence reaction take place. This has important practical implications since autoradiography may take 1–3 days whereas colour and chemiluminescent reactions take minutes.

5.9.5 End labelling of DNA molecules The simplest form of labelling DNA is by 50 or 30 end-labelling. 50 end labelling involves a phosphate transfer or exchange reaction where the 50 phosphate of the DNA to be used as the probe is removed and in its place a labelled phosphate, usually 32P, is added. This is usually carried out by using two enzymes; the first, alkaline phosphatase, is used to remove the existing phosphate group from the DNA. Following removal of the released phosphate from the DNA, a second enzyme, polynucleotide kinase, is added which catalyses the transfer of a phosphate group (32P-labelled) to the 50 end of the DNA. The newly labelled probe is then purified, usually by chromatography through a Sephadex column, and may be used directly (Fig. 5.28).

176

Molecular biology, bioinformatics and basic techniques

Purify gene probe fragment or synthesise oligonucleotide

5 P

3

5

3

Alkaline phosphatase treatment of probe to remove 5-phosphate

P  5

Polynucleotide kinase transfers phosphate group from donor to 5 end of probe

5 end of probe is radiolabelled and gene probe is purified

5

P

P

dATP 3

P

3

Fig. 5.28 End-labelling of a gene probe at the 5’ end with alkaline phosphatase and polynucleotide kinase.

5

Synthesise oligonucleotide or purify gene probe fragment

3 dNTP

P

5

3

5

N

Transfer labelled dNTP to the 3 end using terminal transferase

P

3

3 end of probe is radiolabelled and gene probe is purified

Fig. 5.29 End-labelling of a gene probe at the 3’ end using terminal transferase. Note that the addition of a labelled dNTP at the 3’ end alters the sequence of the gene probe.

Using the other end of the DNA molecule, the 30 end, is slightly less complex. Here a new dNTP which is labelled (e.g. 32P-adATP or biotin-labelled dNTP) is added to the 30 end of the DNA by the enzyme terminal transferase. Although this is a simpler reaction a potential problem exists because a new nucleotide is added to the existing sequence and so the complete sequence of the DNA is altered which may affect its hybridisation to its target sequence. End-labelling methods also suffer from the fact that only one label is added to the DNA so they are of a lower activity in comparison to methods which incorporate label along the length of the DNA (Fig. 5.29).

5.9.6 Random primer labelling and nick translation The DNA to be labelled is first denatured and then placed under renaturing conditions in the presence of a mixture of many different random sequences of hexamers or hexanucleotides. These hexamers will, by chance, bind to the DNA sample wherever they encounter a complementary sequence and so the DNA will rapidly acquire an approximately random sprinkling of hexanucleotides annealed to it. Each of the hexamers can act as a primer for the synthesis of a fresh strand of DNA catalysed by DNA polymerase since it has an exposed 30 hydroxyl group. The Klenow fragment of DNA polymerase is used for random primer labelling because it lacks a 50 to

177

5.9 Molecular analysis of nucleic acid sequences

Single-stranded DNA probe

Anneal random primers to gene probe Random primer 3 3

5

3

5 5

5

3

5 3

DNA polymerase (Klenow) and dNTPs, one of which is labelled Labelled dNTP

3

5

5

3 Double-stranded labelled gene probe

Fig. 5.30 Random primer gene probe labelling. Random primers are incorporated and used as a start point for Klenow DNA polymerase to synthesise a complementary strand of DNA whilst incorporating a labelled dNTP at complementary sites.

30 exonuclease activity. This is prepared by cleavage of DNA polymerase with subtilisin, giving a large enzyme fragment which has no 50 to 30 exonuclease activity, but which still acts as a 50 to 30 polymerase. Thus when the Klenow enzyme is mixed with the annealed DNA sample in the presence of dNTPs, including at least one which is labelled, many short stretches of labelled DNA will be generated (Fig. 5.30). In a similar way to random primer labelling the polymerase chain reaction may also be used to incorporate radioactive or non-radioactive labels (Section 5.11.4). A further traditional method of labelling DNA is by the process of nick translation. Low concentrations of DNase I are used to make occasional single-strand nicks in the double-stranded DNA that is to be used as the gene probe. DNA polymerase then fills in the nicks, using an appropriate dNTP, at the same time making a new nick to the 30 side of the previous one (Fig. 5.31). In this way the nick is translated along the DNA. If labelled dNTPs are added to the reaction mixture, they will be used to fill in the nicks, and so the DNA can be labelled to a very high specific activity.

5.9.7 Molecular-beacon-based probes A more recent development in the design of labelled oligonucleotide hybridisation probes is that of molecular beacons. These probes contain a fluorophore at one end of the probe

178

Molecular biology, bioinformatics and basic techniques

5

G C G T A A G

3

3

C G C A T T C

5

5

G

G T A A G

3

3

C G C A T T C

5

One strand is nicked and nucleotide removed by DNase I

Gap filled by labelled nucleotide and next nucleotide removed by DNA polymerase I

dCTP

5

G C

T A A G

3

3

C G C A T T C

5 dGTP

Nick moves from 5 to 3 5

G C G

A A G

3

3

C G C A T T C

5 dTTP

5

G C G T

A G

3

3

C G C A T T C

5

Fig. 5.31 Nick translation. The removal of nucleotides and their subsequent replacement with labelled nucleotides by DNA polymerase I increase the label in the gene probe as nick translation proceeds.

and a quencher molecule at the other. The oligonucleotide has a stem–loop structure where the stems place the fluorophore and quencher in close proximity. The loop structure is designed to be complementary to the target sequence. When the stem–loop structure is formed the fluorophore is quenched by Fo¨rster or fluorescence resonance energy transfer (FRET), i.e. the energy is transferred from the fluorophore to the quencher and given off as heat. The elegance of these types of probe lies in the fact that upon hybridisation to a target sequence the stem and loop move apart, the quenching is then lost and emission of light occurs from the fluorophore upon excitation. These types of probe have also been used to detect nucleic acid amplification system products such as the polymerase chain reaction (PCR) and have the advantage that it is unnecessary to remove the unhybridised probes.

5.10 THE POLYMERASE CHAIN REACTION (PCR) 5.10.1 Basic concept of the PCR The polymerase chain reaction or PCR is one of the mainstays of molecular biology. One of the reasons for the wide adoption of the PCR is the elegant simplicity of the

179

5.10 The polymerase chain reaction (PCR)

Complex genomic ‘template’ DNA

Region to be amplified ‘target’ DNA expanded view of DNA region

5

3

3

5

PCR primers designed to each DNA strand that flanks region to be amplified 5

3 3

Primer 2 5

3

5 Primer 1 5

3

Primers are complementary to existing sequences necessitating that some flanking sequence information is known

Fig. 5.32 The location of polymerase chain reaction (PCR) primers. PCR primers designed for sequences adjacent to the region to be amplified allow a region of DNA (e.g. a gene) to be amplified from a complex starting material of genomic template DNA.

reaction and relative ease of the practical manipulation steps. Indeed combined with the relevant bioinformatics resources for its design and for determination of the required experimental conditions it provides a rapid means for DNA identification and analysis. It has opened up the investigation of cellular and molecular processes to those outside the field of molecular biology. The PCR is used to amplify a precise fragment of DNA from a complex mixture of starting material usually termed the template DNA and in many cases requires little DNA purification. It does require the knowledge of some DNA sequence information which flanks the fragment of DNA to be amplified (target DNA). From this information two oligonucleotide primers may be chemically synthesised each complementary to a stretch of DNA to the 30 side of the target DNA, one oligonucleotide for each of the two DNA strands (Fig. 5.32). It may be thought of as a technique

180

Molecular biology, bioinformatics and basic techniques

Denaturation

ds DNA denatured by heating to > 94 °C

Annealing

Extension

1 PCR Cycle

Taq polymerase extends target sequences

Oligo primers bind to target sequences

Fig. 5.33 A simplified scheme of one PCR cycle that involves denaturation, annealing and extension. ds, double-stranded.

analogous to the DNA replication process that takes place in cells since the outcome is the same: the generation of new complementary DNA stretches based upon the existing ones. It is also a technique that has replaced, in many cases, the traditional DNA cloning methods since it fulfils the same function, the production of large amounts of DNA from limited starting material; however, this is achieved in a fraction of the time needed to clone a DNA fragment (Chapter 6). Although not without its drawbacks the PCR is a remarkable development which is changing the approach of many scientists to the analysis of nucleic acids and continues to have a profound impact on core biosciences and biotechnology.

5.10.2 Stages in the PCR The PCR consists of three defined sets of times and temperatures termed steps: (i) denaturation, (ii) annealing and (iii) extension. Each of these steps is repeated 30–40 times, termed cycles (Fig. 5.33). In the first cycle the double-stranded template DNA is (i) denatured by heating the reaction to above 90  C. Within the complex DNA the region to be specifically amplified (target) is made accessible. The temperature is then cooled to 40–60  C. The precise temperature is critical and each PCR system has to be defined and optimised. One useful technique for optimisation is touchdown PCR where a programmable cycler is used to incrementally decrease the annealing temperature until the optimum is derived. Reactions that are not optimised may give rise to other DNA products in addition to the specific target or may not produce any

181

5.10 The polymerase chain reaction (PCR)

amplified products at all. The annealing step allows the hybridisation of the two oligonucleotide primers, which are present in excess, to bind to their complementary sites that flank the target DNA. The annealed oligonucleotides act as primers for DNA synthesis, since they provide a free 30 hydroxyl group for DNA polymerase. The DNA synthesis step is termed extension and is carried out by a thermostable DNA polymerase, most commonly Taq DNA polymerase. DNA synthesis proceeds from both of the primers until the new strands have been extended along and beyond the target DNA to be amplified. It is important to note that, since the new strands extend beyond the target DNA, they will contain a region near their 30 ends that is complementary to the other primer. Thus, if another round of DNA synthesis is allowed to take place, not only the original strands will be used as templates but also the new strands. Most interestingly, the products obtained from the new strands will have a precise length, delimited exactly by the two regions complementary to the primers. As the system is taken through successive cycles of denaturation, annealing and extension all the new strands will act as templates and so there will be an exponential increase in the amount of DNA produced. The net effect is to selectively amplify the target DNA and the primer regions flanking it (Fig. 5.34). One problem with early PCR reactions was that the temperature needed to denature the DNA also denatured the DNA polymerase. However the availability of a thermostable DNA polymerase enzyme isolated from the thermophilic bacterium Thermus aquaticus found in hot springs provided the means to automate the reaction. Taq DNA polymerase has a temperature optimum of 72  C and survives prolonged exposure to temperatures as high as 96  C and so is still active after each of the denaturation steps. The widespread utility of the technique is also due to the ability to automate the reaction and as such many thermal cyclers have been produced in which it is possible to program in the temperatures and times for a particular PCR reaction.

5.10.3 PCR primer design and bioinformatics The specificity of the PCR lies in the design of the two oligonucleotide primers. These have to not only be complementary to sequences flanking the target DNA but also must not be self-complementary or bind each other to form dimers since both prevent DNA amplification. They also have to be matched in their GC content and have similar annealing temperatures. The increasing use of bioinformatics resources such as Oligo, Generunner and Genefisher in the design of primers makes the design and the selection of reaction conditions much more straightforward. These resources allow the sequences to be amplified, primer length, product size, GC content, etc. to be input and, following analysis, provide a choice of matched primer sequences. Indeed the initial selection and design of primers without the aid of bioinformatics would now be unnecessarily time-consuming. It is also possible to design primers with additional sequences at their 50 end such as restriction endonuclease target sites or promoter sequences. However modifications such as these require that the annealing conditions be altered to compensate for the areas of non-homology in the primers. A number of PCR methods have been developed where either one of the primers or both are random. This gives rise to

182

Molecular biology, bioinformatics and basic techniques

Cycle 1 5

3 5

3

5

3 5

3

3

5

3

5

3

5 3 5

5

3

3

5 Cycle 2

5

3

5

5

3

5

3

5

3

3

3

5 Cycle 3

5

3

5

5

5

5 3

3

5

3

5

3

5

3

5

3

3

3

3 5

Fig. 5.34 Three cycles in the PCR. As the number of cycles in the PCR increases, the DNA strands that are synthesised and become available as templates are delimited by the ends of the primers. Thus specific amplification of the desired target sequence flanked by the primers is achieved. Primers are denoted as 5’ to 3’.

183

5.10 The polymerase chain reaction (PCR)

arbitrary priming in genomic templates but interestingly may give rise to discrete banding patterns when analysed by gel electrophoresis. In many cases this technique may be used reproducibly to identify a particular organism or species. This is sometimes referred to as random amplified polymorphic DNA (RAPD) and has been used successfully in the detection and differentiation of a number of pathogenic strains of bacteria. In addition primers can now be synthesised with a variety of labels such as fluorophores bound to them allowing easier detection and quantitation using techniques such as qPCR (Section 5.10.7).

5.10.4 PCR amplification templates DNA from a variety of sources may be used as the initial source of amplification templates. It is also a highly sensitive technique and requires only one or two molecules for successful amplification. Unlike many manipulation methods used in current molecular biology the PCR technique is sensitive enough to require very little template preparation. The extraction from many prokaryotic and eukaryotic cells may involve a simple boiling step. Indeed the components of many extraction techniques such as SDS and proteinase K may adversely affect the PCR. The PCR may also be used to amplify RNA, a process termed RT–PCR (reverse transcriptase–PCR). Initially a reverse transcription reaction which converts the RNA to cDNA is carried out (Section 6.2.5). This reaction normally involves the use of the enzyme reverse transcriptase although some thermostable DNA polymerases used in the PCR such as Tth have a reverse transcriptase activity under certain buffer conditions. This allows mRNA transcription products to be effectively analysed. It may also be used to differentiate latent viruses (detected by standard PCR) or active viruses which replicate and thus produce transcription products and are thus detectable by RT–PCR (Fig. 5.35). In addition the PCR may be extended to determine relative amounts of a transcription product.

5.10.5 Sensitivity of the PCR The enormous sensitivity of the PCR system is also one of its main drawbacks since the very large degree of amplification makes the system vulnerable to contamination. Even a trace of foreign DNA, such as that even contained in dust particles, may be amplified to significant levels and may give misleading results. Hence cleanliness is paramount when carrying out PCR, and dedicated equipment and in some cases dedicated laboratories are used. It is possible that amplified products may also contaminate the PCR although this may be overcome by UV irradiation to damage already amplified products so that they cannot be used as templates. A further interesting solution is to incorporate uracil into the PCR and then treat the products with the enzyme uracil N-glycosylase (UNG) which degrades any PCR amplicons with incorporated uracil rendering them useless as templates. In addition most PCRs are now undertaken using hotstart. Here the reaction mixture is physically separated from the template or the enzyme: when the reaction begins mixing occurs and thus avoids any mispriming that may have arisen.

184

Molecular biology, bioinformatics and basic techniques

+

Extract poly(A) RNA 5

AAAAAAAAA 3

Anneal poly(dT) primer 5 3

AAAAAAAAA 3 TTTTTT 5

+ dNTPs Extend with reverse transcriptase to form cDNA 5 3

AAAAAAAAA 3 TTTTTT 5

Use cDNA directly in the PCR 3

TTTTTT 5

Fig. 5.35 Reverse transcriptase–PCR (RT–PCR): mRNA is converted to complementary DNA (cDNA) using the enzyme reverse transcriptase. The cDNA is then used directly in the PCR.

5.10.6 Applications of the PCR Many traditional methods in molecular biology have now been superseded by the PCR and the applications for the technique appear to be unlimited. Some of the main techniques derived from the PCR are introduced in Chapter 6 while some of the main areas to which the PCR has been put to use are summarised in Table 5.5. The success of the PCR process has given impetus to the development of other amplification techniques that are based on either thermal cycling or non-thermal cycling (isothermal) methods. The most popular alternative to the PCR is termed the ligase chain reaction or LCR. This operates in a similar fashion to the PCR but a thermostable DNA ligase joins sets of primers together which are complementary to the target DNA. Following this a similar exponential amplification reaction takes place producing amounts of DNA that are similar to the PCR. A number of alternative amplification techniques are listed in Table 5.6.

5.10.7 Quantitative PCR (qPCR) One of the most useful PCR applications is quantitative PCR or qPCR. This allows the PCR to be used as a means of identifying the initial concentrations of DNA or cDNA template used. Early qPCR methods involved the comparison of a standard or

185

5.10 The polymerase chain reaction (PCR)

Table 5.5 Selected applications of the PCR. A number of the techniques are described in the text of Chapters 5 and 6 Field or area of study

Application

Specific examples or uses

General molecular biology

DNA amplification

Screening gene libraries

Gene probe production

Production/labelling

Use with blots/hybridisations

RNA analysis

RT–PCR

Active latent viral infections

Forensic science

Scenes of crime

Analysis of DNA from blood

Infection/disease monitoring

Microbial detection

Strain typing/analysis RAPDs

Sequence analysis

DNA sequencing

Rapid sequencing possible

Genome mapping studies

Referencing points in genome

Sequence-tagged sites (STS)

Gene discovery

mRNA analysis

Expressed sequence tags (EST)

Genetic mutation analysis

Detection of known mutations

Screening for cystic fibrosis

Quantification analysis

Quantitative PCR

50 Nuclease (TaqMan assay)

Genetic mutation analysis

Detection of unknown mutations

Gel-based PCR methods (DGGE)

Protein engineering

Production of novel proteins

PCR mutagenesis

Molecular archaeology

Retrospective studies

Dinosaur DNA analysis

Single-cell analysis

Sexing or cell mutation sites

Sex determination of unborn

In situ analysis

Studies on frozen sections

Localisation of DNA/RNA

Notes: RT, reverse transcriptase; RAPDs, rapid amplification polymorphic DNA; DDGE, denaturing gradient gel electrophoresis.

control DNA template amplified with separate primers at the same time as the specific target DNA. However these types of quantitation rely on the fact that all the reactions are identical and so any factors affecting this may also affect the result. The introduction of thermal cyclers that incorporate the ability to detect the accumulation of DNA through fluorescent dyes binding to the DNA has rapidly transformed this area. In its simplist form a PCR is set up that includes a DNA-binding cyanine dye such as SYBR green. This dye binds to the major groove of double-stranded DNA but not single-stranded DNA and so as amplicons accumulate during the PCR process SYBR green binds the double-stranded DNA proportionally and fluorescence emission of the dye can be detected following excitation. Thus the accumulation of DNA amplicons can be followed in real time during the reaction run. In order to quantitate unknown DNA templates a standard dilution is prepared using DNA of known concentration. As the DNA accumulates during the early exponential phase of the reaction an arbitrary point is taken where each of the dilluted DNA samples cross. This is termed the crossing threshold on Ct value. From the various Ct values a log

186

Molecular biology, bioinformatics and basic techniques

Table 5.6 Selected alternative amplification techniques to the PCR. Two broad methodologies exist that either amplify the target molecules such as DNA and RNA or detect the target and amplify a signal molecule bound to it Technique

Type of assay

Specific examples or uses

Ligase chain reaction (LCR)

Non-isothermal, employs thermostable DNA ligase

Mutation detection

Nucleic acid sequence based amplification (NASBA)

Isothermal, involving use of RNA, RNase H/reverse transcriptase, and T7 DNA polymerase

Viral detection, e.g. HIV

Isothermal microwell format using hybridisation or target/capture probe and signal amplification

Mutation detection

Target amplification methods

Signal amplification methods Branched DNA amplification (b-DNA)

Note: HIV, human immunodeficiency virus.

graph is prepared from which an unknown concentration can be deduced. Since SYBR green and similar DNA-binding dyes are non-specific, in order to determine if a correctly sized PCR product is present most qPCR cyclers have a built-in melting curve function. This gradually increases the temperature of each tube until the double-stranded PCR product denatures or melts and allows a precise although not definitive determination of the product. Confirmation of the product is usually obtained by DNA sequencing.

5.10.8 The TaqMan system In order to make qPCR specific a number of strategies may be employed that rely on specific hybridisation probes. One ingenious method is called the TaqMan assay or 50 nuclease assay. Here the probe consists of an oligonucleotide labelled with a fluorescent reporter at one end of the molecule and quencher at the other end. The PCR proceeds as normal and the oligonucleotide probe binds to the target sequence in the annealing step. As the Taq polymerase extends from the primer its 50 exonuclease activity degrades the hybridisation probe and releases the reporter from the quencher. A signal is thus generated which increases in direct proportion to the number of starting molecules and fluorescence can be detected in real time as the PCR proceeds (Fig. 5.36). Although relatively expensive in comparison to other methods for determining expression levels it is simple, rapid and reliable and now in use in many research and clinical areas. Further developments in probe-based PCR systems have also been used and include scorpion probe systems, amplifluor and real-time LUX probes.

187

5.11 Nucleotide sequencing of DNA

R

Q

5

5

R

Q

5

5

R Q 5

5

5

5

Fig. 5.36 5’ Nuclease assay (TaqMan assay). PCR is undertaken with RQ probe (reporter/quencher dye). As R–Q are in close proximity, fluorescence is quenched. During extension by Taq polymerase the probe is cleaved as a result of Taq having 5’ nuclease activity. This cleaves R–Q probe and the reporter is released. This results in detectable increase in fluorescence and allows real-time PCR detection.

5.11 NUCLEOTIDE SEQUENCING OF DNA 5.11.1 Concepts of nucleic acid sequencing The determination of the order or sequence of bases along a length of DNA is one of the central techniques in molecular biology. Although it is now possible to derive amino acid sequence information with a degree of reliability it is frequently more convenient and rapid to analyse the DNA coding information. The precise usage of codons, information regarding mutations and polymorphisms and the identification of gene regulatory control sequences are also only possible by analysing DNA sequences. Two techniques have been developed for this, one based on an enzymatic method frequently termed Sanger sequencing after its developer, and a chemical method called Maxam and Gilbert, named for the same reason. At present Sanger

188

Molecular biology, bioinformatics and basic techniques

sequencing is by far the most popular method and many commercial kits are available for its use. However, there are certain occasions such as the sequencing of short oligonucleotides where the Maxam and Gilbert method is more appropriate. One absolute requirement for Sanger sequencing is that the DNA to be sequenced is in a single-stranded form. Traditionally this demanded that the DNA fragment of interest be inserted and cloned into a specialised bacteriophage vector termed M13 which is naturally single-stranded (Section 6.3.3). Although M13 is still universally used the advent of the PCR has provided the means not only to amplify a region of any genome or cDNA but also very quickly generate the corresponding nucleotide sequence. This has led to an explosion in the accumulation of DNA sequence information and has provided much impetus for gene discovery and genome mapping (Section 6.9). The Sanger method is simple and elegant and mimics in many ways the natural ability of DNA polymerase to extend a growing nucleotide chain based on an existing template. Initially the DNA to be sequenced is allowed to hybridise with an oligonucleotide primer, which is complementary to a sequence adjacent to the 30 side of DNA within a vector such as M13 or in an amplicon. The oligonucleotide will then act as a primer for synthesis of a second strand of DNA, catalysed by DNA polymerase. Since the new strand is synthesised from its 50 end, virtually the first DNA to be made will be complementary to the DNA to be sequenced. One of the dNTPs that must be provided for DNA synthesis is radioactively labelled with 32P or 35S, and so the newly synthesised strand will be labelled.

5.11.2 Dideoxynucleotide chain terminators The reaction mixture is then divided into four aliquots, representing the four dNTPs, A, C, G and T. In addition to all of the dNTPs being present in the A tube an analogue of dATP is added (20 30 -dideoxyadenosine triphosphate (ddATP)) which is similar to A but has no 30 hydroxyl group and so will terminate the growing chain since a 50 to 30 phosphodiester linkage cannot be formed without a 30 -hydroxyl group. The situation for tube C is identical except that ddCTP is added; similarly the G and T tubes contain ddGTP and ddTTP respectively (Fig. 5.37). Since the incorporation of ddNTP rather than dNTP is a random event, the reaction will produce new molecules varying widely in length, but all terminating at the same type of base. Thus four sets of DNA sequence are generated, each terminating at a different type of base, but all having a common 50 end (the primer). The four labelled and chain-terminated samples are then denatured by heating and loaded next to each other on a polyacrylamide gel for electrophoresis. Electrophoresis is performed at approximately 70  C in the presence of urea, to prevent renaturation of the DNA, since even partial renaturation alters the rates of migration of DNA fragments. Very thin, long gels are used for maximum resolution over a wide range of fragment lengths. After electrophoresis, the positions of radioactive DNA bands on the gel are determined by autoradiography. Since every band in the track from the ddATP sample must contain molecules which terminate at adenine, and those in the ddCTP terminate

189

5.11 Nucleotide sequencing of DNA

Fragment to be sequenced, cloned in M13 phage 3 – – – AG – – – CT GCTCGCAT – – – 5 TC – – – GA Primer DNA polymerase 4 dNTPs (radioactive) ddGTP Synthesis of complementary second strands: 5 TC – – – GA CddG 3 5 TC – – – GA CGA ddG 3 5 TC – – – GACG AGCddG 3 Denature to give single strands Run on sequencing gel alongside products of ddCTP, ddATP and ddTTP reactions

ddA ddC ddG ddT

Read sequence of second strand from autoradiograph 3 A T G C G A G 5

Fig. 5.37 Sanger sequencing of DNA.

at cytosine, etc., it is possible to read the sequence of the newly synthesised strand from the autoradiogram, provided that the gel can resolve differences in length equal to a single nucleotide (Fig. 5.38). Under ideal conditions, sequences up to about 300 bases in length can be read from one gel.

5.11.3 Direct PCR pyrosequencing Rapid PCR sequencing has also been made possible by the use of pyrosequencing. This is a sequencing by synthesis whereby a PCR template is hybridised to an oligonucleotide and incubated with DNA polymerase, ATP sulphurylase, luciferase and apyrase. During the reaction the first of the four dNTPs are added and if incorporated release pyrophosphate (PPi). The ATP sulphurylase converts the PPi to ATP which drives the luciferase-mediated conversion of luciferin to oxyluciferin to generate light. Apyrase degrades the resulting component dNTPs and ATP. This is followed by another round of dNTP addition. A resulting pyrogram provides an output of the sequence. The method provides short reads very quickly and is especially useful for the determination of mutations or SNPs.

Fig. 5.38 Autoradiograph of a DNA sequencing gel. Samples were prepared using the Sanger dideoxy method of DNA sequencing. Each set of four samples was loaded into adjacent tracks, indicated by A,C, G and T, depending on the identity of the dideoxyribonucleotide used for that sample. Two sets of samples were labelled with 35S (1 and 3) and one was labelled with 32P (2). It is evident that 32P generates darker but more diffuse bands than does 35S, making the bands nearer the bottom of the autoradiograph easy to see. However, the broad bands produced by 32P cannot be resolved near the top of the autoradiograph, making it impossible to read a sequence from this region. The much sharper bands produced by 35S allow sequences to be read with confidence along most of the autoradiograph and so a longer sequence of DNA can be obtained from a single gel.

Direction of electrophoretic movement

1 2 3 A C G T A C G T A C G T

191

5.11 Nucleotide sequencing of DNA

It is also possible to undertake nucleotide sequencing from double-stranded molecules such as plasmid cloning vectors and PCR amplicons directly. The doublestranded DNA must be denatured prior to annealing with primer. In the case of plasmids an alkaline denaturation step is sufficient; however, for amplicons this is more problematic and a focus of much research. Unlike plasmids amplicons are short and reanneal rapidly, therefore preventing the reannealing process or biasing the amplification towards one strand by using a primer ratio of 100 : 1 overcomes this problem to a certain extent. Denaturants such as formamide or DMSO have also been used with some success in preventing the reannealing of PCR strands following their separation. It is possible to physically separate and retain one PCR strand by incorporating a molecule such as biotin into one of the primers. Following PCR one strand with an affinity molecule may be removed by affinity chromatography with strepavidin, leaving the complementary PCR strand. This affinity purification provides singlestranded DNA derived from the PCR amplicon and although it is somewhat timeconsuming does provide high-quality single-stranded DNA for sequencing.

5.11.4 PCR cycle sequencing One of the most useful methods of sequencing PCR amplicons is termed PCR cycle sequencing. This is not strictly a PCR since it involves linear amplification with a single primer. Approximately 20 cycles of denaturation, annealing and extension take place. Radiolabelled or fluorescent-labelled dideoxynucleotides are then introduced in the final stages of the reaction to generate the chain-terminated extension products (Fig. 5.39). Automated direct PCR sequencing is increasingly being refined allowing greater lengths of DNA to be analysed in one sequencing run and provides a very rapid means of analysing DNA sequences.

5.11.5 Automated fluorescent DNA sequencing Advances in fluorescent dye terminator and labelling chemistry have led to the development of high-throughput automated sequencing techniques. Essentially most systems involve the use of dideoxynucleotides labelled with different fluorochromes. Thus the label is incorporated into the ddNTP and this is used to carry out chain termination as in the standard reaction indicated in Section 5.11.1. The advantage of this modification is that since a different label is incorporated with each ddNTP it is unnecessary to perform four separate reactions. Therefore the four chain-terminated products are run on the same track of a denaturing electrophoresis gel. Each product with its base-specific dye is excited by a laser and the dye then emits light at its characteristic wavelength. A diffraction grating separates the emissions which are detected by a charge-coupled device (CCD) and the sequence is interpreted by a computer. The advantages of the technique include real-time detection of the sequence. In addition the lengths of sequence that may be analysed are in excess of 500 bp (Fig. 5.40). Capillary electrophoresis is increasingly being used for the detection of

192

Molecular biology, bioinformatics and basic techniques

Denaturation 5 3

3 5

5

3 3

5

ds DNA denatured by heating to > 94°C

Primer annealing reaction

Extension/termination reaction 5 3 5 3

A 5 A

Cycle sequencing (one cycle)

5

Taq polymerase extends target sequences until chain terminator is added (e.g. ddA)

Label 5 3

Primer 3 5

Labelled oligo anneals to target sequence

Fig. 5.39 Simplified scheme of cycle sequencing. Linear amplification takes place with the use of labelled primers. During the extension and termination reaction, the chain terminator dideoxynucleotides are incorporated into the growing chain. This takes place in four separate reactions (A, C, G and T). The products are then run on a polyacrylamide gel and the sequence analysed. The scheme indicates the events that take place in the A reaction only. ds, double-stranded.

sequencing products. This is where liquid polymers in thin capillary tubes are used obviating the need to pour sequencing gels and requiring little manual operation. This substantially reduces the electrophoresis run times and allows high throughput to be achieved. A number of large-scale sequence facilities are now fully automated using 96-well microtitre-based formats. The derived sequences can be downloaded automatically to databases and manipulated using a variety of bioinformatics resources.

5.11.6 Alternative DNA sequencing methods Developments in the technology of DNA sequencing have made whole-genome sequencing projects a realistic proposition within achievable timescales; indeed the first diploid genome sequence to be completed was of Craig Venter who pioneered high-throughput sequencing. This makes studies on genome variation and evolution viable, as evidenced by the 1000 Genomes Project which is providing high-resolution sequence analysis of genomes. This has been made possible not only by refinements in traditional automated sequencing but also by new developments such as sequencing by synthesis and the development of sequencing by hybridisation arrays. These methods are changing the way genome analysis is undertaken and makes individual

193

5.11 Nucleotide sequencing of DNA

A Fluorescent chain termination products migrate down single lane gel past detector

C G T C G G

Laser excitation unit

C T Diffraction grating G T G C Charge-coupled device (CCD)

T G C G

Computer analysis and automated base calling

Fig. 5.40 Automated fluorescent sequencing detection using single-lane gel and charge-coupled device.

genome analysis a reality. Indeed more advanced methods using nanotechnology are in development and may provide an even more effective means of DNA sequencing.

5.11.7 Maxam and Gilbert sequencing Sanger sequencing is by far the most popular technique for DNA sequencing; however, an alternative technique developed at the same time may also be used. The chemical cleavage method of DNA sequencing developed by Maxam and Gilbert is often used for sequencing small fragments of DNA such as oligonucleotides, where Sanger sequencing is problematic. A radioactive label is added to either the 30 or the 50 ends of a double-stranded DNA sample (Fig. 5.41). The strands are then

194

Molecular biology, bioinformatics and basic techniques

32

5 – – – TACGCTCG – P 3

Single-stranded DNA, labelled only at its 3 end

Modification of C using hydrazine, this removes base, leaving ribosyl urea 32

– – – TACGCT G– P 32

– – – TACG TCG– P 32

– – – TA GCTCG– P

Cleavage at modified bases, using piperidine 32

G– P 32

TCG– P 32

GCTCG– P plus non-radioactive fragments

Separation on sequencing gel alongside products of other modification/cleavage reactions

Fig. 5.41 Maxam and Gilbert sequencing of DNA. Only modification and cleavage of deoxycytidine is shown, but three more portions of the end-labelled DNA would be modified and cleaved at G, GþA, and TþC, and the products would be separated on the sequencing gel alongside those from the C reactions.

separated by electrophoresis under denaturing conditions, and analysed separately. DNA labelled at one end is divided into four aliquots and each is treated with chemicals which act on specific bases by methylation or removal of the base. Conditions are chosen so that, on average, each molecule is modified at only one position along its length; every base in the DNA strand has an equal chance of being modified. Following the modification reactions, the separate samples are cleaved by piperidine, which breaks phosphodiester bonds exclusively at the 50 side of nucleotides whose base has been modified. The result is similar to that produced by the Sanger method, since each sample now contains radioactively labelled molecules of various lengths, all with one end in common (the labelled end), and with the other end cut at the same type of base. Analysis of the reaction products by electrophoresis is as described for the Sanger method.

5.12 SUGGESTIONS FOR FURTHER READING Augen, J. (2005). Bioinformatics in the Post-Genomic Era. Reading, MA: Addison-Wesley. Brooker, R. J. (2005). Genetics Analysis and Principles, 2nd edn. New York: McGraw-Hill. Hartwell, L. et al. (2008). Genetics: From Genes to Genomes, 3rd edn. New York: McGraw-Hill. Lodish, H. et al. (2008). Molecular Cell Biology, 6th edn. San Francisco, CA: W. H. Freeman. Lewin, B. (2007). Genes IX. Sudbury, MA: Jones & Bartlett. Strachan, T. and Read, A. P (2004). Human Molecular Genetics, 3rd edn. Oxford, UK: Bios. Walker, J. M. and Rapley, R. (2008). Molecular Biomethods Handbook, 2nd edn. Totowa, NJ: Humana Press.

6

Recombinant DNA and genetic analysis R. RAPLEY

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12

Introduction Constructing gene libraries Cloning vectors Hybridisation and gene probes Screening gene libraries Applications of gene cloning Expression of foreign genes Analysing genes and gene expression Analysing whole genomes Pharmacogenomics Molecular biotechnology and applications Suggestions for further reading

6.1 INTRODUCTION The considerable advances made in microarray, sequencing technologies and bioinformatics analysis are now beginning to provide true insights into the development and maintenance of cells and tissues. Indeed areas of analysis such as metabolomics, transcriptomics and systems biology are now well established and allow analysis of vast numbers of samples simultaneously. This type of large-scale parallel analysis is now the main driving force of biological discovery and analysis. However, the techniques of molecular biology and genetic analysis have their foundations in methods developed a number of decades ago. One of the main cornerstones on which molecular biology analysis was developed was the discovery of restriction endonucleases in the early 1970s which not only led to the possibility of analysing DNA more effectively but also provided the ability to cut different DNA molecules so that they could later be joined together to create new recombinant DNA fragments. The newly created DNA molecules heralded a new era in the manipulation, analysis and exploitation of biological molecules. This process, termed gene cloning, has enabled numerous discoveries and insights into gene structure, function and regulation. Since their 195

196

Recombinant DNA and genetic analysis

initial use the methods for the production of gene libraries have been steadily refined and developed. Although microarray analysis and the polymerase chain reaction (PCR) have provided short cuts to gene analysis there are still many cases where gene cloning methods are not only useful but are an absolute requirement. The following provides an account of the process of gene cloning and other methods based on recombinant DNA technology.

6.2 CONSTRUCTING GENE LIBRARIES 6.2.1 Digesting genomic DNA molecules Following the isolation and purification of genomic DNA it is possible to specifically fragment it with enzymes termed restriction endonucleases. These enzymes are the key to molecular cloning because of the specificity they have for particular DNA sequences. It is important to note that every copy of a given DNA molecule from a specific organism will give the same set of fragments when digested with a particular enzyme. DNA from different organisms will, in general, give different sets of fragments when treated with the same enzyme. By digesting complex genomic DNA from an organism it is possible to reproducibly divide its genome into a large number of small fragments, each approximately the size of a single gene. Some enzymes cut straight across the DNA to give flush or blunt ends. Other restriction enzymes make staggered single-strand cuts, producing short single-stranded projections at each end of the digested DNA. These ends are not only identical, but complementary, and will base-pair with each other; they are therefore known as cohesive or sticky ends. In addition the 50 end projection of the DNA always retains the phosphate groups. Over 600 enzymes, recognising more than 200 different restriction sites, have been characterised. The choice of which enzyme to use depends on a number of factors. For example, the recognition sequence of 6 bp will occur, on average, every 4096 (46) bases assuming a random sequence of each of the four bases. This means that digesting genomic DNA with EcoR1, which recognises the sequence 50 -GAATTC-30 , will produce fragments each of which is on average just over 4 kb. Enzymes with 8 bp recognition sequences produce much longer fragments. Therefore very large genomes, such as human DNA, are usually digested with enzymes that produce long DNA fragments. This makes subsequent steps more manageable, since a smaller number of those fragments need to be cloned and subsequently analysed (Table 6.1).

6.2.2 Ligating DNA molecules The DNA products resulting from restriction digestion to form sticky ends may be joined to any other DNA fragments treated with the same restriction enzyme. Thus, when the two sets of fragments are mixed, base-pairing between sticky ends will result in the annealing together of fragments that were derived from different starting DNA. There will, of course, also be pairing of fragments derived from the same starting DNA

197

6.2 Constructing gene libraries

Table 6.1 Numbers of clones required for representation of DNA in a genome library No. of clones required Species

Genome size (kb)

Bacteria (E. coli) Yeast Fruit fly Man Maize

17 kb fragments

35 kb fragments

4 000

700

340

20 000

3 500

1 700

165 000

29 000

14 500

3 000 000

535 000

258 250

15 000 000

2 700 000

1 350 000

Fragments produced by cleavage with BamHI 5 pGATCC

G

3

CCTAGp

G

pGATCC

G

3

CCTAGp 5

G

DNA ligase + ATP

5 pGATCC 3 G

GGATCC CCTAGG

G 3 CCTAGp 5

Fig. 6.1 Ligation molecules with cohesive ends. Complementary cohesive ends base-pair, forming a temporary link between two DNA fragments. This association of fragments is stabilised by the formation of 3’ to 5’ phosphodiester linkages between cohesive ends, a reaction catalysed by DNA ligase.

molecules, termed reannealing. All these pairings are transient, owing to the weakness of hydrogen bonding between the few bases in the sticky ends, but they can be stabilised by use of an enzyme termed DNA ligase in a process termed ligation. This enzyme, usually isolated from bacteriophage T4 and termed T4 DNA ligase, forms a covalent bond between the 50 phosphate at the end of one strand and the 30 hydroxyl of the adjacent strand (Fig. 6.1). The reaction, which is ATP dependent, is often carried out at 10  C to lower the kinetic energy of molecules, and so reduce the chances of base-paired sticky ends parting before they have been stabilised by ligation. However, long reaction times are needed to compensate for the low activity of DNA ligase in the cold. It is also possible to join blunt ends of DNA molecules, although the efficiency of this reaction is much lower than sticky-ended ligations. Since ligation reconstructs the site of cleavage, recombinant molecules produced by ligation of sticky ends can be cleaved again at the ‘joins’, using the same restriction enzyme that was used to generate the fragments initially. In order to propagate digested DNA from an organism it is necessary to join or ligate that DNA with

198

Recombinant DNA and genetic analysis

Cut DNA containing desired gene

Cut plasmid

Ligate

Recombinant plasmid

Transform bacteria

Grow cells and select recombinant clones

Select clone containing desired gene

Grow cells to obtain required quantities of gene

Fig. 6.2 Outline of gene cloning.

a specialised DNA carrier molecule termed a vector (Section 6.3). Thus each DNA fragment is inserted by ligation into the vector DNA molecule, which allows the whole recombined DNA to then be replicated indefinitely within microbial cells (Fig. 6.2). In this way a DNA fragment can be cloned to provide sufficient material for further detailed analysis, or for further manipulation. Thus, all of the DNA extracted from an organism and digested with a restriction enzyme will result in a collection of clones. This collection of clones is known as a gene library.

6.2.3 Aspects of gene libraries There are two general types of gene library: a genomic library which consists of the total chromosomal DNA of an organism and a cDNA library which represents only the mRNA from a particular cell or tissue at a specific point in time (Fig. 6.3). The choice of the particular type of gene library depends on a number of factors, the most important being the final application of any DNA fragment derived from the library. If the ultimate aim is understanding the control of protein production for a particular gene or the analysis of its architecture, then genomic libraries must be used. However, if the goal is the production of new or modified proteins, or the determination of the tissue-specific expression and timing patterns, cDNA libraries are more appropriate. The main consideration in the construction of genomic or cDNA libraries is therefore

199

6.2 Constructing gene libraries

Genome library

cDNA library

Extract chromosomal DNA

Extract mRNA

Digest chromosomal DNA with restriction endonuclease

Produce cDNA

Insert each DNA fragment into vector (recombinant DNA)

Insert each cDNA into vector (recombinant DNA)

Transform bacteria Clone each recombinant

Transform bacteria Clone each recombinant

Fig. 6.3 Comparison of the general steps involved in the construction of genomic and complementary DNA (cDNA) libraries.

the nucleic acid starting material. Since the genome of an organism is fixed, chromosomal DNA may be isolated from almost any cell type in order to prepare genomic libraries. In contrast, however, cDNA libraries only represent the mRNA being produced from a specific cell type at a particular time. Thus, it is important to consider carefully the cell or tissue type from which the mRNA is to be derived in the construction of cDNA libraries. There are a variety of cloning vectors available, many based on naturally occurring molecules such as bacterial plasmids or bacteria-infecting viruses. The choice of vector depends on whether a genomic library or cDNA library is constructed. The various types of vectors are explained in more detail in Section 6.3.

6.2.4 Genomic DNA libraries Genomic libraries are constructed by isolating the complete chromosomal DNA from a cell, then digesting it into fragments of the desired average length with restriction endonucleases. This can be achieved by partial restriction digestion using an enzyme that recognises tetranucleotide sequences. Complete digestion with such an enzyme would produce a large number of very short fragments, but if the enzyme is allowed to cleave only a few of its potential restriction sites before the reaction is stopped, each DNA molecule will be cut into relatively large fragments. Average fragment size will depend on the relative concentrations of DNA and restriction enzyme, and in particular, on the conditions and duration of incubation (Fig. 6.4). It is also possible to produce fragments of DNA by physical shearing although the ends of the fragments

200

Recombinant DNA and genetic analysis

(a)

E

E

E

E

(b)

E

E

E

E

Fig. 6.4 Comparison of (a) partial and (b) complete digestion of DNA molecules at restriction enzyme sites (E).

may need to be repaired to make them flush-ended. This can be achieved by using a modified DNA polymerase termed Klenow polymerase. This is prepared by cleavage of DNA polymerase with subtilisin, giving a large enzyme fragment which has no 50 to 30 exonuclease activity, but which still acts as a 50 to 30 polymerase. This will fill in any recessed 30 ends on the sheared DNA using the appropriate dNTPs. The mixture of DNA fragments is then ligated with a vector, and subsequently cloned. If enough clones are produced there will be a very high chance that any particular DNA fragment such as a gene will be present in at least one of the clones. To keep the number of clones to a manageable size, fragments about 10 kb in length are needed for prokaryotic libraries, but the length must be increased to about 40 kb for mammalian libraries. It is possible to calculate the number of clones that must be present in a gene library to give a probability of obtaining a particular DNA sequence. This formula is: N ¼ lnð1PÞ lnð1f Þ where N is the number of recombinants, P is the probability and f is the fraction of the genome in one insert. Thus for the E. coli DNA chromosome of 5  106 bp and with an insert size of 20 kb the number of clones needed (N) would be 1  103 with a probability of 0.99.

6.2.5 cDNA libraries There may be several thousand different proteins being produced in a cell at any one time, all of which have associated mRNA molecules. To identify any one of those

201

6.2 Constructing gene libraries

Messenger RNA AAAAAA

Random primers AAAAAA Reverse transcriptase/buffer/dNTPs Specific primer

Poly(dT) primer AAAAAA

AAAAAA

Reverse transcriptase/buffer/dNTPs

Reverse transcriptase/buffer/dNTPs

AAAAAA cDNA–mRNA hybrid AAAAAA cDNA–mRNA hybrid

AAAAAA cDNA–mRNA hybrid

Fig. 6.5 Strategies for producing first-strand cDNA from mRNA.

mRNA molecules the clones of each individual mRNA have to be synthesised. Libraries that represent the mRNA in a particular cell or tissue are termed cDNA libraries. mRNA cannot be used directly in cloning since it is too unstable. However it is possible to synthesise complementary DNA molecules (cDNAs) to all the mRNAs from the selected tissue. The cDNA may be inserted into vectors and then cloned. The production of cDNA (complementary DNA) is carried out using an enzyme termed reverse transcriptase which is isolated from RNA-containing retroviruses. Reverse transcriptase is an RNA-dependent DNA polymerase, and will synthesise a first-strand DNA complementary to an mRNA template, using a mixture of the four dNTPs. There is also a requirement (as with all polymerase enzymes) for a short oligonucleotide primer to be present (Fig. 6.5). With eukaryotic mRNA bearing a poly(A) tail, a complementary oligo(dT) primer may be used. Alternatively random hexamers may be used which randomly anneal to the mRNAs in the complex. Such primers provide a free 30 hydroxyl group which is used as the starting point for the reverse transcriptase. Regardless of the method used to prepare the first-strand cDNA one absolute requirement is high-quality undegraded mRNA (Section 5.7.2). It is usual to check the integrity of the RNA by gel electrophoresis (Section 5.7.4). Alternatively a fraction of the extract may be used in a cell-free translation system, which, if intact mRNA is present, will direct the synthesis of proteins represented by the mRNA molecules in the sample (Section 6.7). Following the synthesis of the first DNA strand, a poly(dC) tail is added to its 30 end, using terminal transferase and dCTP. This will also, incidentally, put a poly(dC) tail on

202

Recombinant DNA and genetic analysis

Messenger RNA AAAAAA-3

5

Anneal primer (poly(dT)) AAAAAA-3

5

Poly(dT) primer Reverse transcriptase/buffer/dNTPs 5

AAAAAA-3 5

3 RNase H/DNA polymerase I

AAAAAA-3 

5

5

3 RNase H leaves gaps in mRNA strand

DNA polymerase I utilises primer–template complexes formed from RNase H 5

3

3

5 Double-stranded cDNA

Fig. 6.6 Second-strand cDNA synthesis using the RNase H method.

the poly(A) of mRNA. Alkaline hydrolysis is then used to remove the RNA strand, leaving single-stranded DNA which can be used, like the mRNA, to direct the synthesis of a complementary DNA strand. The second-strand synthesis requires an oligo(dG) primer, base-paired with the poly(dC) tail, which is catalysed by the Klenow fragment of DNA polymerase I. The final product is double-stranded DNA, one of the strands being complementary to the mRNA. One further method of cDNA synthesis involves the use of RNase H. Here the first-strand cDNA is carried out as above with reverse transcriptase but the resulting mRNA–cDNA hybrid is retained. RNase H is then used at low concentrations to nick the RNA strand. The resulting nicks expose 30 hydroxyl groups which are used by DNA polymerase as a primer to replace the RNA with a second strand of cDNA (Fig. 6.6).

6.2.6 Treatment of blunt cDNA ends Ligation of blunt-ended DNA fragments is not as efficient as ligation of sticky ends, therefore with cDNA molecules additional procedures are undertaken before ligation with cloning vectors. One approach is to add small double-stranded molecules with one internal site for a restriction endonuclease, termed nucleic acid linkers, to the cDNA. Numerous linkers are commercially available with internal restriction sites for

203

6.2 Constructing gene libraries

Blunt-ended DNA

5 pGGGATCCC 3 CCCTAGGGp linker

Plasmid Blunt-ended ligation, using DNA ligase

5 pGGGATCCC 3 CCCTAGGG

Cut with BamHI

GGGATCCC CCCTAGGGp Cut with BamHI

5 pGATCCC GG 3

Gp TA

5p G

C

ATC 3

GG CCCTAGp Ligation of cohesive ends, using DNA ligase

T T C A G

G A C A T

G G C C G C

A

C C C GG T G

Plasmid containing DNA insert

Fig. 6.7 Use of linkers. In this example, blunt-ended DNA is inserted into a specific restriction site on a plasmid, after ligation to a linker containing the same restriction site.

many of the most commonly used restriction enzymes. Linkers are blunt-end ligated to the cDNA but since they are added much in excess of the cDNA the ligation process is reasonably successful. Subsequently the linkers are digested with the appropriate restriction enzyme which provides the sticky ends for efficient ligation to a vector digested with the same enzyme. This process may be made easier by the addition of adaptors rather than linkers which are identical except that the sticky ends are preformed and so there is no need for restriction digestion following ligation (Fig. 6.7).

6.2.7 Enrichment methods for RNA Frequently an attempt is made to isolate the mRNA transcribed from a desired gene within a particular cell or tissue that produces the protein in high amounts. Thus if the

204

Recombinant DNA and genetic analysis

Subtractor cells

Target cells

mRNA

mRNA

mRNA cDNA (1st strand)

mRNA unhybridised

cDNA

Analyse unhybridised mRNA

Fig. 6.8 Scheme of analysing specific mRNA molecules by subtractive hybridisation.

cell or tissue produces a major protein of the cell a large fraction of the total mRNA will code for the protein. An example of this are the B cells of the pancreas, which contain high levels of pro-insulin mRNA. In such cases it is possible to precipitate polysomes which are actively translating the mRNA, by using antibodies to the ribosomal proteins; mRNA can then be dissociated from the precipitated ribosomes. More usually the mRNA required is only a minor component of the total cellular mRNA. In such cases total mRNA may be fractionated by size using sucrose density gradient centrifugation. Then each fraction is used to direct the synthesis of proteins using an in vitro translation system (Section 6.7).

6.2.8 Subtractive hybridisation It is often the case that genes are transcribed in a specific cell type or differentially activated during a particular stage of cellular growth, often at very low levels. It is possible to isolate those mRNA transcripts by subtractive hybridisation. Usually the the mRNA species common to the different cell types are removed, leaving the cell type or tissue-specific mRNAs for analysis (Fig. 6.8). This may be undertaken by isolating the mRNA from the so-called subtractor cells and producing a first-strand cDNA (Section 6.2.5). The original mRNA from the subtractor cells is then degraded and the mRNA from the target cells isolated and mixed with the cDNA. All the complementary mRNA–cDNA molecules common to both cell types will hybridise leaving the unbound mRNA which may be isolated and further analysed. A more rapid approach of analysing the differential expression of genes has been developed using the PCR. This technique, termed differential display, is explained in greater detail in Section 6.8.1.

6.2.9 Cloning PCR products While PCR has to some extent replaced cloning as a method for the generation of large quantities of a desired DNA fragment there is, in certain circumstances, still

205

6.2 Constructing gene libraries

A A PCR product amplified with Taq DNA polymerase T T Vector (dT)

Ligation

T4 DNA ligase

Vector (dT) + PCR insert

Fig. 6.9 Cloning of PCR products using dA : dT cloning.

a requirement for the cloning of PCR-amplified DNA. For example certain techniques such as in vitro protein synthesis are best achieved with the DNA fragment inserted into an appropriate plasmid or phage cloning vector (Section 6.7.1). Cloning methods for PCR follow closely the cloning of DNA fragments derived from the conventional manipulation of DNA. The techniques with which this may be achieved are through one of two ways, blunt-ended or cohesive-ended cloning. Certain thermostable DNA polymerases such as Taq DNA polymerase and Tth DNA polymerase give rise to PCR products having a 30 overhanging A residue. It is possible to clone the PCR product into dT vectors termed dA : dT cloning. This makes use of the fact that the terminal additions of A residues may be successfully ligated to vectors prepared with T residue overhangs to allow efficient ligation of the PCR product (Fig. 6.9). The reaction is catalysed by DNA ligase as in conventional ligation reactions (Section 6.2.2). It is also possible to carry out cohesive ended cloning with PCR products. In this case oligonucleotide primers are designed with a restriction endonuclease site incorporated into them. Since the complementarity of the primers needs to be absolute at the 30 end the 50 end of the primer is usually the region for the location of the restriction site. This needs to be designed with care since the efficiency of digestion with certain restriction endonuclease decreases if extra nucleotides, not involved in recognition, are absent at the 50 end. In this case the digestion and ligation reactions are the same as those undertaken for conventional reactions (Section 6.2.1).

206

Recombinant DNA and genetic analysis

6.3 CLONING VECTORS For the cloning of any molecule of DNA it is necessary for that DNA to be incorporated into a cloning vector. These are DNA elements that may be stably maintained and propagated in a host organism for which the vector has replication functions. A typical host organism is a bacterium such as E. coli which grows and divides rapidly. Thus any vector with a replication origin in E. coli will replicate (together with any incorporated DNA) efficiently. Thus, any DNA cloned into a vector will enable the amplification of the inserted foreign DNA fragment and also allow any subsequent analysis to be undertaken. In this way the cloning process resembles the PCR although there are some major differences between the two techniques. By cloning, it is possible to not only store a copy of any particular fragment of DNA, but also produce unlimited amounts of it (Fig. 6.10). The vectors used for cloning vary in their complexity, their ease of manipulation, their selection and the amount of DNA sequence they can accommodate (the insert

Stable gene bank (gene library), each vector containing a different foreign DNA fragment

Isolation of one clone from library by gene library screening

Amplify single clone from gene library for further analysis

Fig. 6.10 Production of multiple copies of a single clone from a stable gene bank or library.

207

6.3 Cloning vectors

Table 6.2 Comparison of vectors generally available for cloning DNA fragments Vector

Host cell

Vector structure

Insert range (kb)

M13

E. coli

Circular virus

1–4

Plasmid

E. coli

Circular plasmid

1–5

Phage l

E. coli

Linear virus

2–25

Cosmids

E. coli

Circular plasmid

35–45

BACs

E. coli

Circular plasmid

50–300

YACs

S. cerevisiae

Linear chromosome

100–2000

Notes: BAC, bacterial artificial chromosome; YAC, yeast artificial chromosome.

capacity). Vectors have in general been developed from naturally occurring molecules such as bacterial plasmids, bacteriophages or combinations of the elements that make them up, such as cosmids (Section 6.3.4). For gene library constructions there is a choice and trade-off between various vector types, usually related to ease of the manipulations needed to construct the library and the maximum size of foreign DNA insert of the vector (Table 6.2). Thus, vectors with the advantage of large insert capacities are usually more difficult to manipulate, although there are many more factors to be considered, which are indicated in the following treatment of vector systems.

6.3.1 Plasmids Many bacteria contain an extrachromosomal element of DNA, termed a plasmid, which is a relatively small, covalently closed circular molecule, carrying genes for antibiotic resistance, conjugation or the metabolism of ‘unusual’ substrates. Some plasmids are replicated at a high rate by bacteria such as E. coli and so are excellent potential vectors. In the early 1970s a number of natural plasmids were artificially modified and constructed as cloning vectors, by a complex series of digestion and ligation reactions. One of the most notable plasmids, termed pBR322 after its developers Bolivar and Rodriguez (pBR), was widely adopted and illustrates the desirable features of a cloning vector as indicated below (Fig. 6.11).

• •

The plasmid is much smaller than a natural plasmid, which makes it more resistant to damage by shearing, and increases the efficiency of uptake by bacteria, a process termed transformation. A bacterial origin of DNA replication ensures that the plasmid will be replicated by the host cell. Some replication origins display stringent regulation of replication, in which rounds of replication are initiated at the same frequency as cell division. Most plasmids, including pBR322, have a relaxed origin of replication, whose activity is not tightly linked to cell division, and so plasmid replication will be

208

Recombinant DNA and genetic analysis

EcoRI HindIII

SspI

BamHI ScaI SphI SalI PstI Ampicillin resistance gene

ApR

TcR

pBR322 4.36 kb

Tetracycline resistance gene

ORI

Origin of replication

NdeI

PuvII

Fig. 6.11 Map and important features of pBR322.





initiated far more frequently than chromosomal replication. Hence a large number of plasmid molecules will be produced per cell. Two genes coding for resistance to antibiotics have been introduced. One of these allows the selection of cells which contain plasmid: if cells are plated on medium containing an appropriate antibiotic, only those that contain plasmid will grow to form colonies. The other resistance gene can be used, as described below, for detection of those plasmids that contain inserted DNA. There are single recognition sites for a number of restriction enzymes at various points around the plasmid, which can be used to open or linearise the circular plasmid. Linearising a plasmid allows a fragment of DNA to be inserted and the circle closed. The variety of sites not only makes it easier to find a restriction enzyme that is suitable for both the vector and the foreign DNA to be inserted, but, since some of the sites are placed within an antibiotic resistance gene, the presence of an insert can be detected by loss of resistance to that antibiotic. This is termed insertional inactivation. Insertional inactivation is a useful selection method for identifying recombinant vectors with inserts. For example, a fragment of chromosomal DNA digested with BamH1 would be isolated and purified. The plasmid pBR322 would also be digested at a single site, using BamH1, and both samples would then be deproteinised to inactivate the restriction enzyme. BamH1 cleaves to give sticky ends, and so it is possible to obtain ligation between the plasmid and digested DNA fragments in the presence of T4 DNA ligase. The products of this ligation will include plasmid containing a single fragment of the DNA as an insert, but there will also be unwanted products, such as

209

6.3 Cloning vectors

Velvet pad

Replica plate

Incubate

Ampicillin plate with colonies

Tetracycline plate

Only cells with plasmid but without insert grow

Recover colonies containing recombinant plasmid from the ampicillin plate

Fig. 6.12 Replica plating to detect recombinant plasmids. A sterile velvet pad is pressed onto the surface of an agar plate, picking up some cells from each colony growing on that plate. The pad is then pressed on to a fresh agar plate, thus inoculating it with cells in a pattern identical with that of the original colonies. Clones of cells that fail to grow on the second plate (e.g. owing to the loss of antibiotic resistance) can be recovered from their corresponding colonies on the first plate.

plasmid that has recircularised without an insert, dimers of plasmid, fragments joined to each other, and plasmid with an insert composed of more than one fragment. Most of these unwanted molecules can be eliminated during subsequent steps. The products of such reactions are usually identified by agarose gel electrophoresis (Section 5.7.4). The ligated DNA must now be used to transform E. coli. Bacteria do not normally take up DNA from their surroundings, but can be induced to do so by prior treatment with Ca2þ at 4  C; they are then termed competent, since DNA added to the suspension of competent cells will be taken up during a brief increase in temperature termed heat shock. Small, circular molecules are taken up most efficiently, whereas long, linear molecules will not enter the bacteria. After a brief incubation to allow expression of the antibiotic resistance genes the cells are plated onto medium containing the antibiotic, e.g. ampicillin. Colonies that grow on these plates must be derived from cells that contain plasmid, since this carries the gene for resistance to ampicillin. It is not, at this stage, possible to distinguish between those colonies containing plasmids with inserts and those that simply contain recircularised plasmids. To do this, the colonies are replica plated, using a sterile velvet pad, onto plates containing tetracycline in their medium. Since the BamHI site lies within the tetracycline resistance gene, this gene will be inactivated by the presence of insert, but will be intact in those plasmids that have merely recircularised (Fig. 6.12). Thus colonies that grow on ampicillin but not on tetracycline must contain

210

Recombinant DNA and genetic analysis

lacZ β-galactosidase gene

Ampicillin resistance gene

ApR

pUC18 2686 bp

lac I

HindIII Sph I Pst I Hinc II Acc I Sal I BamHI Xma I Sma I Kpn I Sac I EcoRI

Multiple cloning site (MCS) polylinker

Control regions for lacZ

ORI Origin of replication

Fig. 6.13 Map and important features of pUC18.

plasmids with inserts. Since replica plating gives an identical pattern of colonies on both sets of plates, it is straightforward to recognise the colonies with inserts, and to recover them from the ampicillin plate for further growth. This illustrates the importance of a second gene for antibiotic resistance in a vector. Although recircularised plasmid can be selected against, its presence decreases the yield of recombinant plasmid containing inserts. If the digested plasmid is treated with the enzyme alkaline phosphatase prior to ligation, recircularisation will be prevented, since this enzyme removes the 50 phosphate groups that are essential for ligation. Links can still be made between the 50 phosphate of insert and the 30 hydroxyl of plasmid, so only recombinant plasmids and chains of linked DNA fragments will be formed. It does not matter that only one strand of the recombinant DNA is ligated, since the nick will be repaired by bacteria transformed with these molecules. The valuable features of pBR322 have been enhanced by the construction of a series of plasmids termed pUC (produced at the University of California) (Fig. 6.13). There is an antibiotic resistance gene for tetracycline and origin of replication for E. coli. In addition the most popular restriction sites are concentrated into a region termed the multiple cloning site (MCS). In addition the MCS is part of a gene in its own right and codes for a portion of a polypeptide called b-galactosidase. When the pUC plasmid has been used to transform the host cell E. coli the gene may be switched on by adding the inducer IPTG (isopropyl-b-D-thiogalactopyranoside). Its presence causes the enzyme b-galactosidase to be produced (Section 5.5.5). The functional enzyme is able to hydrolyse a colourless substance called X-gal (5-bromo-4-chloro-3-indolylb-galactopyranoside) into a blue insoluble material (5,50 -dibromo-4,40 –dichloro indigo) (Fig. 6.14). However if the gene is disrupted by the insertion of a foreign

211

6.3 Cloning vectors

Non-recombinant vector (no insert) Induce with IPTG MCS

X-gal hydrolysed (white to blue)

β-Galactosidase gene

BLUE plaque

Recombinant vector (insert within MCS) Induce with IPTG X-gal NOT hydrolysed (white) DNA inserted in MCS β-galactosidase gene

WHITE plaque

Fig. 6.14 Principle of blue/white selection for the detection of recombinant vectors.

fragment of DNA, a non-functional enzyme results which is unable to carry out hydrolysis of X-gal. Thus, a recombinant pUC plasmid may be easily detected since it is white or colourless in the presence of X-gal, whereas an intact non-recombinant pUC plasmid will be blue since its gene is fully functional and not disrupted. This elegant system, termed blue/white selection, allows the initial identification of recombinants to be undertaken very quickly and has been included in a number of subsequent vector systems. This selection method and insertional inactivation of antibiotic resistance genes do not, however, provide any information on the character of the DNA insert, just the status of the vector. To screen gene libraries for a desired insert hybridisation to gene probes is required and this is explained in Section 6.5.

6.3.2 Virus-based vectors A useful feature of any cloning vector is the amount of DNA it may accept or have inserted before it becomes unviable. Inserts greater than 5 kb increase plasmid size to the point at which efficient transformation of bacterial cells decreases markedly, and so bacteriophages (bacterial viruses) have been adapted as vectors in order to propagate larger fragments of DNA in bacterial cells. Cloning vectors derived from l bacteriophage are commonly used since they offer an approximately 16-fold advantage in cloning efficiency in comparison with the most efficient plasmid cloning vectors. Phage l is a linear double-stranded phage approximately 49 kb in length (Fig. 6.15). It infects E. coli with great efficiency by injecting its DNA through the cell membrane. In the wild-type phage l the DNA follows one of two possible modes of replication. Firstly the DNA may either become stably integrated into the E. coli chromosome where it lies dormant until a signal triggers its excision. This is termed the lysogenic life cycle. Alternatively, it may follow a lytic life cycle where the DNA is replicated

212

Recombinant DNA and genetic analysis

E.coli cell Bacterial DNA

Bacteriophage λ

Injection of phage DNA

LYTIC PATH

LYSOGENIC PATH Phage DNA

Integration of phage DNA

Bacterial DNA digested New phage synthesised

Early protein synthesis

Prophage

INDUCTION

Late protein synthesis

Host cell lysis

Cell division and replication of bacterial and phage DNA

Fig. 6.15 The lysogenic and lytic cycles of bacteriophage l.

upon entry to the cell, phage head and tail proteins synthesised rapidly and new functional phage assembled. The phage are subsequently released from the cell by lysing the cell membrane to infect further E. coli cells nearby. At the extreme ends of the phage l are 12 bp sequences termed cos (cohesive) sites. Although they are

213

6.3 Cloning vectors

In Vitro Packaging

Single Strain Mix

Double Strain Mix

λ virus has defective cos sites therefore lost ability for viral packaging

λ virus strain produces incomplete capsid protein E, the other produces defective capsid protein D

Capsid protein isolated

Capsid proteins isolated

Fig. 6.16 Two strategies for producing in vitro packaging extracts for bacteriophage l.

asymmetric they are similar to restriction sites and allow the phage DNA to be circularised. Phage may be replicated very efficiently in this way, the result of which are concatemers of many phage genomes which are cleaved at the cos sites and inserted into newly formed phage protein heads. Much use of phage l has been made in the production of gene libraries mainly because of its efficient entry into the E. coli cell and the fact that larger fragments of DNA may be stably integrated. For the cloning of long DNA fragments, up to approximately 25 kb, much of the non-essential l DNA that codes for the lysogenic life cycle is removed and replaced by the foreign DNA insert. The recombinant phage is then assembled into pre-formed viral protein particles, a process termed in vitro packaging. These newly formed phage are used to infect bacterial cells that have been plated out on agar (Fig. 6.16). Once inside the host cells, the recombinant viral DNA is replicated. All the genes needed for normal lytic growth are still present in the phage DNA, and so multiplication of the virus takes place by cycles of cell lysis and infection of surrounding cells, giving rise to plaques of lysed cells on a background, or lawn, of bacterial cells. The viral DNA including the cloned foreign DNA can be recovered from the viruses from these plaques and analysed further by restriction mapping (Section 5.9.1) and agarose gel electrophoresis (Section 5.7.4). In general two types of l phage vectors have been developed, l insertion vectors and l replacement vectors (Fig. 6.17). The l insertion vectors accept less DNA than the replacement type since the foreign DNA is merely inserted into a region of the phage genome with appropriate restriction sites; common examples are lgt10 and lcharon16A. With a replacement vector a central region of DNA not essential for lytic growth is removed (a stuffer fragment) by a double digestion with for example EcoRI and BamHI. This leaves two DNA fragments termed right and left arms. The central stuffer fragment is replaced by inserting foreign DNA between the arms to form a functional recombinant l phage. The most notable examples of l replacement vectors are lEMBL and lZap.

214

Recombinant DNA and genetic analysis

Insertion Vectors (λgt10)

Replacement Vectors (λEMBL4)

EcoRI

EcoRI

CI857 Digest with restriction enzyme

BamHI

CI857 Digest with two restriction enzymes

Remove stuffer fragment

Insert DNA fragment

Insert DNA fragment

Fig. 6.17 General schemes used for cloning in l insertion and l replacement vectors. Cl857 is a temperaturesensitive mutation that promotes lysis at 42  C after incubation at 37  C.

T3 SacI NotI XbaI SpeI EcoRI XhoI promoter COS

T7 promoter

Multiple cloning site in lacZ gene DNA synthesis host lysis genes Cos A–J

CI857

Cos T

Capsid components

I Bluescript phagemid site

Lytic control

COS

Fig. 6.18 General map of lZap cloning vector, indicating important areas of the vector. The multiple cloning site is based on the lacZ gene, providing blue/white selection based on the b-galactosidase gene. In between the initiator (I) site and terminator (T) site lie sequences encoding the phagemid Bluescript.

lZap is a commercially produced cloning vector that includes unique cloning sites clustered into a multiple cloning site (MCS) (Fig. 6.18). Furthermore the MCS is located within a lacZ region providing a blue/white screening system based on insertional inactivation. It is also possible to express foreign cloned DNA from this vector. This is a very useful feature of some l vectors since it is then possible to screen for protein

215

6.3 Cloning vectors

DNA inserted into multiple cloning site

Cos Cos

A–J

CI857 T

Phagemid vector site

I

Infect E.coli

Add helper phage (e.g. M13R408)

Excision of phagemid from λZAP vector

f1(+/–) origin

Ampicillin resistance gene

ApR

Bluescript SK+/– 2.96 kb

lacZ

DNA inserted into multiple cloning site

Col E1 plasmid origin

Fig. 6.19 Single-stranded DNA rescue of phagemid from lZap. The single-stranded phagemid pBluescript SK may be excised from lZap by addition of helper phage. This provides the necessary proteins and factors for transcription between the I and T sites in the parent phage to produce the phagemid with the DNA cloned into the parent vector.

product rather than the DNA inserted into the vector. This screening is therefore undertaken with antibody probes directed against the protein of interest (Section 6.5.4). Other features that make this a useful cloning vector are the ability to produce RNA transcripts termed cRNA or riboprobes. This is possible because two promoters for RNA polymerase enzymes exist in the vector, a T7 and a T3 promoter which flank the MCS (Section 6.4.2). One of the most useful features of lZap is that it has been designed to allow automatic excision in vivo of a small 2.9 kb colony-producing vector termed a phagemid, pBluescript SK (Section 6.3.3). This technique is sometimes termed single-stranded DNA rescue and occurs as the result of a process termed superinfection where helper phage are added to the cells which are grown for an additional period of approximately 4 h (Fig. 6.19).

216

Recombinant DNA and genetic analysis

Reverse sequencing primer

lacZ β-galactosidase gene

Antibiotic resistance gene

ApR

M13mp 7.25 kb

lacI

HindIII SphI PstI HincII AccI SalI BamHI XmaI SmaI KpnI SacI EcoRI

Multiple cloning site polylinker

Forward sequencing primer

Control regions for lacZ ORI Origin of replication

Fig. 6.20 Genetic map and important features of bacteriophage vector M13.

The helper phage displaces a strand within the lZap which contains the foreign DNA insert. This is circularised and packaged as a filamentous phage similar to M13 (Section 6.3.3). The packaged phagemid is secreted from the E. coli cell and may be recovered from the supernatant. Thus the lZap vector allows a number of diverse manipulations to be undertaken without the necessity of recloning or subcloning foreign DNA fragments. The process of subcloning is sometimes necessary when the manipulation of a gene fragment cloned in a general purpose vector needs to be inserted into a more specialised vector for the application of techniques such as in vitro mutagenesis or protein production (Section 6.6).

6.3.3 M13 and phagemid-based vectors Much use has been made of single-stranded bacteriophage vectors such as M13 and vectors which have the combined properties of phage and plasmids, termed phagemids. M13 is a filamentous coliphage with a single-stranded circular DNA genome (Fig. 6.20). Upon infection of E. coli, the DNA replicates initially as a double-stranded molecule but subsequently produces single-stranded virions for infection of further bacterial cells (lytic growth). The nature of these vectors makes them ideal for techniques such as chain termination sequencing (Section 6.6.1) and in vitro mutagenesis (Section 6.6.3) since both require single-stranded DNA. M13 or phagemids such as pBluescript SK infect E. coli harbouring a male-specific structure termed the F-pilus (Fig. 6.21). They enter the cell by adsorption to this structure and once inside the phage DNA is converted to a double-stranded replicative form or

217

6.3 Cloning vectors

M13 adsorbs to E. coli through F-pilus

M13 phage released into medium without lysing E. coli cells

Single-stranded DNA is assembled at periplasm RF

+

+ Rolling circle replication

+

+ strand

Single-stranded DNA enters E. coli cell

Fig. 6.21 Life cycle of bacteriophage M13. The bacteriophage virus enters the E. coli cell through the F-pilus. It then enters a stage where the circular single strands are converted to double strands. Rolling-circle replication then produces single strands, which are packaged and extruded through the E. coli cell membrane.

RF DNA. Replication then proceeds rapidly until some 100 RF molecules are produced within the E. coli cell. DNA synthesis then switches to the production of single strands and the DNA is assembled and packaged into the capsid at the bacterial periplasm. The bacteriophage DNA is then encapsulated by the major coat protein, gene VIII protein, of which there are approximately 2800 copies with three to six copies of the gene III protein at one end of the particle. The extrusion of the bacteriophage through the bacterial periplasm results in a decreased growth rate of the E. coli cell rather than host cell lysis and is visible on a bacterial lawn as an area of clearing. Approximately 1000 packaged phage particles may be released into the medium in one cell division. In addition to producing single-stranded DNA the coliphage vectors have a number of other features that make them attractive as cloning vectors. Since the bacteriophage DNA is replicated as a double-stranded RF DNA intermediate a number of regular DNA manipulations may be performed such as restriction digestion, mapping and DNA ligation. RF DNA is prepared by lysing infected E. coli cells and purifying the supercoiled circular phage DNA with the same methods used for plasmid isolation. Intact single-stranded DNA packaged in the phage protein coat located in the supernatant may be precipitated with reagents such as polyethylene glycol, and the DNA purified with phenol/chloroform (Section 5.7.1). Thus the bacteriophage may act as a plasmid under certain circumstances and at other times produce DNA in the fashion of a virus. A family of vectors derived from M13 are currently widely used termed M13mp8/9,

218

Recombinant DNA and genetic analysis

M13 Multiple Cloning Site/Polylinker

HindIII Pst I Hinc II Acc I Sal I BamH I Xma I Sma I EcoR I

EcoR I Sma I Xma I BamH I Sal I Acc I Hinc II Pst I Hind III

mp8

mp9

Hind III Pst I Hinc II Acc I Sal I Xba I BamH I Xma I Sma I Sst I EcoR I

EcoR I Sst I Sma I Xma I Bam HI Xba I Sal I Acc I Hinc II Pst I Hind III

mp12

mp13

Hind III Sph I Pst I Hinc II Acc I Sal I Xba I BamHI Xma I Sma I Kpn I Sst I EcoRI

EcoRI Sst I Kpn I Sma I Xma I BamHI Xba I Sal I Acc I Hinc II Pst I Sph I Hind III

mp18

mp19

Fig. 6.22 Design and orientation of polylinkers in M13 series. Only the main restriction enzymes are indicated.

mp18/19, etc., all of which have a number of highly useful features. All contain a synthetic MCS, which is located in the lacZ gene without disruption of the reading frame of the gene. This allows efficient selection to be undertaken based on the technique of blue/white screening (Section 6.3.1). As the series of vectors were developed the number of restriction sites was increased in an asymmetric fashion. Thus M13mp8, mp12, mp18 and sister vectors which have the same MCS but in reverse orientation, M13mp9, mp13 and mp19 respectively have more restriction sites in the MCS making the vector more useful since greater choice of restriction enzymes is available (Fig. 6.22). However, one problem frequently encountered with M13 is the instability and spontaneous loss of inserts that are greater than 6 kb. Phagemids are very similar to M13 and replicate in a similar fashion. One of the first phagemid vectors, pEMBL, was constructed by inserting a fragment of another phage termed f1 containing a phage origin of replication and elements for its morphogenesis into a pUC8 plasmid. Following superinfection with helper phage the f1 origin is activated allowing single-stranded DNA to be produced. The phage is assembled into a phage coat extruded through the periplasm and secreted into the culture medium in a similar way to M13. Without superinfection the phagemid replicates as a pUC type plasmid and in the replicative form (RF) the DNA isolated is double-stranded. This allows further manipulations such as restriction digestion, ligation and mapping analysis to be performed. The pBluescript SK vector is also a phagemid and can be used in its own right as a cloning vector and manipulated as if it were a plasmid. It may, like M13, be used in nucleotide sequencing and site-directed mutagenesis, and it is also possible to produce RNA transcripts that may be used in the production of labelled cRNA probes or riboprobes (Section 6.4.2).

6.3.4 Cosmid based vectors The way in which the phage l DNA is replicated is of particular interest in the development of larger insert cloning vectors termed cosmids (Fig. 6.23). These are

219

6.3 Cloning vectors

Antibiotic resistance gene

Foreign DNA to be cloned Cosmid

cos sites Restriction site

Digest cosmid and foreign DNA with same restriction enzyme

Ligation (fragments inserted randomly between cosmids)

Packaging in vitro only if cos–cos distance is in range 37 to 52 kb

Infection by phage of E. coli (selection for antibiotic resistant colonies)

Recombinant cosmid DNA recircularises via cos sites and replicates as a plasmid within E. coli Chromosomal DNA E. coli bacterium

Fig. 6.23 Scheme for cloning foreign DNA fragments in cosmid vectors.

especially useful for the analysis of highly complex genomes and are an important part of various genome mapping projects (Section 6.9). The upper limit of the insert capacity of phage l is approximately 21 kb. This is because of the requirement for essential genes and the fact that the maximum length

220

Recombinant DNA and genetic analysis

between the cos sites is 52 kb. Consequently cosmid vectors have been constructed that incorporate the cos sites from phage l and also the essential features of a plasmid, such as the plasmid origin of replication, a gene for drug resistance, and several unique restriction sites for insertion of the DNA to be cloned. When a cosmid preparation is linearised by restriction digestion, and ligated to DNA for cloning, the products will include concatamers of alternating cosmid vector and insert. Thus the only requirement for a length of DNA to be packaged into viral heads is that it should contain cos sites spaced the correct distance apart; in practice this spacing can range between 37 and 52 kb. Such DNA can be packaged in vitro if phage head precursors, tails and packaging proteins are provided. Since the cosmid is very small, inserts of about 40 kb in length will be most readily packaged. Once inside the cell, the DNA recircularises through its cos sites, and from then onwards behaves exactly like a plasmid.

6.3.5 Large insert capacity vectors The advantage of vectors that accept larger fragments of DNA than phage l or cosmids is that fewer clones need to be screened when searching for the foreign DNA of interest. They have also had an enormous impact in the mapping of the genomes of organisms such as the mouse and are used extensively in the human genome mapping project (Section 6.9.3). Recent developments have allowed the production of large insert capacity vectors based on human artificial chromosomes, bacterial artificial chromosomes (BACs), mammalian artificial chromosomes (MACs) and on the virus P1 (PACs), P1 artificial chromosomes. However, perhaps the most significant development are vectors based on yeast artificial chromosomes (YACs).

6.3.6 Yeast artificial chromosome (YAC) vectors Yeast artificial chromosomes (YACs) are linear molecules composed of a centromere, telomere and a replication origin termed an ARS element (autonomous replicating sequence). The YAC is digested with restriction enzymes at the SUP4 site (a suppressor tRNA gene marker) and BamHI sites separating the telomere sequences (Fig. 6.24). This produces two arms and the foreign genomic DNA is ligated to produce a functional YAC construct. YACs are replicated in yeast cells; however, the external cell wall of the yeast needs to be removed to leave a spheroplast. These are osmotically unstable and also need to be embedded in a solid matrix such as agar. Once the yeast cells are transformed only correctly constructed YACs with associated selectable markers are replicated in the yeast strains. DNA fragments with repeat sequences are sometimes difficult to clone in bacterial-based vectors but may be successfully cloned in YAC systems. The main advantage of YAC-based vectors, however, is the ability to clone very large fragments of DNA. Thus the stable maintenance and replication of foreign DNA fragments of up to 2000 kb have been carried out in YAC vectors and they are the main vector of choice in the various genome mapping and sequencing projects (Section 6.9).

221

6.3 Cloning vectors

TRP-ori-CEN SmaI SUP4 pYAC2 URA3

TEL BamHI

TEL BamHI

DNA inserted in SmaI site

Left arm (TRP-ori-CEN)

Digest with BamHI remove DNA fragment

DNA to be inserted

Right arm (URA3)

Yeast Artificial Chromosome Construct

Transformation Saccharomyces cerevisiae

Fig. 6.24 Scheme for cloning large fragments of DNA into YAC vectors.

6.3.7 Vectors used in eukaryotes The use of E. coli for general cloning and manipulation of DNA is well established; however, numerous developments have been made for cloning in eukaryotic cells. Plasmids used for cloning DNA in eukaryotic cells require a eukaryotic origin of replication and marker genes that will be expressed by eukaryotic cells. At present the two most important applications of plasmids to eukaryotic cells are for cloning in yeast and in plants. Although yeast has a natural plasmid, called the 2m circle, this is too large for use in cloning. Plasmids such as the yeast episomal plasmid (YEp) have been created by genetic manipulation using replication origins from the 2m circle, and by incorporating a gene which will complement a defective gene in the host yeast cell. If, for example, a strain of yeast is used which has a defective gene for the biosynthesis of an amino acid, an active copy of that gene on a yeast plasmid can be used as a selectable marker for the presence of that plasmid. Yeast, like bacteria, can be grown rapidly, and it is therefore well suited for use in cloning. Of particular use has been the creation of shuttle vectors which have origins of replication for yeast and bacteria such as E. coli.

222

Recombinant DNA and genetic analysis

Ti plasmid Plant cell

Agropine synthesis T DNA

ocs

arc tra

pTiCch5

vir genes ori

Transform plant cell

Octapine catabolism Nucleus

Agropine catabolism

Plate onto agar

Transformed callus

Integration of Ti DNA with plant genome

Transfer to medium + hormones

Transformed shoots Transformed plants

Fig. 6.25 Scheme for cloning in plant cells using the Ti plasmid.

This means that constructs may be prepared rapidly in the bacteria and delivered into yeast for expression studies. The bacterium Agrobacterium tumefaciens infects plants that have been damaged near soil level, and this infection is often followed by the formation of plant tumours in the vicinity of the infected region. It is now known that A. tumefaciens contains a plasmid called the Ti plasmid, part of which is transferred into the nuclei of plant cells which are infected by the bacterium. Once in the nucleus, this DNA is maintained by integrating with the chromosomal DNA. The integrated DNA carries genes for the synthesis of opines (which are metabolised by the bacteria but not by the plants) and for tumour induction (hence ‘Ti’). DNA inserted into the correct region of the Ti plasmid will be transferred to infected plant cells, and in this way it has been possible to clone and express foreign genes in plants (Fig. 6.25). This is an essential prerequisite for the genetic engineering of crops.

6.3.8 Delivery of vectors into eukaryotes Following the production of a recombinant molecule, the so-called constructs are subsequently introduced into cells to enable it to be replicated a large number of times as the cells replicate. Initial recombinant DNA experiments were performed in bacterial

223

6.4 Hybridisation and gene probes

cells, because of their ease of growth and short doubling time. Gram-negative bacteria such as E. coli can be made competent for the introduction of extraneous plasmid DNA into cells (Section 6.3.1). The natural ability of bacteriophage to introduce DNA into E. coli has also been well exploited and results in 10–100-fold higher efficiency for the introduction of recombinant DNA compared to transformation of competent bacteria with plasmids. These well-established and traditional approaches are the reason why so many cloning vectors have been developed for E. coli. The delivery of cloning vectors into eukaryotic cells is, however, not as straightforward as that for the bacterium E. coli. It is possible to deliver recombinant molecules into animal cells by transfection. The efficiency of this process can be increased by first precipitating the DNA with Ca2þ or making the membrane permeable with divalent cations. High-molecularweight polymers such as DEAE-dextran or polyethylene glycol (PEG) may also be used to maximise the uptake of DNA. The technique is rather inefficient although a selectable marker that provides resistance to a toxic compound such as neomycin can be used to monitor the success. Alternatively, DNA can be introduced into animal cells by electroporation. In this process the cells are subjected to pulses of a high-voltage gradient, causing many of them to take up DNA from the surrounding solution. This technique has proved to be useful with cells from a range of animal, plant and microbial sources. More recently the technique of lipofection has been used as the delivery method. The recombinant DNA is encapsulated by a core of lipid-coated particles which fuse with the lipid membrane of cells and thus release the DNA into the cell. Microinjection of DNA into cell nuclei of eggs or embryos has also been performed successfully in many mammalian cells. The ability to deliver recombinant molecules into plant cells is not without its problems. Generally the outer cell wall of the plant must be stripped, usually by enzymatic digestion, to leave a protoplast. The cells are then able to take up recombinants from the supernatant. The cell wall can be regenerated by providing appropriate media. In cases where protoplasts have been generated transformation may also be achieved by electroporation. An even more dramatic transformation procedure involves propelling microscopically small titanium or gold pellet microprojectiles coated with the recombinant DNA molecule, into plant cells in intact tissues. This biolistic technique involves the detonation of an explosive charge which is used to propel the microprojectiles into the cells at a high velocity. The cells then appear to reseal themselves after the delivery of the recombinant molecule. This is a particularly promising technique for use with plants whose protoplasts will not regenerate whole plants.

6.4 HYBRIDISATION AND GENE PROBES 6.4.1 Cloned DNA probes The increasing accumulation of DNA sequences in nucleic acid databases coupled with the availability of custom synthesis of oligonucleotides has provided a relatively straightforward means to design and produce gene probes and primers for PCR. Such

224

Recombinant DNA and genetic analysis

Vector containing gene probe

Ti

T7

Vector containing gene probe is linearised by restriction digestion

f1 ori T3 RNA polymerase added and transcribes gene probe sequence

phagemid

T3

Resistance marker colE1

Labelled dNTP Labelled cRNA probes (riboprobes) synthesised Unlabelled dNTPs

Fig. 6.26 Production of cRNA (riboprobes) using T3 RNA polymerase and phagemid vectors.

probes and primers are usually designed with bioinformatics software using sequence information from nucleic acid databases. Alternatively, gene family related sequences as indicated in Section 5.11 may also be successfully employed. However, there are many gene probes that have traditionally been derived from cDNA or from genomic sequences and which have been cloned into plasmid and phage vectors. These require manipulation before they may be labelled and used in hybridisation experiments. Gene probes may vary in length from 100 bp to a number of kilobases, although this is dependent on their origin. Many are short enough to be cloned into plasmid vectors and are useful in that they may be manipulated easily and are relatively stable both in transit and in the laboratory. The DNA sequences representing the gene probe are usually excised from the cloning vector by digestion with restriction enzymes and purified. In this way vector sequences which may hybridise non-specifically and cause high background signals in hybridisation experiments are removed. There are various ways of labelling DNA probes and these are described in Section 5.9.4.

6.4.2 RNA gene probes It is also possible to prepare cRNA probes or riboprobes by in vitro transcription of gene probes cloned into a suitable vector. A good example of such a vector is the phagemid pBluescript SK since at each end of the multiple cloning site where the cloned DNA fragment resides are promoters for T3 or T7 RNA polymerase (Section 6.3.3). The vector is then made linear with a restriction enzyme and T3 or T7 RNA polymerase is used to transcribe the cloned DNA fragment. Provided a labelled NTP is added in the reaction a riboprobe labelled to a high specific activity will be produced (Fig. 6.26). One advantage of riboprobes is that they are single stranded and their sensitivity is generally regarded as

225

6.5 Screening gene libraries

superior to cloned double-stranded probes indicated in Section 6.4.1. They are used extensively in in situ hybridisation and for identifying and analysing mRNA and are described in more detail in Section 6.8.

6.5 SCREENING GENE LIBRARIES 6.5.1 Colony and plaque hybridisation Once a cDNA or genomic library has been prepared the next task requires the identification of the specific fragment of interest. In many cases this may be more problematic than the library construction itself since many hundreds of thousands of clones may be in the library. One clone containing the desired fragment needs to be isolated from the library and therefore a number of techniques mainly based on hybridisation have been developed. Colony hybridisation is one method used to identify a particular DNA fragment from a plasmid gene library (Fig. 6.27). A large number of clones are grown up to form colonies on one or more plates, and these are then replica plated onto nylon membranes placed on solid agar medium. Nutrients diffuse through the membranes and allow colonies to grow on them. The colonies are then lysed, and liberated DNA is denatured and bound to the membranes, so that the pattern of colonies is replaced by an identical pattern of bound DNA. The membranes are then incubated with a prehybridisation mix containing nonlabelled non-specific DNA such as salmon sperm DNA to block non-specific sites. Following this denatured, labelled gene probe is added. Under hybridising conditions the probe will bind only to cloned fragments containing at least part of its corresponding gene (Section 5.9.3). The membranes are then washed to remove any unbound probe and the binding detected by autoradiography of the membranes. If non-radioactive labels have been used then alternative methods of detection must be employed (Section 5.9.4). By comparison of the patterns on the autoradiograph with the original plates of colonies, those that contain the desired gene (or part of it) can be identified and isolated for further analysis. A similar procedure is used to identify desired genes cloned into bacteriophage vectors. In this case the process is termed plaque hybridisation. It is the DNA contained in the bacteriophage particles found in each plaque that is immobilised on to the nylon membrane. This is then probed with an appropriately labelled complementary gene probe and the detection undertaken as for colony hybridisation.

6.5.2 PCR screening of gene libraries In many cases it is now possible to use the PCR to screen cDNA or genomic libraries constructed in plasmids or bacteriophage vectors. This is usually undertaken with primers which anneal to the vector rather than the foreign DNA insert. The size of an amplified product may be used to characterise the cloned DNA and subsequent restriction mapping is then carried out (Fig. 6.28). The main advantage of the PCR over traditional hybridisation based screening is the rapidity of the technique, as PCR

226

Recombinant DNA and genetic analysis

Nylon membrane

Master plate

Grow bacterial colonies harbouring plasmid

Colonies defined by autoradiography are used for further analysis

Nylon membrane with colonies

Nylon membrane mixed with prehybridisation buffer in a sealed bag

Prehybridisation solution replaced with complementary labelled DNA probe Wash unbound probe

Nylon membrane exposed to X-ray film

Autoradiography defines clones identified by probe

Fig. 6.27 Colony hybridisation technique for locating specific bacterial colonies harbouring recombinant plasmid vectors containing desired DNA fragments. This is achieved by hybridisation to a complementary labelled DNA probe and autoradiography.

227

6.5 Screening gene libraries

Forward primer

Non-recombinant

M13 multiple cloning site Reverse primer

Agarose gel electrophoresis

225 bp 125 bp

Recombinant Forward primer M13 multiple

cloning site

100 bp DNA insert Reverse primer

Fig. 6.28 PCR screening of recombinant vectors. In this figure, the M13 non-recombinant has no insert and so the PCR undertaken with forward and reverse sequencing primers gives rise to a product 125 bp in length. The M13 recombinant with an insert of 100 bp will give rise to a PCR product of 125 bp þ 100 bp ¼ 225 bp and thus may be distinguished from the non-recombinant by analysis on agarose gel electrophoresis.

screening may be undertaken in 3–4 h whereas it may be several days before detection by hybridisation is achieved. The PCR screening technique gives an indication of the size of the cloned insert rather than the sequence of the insert; however, PCR primers that are specific for a foreign DNA insert may also be used. This allows a more rigorous characterisation of clones from cDNA and genomic libraries.

6.5.3 Hybrid select/arrest translation The difficulty of characterising clones and detecting a desired DNA fragment from a mixed cDNA library may be made simpler by two useful techniques termed hybrid select (release) translation or hybrid arrest translation. Following the preparation of a cDNA library in a plasmid vector the plasmid is extracted from part of each colony, and each preparation is then denatured and immobilised on a nylon membrane (Fig. 6.29). The membranes are soaked in total cellular mRNA, under stringent conditions (usually a temperature only a few degrees below Tm) in which hybridisation will occur only between complementary strands of nucleic acid. Hence each membrane will bind just one species of mRNA, since it has only one type of cDNA immobilised on it. Unbound mRNA is washed off the membranes, and then the bound mRNA is eluted and used to direct in vitro translation (Section 6.7). By immunoprecipitation or electrophoresis of the protein, the mRNA coding for a particular protein can be detected, and the clone containing its corresponding cDNA isolated. This technique is known as hybrid release translation. In a related method called

228

Recombinant DNA and genetic analysis

Colonies from cDNA library on agar plate

Isolate DNA from each clone, denature and bind to nylon membranes

Add total mRNA under hybridising conditions Wash off unbound mRNA; single species is bound to each membrane

Elute bound mRNA

Use mRNA to direct in vitro protein synthesis

Test for production of desired protein by procedures such as immunoprecipitation/electrophoresis

Fig. 6.29 General principles involved in the technique of a hybrid select translation.

229

6.6 Applications of gene cloning

hybrid arrest translation a positive result is indicated by the absence of a particular translation product when total mRNA is hybridised with excess cDNA. This is a consequence of the fact that mRNA cannot be translated when it is hybridised to another molecule.

6.5.4 Screening expression cDNA libraries In some cases the protein for which the gene sequence is required is partially characterised and in these cases it may be possible to produce antibodies to that protein. This allows immunological screening to be undertaken rather than gene hybridisation. Such antibodies are useful since they may be used as the probe if little or no gene sequence is available. In these cases it is possible to prepare a cDNA library in a specially adapted vector termed an expression vector which transcribes and translates any cDNA inserted into it. The protein is usually synthesised as a fusion with another protein such as b-galactosidase. Common examples of expression vectors are those based on bacteriophage such as lgt11 and lZap or plasmids such as pEX. The precise requirements for such vectors are identical to vectors which are dedicated to producing proteins in vitro and are described in Section 6.7.1. In some cases expression vectors incorporate inducible promoters which may be activated by for example increasing the temperature allowing stringent control of expression of the cloned cDNA molecules (Fig. 6.30). The cDNA library is plated out and nylon membrane filters prepared as for colony/ plaque hybridisation. A solution containing the antibody to the desired protein is then added to the membrane. The membrane is then washed to remove any unbound protein and a further labelled antibody which is directed to the first antibody is applied. This allows visualisation of the plaque or colony that contains the cloned cDNA for that protein and this may then be picked from the agar plate and pure preparations grown for further analysis.

6.6 APPLICATIONS OF GENE CLONING 6.6.1 Sequencing cloned DNA Most of the DNA sequencing now undertaken is based on the use of PCR products as the template; however, DNA fragments, including PCR products cloned into plasmid vectors, may be subjected to the chain termination sequencing (Section 5.9.5). However, due to the double-stranded nature of plasmids further manipulation needs to be undertaken before this may be attempted. In these cases the plasmids are denatured usually by alkali treatment. Although the plasmids containing the foreign DNA inserts may reanneal the kinetics of the reaction is such that the strands are single-stranded for a long enough period of time to allow the sequencing method to succeed. It is also possible to include denaturants such as formamide in the reaction to further prevent reannealing. In general, however, superior results may be gained with sequencing single-stranded DNA from M13 or single-stranded phagemids which means that the cloned DNA of interest is usually subcloned into these vectors. A further modification

230

Recombinant DNA and genetic analysis

Recombinant λgt11 vector cDNA insert β-Galactosidase lac promoter

In vitro packaging Plate on bacterial lawn Induce production of fusion protein-cDNA

Master plate

Pick plaque for further analysis

Overlay nylon filter

Incubate filter with primary antibody Wash filter Incubate filter with labelled secondary antibody

Detection of specific antibody Identification of cDNA in specific plaque

Fig. 6.30 Screening of cDNA libraries in expression vector lgt11. The cDNA inserted upstream from the gene for b-galactosidase will give rise to a fusion protein under induction (e.g. with IPTG). The plaques are then blotted onto a nylon membrane filter and probed with an antibody specific for the protein coded for by the cDNA. A secondary labelled antibody directed to the specific antibody can then be used to identify the location (plaque) of the cDNA.

231

6.6 Applications of gene cloning

that makes M13 useful in chain termination sequencing is the placement of universal priming sites at 20 or 40 bases from the start of the MCS. This allows any gene to be sequenced by using one universal primer since annealing of the primer prior to sequencing occurs outside the MCS and so is M13-specific rather than gene-specific. This obviates the need to synthesise new oligonucleotide primers for each new foreign DNA insert. A further, reverse priming site is also located at the opposite end of the polylinker allowing sequencing in the opposite orientation to be undertaken.

6.6.2 In vitro mutagenesis One of the most powerful developments in molecular biology has been the ability to artificially create defined mutations in a gene and analyse the resulting protein following in vitro expression. Numerous methods are now available for producing site-directed mutations many of which now involve the PCR. Commonly termed protein engineering, this process involves a logical sequence of analytical and computational techniques centred around a design cycle. This includes the biochemical preparation and analysis of proteins, the subsequent identification of the gene encoding the protein and its modification. The production of the modified protein and its further biochemical analysis completes the concept of rational redesign to improve a protein’s structure and function (Fig. 6.31). The use of design cycles and rational design systems are exemplified by the study and manipulation of subtilisin. This is a serine protease of broad specificity and of considerable industrial importance being used in soap powder and in the food and leather industries. Protein engineering has been used to alter the specificity, pH profile and stability to oxidative, thermal and alkaline inactivation. Analysis of homologous thermophiles and their resistance to oxidation has also been improved. Engineered subtilisins of improved bleach resistance and wash performance are now used in many brands of washing powders. Furthermore mutagenesis has played an important role in the re-engineering of important therapeutic proteins such as the Herceptin antibody which has been used to successfully treat certain types of breast cancer.

6.6.3 Oligonucleotide-directed mutagenesis The traditional method of site-directed mutagenesis demands that the gene be already cloned or subcloned into a single-stranded vector such as M13. Complete sequencing of the gene is essential to identify a potential region for mutation. Once the precise base change has been identified an oligonucleotide is designed that is complementary to part of the gene but has one base difference. This difference is designed to alter a particular codon, which, following translation, gives rise to a different amino acid and hence may alter the properties of the protein. The oligonucleotide and the single-stranded DNA are annealed and DNA polymerase is added together with the dNTPs. The primer for the reaction is the 30 end of the oligonucleotide. The DNA polymerase produces a new complementary DNA strand to the existing one but which incorporates the oligonucleotide with the base mutation. The subsequent cloning of the recombinant produces multiple copies, half of which

232

Recombinant DNA and genetic analysis

Protein purification

Protein assay Protein production

Copurification of ligand and protein

Design Cycle Protein Engineering

Site-directed mutations

Protein redesign & modelling

Analysis techniques in solution

Protein crystallography

Molecular modelling

Comparison with known protein databases

Fig. 6.31 Protein design cycle used in the rational redesign of proteins and enzymes.

contain a sequence with the mutation and half contain the wild-type sequence. Plaque hybridisation using the oligonucleotide as the probe is then used at a stringency that allows only those plaques containing a mutated sequence to be identified (Fig. 6.32). Further methods have also been developed which simplify the process of detecting the strands with the mutations.

6.6.4 PCR-based mutagenesis The PCR has been adapted to allow mutagenesis to be undertaken and this relies on single bases mismatched between one of the PCR primers and the target DNA to become incorporated into the amplified product following thermal cycling. The basic PCR mutagenesis system involves the use of two primary PCR reactions to produce two overlapping DNA fragments both bearing the same mutation in the overlap region. The technique is termed overlap extension PCR. The two separate PCR products are made single-stranded and the overlap in sequence allows the products

233

6.6 Applications of gene cloning

Oligonucleotide primer with predefined mutation Single-stranded DNA template (M13) DNA polymerase

Anneal primer to template

Transform E. coli

Complementary strand synthesis with primer incorporated (with mutation)

Wild-type original strand

Strand with new mutation

Fig. 6.32 Oligonucleotide-directed mutagenesis. This technique requires a knowledge of nucleotide sequence, since an oligonucleotide may then be synthesised with the base mutation. Annealing of the oligonucleotide to complementary (except for the mutation) single-stranded DNA provides a primer for DNA polymerase to produce a new strand and thus incorporates the primer with the mutation.

from each reaction to hybridise. Following this, one of the two hybrids bearing a free 30 hydroxyl group is extended to produce a new duplex fragment. The other hybrid with a 50 hydroxyl group cannot act as substrate in the reaction. Thus, the overlapped and extended product will now contain the directed mutation (Fig. 6.33). Deletions and insertions may also be created with this method although the requirements of four primers and three PCR reactions limits the general applicability of the technique. A modification of the overlap extension PCR may also be used to construct directed mutations; this is termed megaprimer PCR. This method utilises three oligonucleotide primers to perform two rounds of PCR. A complete PCR product, the megaprimer is made single-stranded and this is used as a large primer in a further PCR reaction with an additional primer. The above are all methods for creating rational defined mutations as part of a design cycle system. However it is also possible to introduce random mutations into a gene and select for enhanced or new activities of the protein or enzyme it encodes. This accelerated form of artificial molecular evolution may be undertaken using

234

Recombinant DNA and genetic analysis

A

A

B

C PCR

PCR 5

5

3

3

Denature and anneal PCR products

5 5

3 3

3 3 PCR overlap fragments with 3 ends with primers B and C

Fig. 6.33 Construction of a synthetic DNA fragment with a predefined mutation using overlap PCR mutagenesis.

error-prone PCR where deliberate and random mutations are introduced by a lowfidelity PCR amplification reaction. The resulting amplified gene is then translated and its activity assayed. This has already provided novel evolved enzymes such as a p-nitrobenzyl esterase which exhibits an unusual and surprising affinity for organic solvents. This accelerated evolutionary approach to protein engineering has been useful in the production of novel phage displayed antibodies and in the development of antibodies with enzymic activities (catalytic antibodies).

6.7 EXPRESSION OF FOREIGN GENES One of the most useful applications of recombinant DNA technology is the ability to artificially synthesise large quantities of natural or modified proteins in a host cell such as bacteria or yeast. The benefits of these techniques have been enjoyed for many years since the first insulin molecules were cloned and expressed in 1982 (Table 6.3). Contamination of other proteins such as the blood product factor VIII with infectious agents has also increased the need to develop effective vectors for in vitro expression

235

6.7 Expression of foreign genes

Table 6.3 A number of recombinant DNA-derived human therapeutic reagents Therapeutic area

Recombinant product

Drugs

Erythropoietin Insulin Growth hormone Coagulation factors (e.g. factor VIII) Plasminogen activator

Vaccines

Hepatitis B

Cytokines/growth factors

GM-CSF G-CSF Interleukins Interferons

Notes: GM-CSF, granulocyte–macrophage colony-stimulating factor; G-CSF, granulocyte colony-stimulating factor.

Promoter

RBS

Coding sequence

R

T

–35

–10

Start

Stop

Prokaryotic expression vector

ori

Antibiotic resistance gene

Fig. 6.34 Components of a typical prokaryotic expression vector. To produce a transcript (coding sequence) and translate it, a number of sequences in the vector are required. These include the promoter and ribosome-binding site (RBS). The activity of the promoter may be modulated by a regulatory gene (R), which acts in a way similar to that of the regulatory gene in the lac operon. T indicates a transcription terminator.

of foreign genes. In general the expression of foreign genes is carried out in specialised cloning vectors (Fig. 6.34). However it is possible to use cell-free transcription and translation systems that direct the synthesis of proteins without the need grow and maintain cells. In vitro translation is carried out with the appropriate amino acids, ribosomes, tRNA molecules and isolated mRNA fractions. Wheat germ extracts or rabbit reticulocyte lysates are usually the systems of choice for in vitro translation. The resulting

236

Recombinant DNA and genetic analysis

Primer 1 T7 promoter

Restriction RBS cloning site

Primer 2

Start codon

Restriction cloning site

UTR

Stop codon

5 3

3 5 Amplify gene by PCR with primers 1 and 2

PCR product with potential for protein production

Transcription and translation

Fig. 6.35 Expression PCR (E-PCR). This technique amplifies a target sequence with one promoter that contains a transcriptional promoter, ribosome binding site (RBS), untranslated leader region (UTR) and start codon. The other primer contains a stop codon. The amplified PCR products may be used in transcription and translation to produce a protein.

proteins may be detected by polyacrylamide gel electrophoresis or by immunological detection using western blotting. Recently oligonucleotide PCR primers have been designed to incorporate a promoter for RNA polymerase and a ribosome-binding site. When the so-called expression PCR (E-PCR) is carried out the amplified products are denatured and transcribed by RNA polymerase after which they are translated in vitro. The advantage of this system is that large amounts of specific RNA are synthesised thus increasing the yield of specific proteins (Fig. 6.35).

6.7.1 Production of fusion proteins For a foreign gene to be expressed in a bacterial cell, it must have particular promoter sequences upstream of the coding region, to which the RNA polymerase will bind prior to transcription of the gene. The choice of promoter is vital for correct and efficient transcription since the sequence and position of promoters are specific to a particular host such as E. coli (Section 5.5.4). It must also contain a ribosome-binding site, placed just before the coding region. Unless a cloned gene contains both of these sequences, it will not be expressed in a bacterial host cell. If the gene has been produced via cDNA from a eukaryotic cell, then it will certainly not have any such sequences. Consequently, expression vectors have been developed which contain promoter and ribosome-binding sites positioned just before one or more restriction

237

6.7 Expression of foreign genes

sites for the insertion of foreign DNA. These regulatory sequences, such as that from the lac operon of E. coli, are usually derived from genes that, when induced, are strongly expressed in bacteria. Since the mRNA produced from the gene is read as triplet codons, the inserted sequence must be placed so that its reading frame is in phase with the regulatory sequence. This can be ensured by the use of three vectors which differ only in the number of bases between promoter and insertion site, the second and third vectors being respectively one and two bases longer than the first. If an insert is cloned in all three vectors then in general it will subsequently be in the correct reading frame in one of them. The resulting clones can be screened for the production of a functional foreign protein (Section 6.5.4). In some cases the protein is expressed as a fusion with a general protein such as b-galactosidase or glutathione-S-transferase (GST) to facilitate its recovery. It may also be tagged with a moiety such as a polyhistidine (6His-Tag) which binds strongly to a nickel-chelate-nitrilotriacetate (Ni-NTA) chromatography column. The usefulness of this method is that the binding is independent of the three-dimensional structure of the 6His-tag and so recovery is efficient even under strong denaturing conditions, often required for membrane proteins and inclusion bodies (Fig. 6.36). The tags are subsequently removed by cleavage with a reagent such as cyanogen bromide and the protein of interest purified by protein biochemical methods such as chromatography and polyacrylamide gel electrophoresis. It is not only possible, but usually essential, to use cDNA instead of a eukaryotic genomic DNA to direct the production of a functional protein by bacteria. This is because bacteria are not capable of processing RNA to remove introns, and so any foreign genes must be pre-processed as cDNA if they contain introns. A further problem arises if the protein must be glycosylated, by the addition of oligosaccharides at specific sites, in order to become functional. Although the use of bacterial expression systems is somewhat limited for eukaryotic systems there are a number of eukaryotic expression systems based on plant, mammalian, insect and yeast cells. These types of cells can perform such posttranslational modifications, producing a correct glycosylation pattern and in some cases the correct removal of introns. It is also possible to include a signal or address sequence at the 50 end of the mRNA which directs the protein to a particular cellular compartment or even out of the cell altogether into the supernatant. This makes the recovery of expressed recombinant proteins much easier since the supernatant may be drawn off while the cells are still producing protein. One useful eukaryotic expression system is based on the monkey COS cell line. These cells each contain a region derived from a mammalian monkey virus termed simian virus 40 (SV40). A defective region of the SV40 genome has been stably integrated into the COS cell genome. This allows the expression of a protein termed the large T antigen which is required for viral replication. When a recombinant vector having the SV40 origin of replication and carrying foreign DNA is inserted into the COS cells viral replication takes place. This results in a high level expression of foreign proteins. The disadvantage of this system is the ultimate lysis of the COS cells and limited insert capacity of the vector. Much interest is also currently focussed on other modified viruses, vaccinia virus and baculovirus. These have been developed for highlevel expression in mammalian cells and insect cells respectively. The vaccinia virus

238

Recombinant DNA and genetic analysis

Expression of protein fused with 6  His-Tag

His

His

His

His

His

His

His

His His Nickel-chelate-nitrilotriacetate (Ni-NTA) chromatography column

His His His

Protein-His-Tag cleavage

His

His

His

His

His

His

His

His

His

Recovery and purification of protein

Fig. 6.36 Recovery of proteins using (6  His-Tag) and (Ni-NTA) chromatography columns.

in particular has been used to correct the defective ion transport by introducing a wild-type cystic fibrosis gene into cells bearing a mutated cystic fibrosis (CFTR) gene. There is no doubt that the further development of these vector systems will enhance eukaryotic protein expression in the future.

6.7.2 Phage display techniques As a result of the production of phagemid vectors and as a means of overcoming the problems of screening large numbers of clones generated from genomic libraries of antibody genes, a method for linking the phenotype or expressed protein with the genotype has been devised. This is termed phage display, since a functional protein is

239

6.7 Expression of foreign genes

Amplify DNA sequence by PCR Clone fragment into phage display vector

Coat protein III Coat protein VIII

PCR fragment Phage surface display vector

f1 origin

Transform E.coli with construct Superinfect E.coli with helper phage

Expression of phage gene III–insert produces a fusion, coat protein III–protein During phage assembly protein is displayed on the surface whilst its DNA is phage encoded

Fig. 6.37 Flow diagram indicating the main steps in the phage display technique.

linked to a major coat protein of a coliphage whilst the single-stranded gene encoding the protein is packaged within the virion. The initial steps of the method rely on the PCR to amplify gene fragments that represent functional domains or subunits of a protein such as an antibody. These are then cloned into a phage display vector which is an adapted phagemid vector (Section 6.3.3) and used to transform E. coli. A helper phage is then added to provide accessory proteins for new phage molecules to be constructed. The DNA fragments representing the protein or polypeptide of interest are also transcribed and translated, but linked to the major coat protein g III. Thus when the phage is assembled the protein or polypeptide of interest is incorporated into the coat of the phage and displayed, whilst the corresponding DNA is encapsulated (Fig. 6.37). There are numerous applications for the display of proteins on the surface of bacteriophage viruses, bacteria and other organisms, and commercial organisations have been quick to exploit this technology. One major application is the analysis and production of engineered antibodies from which the technology was mainly developed. In general phage-based systems have a number of novel applications in terms of ease of selection rather than screening of antibody fragments, allowing analysis by methods such as affinity chromatography. In this way it is possible to generate large numbers of antibody heavy and light chain genes by PCR amplification and mix them in a random fashion. This recombinatorial library approach may allow new or novel partners to be formed

240

Recombinant DNA and genetic analysis

as well as naturally existing ones. This strategy is not restricted to antibodies and vast libraries of peptides may be used in this combinatorial chemistry approach to identify novel compounds of use in biotechnology and medicine. Phage-based cloning methods also offer the advantage of allowing mutagenesis to be performed with relative ease. This may allow the production of antibodies with affinities approaching that derived from the human or mouse immune system. This may be brought about by using an error prone DNA polymerase in the initial steps of constructing a phage display library. It is possible that these types of libraries may provide a route to high affinity recombinant antibody fragments that are difficult to produce by more conventional hybridoma fusion techniques. Surface display libraries have also been prepared for the selection of ligands, hormones and other polypeptides in addition to allowing studies on protein–protein or protein–DNA interactions or determining the precise binding domains in these receptor–ligand interactions.

6.7.3 Alternative display systems A number of display systems have been developed based on the original phage display technique. One interesting method is ribosome display where a sequence or even a library of sequences are transcribed and translated in vitro. However in the DNA library the sequences are fused to spacer sequences lacking a stop codon. During translation at the ribosome the protein protrudes from the ribosome and is locked in with the mRNA. The complex can be stabilised by adding salt. In this way it is possible to select the appropriate protein through binding to its ligand. Thus a high-affinity protein-ligand can be isolated which has the mRNA that originally encoded it. The mRNA may then be reverse transcribed into cDNA and amplified by PCR to allow further methods such as mutagenesis to be undertaken. A related technique, mRNA display, is similar except the association between the protein and mRNA is through a more stable covalent puromycin link rather than the salt-induced link as in ribosome display. Further display systems, based on yeast or bacteria, have also been developed and provide powerful in vitro selection methods.

6.8 ANALYSING GENES AND GENE EXPRESSION 6.8.1 Identifying and analysing mRNA The levels and expression patterns of mRNA dictate many cellular processes and therefore there is much interest in the ability to analyse and determine levels of a particular mRNA. Technologies such as real-time or quantitative PCR and microchip expression arrays are currently being employed and refined for high throughput analysis. A number of other informative techniques have been developed that allow the fine structure of a particular mRNA to be analysed and the relative amounts of an RNA quantitated by non-PCR-based methods. This is important not only for gene regulation studies but may also be used as a marker for certain clinical disorders. Traditionally the Northern blot has been used for

241

6.8 Analysing genes and gene expression

Total RNA isolation

Specific mRNA Labelled RNA probe Markers

Hybridisation of probe and specific RNA

RNase digestion of unhybridised RNA

RNA/Probe

RNA purification and PAGE analysis

Fig. 6.38 Steps involved in the ribonuclease protection assay (RPA). PAGE, polyacrylamide gel electrophoresis.

detection of particular RNA transcripts by blotting extracted mRNA and immobilising it to a nylon membrane (Section 5.9.2). Subsequent hybridisation with labelled gene probes allows precise determination of the size and nature of a transcript. However, much use has been made of a number of nucleases that digest only single-stranded nucleic acids and not double-stranded molecules. In particular the ribonuclease protection assay (RPA) has allowed much information to be gained regarding the nature of mRNA transcripts (Fig. 6.38). In the RPA single-stranded mRNA is hybridised in solution to a labelled single-stranded RNA probe which is in excess. The hybridised part of the complex becomes protected whereas the unhybridised part of the probe made from RNA is digested with RNase A and RNase T1. The protected fragment may then be analysed on a high-resolution polyacrylamide gel. This method may give valuable information regarding the mRNA in terms of the precise structure of the transcript (transcription start site, intron/exon junctions, etc.). It is also quantitative and requires less RNA than a Northern blot. A related technique, S1 nuclease mapping, is similar although the unhybridised part of a DNA probe, rather than an RNA probe, is digested, this time with the enzyme S1 nuclease. The PCR has also had an impact on the analysis of RNA via the development of a technique known as reverse transcriptase–PCR (RT–PCR). Here the RNA is isolated and a first strand cDNA synthesis undertaken with reverse transcriptase; the cDNA is then used in a conventional PCR (Section 6.2.5). Under certain circumstances a number of thermostable DNA polymerases have reverse transcriptase activity which obviates the need to separate the two reactions and allows the RT–PCR to be carried out in one tube. One of the main benefits of RT–PCR is the ability to identify rare or low levels of mRNA transcripts with great sensitivity. This is especially useful when

242

Recombinant DNA and genetic analysis

Cell with active virus

Cell with latent virus

Extract mRNA

Extract mRNA

Perform RT–PCR virus-specific primers

Perform RT–PCR virus-specific primers

Agarose gel electrophoresis

Fig. 6.39 Representation of the detection of active viruses using RT–PCR.

detecting, for example, viral gene expression and furthermore allows the means of differentiating between latent and active virus (Fig. 6.39). The level of mRNA production may also be determined by using a PCR-based method, termed quantitative PCR (Section 5.10.7). In many cases the analysis of tissue-specific gene expression is required and again the PCR has been adapted provide a solution. This technique, termed differential display, is also an RT–PCR-based system requiring that isolated mRNA be first converted into cDNA. Following this, one of the PCR primers, designed to anneal to a general mRNA element such as the poly(A) tail in eukaryotic cells, is used in conjunction with a combination of arbitrary 6–7 bp primers which bind to the 50 end of the transcripts. Consequently this results in the generation of multiple PCR products with reproducible patterns (Fig. 6.40). Comparative analysis by gel electrophoresis of PCR products generated from different cell types therefore allows the identification and isolation of those transcripts that are differentially expressed. As with many PCR-based techniques the time to identify such genes is dramatically reduced from the weeks that are required to construct and screen cDNA libraries to a few days.

243

6.8 Analysing genes and gene expression

Total cellular mRNA

cDNA synthesis

PCR amplification (Arbitrary primers 6 to 7 bp used in various combinations) Arbitrary primer AAAAA TTTT Anchored primer

Multiple PCR products separated by gel electrophoresis

Autoradiograph Comparative analysis of differentially expressed genes

Fig. 6.40 Analysis of gene expression using differential display PCR.

6.8.2 Analysing genes in situ Gross chromosomal changes are often detectable by microscopic examination of the chromosomes within a karyotype (Section 5.3). Single or restricted numbers of base substitutions, deletions, rearrangements or insertions are far less easily detectable but may induce similarly profound effects on normal cellular biochemistry. In situ hybridisation makes it possible to determine the chromosomal location of a particular gene fragment or gene mutation. This is carried out by preparing a radiolabelled DNA or RNA probe and applying this to a tissue or chromosomal preparation fixed to a microscope slide. Any probe that does not hybridise to complementary sequences is washed off and an image of the distribution or location of the bound probe is viewed by autoradiography (Fig. 6.41). Using tissue or cells fixed to slides it is also possible to carry out in situ PCR and qPCR. This is a highly sensitive technique where PCR is carried out directly on the tissue slide with the standard PCR reagents. Specially adapted thermal cycling machines are required to hold the slide preparations and allow the PCR to proceed.

244

Recombinant DNA and genetic analysis

Cell Fixation Method Pretreatment of cell

Post-Fixation Treatment Cell permeabilisation

Addition of Labelled Probe Hybridisation conditions required

Wash Excess Probe from Section

Detection of Hybridised Probe Radioactive/Non-radioactive detection

Fig. 6.41 General scheme for in situ hybridisation.

This allows the localisation and identification of, for example, single copies of intracellular viruses and in the case of qPCR the determination of initial concentrations of nucleic acid. An alternative labelling strategy used in karyotyping and gene localisation is fluorescent in situ hybridisation (FISH). This method sometimes termed chromosome painting is based on in situ hybridisation but in which different gene probes are labelled with different fluorochromes, each specific for a particular chromosome. The advantage of this method is that separate gene regions may be identified and comparisons made within the same chromosome preparation. The technique is also likely to be highly useful in genome mapping for ordering DNA probes along a chromosomal segment (Section 6.9).

6.8.3 Analysing promoter–protein interactions To determine potential transcriptional regulatory sequences genomic DNA fragments may be cloned into specially devised promoter probe vectors. These contain sites for insertion of foreign DNA which lies upstream of a reporter gene. A number of reporter genes are currently used, including the lacZ gene encoding b-galactosidase, the CAT gene encoding chloramphenicol acetyl transferase (CAT) and the lux gene which produces luciferase and is determined in a bioluminescent assay. Fragments of DNA potentially containing a promoter region are cloned into the vector and the constructs

245

6.8 Analysing genes and gene expression

Transcription and translation Promoter

CAT gene Transfection of cells

pCAT

pCAT CAT protein

Incubate CAT protein 37°C [14C]chloramphenicol acetyl-CoA

Inactive promoter

Lyse cells

Active promoter

Aceylated chloramphenicol

Chloramphenicol Autoradiograph

Fig. 6.42 Assay for promoters using the reporter gene for chloramphenicol acetyl transferase (CAT).

transfected into eukaryotic cells. Any expression of the reporter gene will be driven by the foreign DNA which must therefore contain promoter sequences (Fig. 6.42). These plasmids and other reporter genes such as those using green fluorescent protein (GFP) or the firefly luciferase gene allow quantitation of gene transcription in response to transcriptional activators. The binding of a regulatory protein or transcription factor to a specific DNA site results in a complex that may be analysed by the technique termed gel retardation. Under gel electrophoresis the migration of a DNA fragment bound to a protein of a relatively large mass will be retarded in comparison to the DNA fragment alone. For gel retardation to be useful the region containing the promoter DNA element must be digested or mapped with a restriction endonuclease before it is complexed with the protein. The location of the promoter may then be defined by finding the position on the restriction map of the fragment that binds to the regulatory protein and therefore retards it during electrophoresis. One potential problem with gel retardation is the ability to define the precise nucleotide binding region of the protein, since this depends on the accuracy and detail of the restriction map and the convenience of the restriction sites. However it is a useful first step in determining the interaction of a regulatory protein with a DNA binding site. DNA footprinting relies on the fact that the interaction of a DNA-binding protein with a regulatory DNA sequence will protect that DNA sequence from degradation by an enzyme such as DNase I. The DNA regulatory sequence is first labelled at one end with a radioactive label and then mixed with the DNA-binding protein

246

Recombinant DNA and genetic analysis

Restriction fragment/oligonucleotide e.g. promoter region DNA-binding proteins e.g. transcription factors Protein-binding domain End-label 32P

DNase I

DNase I

Protein protects DNA from digestion

No protein protection

Footprint

Gel electrophoresis and autoradiography

Fig. 6.43 Steps involved in DNA footprinting.

(Fig. 6.43). DNase I is added and conditions favouring a partial digestion are then carried out. This limited digestion ensures that a number of fragments are produced where the DNA is not protected by the DNA-binding protein. The region protected by the DNAbinding protein will remain undigested. All the fragments are then separated on a highresolution polyacrylamide gel alongside a control digestion where no DNA-binding

247

6.8 Analysing genes and gene expression

Gene library production Fusions of B and proteins (e.g. D)

Fusion protein production A and proteins (e.g. C)

C: Specific protein is fused with A A: DNA-binding domain

C

D

A

B

5

D: Specific protein is fused with B B: Activation domain

3 Promoter C

Reporter

D Reporter gene expression

5

A

B

3

C + D interact A + B then allow transcription activation

Fig. 6.44 Yeast two-hybrid system (interaction trapping technique). Transcription factors have two domains, one for DNA binding (A) and the other to allow binding to further proteins (B). Thus a recombinant molecule is formed from a protein (C) as a fusion with the DNA-binding domain. It cannot, however, activate transcription alone. Genes from a cDNA library (D) are expressed as a fusion with the activator domain (B) but also cannot initiate transcription alone. When the two fractions are mixed together, transcription is initiated if the domains are complementary and expression of a reporter gene takes place.

protein is present. The autoradiograph of a gel will contain a ladder of bands representing the partially digested fragments. Where DNA has been protected no bands appear; this region or hole is termed the DNA footprint. The position of the protein-binding sequence within the DNA may be elucidated from the size of the fragments either side of the footprint region. Footprinting is a more precise method of locating a DNA–protein interaction than gel retardation; however, it also is unable to give any information as to the precise interaction or the contribution of individual nucleotides. In addition to the detection of DNA sequences that contribute to the regulation of gene expression an ingenious way of detecting the protein transcription factors has been developed. This is termed the yeast two-hybrid system. Transcription factors have two domains, one for DNA binding and the other to allow binding to further proteins (activation domain). These occur as part of the same molecule in natural transcription factors, for example TFIID (Section 5.5.4). However they may also be formed from two separate domains. Thus a recombinant molecule is formed encoding the protein under study as a fusion with the DNA-binding domain. It cannot however activate transcription. Genes from a cDNA library are expressed as a fusion with the activator domain; this also cannot initiate transcription. However, when the two fractions are mixed together transcription is initiated if the domains are complementary (Fig. 6.44). This is indicated

248

Recombinant DNA and genetic analysis

Table 6.4 Use of transgenic mice for investigation of selected human disorders Gene/protein

Genetic lesion

Disorder in humans

Tyrosine kinase (TK)

Constitutive expression of gene

Cardiac hypertrophy

HIV transactivator

Expression of HIV tat gene

Kaposi’s sarcoma

Angiotensinogen

Expression of rat angiotensinogen gene

Hypertension

Cholesterol ester transfer protein (CET protein)

Expression of CET gene

Atherosclerosis

Hypoxanthine-guanine phosphoribosyl transferase (HPRT)

Inactivation of HPRT gene

HPRT deficiency

by the transcription of a reporter gene such as the CAT gene. The technique is not just confined to transcription factors and may be applied to any protein system where interaction occurs.

6.8.4 Transgenics and gene targeting In many cases it is desirable to analyse the effect of certain genes and proteins in an organism rather than in the laboratory. Furthermore the production of pharmaceutical products and therapeutic proteins is also desirable in a whole organism. This also has important consequences for the biotechnology and agricultural industry (Section 6.10) (Table 6.4). The introduction of foreign genes into germ line cells and the production of an altered organism is termed transgenics. There are two broad strategies for transgenesis. The first is direct transgenesis in mammals whereby recombinant DNA is injected directly into the male pronucleus of a recently fertilised egg. This is then raised in a foster mother animal resulting in an offspring that is all transgenic. Selective transgenesis is where the recombinant DNA is transferred into embryo stem (ES) cells. The cells are then cultured in the laboratory and those expressing the desired protein selected and incorporated into the inner cell mass of an early embryo. The resulting transgenic animal is raised in a foster mother but in this case the transgenic animal is a mosaic or chimeric since only a small proportion of the cells will be expressing the protein. The initial problem with both approaches is the random nature of the integration of the recombinant DNA into the genome of the egg or embryo stem cells. This may produce proteins in cells where it is not required or disrupt genes necessary for correct growth and development. A refinement of this however is gene targeting which involves the production of an altered gene in an intact cell, a form of in vivo mutagenesis as opposed to in vitro mutagenesis (Section 6.6.2). The gene is inserted into the genome of, for example, an ES cell by specialised viral-based vectors. The insertion is non-random, however, since homologous sequences exist on the vector to the gene and on the gene to be targeted. Thus, homologous recombination may introduce a new genetic property to the cell, or inactivate an already existing one, termed gene knockout. Perhaps the most important aspect

249

6.8 Analysing genes and gene expression

of these techniques is that they allow animal models of human diseases to be created. This is useful since the physiological and biochemical consequences of a disease are often complex and difficult to study impeding the development of diagnostic and therapeutic strategies.

6.8.5 Modulating gene expression by RNAi There are a number of ways of experimentally changing the expression of genes. Traditionally methods have focussed on altering the levels of mRNA by manipulation of promoter sequences or levels of accessory proteins involved in control of expression. In addition post-mRNA production methods have also been employed such as antisense RNA, where a nucleic acid sequence complementary to an expressed mRNA is delivered into the cell. This antisense sequence binds to the mRNA and prevents its translation. A development of this theme and a process that is found in a variety of normal cellular processes is termed RNA interference (RNAi) and uses microRNA. Here a number of techniques have been developed that allow the modulation of gene expression in certain cells. This type of cellular-based gene expression modulation will no doubt extend to many organisms in the next few years.

6.8.6 Analysing genetic mutations There are several types of mutations that can occur in nucleic acids, either transiently or those that are stably incorporated into the genome. During evolution, mutations may be inherited in one or both copies of a chromosome, resulting in polymorphisms within the population (Section 5.3). Mutations may potentially occur at any site within the genome; however, there are several instances whereby mutations occur in limited regions. This is particularly obvious in prokaryotes, where elements of the genome (termed hypervariable regions) undergo extensive mutations to generate large numbers of variants, by virtue of the high rate of replication of the organisms. Similar hypervariable sequences are generated in the normal antibody immune response in eukaryotes. Mutations may have several effects upon the structure and function of the genome. Some mutations may lead to undetectable effects upon normal cellular functions, termed conservative mutations. An example of these are mutations that occur in intron sequences and therefore play no part in the final structure and function of the protein or its regulation. Alternatively, mutations may result in profound effects upon normal cell function such as altered transcription rates or on the sequence of mRNAs necessary for normal cellular processes. Mutations occurring within exons may alter the amino acid composition of the encoded protein by causing amino acid substitution or by changing the reading frame used during translation. These point mutations were traditionally detected by Southern blotting or, if a convenient restriction site was available, by restriction fragment length polymorphism (RFLP) (Section 5.9). However, the PCR has been used to great effect in mutation detection since it is possible to use allele-specific oligonucleotide PCR (ASO–PCR) where two competing primers and one general primer are used in the reaction (Fig. 6.45). One of the primers is directly complementary to the known point mutation whereas the other is a wild-type primer; that is, the primers are identical

250

Recombinant DNA and genetic analysis

Isolate patient sample DNA

Wild-type primer X

PCR W

Disease gene Allele-specific oligonucleotide primer PCR M

Analysis of PCR (M and W) by gel electrophoresis

M

W

Homozygous mutation

M

W

Homozygous wild-type

M

W

Heterozygous

Fig. 6.45 Point mutation detection using allele-specific oligonucleotide PCR (ASO–PCR).

except for the terminal 30 end base. Thus, if the DNA contains the point mutation only the primer with the complementary sequence will bind and be incorporated into the amplified DNA, whereas if the DNA is normal the wild-type primer is incorporated. The results of the PCR are analysed by agarose gel electrophoresis. A further modification of ASO–PCR has been developed where the primers are each labelled with a different fluorochrome. Since the primers are labelled differently a positive or negative result is produced directly without the need to examine the PCRs by gel electrophoresis. Various modifications now allow more than one PCR to be carried out at a time (multiplex PCR), and hence the detection of more than one mutation is possible at the same time. Where the mutation is unknown it is also possible to use a PCR system with a gel-based detection method termed denaturing gradient gel electrophoresis (DGGE). In this technique a sample DNA heteroduplex containing a mutation is amplified by the PCR which is also used to attach a GC-rich sequence to one end of the heteroduplex. The mutated heteroduplex is identified by its altered melting properties through a polyacrylamide gel which contains a gradient of denaturant such as urea. At a certain point in the gradient the heteroduplex will denature relative to a perfectly matched homoduplex and thus may be identified. The GC clamp maintains the integrity of the end of the duplex on passage through the gel (Fig. 6.46). The sensitivity of this and other mutation detection methods has been substantially increased by the use of PCR, and further mutation

251

6.8 Analysing genes and gene expression

Sample DNA with mutation

PCR with GC clamp

Separate by electrophoresis in gradient of denaturant

Duplex melting

Duplex melting Mutated DNA

Normal DNA

Fig. 6.46 Detection of mutations using denaturing gradient gel electrophoresis (DGGE).

techniques used to detect known or unknown mutations are indicated in Table 6.5. An extension of this principle is used in a number of detection methods employing denaturing high-performance liquid chromatography (dHPLC). Commonly known as wave technology the detection of denatured single strands containing mismatches is rapid allowing a high-throughput analysis of samples to be achieved.

6.8.7 Detecting DNA polymorphisms Polymorphisms are particularly interesting elements of the human genome and as such may be used as the basis for differentiating between individuals. All humans carry repeats of sequences known as minisatellite DNA of which the number of repeats varies between unrelated individuals. Hybridisation of probes which anneal to these sequences using Southern blotting provides the means to type and identify those individuals (Section 5.3). DNA fingerprinting is the collective term for two distinct genetic testing systems that use either ‘multilocus’ probes or ‘single-locus’ probes. Initially described DNA fingerprinting probes were multilocus probes and so termed because they detect hypervariable minisatellites throughout the genome, i.e. at multiple locations within the genome. In contrast, several single-locus probes were discovered which under

252

Recombinant DNA and genetic analysis

Table 6.5 Main methods of detecting mutations in DNA samples Technique

Basis of method

Main characteristics of detection

Southern blotting

Gel based

Labelled probe hybridisation to DNA

Dot/slot blotting

Sample application

Labelled probe hybridisation to DNA

Allele-specific oligo-PCR (ASO–PCR)

PCR based

Oligonucleotide matching to DNA sample

Denaturing gradient gel electrophoresis (DGGE)

Gel/PCR based

Melting temperature of DNA strands

Single-stranded conformation polymorphism (SSCP)

Gel/PCR based

Conformation difference of DNA strands

Ligase chain reaction (LCR)

Gel/automated

Oligonucleotide matching to DNA sample

DNA sequencing

Gel based

Nucleotide sequence analysis of DNA

DNA microchips

Glass chip based

Sample DNA hybridisation to oligo arrays

specific conditions only detect the two alleles at a single locus and generate what have been termed DNA profiles because, unlike multilocus probes, the two-band pattern result is in itself insufficient to uniquely identify an individual. Techniques based on the PCR have been coupled to the detection of minisatellite loci. The inherent larger size of such DNA regions was not best suited to PCR amplification; however, new PCR developments are beginning to allow this to take place. The discovery of polymorphisms within the repeating sequences of minisatellites has led to the development of a PCR-based method that distinguishes an individual on the basis of the random distribution of repeat types along the length of a person’s two alleles for one such minisatellite. Known as minisatellite variant repeat (MVR) analysis or digital DNA typing, this technique can lead to a simple numerical coding of the repeat variation detected. Potentially this combines the advantages of PCR sensitivity and rapidity with the discriminating power of minisatellite alleles. Thus for the future there are a number of interesting identification systems under development and evaluation. Techniques for genetic detection of polymorphisms have been used in many cases of paternity testing and immigration control, and are becoming central factors in many criminal investigations. They are also valuable tools in plant biotechnology for cereal typing and in the field of pedigree analysis and animal breeding.

6.8.8 Microarrays and DNA microchips One firmly established area under rapid development in molecular biology is the use of microarrays or DNA microchips. These provide a radically different approach to current laboratory molecular biology research strategies in that large-scale analysis and quantification of genes and gene expression is possible simultaneously. A microarray consists of an ordered arrangement of potentially hundreds of thousands of DNA sequences such

253

6.8 Analysing genes and gene expression

as oligonucleotides or cDNAs deposited onto a solid surface. The solid support may be either glass or silicon and currently the arrays are synthesised on or off the chip. They require complex fabrication methods similar to that used in producing computer microchips. Most commercial productions employ robotic ultrafine microarray deposition instruments which dispense volumes in the picolitre range. Alternatively on-chip fabrication as used by Affymetrix builds up layers of nucleotides using a process borrowed from the computer industry termed photolithography. Here wafer-thin masks with holes allow photoactivation of specific dNTPs which are linked together at specific regions on the chip. The whole process allows layers of oligonucleotides to be built up with each nucleotide at each position being defined by computer. The arrays themselves may represent a variety of nucleic acid material. This may be mRNA produced in a particular cell type, termed cDNA expression arrays, or may alternatively represent coding and regulatory regions of a particular gene or group of genes. A number of arrays are now available that may determine mutations in DNA, mRNA transcript levels or other polymorphisms such as SNPs. Sample DNA is placed on the array and any unhybridised DNA washed off. The array is then analysed and scanned for patterns of hybridisation by detection of fluorescence signals. Any mutations or genetic polymorphisms in relevant genes may be rapidly analysed by computer interpretation of the resulting hybridisation pattern and mutation, transcript level or polymorphism defined. Indeed the collation and manipulation of data from microarrays presents as big a problem as fabricating the chips in the first place. The potential of microarrays appears to be limitless and a number of arrays have been developed for the detection of various genetic mutations including the cystic fibrosis CFTR gene (cystic fibrosis transmembrane regulator), the breast cancer gene BRCA1 and in the study of the human immunodeficiency virus (HIV). At present microarrays require DNA to be highly purified, which limits their applicability. However as DNA purification becomes automated and microarray technology develops it is not difficult to envisage numerous laboratory tests on a single DNA microchip. This could not only be used for analysing single genes but large numbers of genes or DNA representing microorganisms, viruses, etc. Since the potential for quantitation of gene transcription exists expression arrays could also be used in defining a particular disease status. This technique may be very significant since it will allow large amounts of sequence information to be gathered very rapidly and assist in many fields of molecular biology, especially in large genome sequencing projects or in so-called resequencing projects where gene regions such as those containing potentially important polymorphisms require analysis in a number of samples. One current application of microarray technology is the generation of a catalogue of SNPs across the human genome. Estimates indicate that there are approximately 10 million SNPs and importantly 200 000 coding or cSNPs that lie within genes and may point to the development of certain diseases. SNP analysis is therefore clearly a candidate for microarray analysis and developments such as Affymetrix Genome Wide SNP array enables the simultaneous analysis of nearly 1 million SNPs on one gene chip. In order to simplify the problem of the vast numbers of SNPs that need to be analysed the HapMap project currently analyses SNPs that are inherited as a block, and in theory as few as 500 000 SNPs will be required to genotype an individual.

254

Recombinant DNA and genetic analysis

An extension of microarray technology may also be used to analyse tissue sections. This process, termed tissue microarrays (TMA), uses tissue cores or biopsies from conventional paraffin-embedded tissues. Thousands of tissue cores are sliced and placed on a solid support such as glass where they may all be subjected to the same immunohistochemical staining process or analysis with gene probes using in situ hybridisation. As with DNA microarrays many samples may be analysed simultaneously, less tissue is required and greater standardisation is possible.

6.9 ANALYSING WHOLE GENOMES Perhaps the most ambitious project in biosciences is the initiative to map and completely sequence a number of genomes from various organisms. The mapping and sequencing of a number of organisms indicated in Table 6.6. has been completed and many more are due for completion. A number have been completed already such as the bacterium E. coli. The demands of such large-scale mapping and sequencing have provided the impetus for the development and refinement of even the most standard of molecular biology techniques such as DNA sequencing. It has also led to new methods of identifying the important coding sequences that represent proteins and enzymes. The use of bioinformatics to collate, annotate and publish the information on the World Wide Web has also been an enormous undertaking. The availability of an informative map of the human genome that may be analysed and studied in detail chromosome by chromosome, such as the Map Viewer (NCBI), is just one of the rapid developments in the field of genome analysis and bioinformatics. Such is the power and ease of use of resources such as these that it is now inconceivable to work without these resources.

6.9.1 Physical genome mapping In terms of genome mapping a physical map is the primary goal. Genetic linkage maps have also been produced by determining the recombination frequency between two particular loci. YAC-based vectors essential for large-scale cloning contain DNA inserts that are on average 300 000 bp in length, which is longer by a factor of ten than the longest inserts in the clones used in early mapping studies. The development of vectors with large insert capacity has enable the production of contigs. These are continuous overlapping cloned fragments that have been positioned relative to one another. Using these maps any cloned fragment may be identified and aligned to an area in one of the contig maps. In order to position cloned DNA fragments resulting from the construction of a library in a YAC or cosmid it is necessary to detect overlaps between the cloned DNA fragments. Overlaps are created because of the use of partial digestion conditions with a particular restriction endonuclease when constructing the libraries. This ensures that when each DNA fragment is cloned into a vector it has overlapping ends which theoretically may be identified and the clones positioned or ordered so that a physical map may be produced (Fig. 6.47). In order to position the overlapping ends it is preferable to undertake DNA sequencing; however, due to the impracticality of this approach a fingerprint of each clone is

255

6.9 Analysing whole genomes

Table 6.6 Current selected genome-sequencing projects Organism

Genome size (Mb)

Bacteria

Escherichia coli

Yeast

Saccharomyces cerevisiae

Roundworm

Caenorhabditis elegans

100

Fruit fly

Drosophila melanogaster

165

Puffer fish

Fugu rubripes rubripes

400

Mouse

Mus musculus

Cosmid/YAC library

4.6 14

3000

Ordering of clones by contigs

Overlapping sequences

Fig. 6.47 Physical mapping using continuous overlapping cloned fragments (contigs). In order to assign the position of cloned DNA fragments resulting from the construction of a library in a YAC or cosmid vector, overlaps are detected between the clone fragments. These are created because of the use of partial digestion conditions when the libraries are constructed.

carried out by using restriction enzyme mapping. Although this is not an unambiguous method of ordering clones it is useful when also applying statistical probabilities of the overlap between clones. In order to link the contigs techniques such as in situ hybridisation may be used or a probe generated from one end of a contig in order to screen a different disconnected contig. This method of probe production and identification is termed walking, and has been used successfully in the production of physical maps

256

Recombinant DNA and genetic analysis

Isolate genomic cosmid clone

Subclone DNA into M13 sequencing vector

Sequence 400 to 500 bp from M13 clones

Identify unique sequence (database searching)

Design primers for PCR (20 to 25 bp sequences)

Analyse amplification products Functional STS markers will give single product

Fig. 6.48 General scheme of the production of a functional STS marker.

of E. coli and yeast genomes. This cycle of clone to fingerprint to contig is amenable to automation; however, the problem of closing the gaps between contigs remains very difficult. In order to define a common way for all research laboratories to order clones and connect physical maps together an arbitrary molecular technique based on the polymerase chain reaction has been developed based on sequence-tagged sites (STS). This is a small unique sequence between 200–300 bp that is amplified by PCR (Fig. 6.48). The uniqueness of the STS is defined by the PCR primers that flank the STS. A PCR with those primers is performed and if the PCR results in selected amplification of target region it may be defined as a potential STS marker. In this way defining STS markers that lie approximately 100 000 bases apart along a contig map allows the ordering of those contigs. Thus, all groups working with clones have definable landmarks with which to order clones produced in their libraries. An STS that occurs in two clones will overlap and thus may be used to order the clones in a contig. Clones containing the STS are usually detected by Southern blotting where the clones have been immobilised on a nylon membrane. Alternatively a library of clones may be divided into pools and and each pool PCR screened. This is usually a more rapid method of identifying an STS within a clone and further refinement of the PCR-based screening method allows the identification of a particular clone within a pool (Fig. 6.49). STS elements may also be generated from variable regions of the genome to produce a polymorphic marker that may be traced through families along with other DNA markers and located on a genetic linkage map. These

257

6.9 Analysing whole genomes

Genomic YAC clone (150 kb)

Overlapping cosmid contigs Physical map (5 to 10 kb)

STS marker

200 to 300 bp PCR-STS

Fig. 6.49 The derivation of an STS marker. An STS is a small unique sequence of between 200 and 300 bp that is amplified by PCR and allows ordering along a contig map. Such sequences are definable landmarks with which to order clones produced in genome libraries and usually lie approximately 100 000 bp apart.

polymorphic STSs are useful since they may serve as markers on both a physical map and a genetic linkage map for each chromosome and therefore provide a useful marker for aligning the two types of map.

6.9.2 Gene discovery and localisation A number of disease loci have been identified and located to certain chromosomes. This has been facilitated by the use of in situ mapping techniques such as FISH. In fact a number of genes have been identified and the protein determined where little was initially known about the gene except for its location. This method of gene discovery is known as positional cloning and was instrumental in the isolation of the CFTR gene responsible for the disorder cystic fibrosis (Fig. 6.50). The genes that are actively expressed in a cell at any one time are estimated to be as little as 10% of the total. The remaining DNA is packaged and serves an as yet unknown function. Investigations have found that certain active genes may be identified by the presence of so-called HTF (HpaII tiny fragments) islands often found at the 50 end of genes. These are CpG-rich sequences that are not methylated and form tiny fragments on digestion with the restriction enzyme HpaII. A further gene discovery method that has been used extensively in the past few years is a PCR-based technique giving rise to a product termed an expressed sequence tag (EST). This represents part of a putative gene for which a function has yet to be assigned. It is carried out on cDNA by using primers that bind to an anchor sequence such as a poly(A) tail and primers which bind to sequences at the 50 end of the gene. Such PCRs may

258

Recombinant DNA and genetic analysis

Characterise disease phenotype

Identify marker linked to gene

Isolate disease gene by mapping

Identify and characterise disease gene

Identify function of protein encoded by gene

Fig. 6.50 The scheme of identification of a disease gene by positional cloning.

subsequently be used to map the putative gene to a chromosomal region or be used itself as a probe to search a genomic DNA library for the remaining parts of the gene. This type of information can be visualised using bioinformatics and useful information determined in a process termed data mining. Much interest currently lies in ESTs since they may represent a short cut to gene discovery. A further gene isolation system that uses adapted vectors, termed exon trapping or exon amplification, may be used to identify exon sequences. Exon trapping requires the use of a specialised expression vector that will accept fragments of genomic DNA containing sequences for splicing reactions to take place. Following transfection of a eukaryotic cell line a transcript is produced that may be detected by using specific primers in a RT–PCR. This indicates the nature of the foreign DNA by virtue of the splicing sequences present. A list of further techniques that aid in the identification of a potential gene-encoding sequence is indicated in Table 6.7.

6.9.3 Genome mapping projects As a result of the technological advances in large-scale DNA sequencing as indicated in Chapter 5 it is now possible not only to map genomes of various species but also to determine their sequence reliably and rapidly. The genomes of hundreds of species have been determined and this is increasing each month. Sequencing and mapping of the human genome was completed ahead of schedule and has provided many new insights into gene function and gene regulation. It was also a multi-collaboration effort that engaged many scientific research groups around the world and has given rise to many scientific, technical, financial and ethical debates. One interesting issue is the sequencing of the whole genome in relation to the coding sequences. Much of the human genome appears to be non-coding and composed of repetitive sequences.

259

6.10 Pharmacogenomics

Table 6.7 Techniques used to determine putative gene-encoding sequences Identification method

Main details

Zoo blotting (cross-hybridisation)

Evolutionary conservation of DNA sequences that suggest functional significance

Homology searching

Gene database searching to gene family-related sequences

Identification of CpG islands

Regions of hypomethylated CpG frequently found 50 to genes in vertebrate animals

Identification of open reading frames (ORF) promoters/splice sites/RBS

DNA sequences scanned for consensus sequences by computer

Northern blot hybridisation

mRNA detection by binding to labelled gene probes

Exon trapping technique

Artificial RNA splicing assay for exon identification

Expressed sequence tags (ESTs)

cDNAs amplified by PCR that represent part of a gene

Notes: RBS, ribosome binding site; cDNA, complementary DNA.

Estimates indicate that as little as 10% of the genome appears to encode enzymes and proteins. Current estimates equate this to approximately 20 000 genes which are important for human cellular development and maintenance. However it is the understanding of the complete function of many of the genes and their variants coupled with their interaction that now provides a major challenge. It also points to the fact that there is an extensive use of alternative splicing where exons are essentially mixed and matched to form different mRNA and thus different proteins. The study further aims to understand and possibly provide the eventual means of treating some of the 4000 genetic diseases in addition to other diseases whose inheritance is multifactorial. In this respect there are a number of specific genome projects such as the Cancer Genome Anatomy Project (CGAP) which aims to understand the part certain mutations play in the development of tumours.

6.10 PHARMACOGENOMICS As a result of the developments in genomics new methods of providing targeted drug treatment are beginning to be developed. This area is linked to the proposal that it is possible to identify those people who react in a specific way to drug treatment by identifying their genetic make-up. In particular SNPs may provide a key marker of potential disease development and reaction to a particular treatment. A simple example that has been known for some time is the reaction to a drug used to treat a particular type of childhood leukaemia. Successful treatment of the majority of patients may be achieved with 6-mercaptopurine. A number of patients do not respond well, but in some cases it may be fatal to administer this drug. This is now known to be due to a mutation

260

Recombinant DNA and genetic analysis

in the gene encoding the enzyme that metabolises the drug. Thus, it is possible to analyse patient DNA prior to administration of a drug to determine what the likely response will be. The technology to deduce a patient’s genotype is already developed and indicated in Section 6.8.7. It is also now possible to analyse SNPs which may also correlate with certain disease processes in a microarray type format. This opens up the possibility that it may be possible to assign a pharmacogenetic profile at birth, in much the same way as blood typing for later treatment. A further possibility is the determination of likely susceptibility to a disease based on genetic information. A number of companies including the Icelandic genetics company deCode are able to provide personal genetic information based on modelling and analysis of disease genes in large population studies for certain conditions such as diabetes.

6.11 MOLECULAR BIOTECHNOLOGY AND APPLICATIONS It is a relatively short period of time since the early 1970s when the first recombinant DNA experiments were carried out. However, huge strides have been made not only in the development of molecular biology techniques but also in their practical application. The molecular basis of disease and the new areas of genetic analysis and gene therapy hold great promise. In the past medical science relied on the measurement of protein and enzyme markers which reflected disease states. It is possible now not only to detect such abnormalities at an earlier stage using mRNA techniques but also in some cases to predict such states using genome analysis. The complete mapping and sequencing of the human genome and the development of techniques such as DNA microchips will certainly accelerate such events. Perhaps even more difficult is

Table 6.8 General classification of oncogenes and their cellular and biochemical functions Oncogene

Example

Main details

G-proteins

H-K- and N-ras

GTP-binding protein/GTPase

Growth factors

sis, nt-2, hst

b-chain of platelet-derived growth factor (PDGF)

Growth factor receptors

erbB

Epidermal growth factor receptor (EGFR)

fms

Colony-stimulating factor-1 receptor

abl, src

Protein tyrosine kinases

mos, ras

Protein serine kinases

mye

DNA-binding protein

myb

DNA-binding protein

jun, fos

DNA-binding protein

Protein kinases

Nucleus-located transcription factors

261

6.11 Molecular biotechnology and applications

Table 6.9 A number of selected examples of targets for gene therapy Disorder

Defect

Gene target

Target cell

Emphysema

Deficiency (a1-AT)

a1-Antitrypsin (a1-AT)

Liver cells

Gaucher disease (storage disorder)

GC deficiency

Glucocerebrosidase

GC fibroblasts

Haemoglobinopathies

Thalassaemia

b-Globin

Fibroblasts

Lesch–Nyhan syndrome

Metabolic deficiency

Hypoxanthine guanine phosphoribosyl transferase (HPRT)

HPRT cells

Immune system disorder

Adenosine deaminase deficiency

Adenosine deaminase (ADA)

T and B cells

Table 6.10 Current selected plant/crops modified by genetic manipulation Crop or plant

Genetic modification

Canola (oil seed rape)

Insect resistance, seed oil modification

Maize

Herbicide tolerance, resistance to insects

Rice

Modified seed storage protein, insect resistance

Soya bean

Tolerance to herbicide, modified seed storage protein

Tomato

Modified ripening, resistance to insects and viruses

Sunflower

Modified seed storage protein

the elucidation of diseases that are multifactorial and involve a significant contribution from environmental factors. One of the best-studied examples of this type of disease is cancer. Molecular genetic analysis has allowed a discrete set of cellular genes, termed oncogenes, to be defined which play key roles in such events. These genes and their proteins are also major points in the cell cycle and are intimately invloved in cell regulation. A number of these are indicated in Table 6.8. In a number of cancers well-defined molecular events have been correlated with mutations in these oncogenes and therefore in the corresponding protein. It is already possible to screen and predict the fate of some disease processes at an early stage, a point which itself raises significant ethical dilemmas. In addition to understanding cellular processes both in normal and disease states great promise is also held in drug discovery and molecular gene therapy. A number of genetically engineered therapeutic proteins and enzymes have been developed and are already having an impact on disease management. In addition the correction of disorders at the gene level (gene therapy) is also under way and perhaps is one of the most startling applications of molecular biology to date. A number of these developments are indicated in Table 6.9.

262

Recombinant DNA and genetic analysis

The production of modified crops and animals for farming and as producers of important therapeutic proteins is also one of the most exciting developments of molecular biology. This has allowed the production of modified crops, improving their resistance to environmental factors and their stability (Table 6.10). The production of transgenic animals also holds great promise for improved livestock quality, low-cost production of pharmaceuticals and disease-free or disease-resistant strains. In the future this may overcome such factors as contamination with agents such as BSE. There is no doubt that improved methods of producing livestock by whole-animal cloning will also be a major benefit. All of these developments do however require debate and the many ethical considerations that arise from them require careful consideration.

6.12 SUGGESTIONS FOR FURTHER READING Augen, J. (2005). Bioinformatics in the Post-Genomic Era. Reading, MA: Addison-Wesley. Brooker, R. J. (2005). Genetics Analysis and Principles, 2nd edn. McGraw-Hill. Brown, T. A. (2006). Gene Cloning and DNA Analysis. Oxford, UK: Wiley–Blackwell. Primrose, S. B. and Twyman, R. (2006). Principles of Gene Manipulation and Genomics. Oxford, UK: Wiley–Blackwell. Strachan, T. and Read, A. P. (2004). Human Molecular Genetics, 3rd edn. Oxford, UK: Bios. Walker, J. M. and Rapley, R. (2008). Molecular Biomethods Handbook, 2nd edn. Totowa, NJ: Humana Press. Watson, J. D., Caudy, A. A., Myers, R. M. and Witkowski, J. A. (2007). Recombinant DNA: Genes and Genomes. San Francisco, CA: W. H. Freeman.

7 Immunochemical techniques R. BURNS

7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15

Introduction Making antibodies Immunoassay formats Immuno microscopy Lateral flow devices Epitope mapping Immunoblotting Fluorescent-activated cell sorting (FACS) Cell and tissue staining techniques Immunocapture polymerase chain reaction (PCR) Immunoaffinity chromatography (IAC) Antibody-based biosensors Therapeutic antibodies The future uses of antibody technology Suggestions for further reading

7.1 INTRODUCTION The immune system of mammals has evolved over millions of years and provides an incredibly elegant protection system which is capable of responding to infective challenges as they arise. The system is fluid-based and both the cells of immunity and their products are transported throughout the body, primarily in the blood and secondarily through fluid within the tissues and organs themselves. All areas of the body are protected by immunity apart from the central nervous system including the brain and eyes. There are several cell types involved in immune responses, each with a role to play and each controlled by chemical mediators known as cytokines. This control is essential as the immune system is such a powerful tool it needs careful management to ensure its effective operation. Both over- and under-activity could have fatal consequences. All vertebrates have advanced immune systems which show the similarities that you would expect from our common evolutionary past. The more advanced the 263

264

Immunochemical techniques

vertebrate the more complex the immune system. Fish and amphibians have fairly rudimentary immunity with the most sophisticated being found in mammals. The immune system is broadly additive; more complex animals have elements analogous to those found in primitive species but have extra features as well. For the purposes of this chapter we will focus on the mammalian immune system although the use of birds for antibody production will be discussed in Section 7.1.2. Immunity is monitored, delivered and controlled by specialised cells all derived from stem cells in the bone marrow. There are motile macrophages which move around the body removing debris and foreign materials, and two lineages of lymphocyte, B and T, which provide immediate killing potential but also provide the mechanism for the production of antibodies. There are also assorted other cells whose function is to rush to areas of the body where a breach of security has occurred and deliver potent chemicals capable of sterilising and neutralising any foreign bodies. For mammalian immunity to function effectively it is vital that the cells of the system can recognise the difference between self and non-self. There are three ways in which this is achieved. Mammals have a pre-programmed ability to recognise and immediately act against substances derived from fungal and bacterial microorganisms. This is mediated through a series of biological chemicals known as the complement system which are capable of adhering to and killing bacteria, fungi and some viruses. Secondly, the immune system is capable of recognising when a substance is close to but not quite the same as self. This is a response based on ‘generic’ circulating antibodies which are able to discriminate between self and nonself. Lastly, every individual has a unique ‘signature’ which is caused by a pattern of molecules on every cell surface. Cells of the immune system read this unique code and any cells differing from the authorised version are targeted and destroyed through a T-cell-mediated response. These three systems do not operate in isolation, they form a cohesive network of surveillance in which all of the cell types co-operate to provide the most appropriate response to any breach of security. In mammals the first line of defence against attack is the skin and any breaches of it are responded to by the cells of the immune system even though no foreign material is present. This response is mediated by cell messengers known as cytokines which can be released from damaged tissue or cells of the immune system near to the site of injury. For the purposes of this chapter it is the antibody response in mammals that will be focussed on as these are the molecules that we are able to harness for our uses where a specific protein sequence or molecular structure has to be identified. As previously mentioned it is impossible to discuss one area of cellular immunity in isolation and so reference will be made to how the rest of the immune system contributes to the manufacture of antibodies by mammals. An antigen is a substance capable of causing an immune response leading to the production of antibodies and they are also the targets to which antibodies will bind. Antibodies are antigen specific and will only bind to the antigen that initiated their production.

265

7.1 Introduction

7.1.1 Development of the immune system Mammalian embryos develop an immune system before birth which is capable of providing the newborn with immediate protection. Additional defences are acquired from maternal milk and this covers the period during which the juvenile immune system matures to deal with the requirements of the organism after weaning takes place. For an immune system to function effectively it must be organic in its ability to react to situations as they arise and the mammalian system has an extremely elegant way of dealing with this. The cells of the mammalian immune system are descended from distinct lineages derived initially from stem cells, and those producing antibodies are known as B cells (also called B lymphocytes). These cells have the ability to produce antibodies which recognise specific molecular shapes. Cells of the immune system known as macrophages, dendritic cells and other antigen-presenting cells (APCs) have the ability to recognise ‘foreign’ substances (antigens) within the body and will attack and digest them when encountered. The majority of antigens found by the body are from viruses, bacteria, parasites and fungi, all of which may infect an individual. All of these organisms have proteins and other substances that will be antigenic (behave as an antigen) when encountered by the immune system. Organisms that infect or invade the body are known as pathogens and will have many antigens within their structures. The presenting cells process antigens into small fragments and present them to the B cells. The fragments contain epitopes which are typically about 15 amino acid residues in size. This size corresponds to the size that the antibody binding site can bind to. After ingesting the antigen fragments the B cells recruit ‘help’ in the form of cytokines from T cells which stimulates cell division and secretion. Each B cell that was capable of binding an antigen fragment and has ingested it will then model an antibody on the shape of an epitope and start to secrete it into the blood. During embryonic development the immune system has to learn what is self and what is foreign. Failure to do this would lead to self-destruction or would lead to an inability to mount an immune response to foreign substances. During development this is achieved by selective clonal deletion (see Fig. 7.1) of self-recognising B cells. Early in the development of the immune system, B cell lineages randomly reassort the antibody-creating genes to produce a ‘starter pack’ of B cells that will respond to a huge number of molecular shapes. These cells have these randomly produced antibodies bound to their surface ready to bind should an antigen fit the antibodybinding site. These provide crude but instant protection to a large number of foreign substances immediately after birth. They also are the basis for the B cells that will provide protection for the rest of the animal’s life. However, within the population of randomly produced B cells are a number which will be responsive to self-antigens, which are extremely dangerous as they could lead to the destruction of parts of the animal’s body. Embryos are derived exclusively from cells derived from the fusion of egg and sperm. There are no cells derived from the mother within the embryonic sac in the uterus and so everything can be regarded as being immunologically part of ‘self ’. Any B cells that start to divide within the embryo prior to birth are responding to ‘self ’ antigens and are destroyed as potentially dangerous. This selective clonal deletion is

266

Immunochemical techniques

Self antigens bound by B cells

Self antigens not bound by B cells

Quiescent B cell waiting for activation by foreign antigen

Cells die by apoptosis

Fig. 7.1 Clonal deletion.

fundamental to the development of the immune system and without it the organism could not continue to develop. The remainder of the B cells that have not undergone cell division will only recognise non-self antigens and are retained within the bone marrow of the animal as a quiescent cell population waiting for stimulation from passing stimulated antigen-presenting cells. Stimulation of B cells requires both the presence of macrophages and also T lymphocytes. T cells are descended from an alternative lymphocyte lineage to the B cells and are responsible for ‘helping’ and ‘suppressing’ the immune response. T cells also undergo clonal selection during development to ensure that they do not recognise self antigens. In addition they are positively selected to ensure that they do recognise proteins of the major histocombatibility complex (MHC) found on cell surfaces. The balance of appropriate immune response is governed by the interplay of T and B lymphocytes along with macrophages and other antigen-presenting cells to ensure that an individual is protected but not endangered by inappropriate responses. After birth, exposure to foreign materials will cause an immediate response resulting in antibody production and secretion by B cells. The antibody binds to the target and marks it as foreign and it is then removed by the body. Macrophages are responsible for much of the removal of foreign material which they ingest by phagocytosis (an uptake system that some cells use to transport particles from outside the surface membrane into the cell body). The material is then digested and exported to the cell surface as small fragments (antigens) which are then presented to passing B cells. Should a B cell carry an antibody that binds the antigen then it will take the antigen from the macrophages and this causes a number of intracellular changes known as B cell activation. B cell activation involves the recruitment of T cells which stimulates cell growth and metabolism. B cell activation may also occur without the presence of macrophages or other presenting cells when the lymphocyte is directly

267

7.1 Introduction

exposed to antigens. The stimulation of the B cells leads to two major changes apart from antibody secretion. It leads to a larger population of cells being retained in the bone marrow that recognise the antigen. These are known as memory cells as they have the ability to recognise and rapidly respond should the antigen be encountered again. Binding antigen leads to cell division and antibody secretion but it also causes the cells to refine the quality of the antibody they produce. Avidity is the strength with which the antibody sticks to the antigen and affinity is the ‘fit’ of the antibody shape to the target. Both of these can be improved after the B cells have been stimulated but require more than one exposure to the antigen. The process is known as affinity maturation and is characterised by a change in antibody type from predominantly low-affinity pentameric (five molecules linked together) immunoglobulin M (IgM) to the high-affinity immunoglobulin G (IgG). Other antibody types may be produced in specific tissues and in response to particular antigens. For example parasites in the intestines often induce high levels of IgE in the gut mucosa (innermost layer of the gut which secretes large amounts of mucous). After several encounters with an antigen a background level of specific antibody will be found in the animal’s blood along with a population of memory cells capable of rapidly responding to its presence by initiating high levels of antibody secretion. This status is known as immune and is the basis of both artificial immunisations for protection against disease and also for the production of antibodies for both diagnostic and therapeutic use.

7.1.2 Harnessing the immune system for antibody production There are two major classes of antibodies used in immunochemistry: these are polyclonal and monoclonal. Polyclonal antibodies are produced in animals by injecting them with antigens. They are derived from the animal blood serum and their name means that they have been produced by many clones. This refers to the fact that the B cells that have made them will be producing antibodies to many different epitopes on the antigen and will involve the secretion of antibody by many B cell clones. Polyclonal antibodies are essentially a population of antibody molecules contributed by many B cell clones. Monoclonal antibodies are produced by animal cells artificially in tissue culture, and as their name suggests the antibody produced comes from a single cell clone. The cells that make them are known as hybridomas and are produced from the fusion of a cancer cell line and B cells. Monoclonal antibodies are epitope specific whereas polyclonal antibodies are antigen specific. This difference is fundamental to the way in which they can be used for both diagnostics and therapeutics. Mammals will produce antibodies to practically any foreign material that is introduced into their bodies providing it has a molecular weight greater than 5 000 Da. The only restriction to this is antigens that are closely related to substances found in the animal itself. Many mammalian proteins and other biochemical substances are highly conserved and are antigenically very similar in many species. This can lead to problems in producing antibodies for diagnostic and therapeutic use. The immune system is incapable of mounting a response to ‘self ’ as discussed earlier and because of this, some antigens may not be able to produce an antibody response

268

Immunochemical techniques

in some species. Providing that the antigen is large enough and that it does not resemble proteins in the host animal then antibodies can be produced to a huge number of substances which can be used in all branches of diagnostics and therapeutics. There are three types of antibodies that can be produced: these are polyclonal, monoclonal and recombinant. Each of these antibody types has advantages but also limitations and should be viewed as complementary to each other as each has specific areas where they are particularly useful. Polyclonal antibodies are produced in a number of animal species. Antibodies are generated by immunising the host with the substance of interest usually three or four times. Blood is collected on a number of occasions and the antibody fraction purified from the blood serum. The exception to this is chicken polyclonal antibodies which are harvested from eggs. Generally, larger animals are used since antibody is harvested from the blood of the animal and bigger volumes can be obtained from larger species. Historically, the first antibodies produced artificially for diagnostic purposes were polyclonal. They are the cheapest of antibodies to produce and have many uses in diagnostics. They have limited use in therapeutics, however, as there are problems in that they themselves can be antigenic when injected into other animals. There are exceptions to this and neutralising antibodies to snake venom and prophylactic (reducing risk of infection) antiviral injections fall into this category. Polyclonal antibodies are cheap to produce, robust but less specific than other antibodies and will have variable qualities depending on the batch and specific donor animal. Monoclonal antibodies are secreted by mammalian cells grown in synthetic medium in tissue culture. The cells that produce them are known as hybridomas and are usually derived from donor mouse or rat lymphocytes. Human monoclonal antibodies are also available but they are produced by different methodologies to the rodent ones. The murine system was first described in 1976 when Kohler and Milstein published their work. Monoclonal antibodies have radically altered the possible uses for antibodies in both diagnostics and therapeutics. The basis of the technology is the creation of the hybridoma by fusing antibody-secreting B lymphocytes from a donor animal to a tumour cell line. B lymphocytes have a limited lifespan in tissue culture but the hybridoma has immortality conferred by the tumour parent and continues to produce antibody. Each hybridoma is derived from a single tumour cell and a single lymphocyte and this has to be ensured by cloning. Cell cloning is the process where single cells are grown into colonies, in isolation from each other so that they can be assessed and the best chosen for future development. Once cloned, the cell lines are reasonably stable and can be used to produce large quantities of antibody which they secrete into the tissue culture medium that they are grown in. The antibody they produce has the qualities that the parent lymphocyte had and it is this uniqueness that makes monoclonal antibodies so useful. During immunisation the B cells are presented with antigen fragments by macrophages and other antigen-presenting cells and each cell then produces a specific antibody to the fragment it has been presented with. The specific site that the antibody recognises is known as an epitope which is approximately 15 amino acids in size. There are thousands of potential epitopes on the antigen. The cell fusion process generates many

269

7.1 Introduction

hundreds of hybridoma clones, each making an individual antibody. The most important part of making hybridomas is the screening process that is used to select those of value. Monoclonal antibodies are epitope specific and so it is important that the screening process takes this into account to ensure that antibodies selected have the correct qualities needed for the final intended use. They can be used for human and veterinary therapeutics although they are antigenic if used unmodified. Monoclonal antibodies can be processed to modify antigenicity to make them more useful in therapeutics. Mouse hybridomas can also be engineered so that the antibodies that they make have human sequences in them. These humanised antibodies have been used very successfully for treating a range of human conditions including breast cancer, lymphoma and the rejection symptoms after organ transplantation. Monoclonal antibodies are more expensive to produce than their polyclonal counterparts but have qualities that can make them more valuable. They are highly specific and reasonably robust but may be less avid than polyclonal antibodies. They are produced from established cell lines in tissue culture and should show little in the way of batch variation. Recombinant antibodies are produced by molecular methodologies and are expressed in a number of systems, both prokaryotic and eukaryotic. Attempts have also been made to express antibodies in plants and this has had some success. The idea of producing antibody in crop plant species such as potato is very attractive as the costs of growing are negligible and the amounts of antibody produced could be very large. Two basic methods can be used to produce recombinant antibodies. Existing DNA libraries can be used to produce bacteriophage expressing antibody fragments on their surface. Useful antibodies can be identified by assay and the bacteriophage producing it then used to transfect the antibody DNA into a prokaryotic host cell type. The antibody can then be produced in culture by the recombinant cells. The antibodies produced are monoclonal but do not have the full structure of those expressed by animals or cell lines derived from them. They are less robust and as they are much smaller than native antibodies it may not be possible to modify them without losing binding function. The great advantage of using this system is the speed with which antibodies can be generated, generally in a matter of weeks. Typically the timescale for producing monoclonal antibodies from cell fusions is about 6 months. Antibodies can also be generated from donor lymphocyte (B cell) DNA. The highest concentration of B cells is found in the spleen after immunisation and so this is the tissue usually used for DNA extraction. The antibody-coding genes are then selectively amplified by polymerase chain reaction (PCR) and then transfected (inserted into DNA) into a eukaryotic cell line. Usually a resistance gene is co-transfected so that only recombinant cells containing antibody genes will grow in culture. The cells chosen for this work are often those most easily grown in culture and may be rodent or other mammalian lines. Chinese hamster ovary (CHO) cells are often used for this and have become the industry standard amongst biotechnology companies. Yeasts, filamentous algae and insect cells have all also been used as recipients for antibody genes with varying degrees of success.

270

Immunochemical techniques

7.1.3 Antibody structure and function Antibodies as they are found in nature are all based on a Y-shaped molecule consisting of four polypeptide chains held together by disulphide bonds (see Fig. 7.2). There are two pairs of chains, known as heavy (H) and light (L); each member of the pair is identical. Functionally the base of the Y is known as the constant region and the tips of the arms are the variable region. The amino acid structure in the constant region is fairly fixed in an individual but varies between animal species. The amino acid structure in the variable region is composed of between 110 and 130 amino acids and it is variations in these that forms the different binding sites of the antibodies. The ends of both the heavy and light chains are variable and the antigen-binding site is formed by a combination of the two. The variable part of the antibody contains two further areas, the framework and hypervariable regions. There are three hypervariable and four framework regions per binding site. The hypervariable regions are structurally supported by the framework regions and form the area of direct contact with the antigen. Antibodies can be fragmented by enzymatic degradation and the subunits produced are sometimes used to describe portions of the antibody molecule. Treatment with the enzyme papain gives rise to three fragments: two antigen-binding fragments (Fab) and one constant fragment (Fc). The enzyme digests the molecule at the hinge region and the resulting Fab fragments retain their antigen-binding capability. The Fc fragment has no binding region and has no practical use. Fab fragments are sometimes prepared and used for some immunochemical applications. Their smaller size may mean that they can bind to antigens in certain situations where the larger native molecule would have difficulty binding. en te tig g si n A in nd bi

Variable regions

VH VL Fv CH1 Fab

Hinge

CL

L

L CH2

Constant regions Fc

3

CH

H

Fig. 7.2 Immunoglobulin G.

H

271

7.1 Introduction

Immunoglobulin G

Immunoglobulin A

Immunoglobulin E on mast cell

Immunoglobulin M

Immunoglobulin D on B cell

Fig. 7.3 Immunoglobulin classes.

Treatment with the enzyme papain gives rise to one double antigen-binding fragment (F(ab0 )2) and multiple Fc fragments. The enzyme digests the Fc until it reaches the hinge region of the antibody which is protected by a disulphide bond. F(ab0 )2 fragments have two binding sites and can be used in place of native antibody molecules. There are five major classes of antibody molecule, also known as immunoglobulins (Ig); G is the commonest and it is characterised by its Y-shaped structure. The other classes of antibody are immunoglobulins M, A, D and E (Fig. 7.3). Immunoglobulin M (IgM) is produced early in immune responses. It is produced by immature and newly activated B lymphocytes that have been exposed to an antigen for the first time. It is found on the surface of B cells frequently in association with IgD. Structurally it is formed from five immunoglobulin G molecules in a ring complexed by a mu chain. It may also be found as a hexamer without the mu chain. The molecule tends to have low affinity and poor avidity to antigen. It is much less specific and will react to a range of antigens without immunisation having taken place. It is known as ‘natural antibody’ as a result. Its production rises dramatically after first exposure to antigen and is characteristic of the primary immune response. It is generally only found in serum as its large size prevents it from crossing tissue boundaries. The pentameric form is particularly useful for complexing antigen such as bacteria into aggregates either for disposal or for further processing by the immune system. Cells secreting IgM can progress to IgG production, in time, if the animal is challenged again by the antigen. This progression to IgG production is known as affinity maturation and requires maturation of the cells to memory cell status. After several encounters with an antigen a background level of specific antibody will be found in the animal’s blood along with a population of B memory cells capable of rapidly responding to its presence by initiating high levels of antibody secretion. This status is known as immune and is the basis of both artificial immunisations for protection against disease and also for the production of antibodies for both diagnostic and therapeutic use. A status of hyperimmunity may be reached after repeated exposure to an antigen leading to extremely high levels of circulating

272

Immunochemical techniques

antibody. Hyperimmunity carries risk, as additional exposure to antigen can lead to anaphylactic shock due to the overwhelmingly large immune response. Conversely, a total loss of immune response to an antigen can occur after repeated immunisation as a state of immune tolerance to the antigen is reached. Acquired immune tolerance is a response to overstimulation by an antigen and is characterised by a loss of circulating B cells reactive to the antigen and also by a loss of T cell response to the antigen. This can be used therapeutically to protect individuals against allergic responses. Immunoglobulin A (IgA) is a dimeric form of immunoglobulin essentially with two IgG molecules placed end to end with the binding sites facing outwards. They are complexed with a J chain. It is predominantly found in secretions from mucosa and is resistant to enzyme degradation due to its structure. It is primarily concerned with protection of the mucosal surface of the mouth, nose, eyes, digestive tract and genitourinary system. It is produced by B cells resident in the mucosa and is directly secreted into the fluids associated with the individual tissues. It is of little use in immunochemistry as it cannot be purified easily and is prone to spontaneous aggregation. Occasionally a hybridoma is derived secreting antibodies with this isotype and it may be that this is the only source of a rare antibody. In this case an indirect assay may be developed using the tissue culture supernatant derived from the hybridoma along with a specific anti-IgA antibody–enzyme conjugate. Immunoglobulin D (IgD) is an antibody resembling IgG and is found on the surface of immature B cells along with IgM. It is a cellular marker which indicates that an immature B cell is ready to mount an immune response and may be responsible for the migration of the cells from the spleen into the blood. It is used by the macrophages to identify cells to which they can present antigen fragments. Immunoglobulin E (IgE) also resembles IgG structurally and is produced in response to allergens and parasites. It is secreted by B lymphocytes and attaches itself to the surface of specialised cells known as mast cells. Exposure to allergen and its subsequent binding to the IgE molecules on the cell surface cause the antibodies to cross-link and move together in the cell membrane. This cross-linking causes the cell to degranulate releasing histamine. Histamine is responsible for the symptoms suffered by individuals as a result of exposure to allergens. Immunoglobulin G and to a lesser extent IgM are the only two antibodies that are of practical use in immunochemistry. IgG is the antibody of choice used for development of assays as it is easily purified from serum and tissue culture medium. It is very robust and can be modified by labelling with marker molecules (see Section 7.4) without losing function. It can be stored for extended periods of time at 4  C or lower. Occasionally antigens will not generate IgG responses in vivo and instead IgM is produced. This is caused by the antigen being unable to activate the B cells fully and as a result no memory cells being produced. Such antigens are often highly glycosylated and it is the large number of sugar residues that block the full activation of the B cells. IgM can be used for assay development but is more difficult to work with. IgM molecules tend to be unstable and are difficult to label without losing function as the binding sites become blocked by the proximity of each other. This is known as stearic hindrance. They can be used directly from cell tissue culture supernatant in assays with an appropriate secondary anti-IgM enzyme conjugate.

273

7.2 Making antibodies

7.2 MAKING ANTIBODIES All methods used in immunochemistry rely on the antibody molecule or derivatives of it. Antibodies can be made in various ways and the choice of which method to use is very much dependent on the final assay format. For an antibody to be of use it has to have a defined specificity, affinity and avidity as these are the qualities that determine its usefulness in the method to be used. There are considerable cost differences in producing the various antibody types and it is important to remember that the most expensive product is not always the best.

7.2.1 Polyclonal antibody production Polyclonal antibodies are raised in appropriate donor animals, generally rabbits for smaller amounts and sheep or goats for larger quantities. Occasionally rats or mice can be used for small research quantities of antibody. It is important that animals are sourced from reputable suppliers and that they are housed and managed according to domestic welfare legislation. Usually antigens are mixed with an appropriate adjuvant prior to immunising the animals. Adjuvants are substances which increase the immunogenicity of the antigen and are used to reduce the amount of antigen required as well as stimulate specific immunity to it. Adjuvants may be chemicals such as detergents and oils or complex proprietary products containing bacterial cell walls or preparations of them. Preimmune blood samples are taken to provide baseline IgG levels (Fig. 7.4). Immunisations are spaced at intervals to maximise antibody production usually at least 4–6 weeks apart although the first two may be given within 14 days. Blood samples are taken 10 days after the immunisation programme is complete and the serum tested for specific activity to antigen by a method such enzyme-linked immunosorbent assay (ELISA) (see Section 7.3). Usually a range of doubling serum dilutions are made (1/100–1/12 800) and tested against the antigen. Serum from a satisfactory course of immunisations will detect antigen at 1/6400 dilution indicating high levels of circulating antibody. Once a high level of circulating antibody is detected in test bleeds then donations can be taken. Animal welfare legislation governs permissible amounts and frequency of bleeds. Donations can be taken until the antibody titre begins to drop and if necessary the animal can be immunised again and a second round of donations taken. Blood donations are allowed to clot and the serum collected. Individual bleeds may be kept separate or pooled to provide a larger volume of standard product. Serum can be stored at 4  C or lower for longer periods. It is also possible to produce antibodies in chicken eggs. Avian immunoglobulin is known as immunoglobulin Y (IgY) and chickens secrete it into eggs to provide protection for the developing embryo. This can be utilised for effective polyclonal antibody production. The chickens are immunised three or four times with the antigen and the immune status monitored by test bleeds. Eggs are collected and can yield up to 50 mg antibody per yolk. The antibody has to be purified from the egg yolks prior to use and a number of proprietary kits can be used to do this. Occasionally antigens that give a poor response in mammals can give much higher yields in chickens.

274

Immunochemical techniques

Pre-immunisation bleed

Test by ELISA to establish basic immunity

Immunisation 1 day 0

Immunisation 2 day 14

Immunisation 3 day 44

Test bleed day 55

Test quality of serum by ELISA

Immunisation 4 Day 65

If poor, immunise again

If good, collect donations

Fig. 7.4 Immunisation schedule.

Small quantities of very pure polyclonal antibodies can be produced in rats and mice in ascitic fluid. Ascites is a mammalian response to a tumour within the peritoneum (cavity containing the intestines). Fluid similar to plasma is secreted into the cavity of the animal and contains very high levels of the antibodies that the animal is currently secreting in its blood. Animals are immunised with the antigen of interest and once a high serum level is detected then ascitic fluid production is induced. Non-secretory myeloma cells such as NS-0 are introduced into the peritoneal cavity of the animal by injection and allowed to grow there. The presence of the tumour cells causes the animal to produce ascitic fluid which contains high levels of immunoglobulins to the original antigen. The fluid is removed by aspiration with a syringe and needle usually on three or four occasions over a month or so.

7.2.2 Monoclonal antibody production Mice are usually the donor animal of choice for monoclonal antibody production although rats and other rodent species may be used. They are cheap to buy and house, and easy to manage and handle. The limitation on using other species is the availability of a suitable tumour partner for performing fusions. Balb/C is the usual mouse strain used for monoclonal antibody production and most of the tumour cell lines used for fusion are derived from this mouse. Females are usually used as they can be housed together without too much aggression.

275

7.2 Making antibodies

Immunise mice

Harvest spleen cells

Harvest myeloma cells

Grow myeloma cells

Cell fusion with PEG

Test hybridomas

Discard negative cells

Clone positive cells

Establish cell banks

Grow cells for antibody production

Fig. 7.5 Monoclonal antibody production.

Mice are immunised, usually three or four times over the course of 3–4 months, by the intraperitoneal route using antigen mixed with an appropriate adjuvant (Fig. 7.5). Test bleeds can be taken to monitor the immune status of the animals. Once the mice are sufficiently immune they are left for 2–3 months to ‘rest’. This is important as the cells that will be used for the hybridoma production are memory B cells and require the rest period to become quiescent. Mice are sacrificed and the spleens removed; a single spleen will provide sufficient cells for two or three cell fusions. Three days prior to cell fusion the partner cell line NS-0 is cultured to provide a log phase culture. If rat hybridomas are to be made then the fusion partner Y3 or its derivative Y0 can be used. Cell fusions can be carried out by a number of methods but one of the most commonly used is fusion by centrifugation in the presence of polyethylene glycol (PEG). Then 26  106 cells of spleen and fusion partner are mixed together in a centrifuge tube. A quantity of PEG is added to solubilise the cell membranes and the fusion carried out by gentle centrifugation. The PEG is removed from the cells by dilution with culture medium and the cells placed into 96-well tissue culture plates at a cell density of 10  105 per well. From experience, these cell numbers will produce only a single recombinant cell capable of growth in each well. Fusion partners are required to have a defective enzyme pathway to allow selection after cell fusion. NS-0 lacks the enzyme hypoxanthine-guanine phosphoribosyl transferase (HGPRT) which prevents it from using a nucleoside salvage pathway when the primary pathway is disabled by the use of the antibiotic aminopterin. The tissue culture additive HAT which contains hypoxanthine aminopterin and thymidine is used to select for hybridomas after cell fusion. They inherit an intact nucleoside salvage pathway from the spleen cell parent which allows

276

Immunochemical techniques

them to grow in the presence of aminopterin. Unfused NS-0 cells are unable to assimilate nucleosides and die after a few days. Unfused spleen cells are unable to divide more than a few times in tissue culture and will die after a few weeks. Two weeks after the cell fusion the only cells surviving in tissue culture are hybridomas. The immunisation process ensures that many of the spleen cells that have fused will be secreting antibody to the antigen; however this cannot be relied upon and rigorous screening is required to ensure that the hybridomas selected are secreting an antibody of interest. Screening is often carried out by ELISA but other antibody assessing methods may be used. It is important that hybridomas are assessed more than once as they can lose the ability to secrete antibody after a few cell divisions. This occurs as chromosomes are lost during division to return the hybridoma to its normal chromosome quota. Once hybridomas have been selected they have to be cloned to ensure that they are stable. Cloning involves the derivation of cell colonies from individual cells grown isolated from each other. In limiting dilution cloning, a cell count is carried out and dilutions of cells in media made. The aim is to ensure that only one cell is present in each well of the tissue culture plate. The plates are incubated for 7 days and cell growth assessed after this time. Colonies derived from single cells are then tested for antibody production by ELISA. It is essential to clone cell lines to ensure that they are truly monoclonal. It is desirable that a cell line should exhibit 100% cloning efficiency in terms of antibody secretion but some cell lines are inherently unstable and will always produce a small number of non-secretory clones. Providing such cell lines are not subcultured excessively then the problem may not be too great although it is usual to reclone these lines regularly to ensure that cultures are never too far from an authenticated clone. It is very important to know the antibody isotype of the hybridomas as discussed previously and a number of commercial kits are available to do this. Most are based on lateral flow technology which will be discussed later in this chapter. Once the isotype of the antibody is established and it is clonally stable then cultures can be grown to provide both cell banks and antibody for use in testing or for reagent development. Record-keeping is absolutely vital so that the pedigree of every cell line is known. It is also very important to be vigilant in handling and labelling flasks to prevent cross-contamination of cell lines. It is usual to name cell lines and use the clone and subclone number as part of the name. One such naming system used is: / . Other naming systems are used and it is up to the individual to find one that suits them best.

7.2.3 Freezing cells Cell lines are frozen to provide a source of inoculum for future cultures. Cells cannot be grown indefinitely in culture as the required incubator space would be impractical in most tissue culture laboratories. Additionally, although established cell lines should be stable it is known that long-term culture leads to cellular instability and the increased risk of cellular change. Cells stored at the low temperatures achieved using

277

7.2 Making antibodies

liquid nitrogen vapour are stable for many years and can be resuscitated successfully after decades. Cells are transferred into a specialist medium prior to freezing to protect them both as the temperature is lowered and also as the temperature is raised when thawed. Serum containing 10% DMSO works well as a freezing medium although serum-free media can be used if required. Cells must be in perfect health and in log phase prior to freezing. A typical freezing should contain around 1  106cells and this can be assessed by performing a cell count using a counting chamber. A confluent 25-cm2 tissue culture T flask will contain approximately this many cells and for many applications it may not be necessary to carry out a cell count. The cells are harvested from the flask by tapping to dislodge them and pelleted by centrifugation to remove the culture medium. The cells are then resuspended in 1.0 ml freezing medium chilled to 4  C, placed into a cryogenic vial and transferred to a cell freezing container. The freezing container contains butan-1-ol which when placed into a 70  C freezer controls the rate of freezing to 1  C per minute. The gradual freezing is necessary so that as the ice forms within the cells it does so as a glass and not as crystals which would expand and damage the cell structure. The cells are left for a minimum of 24 h and a maximum of 72 h prior to transfer into cryogenic storage. Transfer to liquid nitrogen storage must be rapid to prevent thawing of the cells. It is imperative that the vials are permanently marked and that the storage locations within the cryogenic vessel are noted for future retrieval.

7.2.4 Cell banking Cell banks are established from known positive clones and are produced in a way that maximises reproducibility between frozen cell stocks and minimises the risk of cellular change (Fig. 7.6). A positive clone derived from a known positive clone is rapidly expanded in tissue culture until enough cells are present to produce 12 vials of frozen cells simultaneously. This is the master cell bank and is stored at 196  C under liquid nitrogen vapour. The working cell bank is then derived from the master cell bank. One of the frozen vials from the working cell bank is thawed and rapidly grown until there are enough cells to produce 50 vials of frozen cells simultaneously. This is the working cell bank and it is also stored at 196  C under liquid nitrogen vapour. This strategy ensures that all of the vials of the working cell bank are identical. All of the vials of the master cell bank are also identical and if a new working bank is required then it can be made from another vial from the master. Cell banks work well if managed correctly but record-keeping is vital for their operation. A cell bank derived in this way will provide 550 working vials before the process of deriving a new master cell bank is required. If a new master cell bank is required this is produced by thawing and cloning from the last master cell bank vial and selecting a positive clone for expansion.

7.2.5 Antibodies to small molecules The immune system will recognise foreign proteins and peptides providing that they have a molecular weight (mw) greater than about 2 000 Da (although above 5 000 Da is

278

Immunochemical techniques

Hybridoma cell line

Clone cells

ELISA test then choose one clone to expand

12 simultaneous freezings for master cell bank

Thaw last freezing and grow cells

50 simultaneous freezings for working cell bank

Fig. 7.6 Master and working cell banks.

optimal). The magnitude of the response will increase the greater the molecular weight. If an antibody is to be made to a molecule smaller than 2 000 Da then it has to be conjugated to a carrier molecule to effectively increase its size above the threshold for immune surveillance. These small molecules are known as haptens and may be peptides, organic molecules or other small chemicals. They are usually conjugated to a protein such as albumin, keyhole limpet haemocyanin or thyroglobulin and then used to immunise animals for antibody production (Fig. 7.7). If a polyclonal antibody is being made it is advisable to change the carrier protein at least once in the immunisation procedure as this favours more antibody being made to the hapten and less to each of the carrier proteins. If a monoclonal antibody is being made then the carrier protein can be the same throughout the immunisations. When screening hybridomas for monoclonal antibody production it is necessary to screen against the hapten and carrier separately. Any antibodies responding to both should be discarded as these will be recognising the junction between the hapten and carrier and will not recognise the native hapten.

279

7.2 Making antibodies

Hapten not recognised by immune system

Antibodies made that recognise conjugate and also hapten by itself Hapten conjugated to carrier protein

Hapten conjugate recognised by immune system

Fig. 7.7 Making antibodies to haptens.

7.2.6 Anti-idiotype antibodies The binding site of an anti-idiotype antibody is a copy of an epitope. They are made by deriving a primary monoclonal antibody to the epitope of interest, usually a cell membrane receptor or other important binding site. These primary antibodies are then themselves used as antigen to produce secondary antibodies, some of which will recognise the binding site on the primary antibody. These are the anti-idiotype antibodies and they have the unique quality that their binding site structurally resembles the original epitope. They themselves can be used as vaccines as the immune response raised to them will cross-react with the native original epitope. Some human cancers have cell surface receptors that are unique to them and these can be used as a target for antibody therapy. Anti-idiotype antibodies raised to the cell receptors are used to immunise the patient. The resulting antibodies made by the patient bind to the receptors on the tumour cells allowing the immune system to recognise and destroy the tumour. The method has had some success in the treatment of ovarian and bowel tumours.

7.2.7 Phage display for development of antibody fragments Bacteriophage or phage, as they are known, are viruses that infect and replicate within bacteria. They can be engineered by molecular methods to express proteins and providing the protein sequence is tagged to the coat protein gene then the foreign protein will be expressed on the virus surface. It is possible to isolate the variable (V) antibody coding genes from various sources and insert these into the phage resulting in single-chain antigen-binding (scFv) fragments. Whole antibodies are too large and complex to be expressed by this system but the scFv fragments can be used for diagnostic purposes. The DNA used in this process may come from immunised mouse B cells or from libraries derived from naive mouse (or other species). The V genes are cloned into the phage producing a library which is then assessed for specific activity. It is important to isolate clones that have the specific activity that is required and this can be done by immobilising the antigen onto a solid surface and then adding phage

280

Immunochemical techniques

clones to the immobilised antigen. The clones that bind to the antigen are desirable and those that do not bind are washed away. This technique is known as panning and refers to the technique used by nineteenth-century gold prospectors who washed gravel from rivers, using shallow pans that retained gold fragments. Once clones have been selected for antibody expression they can be multiplied in their host bacterium in liquid culture. The scFv fragments can be harvested using proprietary extraction kits and used to develop ELISA and other immunoassays.

7.2.8 Growing hybridomas for antibody production Cell growth and storage is carried out for the development of cell banks but hybridomas are primarily grown for their products, monoclonal antibodies. All monoclonal antibodies are secreted into the tissue culture media that the cells are growing in. There are a number of ways that cells can be grown to maximise antibody yield, reduce media costs and simplify purification of the product from tissue culture medium. The simplest method for antibody production is static bulk cultures of cells growing in T flasks. T flasks are designed for tissue culture and have various media capacities and cell culture surface areas. For most applications a production run is between 250 ml and 1000 ml medium. Most cell lines produce between 4 and 40 mg of antibody per litre so the size of the production run is based on requirement. The cells from a working cell bank vial are thawed rapidly into 15 ml medium containing 10% foetal bovine serum and placed in an incubator at 37  C supplemented with 5% CO2. Once cell division has started, the flask sizes are increased using medium supplemented with 5% foetal bovine serum until the desired volume is reached. Once the working volume has been achieved the cells are left to divide until all nutrients are utilised and cell death occurs. Usually the timespan for this is around 10 days. The cell debris can then be removed by centrifugation and the antibody harvested from the tissue culture medium. For some applications the antibody can be used in this form without further processing. Monoclonal antibodies can also be produced in ascitic fluid in mice. As described previously, cells can be grown in the peritoneal cavities of mice. Nude mice have no T cells and because of this have poor immune systems. They are often used for ascitic fluid production as they do not mount an immune response to the implanted cells. The mice should not be immunised prior to use as it is important that the only antibody present in the ascitic fluid is derived from the implanted cells. Hybridoma cells are injected into the peritoneum of the mice and allowed to grow there. These cells are secretory and produce high levels of monoclonal antibody in the ascitic fluid. The fluid is harvested by aspiration with a syringe and needle. A number of in vitro bioreactor systems have been developed to produce high yields of monoclonal antibody in small volumes of fluid which mimics ascitic fluid production (Fig. 7.8). All of them rely on physically separating the cells from the culture medium by semipermeable membrane which allows nutrient transfer but prevents monoclonal antibody from crossing. The culture medium can be changed to maximise cell growth and health, and fluid can be removed from around the cells to harvest antibody. Some are based on a rotating cylinder with a cell-growing compartment at

281

7.2 Making antibodies

Harvesting port Media vessel

Cell growth vessel

Air filter

Rotating bioreactor utilising two compartments to separate growth medium from cells and antibodies

Semi permeable membrane Harvesting port

Hollow fibre bioreactor utilising a capillary network to separate growth medium from cells and antibodies

Media flow through capillary spaces

Cell growth around capillaries

Fig. 7.8 Bioreactors for antibody production.

one end separated from the media container by a membrane. Others have capillary systems formed from membrane running through the cell culture compartment and in these the media is pumped through the cartridge to facilitate nutrient and gas exchange. These systems do produce high yields of antibody but can be problematical to set up and run. They are ideal where large quantities of monoclonal antibody are needed and space is at a premium. They are however prone to contamination by yeasts and great care must be exercised when handling them. Cells are grown in bioreactors for up to 6 weeks so the clone used must be stable and it is advisable to carry out studies on long-term culture prior to embarking on this form of culture. The major advantage of bioreactor culture is that the antibody is produced in high concentration without the presence of media components making it easy to purify. Total quantities per bioreactor run may be several hundred milligrams to gram quantities.

7.2.9 Antibody purification The choice of method used for the purification of antibodies depends very much on the fluid that they are in. Antibody can be purified from serum by the addition of chaotropic ions in the form of saturated ammonium sulphate. This preferentially precipitates the antibody fraction at around 60% saturation and provides a rapid method for IgG purification. This method does not work well in tissue culture supernatant as media components such as ferritin are co-precipitated. Ammonium sulphate precipitation may be used as a preparatory method prior to further chromatographic purification.

282

Immunochemical techniques

Antibody in medium added to column

Antibody binds to protein A/G on beads in column

Column washed to remove contaminants

Elution buffer added to release antibody

Fig. 7.9 Affinity chromatography.

Tissue culture supernatant is often concentrated before purification to reduce the volume of liquid. Tangential flow devices and centrifugal concentrators may be used to reduce the volume to 10% of the starting amount. This makes antibody purification by affinity chromatography much easier with the smaller volume of liquid (Fig. 7.9). Antibodies from both polyclonal and monoclonal sources can be purified by similar means. In both cases the antibody type is IgG which allows purification by protein A/G affinity chromatography. Proteins A and G are derivatives of bacterial cells and have the ability to reversibly bind IgG molecules. Binding to the column occurs at neutral pH and the pure antibody fraction can be eluted at pH 2.0. Fractions are collected and neutralised back to pH 7.0. Antibody-containing fractions are identified by spectrophotometry using absorbance at 280 nm (specific wavelength for protein absorbance) and are pooled. A solution of protein at 1 mg cm3 will give an absorbance reading of 1.4 at 280 nm. This can be used to calculate the amount of antibody in specific aliquots after purification. Purified antibody should be adjusted to 1 mg cm3 and kept at 4  C, or 20  C for long-term storage. It is usual to add 0.02% sodium azide to the antibody solution as this increases shelf-life by suppressing the growth of adventitious microorganisms. Antibodies can be stored for several years at 4  C and for decades if kept below 20  C without losing activity.

7.2.10 Antibody modification Antibodies can be labelled for use in assays such as ELISA by the addition of marker enzyme such as horse radish peroxidise (HRP) or alkaline phosphatase (AP). Other enzymes such as urease have been used but HRP and AP are by far the most

283

7.3 Immunoassay formats

popular. Linkage is achieved by simple chemistry to provide stable antibody–enzyme conjugates. Glutaraldehyde is a cross-linking compound and conjugation to HRP is carried out in two stages. Firstly the glutaraldehyde is coupled to reactive amino groups on the enzyme. The HRP–glutaraldehyde is then purified by gel permeation chromatography and added to the antibody solution. The glutaraldehyde reacts with amino groups on the antibody forming a strong link between the antibody and HRP. Alkaline phosphatase can be linked to antibody by glutaraldehyde using a one-step conjugation. The linkage is achieved through amino groups on the antibody and on the enzyme coupled with the glutaraldehyde. A number of proprietary labelling reagents are also available for making antibody enzyme conjugates. Fluorescent labels can also be added for use in immunofluorescent assays; usually fluorescein is the molecule of choice. Fluorescein isothiocyanate (FITC) is often the derivative used to label antibodies. FITC is a fluorescein molecule with an isothiocyanate reactive group (–N¼C¼S) replacing one of the hydrogens. This derivative is reactive towards primary amines on proteins and will readily react with antibodies to produce fluorescent conjugates. It is also possible to link antibodies to gold particles for use in immunosorbent electron microscopy (ISEM) and lateral flow devices. Gold particles are prepared by citrate reduction of auric acid. The size of particle is predictable and can be controlled by pH manipulation. The gold particles are reactive and will bind antibodies to their surface forming immunogold. The immunogold particles are stable and can be stored at 4  C until required. Rare earth lanthanides can also be used as labels and have the advantage that a single assay can be used to detect two or three different antibody bindings. The lanthanides are attached to the antibodies as a chelate. The commonest of the chelating compounds used is diethylenetriamine pentaacetate (DTPA). Each lanthanide fluoresces at a different light frequency and so multiple assays can be carried out and the individual reactions visualised by the use of a variable wavelength spectrophotometer. Antibodies can also be attached to latex particles either by passive absorption or to reactive groups or attachment molecules on the surface of the latex. These can be used either as the solid phase for an immunoassay or as markers in lateral flow devices. Magnetised latex particles are available allowing the easy separation of latex particle/antibody/antigen complex from a liquid phase. Latex and magnetic particles may be purchased which have protein A covalently attached to their surface. Protein A binds the Fc portion of the antibody which orientates the molecules with the binding sites facing outwards.

7.3 IMMUNOASSAY FORMATS The first immunoassay formats described were methods based on the agglutination reaction (Fig. 7.10). The reaction between antibody and antigen can be observed when agglutination occurs and is characterised either by a gel formation in a liquid phase or

284

Immunochemical techniques

Antibody excess no agglutination

Antigen excess no agglutination Agglutination at point of equivalence

Fig. 7.10 Agglutination reaction.

as an opaque band in an agar plate assay. Agglutination only occurs when there is the right amount of antibody and antigen present. It relies on the fact that as an antibody has two binding sites then each of them can be bound to different antigen particles. As this happens bridges are formed created by the antibody molecules spanning two antigen molecules. The resulting lattice that is created forms a stable structure where antigen and antibody particles are suspended in solution by their attachment to each other. For this to take place there must be a precise amount of antigen and antibody present and this is known as equivalence. If too much antibody is present then each antigen molecule will bind multiple antibodies and the meshwork will not develop. If too much antigen is present then each antibody will bind only one antigen particle and no lattice will form. For this reason a dilution series of antibody is often made and a measure of antigen concentration can be made from the end point at which agglutination occurs. Modifications of the agglutination reaction involve the use of antibody bound to red blood cells or latex particles which allow the reaction to be observed more easily in a liquid phase. Agglutination immunoassays are still used as they provide rapid results with the minimum of equipment. They are commonly used for the detection of viral antigens in blood serum. In commercial tests the antibody concentration in the reagent is provided at a working dilution known to produce a positive for the normal range of antigen concentration. The Ouchterlony double diffusion agar plate method is the commonest gel-based assay system used (Fig. 7.11). Wells are cut in an agar plate which is then used to load samples and an antibody solution. The antigens and the antibody diffuse through the agar and if the antibody recognises antigen within the gel then a precipitin band is formed. A diffusion gradient is formed through the agar as the reagents progress and so there is no requirement for dilutions to be made to ensure that agglutination occurs.

285

7.3 Immunoassay formats

Antigen

Antibody

Precipitin band

Band occurs where antigen and antibody form immunoprecipitate

Fig. 7.11 Ouchterlony double diffusion plate.

Polyclonal antibodies can be made to different subspecies of bacteria and these will recognise surface epitopes on them. Because the subspecies are related then they will share some surface epitopes and the closer they are related the more they will share. Antibodies made to one organism will therefore react to a greater or lesser degree to a related organism depending on how many surface epitopes they share. This has given rise to a systematic method of identification known as serotyping which is based on the reaction of microorganisms to antibodies. The system works well with closely related organisms but is not definitive, as it only assesses surface markers. The pattern of precipitin bands obtained to reference antibodies is specific and can be used to assign samples into serotypes. The method was used until very recently to characterise Salmonella strains as their pathogenicity can be assessed according to their relatedness to known strains. Salmonellas from food and water samples were tested by this method to establish if they had been the cause of food poisoning incidents. Agglutination reactions using sensitised erythrocytes (red blood cells) or latex particles are carried out in liquid phase usually in small tubes or more recently in round-bottom microtitre plates. As discussed before, the agglutination reaction only occurs at the point of equivalence. A positive agglutination test appears cloudy to the eye as the erythrocytes or latex particles are suspended in solution. A negative result is characterised by a ‘button’ at the bottom of the reaction vessel which is formed from non-reacted particles. A negative result may be obtained from excess antigen or antibody, as the binding reaction favours the production of small aggregates of antigen/marker particles rather than the agglutination gel.

7.3.1 Enzyme immunosorbent assays By far, the vast majority of immunoassays carried out fall into the category of enzyme immunosorbent assays. These are routinely used for the diagnosis of infectious agents

286

Immunochemical techniques

Fig. 7.12 A microtitre plate. (See also colour plate.)

such as viruses, and other substances in blood. The antigen is the substance or agent to be measured. In this technique the antigen is immobilised on to a solid phase, either the reaction vessel or a bead. The most commonly used solid phase is the enzyme linked immunosorbent assay (ELISA) plate. Immobilisation is achieved by the use of a coating antibody which actively traps antigen to the solid phase. A second antibody (antibody enzyme conjugate) which is labelled with a reporter enzyme is allowed to bind to the immobilised antigen. The enzyme substrate is then added to the antigen/ antibody/enzyme complex and a reaction, usually involving a colour change, is seen (Fig. 7.12, see also colour section). There are many permutations of this method but all of them rely on the antibody–antigen complex being formed and the presence of it being confirmed by the reactions of the reporter enzyme. These assays rely on a stepwise addition of layers with each one being linked to the one before. The antigen is central to the assay as it provides the bridge between the solid phase and the signalgenerating molecule. Without antigen, the antibody enzyme conjugate cannot be bound to the solid phase and no signal can be generated. The coating antibody also concentrates the antigen from the sample as it binds the antigen irreversibly and so the coating layer has the ability to concentrate available antigen until saturation has been reached. This is particularly useful when testing for low levels of antigens in fluids such as blood serum.

287

7.3 Immunoassay formats

ELISA plate coated with antibody

Sample incubated on plate

Antigen trapped by antibody

Anti-species antibody conjugate incubated on plate

Secondary antibody incubated on plate

Substrate added to plate causing colour change in positive wells

Fig. 7.13 TAS ELISA.

7.3.2 Triple antibody sandwich ELISA (TAS ELISA) Triple antibody sandwich (TAS) ELISA, also known as indirect ELISA, is a widely used method (Fig. 7.13). It is often used to identify antibodies in patient blood which may be there as the result of infection. As with other immunoassays, layers of reagents are built up, each dependent on the binding of the previous one. The system is used to test patient blood for the presence of hepatitis B virus (HBV) antibodies as a diagnostic test for this disease. In this test HBV coating antibody is bound to the wells of a microtitre plate and HBV coat protein added to them. The live virus is not used as antigen as this would be too dangerous to use in the laboratory. HBV coat protein is made synthetically specifically for use as antigen in this type of test. After incubation and washing, patient serum is added which if it contains antibodies reacts to the antigen. Anti-human antibody conjugated to an enzyme marker is then added which will bind to the patient antibodies. Substrate is then added to identify samples which were positive. The test works well for the diagnosis of HBV infection and is also used to ensure that blood donations given for transfusion are free from this virus.

7.3.3 Double antibody sandwich ELISA (DAS ELISA) Double antibody sandwich (DAS) ELISA is probably the most widely used immunochemical technique in diagnostics (Fig. 7.14). It is rapid, robust, and reliable and can be performed and the results interpreted with minimal training. The principle is the same as other ELISA techniques in that the antigen is immobilised to a solid phase by a primary antibody and detected with a second antibody which has been labelled with a marker enzyme. The antigen creates a bridge between the two antibodies and the

288

Immunochemical techniques

ELISA plate coated with antibody

Sample incubated on plate

Antigen trapped by antibody

Antibody/enzyme conjugate incubated on plate

Substrate added to plate causing colour change in positive wells

Fig. 7.14 DAS ELISA.

presence of the enzyme causes a colour change in the chromogenic (colour-producing) substrate. The marker enzyme used is usually either horseradish peroxidase (HRP) or alkaline phosphatase (AP). Other enzymes have been used and claims have been made for increased sensitivity but this is at the expense of more complex substrates and buffers. In some systems the enzyme is replaced with a radioactive label and this format is known as the immunoradiometric assay (IRMA). DAS ELISA is used extensively in horticulture and agriculture to ensure that plant material is free of virus. Potato tubers that are to be used as seed for growing new crops have to be free of potato viruses and screening for this is carried out by DAS ELISA. There are many potato viruses but potato leafroll virus (PLRV) in particular causes considerable problems. PLRV antibodies are coated onto the wells of ELISA plates and then the sap to be tested is added. After incubation, the plates are washed and PLRV antibody conjugated to alkaline phosphatase is added. The plates are incubated and after washing, substrate is used to identify the positive wells. The system again requires the presence of the antigen (PLRV) for the sandwich of antibodies to be built up.

7.3.4 Enhanced ELISA systems The maximum sensitivity of ELISA is in the picomole range and there have been many attempts to increase the detection threshold for assays beyond this. The physical limitations are based on the dynamics of the double binding event and the subsequent generation of signal above the background substrate value. Most workers have concentrated their efforts on the amplification of signal. Antibody binding cannot be improved as it is primarily a random event modified by the individual avidities of the antibodies themselves. Some improvement in some assays can be

289

7.3 Immunoassay formats

made by temperature modification as some antibodies may perform better at specific temperatures. There are two basic ways that signal can be amplified in ELISA. More enzyme can be bound by using multivalent attachment molecules. Systems using the avidin–biotin binding system allow amplification through this route. Both avidin and biotin are tetravalent (i.e. they have four binding sites) and it is this property that produces the amplification. The detection antibody is labelled with biotin and the reporter enzyme with avidin. The high affinity and multivalency of the reagents allows larger complexes of enzyme to be linked to the detection antibody, producing an increase in substrate conversion and improved colour development in positive samples. The alternative amplification step is by enhancing the substrate reaction usually by using a double enzyme system. The primary enzyme bound to the antigen catalyses a change in the second enzyme which then generates signal. Both of these methods will increase the signal generated but may also increase the background reaction. Alkaline phosphatase conjugated secondary enzyme can be used to drive a secondary reaction involving NADP dephosphorylation to NAD which is further reduced to NADH by alcohol dehydrogenase. This in turn creates a loop in which a tetrazolium salt is oxidised as the NADH returns to NAD. The tetrazolium salt is chromogenic when in the oxidised state. The cyclic nature of the reaction causes the amplification and increases the observed colour development. Some claims have been made for ‘supersubstrates’ which work directly with enzyme–antibody conjugates but there is usually little in the way of true gain if standard curves are calculated for the various substrate types.

7.3.5 Competitive ELISA Competitive ELISA is used in assays for small molecules such as hormones in blood samples where often only a single epitope is present on the antigen (Fig. 7.15). It is quantitative when used in conjunction with a standard curve. The principle is based on competition between the natural antigen (hormone) to be tested for and an enzymeconjugated form of the antigen which is the detection reagent. The test sample and a defined amount of enzyme-conjugated antigen are mixed together and placed into the coated wells of a microtitre plate. The antigen and conjugated form of it compete for the available spaces on the coating antibody layer. The more natural antigen present the more it will displace (compete out) the conjugated form leading to a reduction in enzyme bound to the plate. The relationship of substrate colour development is therefore inverted; the more natural antigen bound the lower the signal generated. This form of ELISA is routinely used for testing blood samples for thyroxin. Thyroxin is a hormone that is responsible for regulating metabolic rate and deficiencies (hypothyroidism) and excesses (hyperthyroidism) of it will slow or speed up the metabolism. Patients can be given additional thyroxin if required if they are deficient but it is important to establish the baseline level before treating the condition. Competitive ELISA is used for this as an accurate measure of the circulating level of the hormone can be made from a standard curve of known dilutions. In some assays the enzyme is replaced with a radioactive label and this form of competitive ELISA is known as the radioimmunoassay (RIA).

290

Immunochemical techniques

Sample and conjugated antigen added

Native and conjugated antigen compete for coating antibody

Absorbance

Plate coated with antibody

Substrate added to generate coloured product

Standard curve used to interpret results

0 Concentration

Fig. 7.15 Competitive ELISA.

Competitive ELISA using conjugated antibody can also be used to quantify levels of circulating antibody in test serum. The solid phase has the antigen to which the antibody will attach directly coated on to it. The test serum and the conjugated form of antibody are mixed together and added to the reaction wells. The conjugated and test antibody then compete to bind to the antigen. The level of antibody can again be determined by the reduction in signal observed by addition of the substrate.

7.3.6 Dissociation enhanced lanthanide fluorescence immunoassay (DELFIA) DELFIA is a time-resolved fluorometric assay which relies on the unique properties of lanthanide chelate antibody labels. The lanthanides will generate a fluorescent signal when stimulated with light of a specific wavelength. The light signal generated has a long decay which enhances the negative to positive ratio of the assay. DELFIA offers a signal enhancement greater than that possible from conventional enzyme-linked assays. The lanthanide chelates are conjugated onto the secondary antibody and as there are a number of lanthanides each with a unique signal which can be used, multiplexing (more than one test carried out in the same reaction vessel) is possible. The assay is carried out similarly to standard ELISA and may be competitive or noncompetitive. The assay is concluded by adding an enhancement solution which causes dissociation of the lanthanide from the antibody molecules. The signal is generated by stimulating the lanthanide with light of a specific wavelength and measuring the

291

7.5 Lateral flow devices

resulting fluorescence. If europium is used the stimulatory wavelength is 340 nm, and the fluorescence generated is 615 nm.

7.4 IMMUNO MICROSCOPY 7.4.1 Immunofluorescent (IF) microscopy Immunofluorescent (IF) microscopy uses antibodies conjugated to fluorescent markers to locate specific structures on specimens and allows them to be visualised by illuminating them with ultraviolet light. Fluoroscein and rhodamine are the usual labels used but alternative markers are available. Fluoroscein produces a green fluorescence and rhodamine is red. Microscopes equipped to carry out IF have dual sources of light allowing the operator to view the specimen under white light before illuminating with ultraviolet to look for specific fluorescence. The technique is particularly useful for looking at surface markers on eukaryotic cells but is also used as a whole-cell staining technique in bacteriology. Membrane studies on whole mammalian cells can be undertaken and the migration, endocytosis (uptake of membrane-bound particles by cells) and fate of labelled receptors studied in real time. Bound receptors in cell membranes frequently migrate to one end of the cell prior to being endocytosed. This phenomenon is known as capping and is easily viewed in living cells using antibodies specific to cell membrane receptors labelled with a fluorescent marker.

7.4.2 Immunosorbent electron microscopy Immunosorbent electron microscopy (ISEM) is a diagnostic technique used primarily in virology. Virus-specific antibodies conjugated to gold particles are used to visualise virus particles on electron microscopes. The gold is electron-dense and is seen as a dark shadow against the light background of the specimen field. The technique can be used for both transmission or scanning systems. If gold-labelled primary antibodies are not available then anti-IgG–gold conjugated antibodies can be used with the primary antibody in a double antibody system. Both monoclonal and polyclonal antibodies can be used for ISEM depending on the required specificity.

7.5 LATERAL FLOW DEVICES Lateral flow devices (LFD) are used as rapid diagnostic platforms allowing almost instant results from fluid samples (Fig. 7.16). They are simple to use and contain all of the required components within the strip itself. They are usually supplied as a plastic cassette with a port for applying the sample and an observation window for viewing the result. The technology is based on a solid phase consisting of a nitrocellulose or polycarbonate membrane which has a detection zone which is coated with a trapping

292

Immunochemical techniques

Port for sample application Reagent pad containing antibody labelled latex beads

Capture zone coated with antibody which will bind antigen/antibody/bead complex Sample moves down device by capillarity until capture zone is reached. If antigen is present a coloured line is formed

Fig. 7.16 Lateral flow device.

antibody. The detection antibody is conjugated to a solid coloured marker, usually latex or colloidal gold, and is stored in a fibre pad which acts as a reservoir. The solid phase has a layer of transparent plastic overlaying it leaving a very narrow gap which will draw liquid by capillarity. The sample is applied to the reservoir pad through the sample port where it can react with the conjugated antibody if the specific antigen is present. The liquid then leaves the reservoir and travels up the solid phase pad to the location of the trapping antibody. If the sample contains the specific antigen then it will react to both the conjugated and trapping antibody. This results in a coloured line if the sample is positive. If the sample is negative then no coloured line develops. The system lends itself to multiplexing and up to three different antigens can be tested for simultaneously with appropriate trapping antibodies and different coloured marker particles. The technology has been applied to home pregnancy testing and various other ‘self-diagnostic’ kits. Lateral flow devices are also used by police forces and regulatory authorities for the rapid identification of recreational drugs.

7.6 EPITOPE MAPPING Epitope mapping is carried out to establish where on the target protein the antibody binds. The method works well with new monoclonal antibodies where it may be necessary to know the precise epitope to which binding occurs. This however can

293

7.8 Fluorescent activated cell sorting (FACS)

only be performed where the epitope is linear. A linear epitope is formed by amino acids lying adjacent to each other and the antibody binds to the structure that they form. Non-linear epitopes are formed from non-adjacent amino acids when they interact with each other in space, as is found in helical or hairpin structures. To carry out epitope mapping the amino acid sequence of the target protein must be known. The sequence is then used to design and make synthetic peptides each containing around 15 amino acid residues in length and overlapping with the previous one by about five residues. The synthetic peptides are then coated on to the wells of microtitre plates or onto nitrocellulose membranes and reacted with the antibody of interest. The reaction is visualised by using a secondary antibody enzyme conjugate and substrate. From the reaction to the peptides and the position of the sequence in the native protein it is possible to predict where the epitope lies and also what its sequence is.

7.7 IMMUNOBLOTTING This technique is also known as western blotting and is used to identify proteins from samples after electrophoresis. The sample may be tissue homogenate in origin or an extract of cells or other biological source. The sample may be electrophoresed under reducing or non-reducing conditions until separation is achieved. This is usually visualised by staining with a general protein stain. The separated proteins are transferred onto a nitrocellulose or polyvinyl membrane either passively or by using an electroblotter. The membrane is treated with a protein-blocking solution to prevent non-specific binding of antibody to the membrane itself. Popular blocking compounds are dried milk or bovine serum albumin. Either direct or indirect antibody systems can be used but often indirect methods are used for reasons of cost. Directly conjugating primary antibodies may be expensive and so very often anti-species enzyme conjugate is used. For indirect labelling the membrane is incubated in antibody solution and after washing, it is treated with a solution of a secondary antibody–enzyme conjugate. Both peroxidase and alkaline phosphatase have substrates that will produce a solid colour reaction on the blot where the antibodies have bound. The substrate reaction can be stopped after optimum colour development, dried and the blots stored for reference. Further details can be found in Section 10.3.8. The method is particularly useful during development of new antibodies as part of epitope mapping studies.

7.8 FLUORESCENT ACTIVATED CELL SORTING (FACS) Fluorescent activated cell sorting (FACS) machines are devices that are capable of separating populations of cells into groups of cells with similar characteristics based on antibody binding (Fig. 7.17). The technique is used on live cells and allows recovery and subsequent culture of the cells after separation. Many cell markers are known which identify subsets of cell types and specific antibodies to them are available.

294

Immunochemical techniques

Cells ( ) labelled with fluorescent marker ( )

Reading laser

Electromagnetic sorter

Cells passed through aperture to create aerosol

Separated cell populations

Fig. 7.17 Fluorescent activated cell sorting.

The method can be used quite successfully to separate normal from abnormal cells in bone marrow samples from patients with leukaemia. This can be used as a method of cleaning the marrow prior to autologous (from the patient themselves) marrow transplantation. The technique is also used for diagnostic tests where the numbers of cell subtypes need to be known. This is of particular use when looking at bloodborne cells such as lymphocytes where the ratios of cell types can be of diagnostic significance. For example, in HIV infection the numbers of specific T cell subtypes are of great diagnostic significance in the progress of the infection to AIDS. The cells are labelled with the antibodies to specific cell markers labelled with a fluorescent label and then they are passed through a narrow gauge needle to produce an aerosol. The droplet size is adjusted so that each one should contain only one cell. The aerosol is then passed through a scanning Laser which allows detection of the fluorescent label. The droplets have a surface charge and can be deflected by an electron magnet based on their fluorescent label status. The system relies on computer control to effectively sort the cells into labelled and non-labelled populations. The desired cell population can then be recovered and subsequently counted or cultured if desired. More than one label can be used simultaneously so that multiple sorting can be undertaken.

7.9 CELL AND TISSUE STAINING TECHNIQUES There are many antibodies available that recognise receptors on and structural proteins in cells and tissues and these can be of use diagnostically. Generally immunostaining is carried out on fixed tissues but this is not always the case as it may be important to observe a dynamic event only seen in living cells. Different antibodies

295

7.11 Immunoaffinity chromatography (IAC)

may be required for living and fixed tissues for the same protein, as fixation may destroy the structure of the epitopes in some cases. Fixed tissues are prepared by standard histological methods. The tissue is fixed with a preservative which kills the cells but maintains structure and makes the cell membranes permeable. The sample is embedded in wax or epoxy resin and fine slices are taken using a microtome and they are then mounted onto microscope slides. The antibodies that are used for immunopathology may carry enzyme, fluorescent markers or labels such as gold particles. They may also be unconjugated and in this case would require a secondary antibody conjugate and solid substrate to visualise them. It is important to remember that enzymes such as alkaline phosphatase may be endogenous (found naturally) in mammalian tissue samples and their activity is not easily blocked. Often horseradish peroxidase is used as an alternative. Any endogenous peroxidase activity in the sample can be blocked by treating the sample with hydrogen peroxide. Antibodies may recognise structural proteins within the cells and can access them in fixed tissues through the permeabilised membranes. More than one antibody can be used to produce a composite stain with more than one colour of marker being used. Combinations of fluorescent and enzyme staining may also be used but this has to be carried out sequentially. Fluorescent stains can also be used in conjunction with standard histological stains viewed with a microscope equipped with both white and ultraviolet light. Fluorescence will decay in time and although anti-quench products can be used the specimens should not be considered to be permanent. Photographs can be taken of slides through the microscope and kept as a permanent record.

7.10 IMMUNOCAPTURE POLYMERASE CHAIN REACTION (PCR) Immunocapture PCR is a hybrid method which uses the specificity of antibodies to capture antigen from the sample and the diagnostic power of PCR to provide a result. The method is particularly of use in diagnostic virology where the technique allows the capture of virus from test samples and subsequent diagnosis by PCR. It is useful where levels of virus are low such as in water samples and other non-biological sources. The technique can be carried out in standard PCR microtitre plates or in PCR tubes. The antibody is bound passively to the plastic of the plate or tube and the sample incubated afterwards (see Fig. 7.18). After washing to remove excess sample material the PCR reagents can be added and thermocycling carried out on the bound viral nucleic acid. RNA viruses will require an additional reverse transcription step prior to PCR (see also Section 6.8.1).

7.11 IMMUNOAFFINITY CHROMATOGRAPHY (IAC) Immunoaffinity chromatography can be used for a number of applications. The principle is based on the immobilisation of antibody onto a matrix, normally beads, which are then placed into a chromatography column. Antibody may be permanently

296

Immunochemical techniques

Plate coated with antibody

Sample added and antigen captured

Sample lysed to release nucleic acid followed by PCR with specific primers

Fig. 7.18 Immunocapture PCR.

linked to the beads by covalent linkage to reactive sites on a resin or bound using protein A or G. Usually, antibody is permanently bound to column beads for most applications as this is a more stable linkage and allows repeated use of the columns following regeneration. IAC may be used as a clean-up method in analytical chemistry to extract small quantities of chemical residues such as pesticides from wastewater and other sources. The method also works well for the extraction of biological compounds such as hormones from patient samples. The columns are made by reacting highly purified antibody (monoclonal or polyclonal) with the chromatography beads to form the affinity matrix. Harsh conditions have to be avoided as denaturation of the antibody molecules could occur. A number of proprietary resins are available which have reactive sites suitable for antibody immobilisation. The affinity matrix is loaded into chromatography columns prior to use. Antibody binding of antigen generally occurs best at around pH 7.4 but individual monoclonal antibodies may vary considerably from this pH. Once the sample has been loaded onto the column it should be washed to remove contaminating material from sample fluid. Conditions for elution vary according to individual antibodies and antigens but pH 2.0 buffer, methanol and 10% acetonitrile have all been used successfully. The column can be regenerated after elution by incubating with pH 7.4 buffer. The technique works extremely well for clean-up and concentration of sample from dilute sources prior to additional analysis. Samples eluted from IAC columns may be tested further by high-performance liquid chromatography, ELISA or other analytical techniques.

7.12 ANTIBODY-BASED BIOSENSORS A biosensor is a device that is composed of a biological element and a physicochemical transduction part which converts signal reception by the biological entity into an electrical impulse. A number of biosensor devices are available that use enzymes as the biological part of the device. The enzyme is used to catalyse a chemical reaction which generates an electrical charge at an electrode. Antibodies have the potential to

297

7.13 Therapeutic antibodies

be excellent biological molecules to use for this technology as they can be developed to detect virtually any molecule. The main problem with developing this technology with antibodies has been the lack of adequate physicochemical transduction systems. Three methods have been developed that will provide a signal from antibody binding and these are likely to produce a new generation of biosensors in the future. Antibodies may be bound onto thin layers of gold which in turn are coated onto refractive glass slides. If the slides are illuminated at a precise angle with fixed-wavelength Laser light then electron waves are produced on the surface of the gold. This is known as surface plasmon resonance and only occurs if the incident angle and wavelength of light are precisely right. If the antibody binds antigen then the surface plasmon resonance pattern is changed and a measurable change in emitted energy is observed. Fibre optic sensors have also been developed which rely on the natural ability of biological materials to fluoresce with light at defined frequency. The reaction vessel is coated with antibody and the fibre optic sensor used to illuminate and read light scatter from the vessel. The sample is then applied and the sample vessel washed. The fibre optic sensor is again used to illuminate and read backscatter from the vessel. Changes in the fluorescence will give a change in the observed returned light. A third approach relies on changes in crystals as a result of surface molecules bound to them. Piezoelectric crystals generate a characteristic signature resonance when stimulated with an alternating current. The crystals are elastic and changes to their surface will produce a change in the signature resonance. The binding of antigen to antibody located on the surface of the crystal can be sufficient to alter the signature and therefore induce a signal indicating that antigen has been detected by antibody.

7.13 THERAPEUTIC ANTIBODIES Therapeutic antibodies fall into a number of different classes but are all designed to bind to specific structures or molecules to alter cellular or systemic responses in vivo. The simplest of these are the inhibitory systemic (found throughout the body) antibodies that will bind to substances to render them ineffective. At their crudest, they consist of hyperimmune serum and are used to alleviate the symptoms of bites and stings from a number of poisonous animals. Antivenom produced in horses for treatment of snake bite is a good example of this. Hyperimmune serum derived from human patients who have had the disease has also been used prophylactically (reduce the risk of disease) after exposure to pathogenic viruses. Hyperimmune serum is available to help to treat a number of pathogenic viral conditions such as West Nile Fever, AIDS and hepatitis B. These are used after exposure to the pathogen, for example by needle-stick injury, and help to reduce the risk of infection occurring. The next class of therapeutic antibodies are those that bind bioactive molecules and reduce their effects in vivo. They are all monoclonal and have a number of targets which help to alleviate the symptoms of a number of human diseases. One of the major targets for this approach are systemic cytokines which have been implicated in

298

Immunochemical techniques

the progression of diseases such as arthritis; results using antibody therapy have been encouraging. Monoclonal antibodies can also be used to reduce the numbers of specific cell types in vivo by binding to surface markers on them. The binding of the antibody to the cells alerts the immune system and causes the cells to be cleared from circulation. Chimeric (formed from two sources) mouse/human monoclonal antibodies consisting of mouse variable regions and human constant regions which are specific to the B cell marker CD20 have been used successfully for the treatment of systemic lupus erythematosus. This disease is characterised by the development of aberrant B cells secreting autoantibodies which cause a number of immune phenomena. The decrease in circulating B cells reduces the number producing the autoantibodies and alleviates some of the symptoms. Agonistic (causing upregulation of a biological system) monoclonal antibodies are therapeutic antibodies which have the ability to influence living cells in vivo. They upregulate cellular systems by binding to surface receptor molecules. Normally, cell receptors are stimulated briefly by their ligand (substance that binds to them) and the resulting upregulation is also brief. Agonistic monoclonal antibodies bind to the receptor molecule and mimic their ligand, but have the capacity to remain in place for much longer than the natural molecule. This is due to the fact that the cell finds it much more difficult to clear the antibody than it would the natural ligand. The action of agonistic antibodies is incredibly powerful as the internal system cascades that can be generated are potentially catastrophic for both the cell and the organism. Their use has been mainly restricted to induction of apoptosis (programmed cell death) in cancer cells and only where a known unique cellular receptor is being stimulated. There are a number of therapeutic inhibitory antibodies available and all of them downregulate cellular systems by blocking the binding of antigen to receptor. They behave as competitive analogues to the inhibitor and have a long dwell time (the time they remain bound) on the receptor increasing their potency. They may block the binding of hormones, cytokines and other cellular messengers. They have been used successfully for the management of some hormone-dependent tumours such as breast cancer and also for the downregulation of the immune system to help prevent rejection after organ transplantation. These therapeutic antibody types need to be carefully engineered to make them effective as treatment agents. The avidity and affinity of the antibodies is critical to their therapeutic efficacy as their specific binding ability is critical to their length and specificity of action. Additionally, they must not appear as ‘foreign’ to the immune system or they will be rapidly cleared by the body. Often, the original monoclonal antibody will have been derived using a mouse system and as a result is a murine antibody. These antibodies can be humanised by engineering the cells, retaining the murine binding site and replacing the constant region genes with human ones. The resulting antibody escapes immune surveillance but retain their effective binding capacity. Natural antibodies may remain in the circulation for up to 6 weeks but engineered antibodies survive a much shorter time. The shortened survival time is due to the humanisation which still leaves a degree of murine antibody visible to the immune system. Each engineered, therapeutic antibody has a different half-life in vivo and this factor is of great importance when baseline dosage is being established. All of

299

7.15 Suggestions for further reading

the currently used therapeutic antibodies may cause side effects in patients and so this line of therapy has only been exploited where the benefits outweigh the problems that may be encountered. Great success has been seen in the treatment of prostate cancer in men and breast cancer in women using humanised monoclonal antibodies which bind to hormone receptors on the tumour cells and inhibit their growth as a result.

7.14 THE FUTURE USES OF ANTIBODY TECHNOLOGY Antibodies are incredibly useful molecules which can be designed to detect an almost limitless number of antigens. They are adaptable and will operate in many conditions. They can be used in both diagnostic and therapeutic scenarios. In the future there will be a rise in the availability of therapeutic antibodies both for the up- and downregulation of cellular and systemic responses. Cancer therapy and immune modulation of autoimmune phenomena are probably the two areas where greatest developments will take place. Biosensors for the detection of disease will become increasingly available as will multiple lab on a chip (LOC) formats. LOC devices are miniaturised devices that are capable of handling microscopic amounts of liquids and perform a number of laboratory assays in miniature. They are frequently fully automated and can give rapid results without the equipment normally required for laboratory assays. They have the added advantage that they can be used in field situations as they are becoming increasingly portable. The use of non-animal systems for antibody generation will be exploited more fully with more use of phage display and other DNA library based systems.

7.15 SUGGESTIONS FOR FURTHER READING Burns, R. (ed.) (2005). Immunochemical Protocols, 3rd edn. Totowa, NJ: Humana Press. Coligan, J. (2005). Short Protocols in Immunology. New York: John Wiley. (A good background book which gives detail of immunological protocols and how they can be used to investigate the immune system.) Cruse, J. and Lewis, R. (2002). Illustrated Dictionary of Immunology, 2nd edn. Boca Raton, FL: CRC Press. (An excellent book which describes in detail immunological processes and how they interact. A good balance of text and graphics.) Howard, G. and Kaser, M. (2007). Making and Using Antibodies: A Practical Handbook. Boca Raton, FL: CRC Press. (An excellent book which describes in detail methods for producing, validating, purifying, modifying and storing antibodies.) Subramanian, G. (ed.) (2004). Antibodies, Volume 1, Production and Purification. Dordrecht: Kluwer Academic. (This book gives good coverage of methods for antibody production, purification, modification and storage.) Wild, D. (2005). Immunoassay Handbook, 3rd edn. New York: Elsevier. (This book describes in detail background to many clinical immunoassays and how to design and validate them.)

8

Protein structure, purification, characterisation and function analysis J. WALKER

8.1 8.2 8.3 8.4 8.5 8.6

Ionic properties of amino acids and proteins Protein structure Protein purification Protein structure determination Proteomics and protein function Suggestions for further reading

8.1 IONIC PROPERTIES OF AMINO ACIDS AND PROTEINS Twenty amino acids varying in size, shape, charge and chemical reactivity are found in proteins and each has at least one codon in the genetic code (Section 5.3.5). Nineteen of the amino acids are a-amino acids (i.e. the amino and carboxyl groups are attached to the carbon atom that is adjacent to the carboxyl group) with the general formula RCH(NH2)COOH, where R is an aliphatic, aromatic or heterocyclic group. The only exception to this general formula is proline, which is an imino acid in which the -NH2 group is incorporated into a five-membered ring. With the exception of the simplest amino acid glycine (R ¼ H), all the amino acids found in proteins contain one asymmetric carbon atom and hence are optically active and have been found to have the L configuration. For convenience, each amino acid found in proteins is designated by either a threeletter abbreviation, generally based on the first three letters of their name, or a one-letter symbol, some of which are the first letter of the name. Details are given in Table 8.1. Since they possess both an amino group and a carboxyl group, amino acids are ionised at all pH values, i.e. a neutral species represented by the general formula does not exist in solution irrespective of the pH. This can be seen as follows:

300

301

8.1 Ionic properties of amino acids and proteins

R α

CH

R +

pKa

1

NH3

COOH Net positive charge

α

CH

R +

pKa

NH3

COO– Zero net charge ′zwitterion′

2

α

CH

NH3

COO– Net negative charge

Increasing pH

Table 8.1 Abbreviations for amino acids Amino acid

Three-letter code

One-letter code

Alanine

Ala

A

Arginine

Arg

R

Asparagine

Asn

N

Aspartic acid

Asp

D

Asparagine or aspartic acid

Asx

B

Cysteine

Cys

C

Glutamine

Gln

Q

Glutamic acid

Glu

E

Glutamine or glutamic acid

Glx

Z

Glycine

Gly

G

Histidine

His

H

Isoleucine

Ile

I

Leucine

Leu

L

Lysine

Lys

K

Methionine

Met

M

Phenylalanine

Phe

F

Proline

Pro

P

Serine

Ser

S

Threonine

Thr

T

Tryptophan

Trp

W

302

Protein structure, purification, characterisation and function analysis

Table 8.1 (cont.) Amino acid

Three-letter code

One-letter code

Tyrosine

Tyr

Y

Valine

Val

V

Thus at low pH values an amino acid exists as a cation and at high pH values as an anion. At a particular intermediate pH the amino acid carries no net charge, although it is still ionised, and is called a zwitterion. It has been shown that, in the crystalline state and in solution in water, amino acids exist predominantly as this zwitterionic form. This confers upon them physical properties characteristic of ionic compounds, i.e. high melting point and boiling point, water solubility and low solubility in organic solvents such as ether and chloroform. The pH at which the zwitterion predominates in aqueous solution is referred to as the isoionic point, because it is the pH at which the number of negative charges on the molecule produced by ionisation of the carboxyl group is equal to the number of positive charges acquired by proton acceptance by the amino group. In the case of amino acids this is equal to the isoelectric point (pI), since the molecule carries no net charge and is therefore electrophoretically immobile. The numerical value of this pH for a given amino acid is related to its acid strength (pKa values) by the equation: pI ¼

pKa1 þ pKa2 2

ð8:1Þ

where pKa1 and pKa2 are equal to the negative logarithm of the acid dissociation constants, Ka1 and Ka2 (Section 1.3.2). In the case of glycine, pKa1 and pKa2 are 2, 3 and 9.6, respectively, so that the isoionic point is 6.0. At pH values below this, the cation and zwitterion will coexist in equilibrium in a ratio determined by the Henderson–Hasselbalch equation (Section 1.3.3), whereas at higher pH values the zwitterion and anion will coexist in equilibrium. For acidic amino acids such as aspartic acid, the ionisation pattern is different owing to the presence of a second carboxyl group: COOH

CH2

CH2 CH

COO–

COOH

+

NH3

COOH Cation (1 net positive charge)

pKa1 2.1

CH

COO–

CH2 +

NH3

COO– Zwitterion pH 3.0 (isoionic point)

pKa2 3.9

CH

CH2 +

NH3

COO– Anion (1 net negative charge)

pKa3 9.8

CH

NH2

COO– Anion (2 net negative charges)

In this case, the zwitterion will predominate in aqueous solution at a pH determined by pKa1 and pKa2, and the isoelectric point is the mean of pKa1 and pKa2.

303

8.1 Ionic properties of amino acids and proteins

In the case of lysine, which is a basic amino acid, the ionisation pattern is different again and its isoionic point is the mean of pKa2 and pKa3: +

+

NH3 (CH2)4 pKa

1

NH3 2.2

NH2

(CH2)4 +

CH

+

NH3

(CH2)4 +

CH

+

NH3

NH3

COOH

COO–

Cation (2 net positive charges)

Cation (1 net positive charge)

(CH2)4

pKa

2

9.0

pKa CH

NH2

COO– Zwitterion pH 3.0 (isoionic point)

3

10.5

CH

NH2

COO– Anion (1 net negative charge)

As an alternative to possessing a second amino or carboxyl group, an amino acid side chain may contain in the R of the general formula a quite different chemical group that is also capable of ionising at a characteristic pH. Such groups include a phenolic group (tyrosine), guanidino group (arginine), imidazolyl group (histidine) and sulphydryl group (cysteine) (Table 8.2). It is clear that the state of ionisation of the main groups of amino acids (acidic, basic, neutral) will be grossly different at a particular pH. Moreover, even within a given group there will be minor differences due to the precise nature of the R group. These differences are exploited in the electrophoretic and ion-exchange chromatographic separation of mixtures of amino acids such as those present in a protein hydrolysate (Section 8.4.2). Proteins are formed by the condensation of the a-amino group of one amino acid with the a-carboxyl of the adjacent amino acid (Section 8.2). With the exception of the two terminal amino acids, therefore, the a-amino and carboxyl groups are all involved in peptide bonds and are no longer ionisable in the protein. Amino, carboxyl, imidazolyl, guanidino, phenolic and sulphydryl groups in the side chains are, however, free to ionise and of course there will be many of these. Proteins fold in such a manner that the majority of these ionisable groups are on the outside of the molecule, where they can interact with the surrounding aqueous medium. Some of these groups are located within the structure and may be involved in electrostatic attractions that help to stabilise the three-dimensional structure of the protein molecule. The relative numbers of positive and negative groups in a protein molecule influence aspects of its physical behaviour, such as solubility and electrophoretic mobility. The isoionic point of a protein and its isoelectric point, unlike that of an amino acid, are generally not identical. This is because, by definition, the isoionic point is the pH at which the protein molecule possesses an equal number of positive and negative groups formed by the association of basic groups with protons and dissociation of acidic groups, respectively. In contrast, the isoelectric point is the pH at which the protein is electrophoretically immobile. In order to determine electrophoretic mobility experimentally, the protein must be dissolved in a buffered medium containing anions and cations, of low relative molecular mass, that are capable of binding to the multi-ionised protein. Hence the observed balance of charges at the isoelectric point could be due in

304

Protein structure, purification, characterisation and function analysis

Table 8.2 Ionisable groups found in proteins Amino acid group

pH-dependent ionisation

N-terminal a-amino

NH3 ÐNH2 þHþ

Approx. pKa 8.0

C-terminal a-carboxyl COOHÐCOO þHþ

3.0

Asp-b-carboxyl

CH2 COOHÐCH2 COO þHþ

3.9

Glu-g-carboxyl

ðCH2 Þ2 COOHÐðCH2 Þ2 COO þHþ

4.1

His-imidazolyl

6.0

CH2

CH2 HN+

NH

+H+ N

NH

CH2 SHÐCH2 S þHþ

Cys-sulphydryl

8.4 10.1

Tyr-phenolic

O– + H+

OH þ

ðCH2 Þ4 NH3 ÐðCH2 Þ4 NH2 þHþ

Lys-e-amino

10.3 þ

NHC  NH2 ÐNHC  NH2 þH jj jj þ NH2 NH

Arg-guanidino

12.5

part to there being more bound mobile anions (or cations) than bound cations (anions) at this pH. This could mask an imbalance of charges on the actual protein. In practice, protein molecules are always studied in buffered solutions, so it is the isoelectric point that is important. It is the pH at which, for example, the protein has minimum solubility, since it is the point at which there is the greatest opportunity for attraction between oppositely charged groups of neighbouring molecules and consequent aggregation and easy precipitation.

8.2 PROTEIN STRUCTURE Proteins are formed by condensing the a-amino group of one amino acid or the imino group of proline with the a-carboxyl group of another, with the concomitant loss of a molecule of water and the formation of a peptide bond. R′

R +

NH3

CH

+

COO– + NH3

CH

R COO–

–H2O + NH3

CH CO

NH

Peptide bond

CH COO– R′

305

8.2 Protein structure

The progressive condensation of many molecules of amino acids gives rise to an unbranched polypeptide chain. By convention, the N-terminal amino acid is taken as the beginning of the chain and the C-terminal amino acid as the end of the chain (proteins are biosynthesised in this direction). Polypeptide chains contain between 20 and 2 000 amino acid residues and hence have a relative molecular mass ranging between about 2 000 and 2 00 000. Many proteins have a relative molecular mass in the range 20 000 to 1 00 000. The distinction between a large peptide and a small protein is not clear. Generally, chains of amino acids containing fewer than 50 residues are referred to as peptides, and those with more than 50 are referred to as proteins. Most proteins contain many hundreds of amino acids (ribonuclease is an extremely small protein with only 103 amino acid residues) and many biologically active peptides contain 20 or fewer amino acids, for example oxytocin (9 amino acid residues), vasopressin (9), enkephalins (5), gastrin (17), somatostatin (14) and lutenising hormone (10). The primary structure of a protein defines the sequence of the amino acid residues and is dictated by the base sequence of the corresponding gene(s). Indirectly, the primary structure also defines the amino acid composition (which of the possible 20 amino acids are actually present) and content (the relative proportions of the amino acids present). The peptide bonds linking the individual amino acid residues in a protein are both rigid and planar, with no opportunity for rotation about the carbon–nitrogen bond, as it has considerable double bond character due to the delocalisation of the lone pair of electrons on the nitrogen atom; this, coupled with the tetrahedral geometry around each a-carbon atom, profoundly influences the three-dimensional arrangement which the polypeptide chain adopts. Secondary structure defines the localised folding of a polypeptide chain due to hydrogen bonding. It includes structures such as the a-helix and b-pleated sheet. Certain of the 20 amino acids found in proteins, including proline, isoleucine, tryptophan and asparagine, disrupt a-helical structures. Some proteins have up to 70% secondary structure but others have none. Tertiary structure defines the overall folding of a polypeptide chain. Itþis stabilised by electrostatic attractions between oppositely charged ionic groups (  N H3 ; COO ), by weak van der Waals forces, by hydrogen bonding, hydrophobic interactions and, in some proteins, by disulphide (-S  S-) bridges formed by the oxidation of spatially adjacent sulphydryl groups (-SH) of cysteine residues (Fig. 8.1). The three-dimensional folding of polypeptide chains is such that the interior consists predominantly of non-polar, hydrophobic amino acid residues such as valine, leucine and phenylalanine. The polar, ionised, hydrophilic residues are found on the outside of the molecule, where they are compatible with the aqueous environment. However, some proteins also have hydrophobic residues on their outside and the presence of these residues is important in the processes of ammonium sulphate fractionation and hydrophobic interaction chromatography (Section 8.3.4). Quaternary structure is restricted to oligomeric proteins, which consist of the association of two or more polypeptide chains held together by electrostatic attractions, hydrogen bonding, van der Waals forces and occasionally disulphide bridges. Thus disulphide bridges may exist within a given polypeptide chain (intra-chain) or

306

Protein structure, purification, characterisation and function analysis

SH SH

S Oxidation

Two cysteine sulphydryl groups in juxtaposition in the same or different peptide chain(s)

S

Disulphide bridge

Fig. 8.1 The formation of a disulphide bridge.

linking different chains (inter-chain). An individual polypeptide chain in an oligomeric protein is referred to as a subunit. The subunits in a protein may be identical or different: for example, haemoglobin consists of two a- and two b-chains, and lactate dehydrogenase of four (virtually) identical chains. Traditionally, proteins are classified into two groups – globular and fibrous. The former are approximately spherical in shape, are generally water soluble and may contain a mixture of a-helix, b-pleated sheet and random structures. Globular proteins include enzymes, transport proteins and immunoglobulins. Fibrous proteins are structural proteins, generally insoluble in water, consisting of long cable-like structures built entirely of either helical or sheet arrangements. Examples include hair keratin, silk fibroin and collagen. The native state of a protein is its biologically active form. The process of protein denaturation results in the loss of biological activity, decreased aqueous solubility and increased susceptibility to proteolytic degradation. It can be brought about by heat and by treatment with reagents such as acids and alkalis, detergents, organic solvents and heavy-metal cations such as mercury and lead. It is associated with the loss of organised (tertiary) three-dimensional structure and exposure to the aqueous environment of numerous hydrophobic groups previously located within the folded structure. In enzymes, the specific three-dimensional folding of the polypeptide chain(s) results in the juxtaposition of certain amino acid residues that constitute the active site or catalytic site. Oligomeric enzymes may possess several such sites. Many enzymes also possess one or more regulatory site(s). X-ray crystallography studies have revealed that the active site is often located in a cleft that is lined with hydrophobic amino acid residues but which contains some polar residues. The binding of the substrate at the catalytic site and the subsequent conversion of substrate to product involves different amino acid residues. Some oligomeric enzymes exist in multiple forms called isoenzymes or isozymes (Section 15.1.2). Their existence relies on the presence of two genes that give similar but not identical subunits. One of the best-known examples of isoenzymes is lactate dehydrogenase, which reversibly interconverts pyruvate and lactate. It is a tetramer and exists in five forms (LDH1 to 5) corresponding to the five permutations of arranging the two types of subunits (H and M), which differ only in a single amino acid substitution, into a tetramer:

307

8.3 Protein purification

H4 H3M H2M2 HM3 M4

LDH1 LDH2 LDH3 LDH4 LDH5

Each isoenzyme promotes the same reaction but has different kinetic constants (Km, Vmax), thermal stability and electrophoretic mobility. The tissue distribution of isoenzymes within an organism is frequently different, for example, in humans LDH1 is the dominant isoenzyme in heart muscle but LDH5 is the most abundant form in liver and muscle. These differences are exploited in diagnostic enzymology to identify specific organ damage, for example following myocardial infarction, and thereby aiding clinical diagnosis and prognosis.

8.2.1 Post-translational modifications Proteins are synthesised at the ribosome and as the growing polypeptide chain emerges from the ribosome it folds up into its native three-dimensional structure. However, this is often not the final active form of the protein. Many proteins undergo modifications once they leave the ribosome, where one or more amino acid side chains are modified by the addition of a further chemical group; this is referred to as post-translational modification. Such changes include extensive modifications of the protein structure, for example the addition of chains of carbohydrates to form glycoproteins (see Section 8.4.4), where in some cases the final protein consists of as much as over 40% carbohydrate. Less dramatic, but equally important modifications include the addition of a hydroxyl group to proline to produce hydroxyproline (found in the structure of collagen), or the phosphorylation of one or more amino acids (tyrosine, serine and threonine residues are all capable of being phosphorylated). Many cases are known, for example, where the addition of a single phosphate group (by enzymes known as kinases) can activate a protein molecule, and the subsequent removal of the phosphate group (by a phosphatase) can inactivate the molecule; protein phosphorylation reactions are a central part of intracellular signalling. Another example can be found in the post-translational modification of proline residues in the transcription factor HIF (the a subunit of the hypoxiainducible factor), which is a key oxygen-sensing mechanism in cells. Many proteins therefore are not in their final active, biological form until post-translational modifications have taken place. Over 200 different post-translational modifications have been reported for proteins from microbial, plant and animal sources. Mass spectrometry is used to determine such modifications (see Section 9.5.5).

8.3 PROTEIN PURIFICATION 8.3.1 Introduction At first sight, the purification of one protein from a cell or tissue homogenate that will typically contain 10 000–20 000 different proteins, seems a daunting task. However,

308

Protein structure, purification, characterisation and function analysis

in practice, on average, only four different fractionation steps are needed to purify a given protein. Indeed, in exceptional circumstances proteins have been purified in a single chromatographic step. Since the reason for purifying a protein is normally to provide material for structural or functional studies, the final degree of purity required depends on the purposes for which the protein will be used, i.e. you may not need a protein sample that is 100% pure for your studies. Indeed, to define what is meant by a ‘a pure protein’ is not easy. Theoretically, a protein is pure when a sample contains only a single protein species, although in practice it is more or less impossible to achieve 100% purity. Fortunately, many studies on proteins can be carried out on samples that contain as much as 5–10% or more contamination with other proteins. This is an important point, since each purification step necessarily involves loss of some of the protein you are trying to purify. An extra (and unnecessary) purification step that increases the purity of your sample from, say, 90% to 98% may mean that you now have a more pure protein, but insufficient protein for your studies. Better to have studied the sample that was 90% pure and have enough to work on! For example, a 90% pure protein is sufficient for amino acid sequence determination studies as long as the sequence is analysed quantitatively to ensure that the deduced sequence does not arise from a contaminant protein. Similarly, immunisation of a rodent to provide spleen cells for monoclonal antibody production (Section 7.2.2) can be carried out with a sample that is considerably less than 50% pure. As long as your protein of interest raises an immune response it matters not at all that antibodies are also produced against the contaminating proteins. For kinetic studies on an enzyme, a relatively impure sample can be used provided it does not contain any competing activities. On the other hand, if you are raising a monospecific polyclonal antibody in an animal (see Section 7.2.1), it is necessary to have a highly purified protein as antigen, otherwise immunogenic contaminating proteins will give rise to additional antibodies. Equally, proteins that are to have a therapeutic use must be extremely pure to satisfy regulatory (safety) requirements. Clearly, therefore, the degree of purity required depends on the purpose for which the protein is needed.

8.3.2 The determination of protein concentration The need to determine protein concentration in solution is a routine requirement during protein purification. The only truly accurate method for determining protein concentration is to acid hydrolyse a portion of the sample and then carry out amino acid analysis on the hydrolysate (see Section 8.4.2). However, this is relatively time-consuming, particularly if multiple samples are to be analysed. Fortunately, in practice, one rarely needs decimal place accuracy and other, quicker methods that give a reasonably accurate assessment of protein concentrations of a solution are acceptable. Most of these (see below) are colorimetric methods, where a portion of the protein solution is reacted with a reagent that produces a coloured product. The amount of this coloured product is then measured spectrophotometrically and the amount of colour related to the amount of protein present by appropriate calibration. However, none of these methods is absolute,

309

8.3 Protein purification

since, as will be seen below, the development of colour is often at least partly dependent on the amino acid composition of the protein(s). The presence of prosthetic groups (e.g. carbohydrate) also influences colorimetric assays. Many workers prepare a standard calibration curve using bovine serum albumin (BSA), chosen because of its low cost, high purity and ready availability. However, it should be understood that, since the amino acid composition of BSA will differ from the composition of the sample being tested, any concentration values deduced from the calibration graph can only be approximate.

Ultraviolet absorption The aromatic amino acid residues tyrosine and tryptophan in a protein exhibit an absorption maximum at a wavelength of 280 nm. Since the proportions of these aromatic amino acids in proteins vary, so too do extinction coefficients for individual proteins. However, for most proteins the extinction coefficient lies in the range 0.4–1.5; so for a complex mixture of proteins it is a fair approximation to say that a solution with an absorbance at 280 nm (A280) of 1.0, using a 1 cm pathlength, has a protein concentration of approximately 1 mg cm3. The method is relatively sensitive, being able to measure protein concentrations as low as 10 mg cm3, and, unlike colorimetric methods, is non-destructive, i.e. having made the measurement, the sample in the cuvette can be recovered and used further. This is particularly useful when one is working with small amounts of protein and cannot afford to waste any. However, the method is subject to interference by the presence of other compounds that absorb at 280 nm. Nucleic acids fall into this category having an absorbance as much as 10 times that of protein at this wavelength. Hence the presence of only a small percentage of nucleic acid can greatly influence the absorbance at this wavelength. However, if the absorbances (A) at 280 and 260 nm wavelengths are measured it is possible to apply a correction factor: Protein ðmg cm3 Þ ¼ 1:55 A280  0:76A260 The great advantage of this protein assay is that it is non-destructive and can be measured continuously, for example in chromatographic column effluents. Even greater sensitivity can be obtained by measuring the absorbance of ultraviolet light by peptide bonds. The peptide bond absorbs strongly in the far ultraviolet, with a maximum at about 190 nm. However, because of the difficulties caused by the absorption by oxygen and the low output of conventional spectro-photometers at this wavelength, measurements are usually made at 205 or 210 nm. Most proteins have an extinction coefficient for a 1 mg cm3 solution of about 30 at 205 nm and about 20 at 210 nm. Clearly therefore measuring at these wavelengths is 20 to 30 times more sensitive than measuring at 280 nm, and protein concentration can be measured to less than 1 mg cm3. However, one disadvantage of working at these lower wavelengths is that a number of buffers and other buffer components commonly used in protein studies also absorb strongly at this wavelength, so it is not always practical to work at this lower wavelength. Nowadays all purpose-built column chromatography systems (e.g. fast protein liquid chromatography and high-performance liquid chromatography (HPLC)) have

310

Protein structure, purification, characterisation and function analysis

in-line variable wavelength ultraviolet light detectors that monitor protein elution from columns. Lowry (Folin–Ciocalteau) method In the past this has been the most commonly used method for determining protein concentration, although it is tending to be replaced by the more sensitive methods described below. The Lowry method is reasonably sensitive, detecting down to 10 mg cm3 of protein, and the sensitivity is moderately constant from one protein to another. When the Folin reagent (a mixture of sodium tungstate, molybdate and phosphate), together with a copper sulphate solution, is mixed with a protein solution, a blue-purple colour is produced which can be quantified by its absorbance at 660 nm. As with most colorimetric assays, care must be taken that other compounds that interfere with the assay are not present. For the Lowry method this includes Tris, zwitterionic buffers such as Pipes and Hepes, and EDTA. The method is based on both the Biuret reaction, where the peptide bonds of proteins react with Cu2þ under alkaline conditions producing Cuþ, which reacts with the Folin reagent, and the Folin–Ciocalteau reaction, which is poorly understood but essentially involves the reduction of phosphomolybdotungstate to hetero-polymolybdenum blue by the copper-catalysed oxidation of aromatic amino acids. The resultant strong blue colour is therefore partly dependent on the tyrosine and tryptophan content of the protein sample. The bicinchoninic acid method This method is similar to the Lowry method in that it also depends on the conversion of Cu2þ to Cuþ under alkaline conditions. The Cuþ is then detected by reaction with bicinchoninic acid (BCA) to give an intense purple colour with an absorbance maximum at 562 nm. The method is more sensitive than the Lowry method, being able to detect down to 0.5 mg protein cm3, but perhaps more importantly it is generally more tolerant of the presence of compounds that interfere with the Lowry assay, hence the increasing popularity of the method. The Bradford method This method relies on the binding of the dye Coomassie Brilliant Blue to protein. At low pH the free dye has absorption maxima at 470 and 650 nm, but when bound to protein has an absorption maximum at 595 nm. The practical advantages of the method are that the reagent is simple to prepare and that the colour develops rapidly and is stable. Although it is sensitive down to 20 mg protein cm3, it is only a relative method, as the amount of dye binding appears to vary with the content of the basic amino acids arginine and lysine in the protein. This makes the choice of a standard difficult. In addition, many proteins will not dissolve properly in the acidic reaction medium. Kjeldahl analysis This is a general chemical method for determining the nitrogen content of any compound. It is not normally used for the analysis of purified proteins or for monitoring column fractions but is frequently used for analysing complex solid samples and microbiological samples for protein content. The sample is digested by boiling

311

8.3 Protein purification

Example 1 PROTEIN ASSAY Question A series of dilutions of bovine serum albumin (BSA) was prepared and 0.1 cm3 of each solution subjected to a Bradford assay. The increase in absorbance at 595 nm relative to an appropriate blank was determined in each case, and the results are shown in the table. Concentration of BSA (mg cm3)

A595

1.5 1.0 0.8 0.6 0.4 0.2

1.40 0.97 0.79 0.59 0.37 0.17

A sample (0.1 cm3) of a protein extract from E. coli gave an A595 of 0.84 in the same assay. What was the concentration of protein in the E. coli extract?

Answer If a graph of BSA concentration against A595 is plotted it is seen to be linear. From the graph, at an A595 of 0.84 it can be seen that the protein concentration of the E. coli extracted is 0.85 mg cm3. with concentrated sulphuric acid in the presence of sodium sulphate (to raise the boiling point) and a copper and/or selenium catalyst. The digestion converts all the organic nitrogen to ammonia, which is trapped as ammonium sulphate. Completion of the digestion stage is generally recognised by the formation of a clear solution. The ammonia is released by the addition of excess sodium hydroxide and removed by steam distillation in a Markham still. It is collected in boric acid and titrated with standard hydrochloric acid using methyl red–methylene blue as indicator. It is possible to carry out the analysis automatically in an autokjeldahl apparatus. Alternatively, a selective ammonium ion electrode may be used to directly determine the content of ammonium ion in the digest. Although Kjeldahl analysis is a precise and reproducible method for the determination of nitrogen, the determination of the protein content of the original sample is complicated by the variation of the nitrogen content of individual proteins and by the presence of nitrogen in contaminants such as DNA. In practice, the nitrogen content of proteins is generally assumed to be 16% by weight.

8.3.3 Cell disruption and production of initial crude extract The initial step of any purification procedure must, of course, be to disrupt the starting tissue to release proteins from within the cell. The means of disrupting the tissue will depend on the cell type (see Cell disruption, below), but thought must first be given to the composition of the buffer used to extract the proteins.

312

Protein structure, purification, characterisation and function analysis

Extraction buffer Normally extraction buffers are at an ionic strength (0.1–0.2 M) and pH (7.0–8.0) that is considered to be compatible with that found inside the cell. Tris or phosphate buffers are most commonly used. However, in addition a range of other reagents may be included in the buffer for specific purposes. These include:









An anti-oxidant: Within the cell the protein is in a highly reducing environment, but when released into the buffer it is exposed to a more oxidising environment. Since most proteins contain a number of free sulphydryl groups (from the amino acid cysteine) these can undergo oxidation to give inter- and intramolecular disulphide bridges. To prevent this, reducing agents such as dithiothreitol, b-mercaptoethanol, cysteine or reduced glutathione are often included in the buffer. Enzyme inhibitors: Once the cell is disrupted the organisational integrity of the cell is lost, and proteolytic enzymes that were carefully packaged and controlled within the intact cells are released, for example from lysosomes. Such enzymes will of course start to degrade proteins in the extract, including the protein of interest. To slow down unwanted proteolysis, all extraction and purification steps are carried out at 4  C, and in addition a range of protease inhibitors is included in the buffer. Each inhibitor is specific for a particular type of protease, for example serine proteases, thiol proteases, aspartic proteases and metalloproteases. Common examples of inhibitors include: di-isopropylphosphofluoridate (DFP), phenylmethyl sulphonylfluoride (PMSF) and tosylphenylalanyl-chloromethylketone (TPCK) (all serine protease inhibitors); iodoacetate and cystatin (thiol protease inhibitors); pepstatin (aspartic protease inhibitor); EDTA and 1,10-phenanthroline (metalloprotease inhibitors); and amastatin and bestatin (exopeptidase inhibitors). Enzyme substrate and cofactors: Low levels of substrate are often included in extraction buffers when an enzyme is purified, since binding of substrate to the enzyme active site can stabilise the enzyme during purification processes. Where relevant, cofactors that otherwise might be lost during purification are also included to maintain enzyme activity so that activity can be detected when column fractions, etc. are screened. EDTA: This can be present to remove divalent metal ions that can react with thiol groups in proteins giving mercaptids. R  SH þ Me2þ ! R  S  Meþ þ Hþ



Polyvinylpyrrolidone (PVP): This is often added to extraction buffers for plant tissue. Plant tissues contain considerable amounts of phenolic compounds (both monomeric, such as p-hydroxybenzoic acid, and polymeric, such as tannins) that can bind to enzymes and other proteins by non-covalent forces, including hydrophobic, ionic and hydrogen bonds, causing protein precipitation. These phenolic compounds are also easily oxidised, predominantly by endogenous phenol oxidases, to form quinones, which are highly reactive and can combine with reactive groups in proteins causing cross-linking, and further aggregation and precipitation. Insoluble PVP (which mimics the polypeptide backbone) is therefore added to adsorb the phenolic compounds which

313



8.3 Protein purification

can then be removed by centrifugation. Thiol compounds (reducing agents) are also added to minimise the activity of phenol oxidases, and thus prevent the formation of quinones. Sodium azide: For buffers that are going to be stored for long periods of time, antibacterial and/or antifungal agents are sometimes added at low concentrations. Sodium azide is frequently used as a bacteriostatic agent. Membrane proteins Membrane-bound proteins (normally glycoproteins) require special conditions for extraction as they are not released by simple cell disruption procedures alone. Two classes of membrane proteins are identified. Extrinsic (or peripheral) membrane proteins are bound only to the surface of the cell, normally via electrostatic and hydrogen bonds. These proteins are predominantly hydrophilic in nature and are relatively easily extracted either by raising the ionic concentration of the extraction buffer (e.g. to 1 M NaCl) or by changes of pH (e.g. to pH 3–5 or pH 9–12). Once extracted, they can be purified by conventional chromatographic procedures. Intrinsic membrane proteins are those that are embedded in the membrane (integrated membrane proteins). These invariably have significant regions of hydrophobic amino acids (those regions of the protein that are embedded in the membrane, and associated with lipids) and have low solubility in aqueous buffer systems. Hence, once extracted into an aqueous polar environment, appropriate conditions must be used to retain their solubility. Intrinsic proteins are usually extracted with buffer containing detergents. The choice of detergent is mainly one of trial and error but can include ionic detergents such as sodium dodecyl sulphate (SDS), sodium deoxycholate, cetyl trimethylammonium bromide (CTAB) and CHAPS, and non-ionic detergents such as Triton X-100 and Nonidet P-40. Once extracted, intrinsic membrane proteins can be purified using conventional chromatographic techniques such as gel filtration, ion-exchange chromatography or affinity chromatography (using lectins). However, in each case it is necessary to include detergent in all buffers to maintain protein solubility. The level of detergent used is normally 10- to 100-fold less than that used to extract the protein, in order to minimise any interference of the detergent with the chromatographic process. Cell disruption Unless one is isolating proteins from extracellular fluids such as blood, protein purification procedures necessarily start with the disruption of cells or tissue to release the protein content of the cells into an appropriate buffer. This initial extract is therefore the starting point for protein purification. Clearly one chooses, where possible, a starting material that has a high level of the protein of interest. Depending on the protein being isolated one might therefore start with a microbial culture, plant tissue, or mammalian tissue. The last of these has generally been the tissue of choice where possible, owing to the relatively large amounts of starting material available. However, the ability to clone and overexpress genes for proteins from any source, in both bacteria and yeast, means that nowadays more and more protein purification protocols are starting with a microbial lysate. The different methods available for

314

Protein structure, purification, characterisation and function analysis

Outer membrane 7 nm (LPS) Peptidoglycan 3 nm Periplasmic space 7 nm Plasma membrane 7 nm (cytoplasmic membrane) E. coli (Gram negative) Gram stain = crystal violet + iodine

Peptidoglycan 20–50 nm Plasma membrane Gram positive

Fig. 8.2 The structure of the cell wall of Gram-positive and of Gram-negative bacteria. LPS, lipopolysaccharide.

disrupting cells are described below. Which method one uses depends on the nature of the cell wall/membrane being disrupted. Mammalian cells Mammalian cells are of the order of 10 mm in diameter and enclosed by a plasma membrane, weakly supported by a cytoskeleton. These cells therefore lack any great rigidity and are easy to disrupt by shear forces. Plant cells Plant cells are of the order of 100 mm in diameter and have a fairly rigid cell wall, comprising carbohydrate complexes and lignin or wax that surround the plasma membrane. Although the plasma membrane is protected by this outer layer, the large size of the cell still makes it susceptible to shear forces. Bacteria Bacteria have cell diameters of the order of 1 to 4 mm and generally have extremely rigid cell walls. Bacteria can be classified as either Gram positive or Gram negative depending on whether or not they are stained by the Gram stain (crystal violet and iodine). In Gram-positive bacteria (Fig. 8.2) the plasma membrane is surrounded by a thick shell of peptidoglycan (20–50 nm), which stains with the Gram stain. In Gramnegative bacteria (e.g. Escherichia coli) the plasma membrane is surrounded by a thin (2–3 nm) layer of peptidoglycan but this is compensated for by having a second outer membrane of lipopolysaccharide. The negatively charged lipopolysaccharide polymers interact laterally, being linked by divalent cations such as Mg2þ. A number of Gramnegative bacteria secrete proteins into the periplasmic space. Fungi and yeast Filamentous fungi and yeasts have a rigid cell wall that is composed mainly of polysaccharide (80–90%). In lower fungi and yeast the polysaccharides are mannan and glucan. In filamentous fungi it is chitin cross-linked with glucans. Yeasts also have a small percentage of glycoprotein in the cell wall, and there is a periplasmic space between the cell wall and cell membrane. If the cell wall is removed the cell content, surrounded by a membrane, is referred to as a spheroplast.

315

8.3 Protein purification

Cell disruption methods Blenders These are commercially available, although a typical domestic kitchen blender will suffice. This method is ideal for disrupting mammalian or plant tissue by shear force. Tissue is cut into small pieces and blended, in the presence of buffer, for about 1 min to disrupt the tissue, and then centrifuged to remove debris. This method is inappropriate for bacteria and yeast, but a blender can be used for these microorganisms if small glass beads are introduced to produce a bead mill. Cells are trapped between colliding beads and physically disrupted by shear forces. Grinding with abrasives Grinding in a pestle and mortar, in the presence of sand or alumina and a small amount of buffer, is a useful method for disrupting bacterial or plant cells; cell walls are physically ripped off by the abrasive. However, the method is appropriate for handling only relatively small samples. The Dynomill is a large-scale mechanical version of this approach. The Dynomill comprises a chamber containing glass beads and a number of rotating impeller discs. Cells are ruptured when caught between colliding beads. A 600 cm3 laboratory scale model can process 5 kg of bacteria per hour. Presses The use of a press such as a French Press, or the Manton–Gaulin Press, which is a larger-scale version, is an excellent means for disrupting microbial cells. A cell suspension (50 cm3) is forced by a piston-type pump, under high pressure (10 000 PSI ¼ lbfin.2 1450 kPa) through a small orifice. Breakage occurs due to shear forces as the cells are forced through the small orifice, and also by the rapid drop in pressure as the cells emerge from the orifice, which allows the previously compressed cells to expand rapidly and effectively burst. Multiple passes are usually needed to lyse all the cells, but under carefully controlled conditions it can be possible to selectively release proteins from the periplasmic space. The X-Press and Hughes Press are variations on this method; the cells are forced through the orifice as a frozen paste, often mixed with an abrasive. Both the ice crystal and abrasive aid in disrupting the cell walls. Enzymatic methods The enzyme lysozyme, isolated from hen egg whites, cleaves peptidoglycan. The peptidoglycan cell wall can therefore be removed from Gram-positive bacteria (see Fig. 8.2) by treatment with lysozyme, and if carried out in a suitable buffer, once the cell wall has been digested the cell membrane will rupture owing to the osmotic effect of the suspending buffer. Gram-negative bacteria can similarly be disrupted by lysozyme but treatment with EDTA (to remove Ca2þ, thus destabilising the outer lipopolysaccharide layer) and the inclusion of a non-ionic detergent to solubilise the cell membrane are also needed. This effectively permeabilises the outer membrane, allowing access of the lysozyme to the peptidoglycan layer. If carried out in an isotonic medium so that the cell membrane is not ruptured, it is possible to selectively release proteins from the periplasmic space.

316

Protein structure, purification, characterisation and function analysis

Yeast can be similarly disrupted using enzymes to degrade the cell wall and either osmotic shock or mild physical force to disrupt the cell membrane. Enzyme digestion alone allows the selective release of proteins from the periplasmic space. The two most commonly used enzyme preparations for yeast are zymolyase or lyticase, both of which have b-1, 3-glucanase activity as their major activity, together with a proteolytic activity specific for the yeast cell wall. Chitinase is commonly used to disrupt filamentous fungi. Enzymic methods tend to be used for laboratory-scale work, since for large-scale work their use is limited by cost. Sonication This method is ideal for a suspension of cultured cells or microbial cells. A sonicator probe is lowered into the suspension of cells and high frequency sound waves ( 20 kHz) generated for 30–60 s. These sound waves cause disruption of cells by shear force and cavitation. Cavitation refers to areas where there is alternate compression and rarefaction, which rapidly interchange. The gas bubbles in the buffer are initially under pressure but, as they decompress, shock waves are released and disrupt the cells. This method is suitable for relatively small volumes (50–100 cm3). Since considerable heat is generated by this method, samples must be kept on ice during treatment. >

8.3.4 Fractionation methods Monitoring protein purification As will be seen below, the purification of a protein invariably involves the application of one or more column chromatographic steps, each of which generates a relatively large number of test tubes (fractions) containing buffer and protein eluted from the column. It is necessary to determine how much protein is present in each tube so that an elution profile (a plot of protein concentration versus tube number) can be produced. Appropriate methods for detecting and quantifying protein in solution are described in Section 8.3.2. A method is also required for determining which tubes contain the protein of interest so that their contents can be pooled and the pooled sample progressed to the next purification step. If one is purifying an enzyme, this is relatively easy as each tube simply has to be assayed for the presence of enzyme activity. For proteins that have no easily measured biological activity, other approaches have to be used. If an antibody to the protein of interest is available then samples from each tube can be dried onto nitrocellulose and the antibody used to detect the proteincontaining fractions using the dot blot method (Section 5.9.2). Alternatively, an immunoassay such as ELISA or radioimmunoassay (Section 7.3.1) can be used to detect the protein. If an antibody is not available, then portions from each fraction can be run on a sodium dodecyl sulphate–polyacrylamide gel and the protein-containing fraction identified from the appearance of the protein band of interest on the gel (Section 10.3.1). An alternative approach that can be used for cloned genes that are expressed in cells is to express the protein as a fusion protein, i.e. one that is linked via a short peptide sequence to a second protein. This can have advantages for protein purification (see Section 8.3.5). However, it can also prove extremely useful for monitoring the purification of a protein that has no easily measurable activity. If the second protein is an enzyme that can be easily assayed (e.g. using a simple colorimetric

317

8.3 Protein purification

assay), such as b-galactosidase, then the presence of the protein of interest can be detected by the presence of the linked b-galactosidase activity. A successful fractionation step is recognised by an increase in the specific activity of the sample, where the specific activity of the enzyme relates its total activity to the total amount of protein present in the preparation: specific activity ¼

total units of enzyme in fraction total amount of protein in fraction

The measurement of units of an enzyme relies on an appreciation of certain basic kinetic concepts and upon the availability of a suitable analytical procedure. These are discussed in Section 15.2.2. The amount of enzyme present in a particular fraction is expressed conventionally not in terms of units of mass or moles but in terms of units based upon the rate of the reaction that the enzyme promotes. The international unit (IU) of an enzyme is defined as the amount of enzyme that will convert 1 mmole of substrate to product in 1 minute under defined conditions (generally 25 or 30  C at the optimum pH). The SI unit of enzyme activity is defined as the amount of enzyme that will convert 1 mole of substrate to product in 1 second. It has units of katal (kat) such that 1 kat ¼ 6  107 IU and 1 IU ¼ 1.7  108 kat. For some enzymes, especially those where the substrate is a macromolecule of unknown relative molecular mass (e.g. amylase, pepsin, RNase, DNase), it is not possible to define either of these units. In such cases arbitrary units are used generally that are based upon some observable change in a chemical or physical property of the substrate. For a purification step to be successful, therefore, the specific activity of the protein must be greater after the purification step than it was before. This increase is best represented as the fold purification: fold purification ¼

specific activity of fraction original specific activity

A significant increase in specific activity is clearly necessary for a successful purification step. However, another important factor is the yield of the step. It is no use having an increased specific activity if you lose 95% of the protein you are trying to purify. Yield is defined as follows: yield ¼

units of enzyme in fraction units of enzyme in original preparation

A yield of 70% or more in any purification step would normally be considered as acceptable. Table 8.3 shows how yield and specific activity vary during a purification schedule. Preliminary purification steps The initial extract, produced by the disruption of cells and tissue, and referred to at this stage as a homogenate, will invariably contain insoluble matter. For example, for mammalian tissue there will be incompletely homogenised connective and/or vascular tissue, and small fragments of non-homogenised tissue. This is most easily removed by filtering through a double layer of cheesecloth or by low speed (5 000 g)

318

Protein structure, purification, characterisation and function analysis

Example 2 ENZYME FRACTIONATION Question A tissue homogenate was prepared from pig heart tissue as the first step in the preparation of the enzyme aspartate aminotransferase (AAT). Cell debris was removed by filtration and nucleic acids removed by treatment with polyethyleneimine, leaving a total extract (solution A) of 2 dm3. A sample of this extract (50 mm3) was added to 3 cm3 of buffer in a 1 cm pathlength cuvette and the absorbance at 280 nm shown to be 1.7. (i) Determine the approximate protein concentration in the extract, and hence the total protein content of the extract. (ii) One unit of AAT enzyme activity is defined as the amount of enzyme in 3 cm3 of substrate solution that causes an absorbance change at 260 nm of 0.1 min1. To determine enzyme activity, 100 mm3 of extract was added to 3 cm3 of substrate solution and an absorbance change of 0.08 min1 was recorded. Determine the number of units of AAT actively present per cm3 of extract A, and hence the total number of enzyme units in the extract. (iii) The initial extract (solution A) was then subjected to ammonium sulphate fractionation. The fraction precipitating betweeen 50% and 70% saturation was collected and redissolved in 120 cm3 of buffer (solution B). Solution B (5 mm3 (0.005 cm3)) was added to 3 cm3 of buffer and the absorbance at 280 nm determined to be 0.89 using a 1 cm pathlength cuvette. Determine the protein concentration, and hence total protein content, of solution B. (iv) Solution B 20 mm3 was used to assay for AAT activities and an absorbance change of 0.21 per min at 260 nm was recorded. Determine the number of AAT units cm3 in solution B and hence the total number of enzyme units in solution B. (v) From your answers to (i) to (iv), determine the specific activity of AAT in both solutions A and B. (vi) From your answers to question (v), determine the fold purification achieved by the ammonium sulphate fractionation step. (vii) Finally, determine the yield of AAT following the ammonium sulphate fractionation step.

Answer

(i) Assuming the approximation that a 1 mg protein cm3 solution has an absorbance of 1.0 at 280 nm using a 1 cm pathlength cell, then we can deduce that the protein concentration in the cuvette is approximately 1.7 mg cm3. Since 50 ml (0.05 cm3) of the solution A was added to 3.0 cm3 then the solution A sample had been diluted by a factor of 3.05/0.05 ¼ 61. Therefore the protein concentration of solution A is 61  1.7 mg cm3 ¼ 104 mg cm3. Since there is 2 dm3 (2000 cm3) of solution A, the total amount of protein in solution A is 2000  104 ¼ 208 000 mg or 208 g. (ii) Since one enzyme unit causes an absorbance change of 0.1 per minute, there was 0.08/0.1 ¼ 0.8 enzyme units in the cuvette. These 0.8 enzyme units came from the 100 mm3 of solution A that was added to the cuvette.

319

8.3 Protein purification

Example 2 (cont.)

(iii)

(iv)

(v) (vi) (vii)

Therefore in 100 mm3 of solution A there is 0.8 enzyme unit. Therefore in 1 cm3 of solution A there are 8.0 enzyme units. Since we have 2000 cm3 of solution A there is a total of 2000  8.0 ¼ 16 000 enzyme units in solution A. Using the same approach as in Example 2(i), the protein concentration of solution B is 3.005/0.005  0.89 ¼ 601  0.89 ¼ 535 mg cm3. Therefore the total protein present in solution B ¼ 120  535 ¼ 64 200 mg. Using the same approach as in Example 2(ii), there are 0.21/0.1 ¼ 2.1 units of enzyme activity in the cuvette. These units came from the 20 mm3 that was added to the cell. Therefore, 20 mm3 (0.020 cm3) of solution B contains 2.1 enzyme units. Thus, 1 cm3 of solution B contains 1.0/0.02  2.1 ¼ 105 units. Therefore, solution B has 105 units cm3. Since there are 120 cm3 of solution B, total units in solution B ¼ 120  105 ¼ 12 600. For solution A, specific activity ¼ 16 000/208 000 ¼ 0.077 units mg1. For solution B, specific activity ¼ 12 600/64 200 ¼ 0.197 units mg1. Fold purification ¼ 0.197/0.077 ¼ 2.6 (approx.). Yield ¼ (12 600/16 000)  100% ¼ 79%.

centrifugation. Any fat floating on the surface can be removed by coarse filtration through glass wool or cheesecloth. However, the solution will still be cloudy with organelles and membrane fragments that are too small to be conveniently removed by filtration or low speed centrifugation. These may not be much of a problem as they will often be lost in the preliminary stages of protein purification, for example during salt fractionation. However, if necessary they can be removed first by precipitation using materials such as Celite (a diatomaceous earth that provides a large surface area to trap the particles), Cell Debris Remover (CDR, a cellulose-based absorber), or any number of flocculants such as starch, gums, tannins or polyamines, the resultant precipitate being removed by centrifugation or filtration. It is tempting to assume that the cell extract contains only protein, but of course a range of other molecules is present such as DNA, RNA, carbohydrate and lipid as well as any number of small molecular weight metabolites. Small molecules tend to be removed later on during dialysis steps or steps that involve fractionation based on size (e.g. gel filtration) and therefore are of little concern. However, specific attention has to be paid at this stage to macromolecules such as nucleic acids and polysaccharides. This is particularly true for bacterial extracts, which are particularly viscous owing to the presence of chromosomal DNA. Indeed microbial extracts can be extremely difficult to centrifuge to produce a supernatant extract. Some workers include DNase I in the extraction buffer to reduce viscosity, the small DNA fragments generated being removed at later dialysis/gel filtration steps. Likewise RNA can be removed by treatment with RNase. DNA and RNA can also be removed by precipitation with protamine

12

DEAE-Sepharose

2.3

2.2

19.5

194

40

Protein concentration (mg U cm3)

27.6

105.6

8 190

103 000

340 000

Total protein (mg)

633

198

25

23.3

1.8

Activitya (mg U cm3)

7 600

9 500

10 500

12 350

15 300

Total activity (U)

275

88.4

1.28

0.12

0.045

Specific activity (U mg1)

6 110

1 964

28.4

2.7

1

Purification factorb

Notes: aThe unit of enzyme activity (U) is defined as that amount which produces 1 mmole of product per minute under standard assay conditions. b Defined as: purification factor ¼ (specific activity of fraction/specific activity of homogenate). c Defined as: overall yield ¼ (total activity of fraction/total activity of homogenate). Reproduced with permission from Methods in Molecular Biology, 59, Protein Purification Protocols, ed. S. Doonan (1996), Humana Press Inc., Totowa, NJ.

48

Affinity chromotography

420

CM-cellulose

Homogenate

530

8 500

Fraction

45%–70%(NH4)2SO4

Volume (cm3)

Table 8.3 Example of a protein purification schedule

50

62

69

81

100

Overall yieldc (%)

321

8.3 Protein purification

sulphate. Protamine sulphate is a mixture of small, highly basic (i.e. positively charged) proteins, whose natural role is to bind to DNA in the sperm head. (Protamines are usually extracted from fish organs, which are obtained as a waste product at canning factories.) These positively charged proteins bind to negatively charged phosphate groups on nucleic acids, thus masking the charged groups on the nucleic acids and rendering them insoluble. The addition of a solution of protamine sulphate to the extract therefore precipitates most of the DNA and RNA, which can subsequently be removed by centrifugation. An alternative is to use polyethyleneimine, a synthetic long chain cationic (i.e. positively charged) polymer (molecular mass 24 kDa). This also binds to the phosphate groups in nucleic acids, and is very effective, precipitating DNA and RNA almost instantly. For bacterial extracts, carbohydrate capsular gum can also be a problem as this can interfere with protein precipitation methods. This is best removed by totally precipitating the protein with ammonium sulphate (see below) leaving the gum in solution. The protein can then be recovered by centrifugation and redissolved in buffer. However, if lysozyme (plus detergent) is used to lyse the cells (see Section 8.3.3) capsular gum will not be a problem as it is digested by the lysozyme. The clarified extract is now ready for protein fractionation steps to be carried out. The concentration of the protein in this initial extract is normally quite low, and in fact the major contaminant at this stage is water! The initial purification step is frequently based on solubility methods. These methods have a high capacity, can therefore be easily applied to large volumes of initial extracts and also have the advantage of concentrating the protein sample. Essentially, proteins that differ considerably in their physical characteristics from the protein of interest are removed at this stage, leaving a more concentrated solution of proteins that have more closely similar physical characteristics. The next stages, therefore, involve higher resolution techniques that can separate proteins with similar characteristics. Invariably these high resolution techniques are chromatographic. Which technique to use, and in which order, is more often than not a matter of trial and error. The final research paper that describes in four pages a three-step, four-day protein purification procedure invariably belies the months of hard work that went into developing the final ‘simple’ purification protocol! All purification techniques are based on exploiting those properties by which proteins differ from one another. These different properties, and the techniques that exploit these differences, are as follows. Stability Denaturation fractionation exploits differences in the heat sensitivity of proteins. The three-dimensional (tertiary) structure of proteins is maintained by a number of forces, mainly hydrophobic interactions, hydrogen bonds and sometimes disulphide bridges. When we say that a protein is denatured we mean that these bonds have by some means been disrupted and that the protein chain has unfolded to give the insoluble, ‘denatured’ protein. One of the easiest ways to denature proteins in solution is to heat them. However, different proteins will denature at different temperatures, depending on their different thermal stabilities; this, in turn, is a measure of the number of bonds holding the tertiary structure together. If the protein of interest is particularly heat stable, then heating the extract to a temperature at which the protein is stable yet other

Protein structure, purification, characterisation and function analysis

proteins denature can be a very useful preliminary step. The temperature at which the protein being purified is denatured is first determined by a small-scale experiment. Once this temperature is known, it is possible to remove more thermolabile contaminating proteins by heating the mixture to a temperature 5–10  C below this critical temperature for a period of 15–30 min. The denatured, unwanted protein is then removed by centrifugation. The presence of the substrate, product or a competitive inhibitor of an enzyme often stabilises it and allows an even higher heat denaturation temperature to be employed. In a similar way, proteins differ in the ease with which they are denatured by extremes of pH ( 3 and > 10). The sensitivity of the protein under investigation to extreme pH is determined by a small-scale trial. The whole protein extract is then adjusted to a pH not less than 1 pH unit within that at which the test protein is precipitated. More sensitive proteins will precipitate and are removed by centrifugation. >

322

Solubility Proteins differ in the balance of charged, polar and hydrophobic amino acids that they display on their surfaces. Charged and polar groups on the surface are solvated by water molecules, thus making the protein molecule soluble, whereas hydrophobic residues are masked by water molecules that are necessarily found adjacent to these regions. Since solubility is a consequence of solvation of charged and polar groups on the surfaces of the protein, it follows that, under a particular set of conditions, proteins will differ in their solubilities. In particular, one exploits the fact that proteins precipitate differentially from solution on the addition of species such as neutral salts or organic solvents. It should be stressed here that these methods precipitate native (i.e. active) protein that has become insoluble by aggregation; we have not denatured the protein. Salt fractionation is frequently carried out using ammonium sulphate. As increasing salt is added to a protein solution, so the salt ions are solvated by water molecules in the solution. As the salt concentration increases, freely available water molecules that can solvate the ions become scarce. At this stage those water molecules that have been forced into contact with hydrophobic groups on the surface of the protein are the next most freely available water molecules (rather than those involved in solvating polar groups on the protein surface, which are bound by electrostatic interactions and are far less easily given up) and these are therefore removed to solvate the salt molecules, thus leaving the hydrophobic patches exposed. As the ammonium sulphate concentration increases, the hydrophobic surfaces on the protein are progressively exposed. Thus revealed, these hydrophobic patches cause proteins to aggregate by hydrophobic interaction, resulting in precipitation. The first proteins to aggregate are therefore those with the most hydrophobic residues on the surface, followed by those with less hydrophobic residues. Clearly the aggregates formed are made of mixtures of more than one protein. Individual identical molecules do not seek out each other, but simply bind to another adjacent molecule with an exposed hydrophobic patch. However, many proteins are precipitated from solution over a narrow range of salt concentrations, making this a suitably simple procedure for enriching the proteins of interest. Organic solvent fractionation is based on differences in the solubility of proteins in aqueous solutions containing water-miscible organic solvents such as ethanol, acetone and butanol. The addition of organic solvent effectively ‘dilutes out’ the water present

323

8.3 Protein purification

(reduces the dielectric constant) and at the same time water molecules are used up in hydrating the organic solvent molecules. Water of solvation is therefore removed from the charged and polar groups on the surface of proteins, thus exposing their charged groups. Aggregation of proteins therefore occurs by charge (ionic) interactions between molecules. Proteins consequently precipitate in decreasing order of the number of charged groups on their surface as the organic solvent concentration is increased. Organic polymers can also be used for the fractional precipitation of proteins. This method resembles organic solvent fractionation in its mechanism of action but requires lower concentrations to cause protein precipitation and is less likely to cause protein denaturation. The most commonly used polymer is polyethylene glycol (PEG), with a relative molecular mass in the range 6000–20 000. The fractionation of a protein mixture using ammonium sulphate is given here as a practical example of fractional precipitation. As explained above, as increasing amounts of ammonium sulphate are dissolved in a protein solution, certain proteins start to aggregate and precipitate out of solution. Increasing the salt strength results in further, different proteins precipitating out. By carrying out a controlled pilot experiment where the percentage of ammonium sulphate is increased stepwise say from 10% to 20% to 30% etc., the resultant precipitate at each step being recovered by centrifugation, redissolved in buffer and analysed for the protein of interest, it is possible to determine a fractionation procedure that will give a significantly purified sample. In the example shown in Table 8.3, the original homogenate was made in 45% ammonium sulphate and the precipitate recovered and discarded. The supernatant was then made in 70% ammonium sulphate, the precipitate collected, redissolved in buffer, and kept, with the supernatant being discarded. This produced a purification factor of 2.7. As can be seen, a significant amount of protein has been removed at this step (237 000 mg of protein) while 81% of the total enzyme present was recovered, i.e. the yield was good. This step has clearly produced an enrichment of the protein of interest from a large volume of extract and at the same time has concentrated the sample. Isoelectric precipitation fractionation is based upon the observations that proteins have their minimum solubility at their isoelectric point. At this pH there are equal numbers of positive and negative charges on the protein molecule; intermolecular repulsions are therefore minimised and protein molecules can approach each other. This therefore allows opposite charges on different molecules to interact, resulting in the formation of insoluble aggregates. The principle can be exploited either to remove unwanted protein, by adjusting the pH of the protein extract so as to cause the precipitation of these proteins but not that of the test protein, or to remove the test protein, by adjusting the pH of the extract to its pI. In practice, the former alternative is preferable, since some denaturation of the precipitation protein inevitably occurs. Finally, an unusual solubility phenomenon can be utilised in some cases for protein purification from E. coli. Early workers who were overexpressing heterologous proteins in E. coli at high levels were alarmed to discover that, although their protein was expressed in high yield (up to 40% of the total cell protein), the protein aggregated to form insoluble particles that became known as inclusion bodies. Initially this was seen as a major impediment to the production of proteins in E. coli, the inclusion bodies effectively being a mixture of monomeric and polymeric denatured proteins formed

324

Protein structure, purification, characterisation and function analysis

by partial or incorrect folding, probably due to the reducing environment of the E. coli cytoplasm. However, it was soon realised that this phenomenon could be used to advantage in protein purification. The inclusion bodies can be separated from a large proportion of the bacterial cytoplasmic protein by centrifugation, giving an effective purification step. The recovered inclusion bodies must then be solubilised and denatured and subsequently allowed to refold slowly to their active, native configuration. This is normally achieved by heating in 6 M guanidinium hydrochloride (to denature the protein) in the presence of a reducing agent (to disrupt any disulphide bridges). The denatured protein is then either diluted in buffer or dialysed against buffer, at which time the protein slowly refolds. Although the refolding method is not always 100% successful, this approach can often produce protein that is 50% or more pure. Having carried out an initial fractionation step such as that described above, one would then move towards using higher resolution chromatographic methods. Chromatographic techniques for purifying proteins are summarised in Table 8.4, and some of the more commonly used methods are outlined below. The precise practical details of each technique are discussed in Chapter 11. Charge Proteins differ from one another in the proportions of the charged amino acids (aspartic and glutamic acids, lysine, arginine and histidine) that they contain. Hence proteins will differ in net charge at a particular pH. This difference is exploited in ionexchange chromatography (Section 11.6), where the protein of interest is bound onto a solid support material bearing charged groups of the opposite sign (ion-exchange resin). Proteins with the same charge as the resin pass through the column to waste, after which bound proteins, containing the protein of interest, are selectively released from the column by gradually increasing the strength of salt ions in the buffer passing through the column or by gradually changing the pH of the eluting buffer. These ions compete with the protein for binding to the resin, the more weakly charged protein being eluted at the lower salt strength and the more strongly charged protein being eluted at higher salt strengths. Size Differences between proteins can be exploited in molecular exclusion (also known as gel filtration) chromatography. The gel filtration medium consists of a range of beads with slighly differing amounts of cross-linking and therefore slightly different pore sizes. The separation process depends on the different abilities of the various proteins to enter some, all or none of the beads, which in turn relates to the size of this protein (Section 11.7). The method has limited resolving power, but can be used to obtain a separation between large and small protein molecules and therefore be useful when the protein of interest is either particularly large or particularly small. This method can also be used to determine the relative molecular mass of a protein (Section 11.7.2) and for concentrating or desalting a protein solution (Section 11.7.2). Affinity Certain proteins bind strongly to specific small molecules. One can take advantage of this by developing an affinity chromatography system where the small molecule

Property exploited

Hydrophobicity

Charge

Biological function

Structure and hydrophobicity

Thiol groups

Imidazole, thiol, tryptophan groups

Molecular size

Technique

Hydrophobic interaction

Ion exchange

Affinity

Dye affinity

Covalent

Metal chelate

Exclusion

Medium

Medium–low

Medium–low

Medium (cost limited)

High

High

Capacity

Low

High

High

High

High

Medium

Medium

Section 11.4.3

Further details

Can give information about protein molecular weight. Good for desalting protein samples

Expensive

Specific for thiol-containing proteins. Limited by high cost and long (3 h) regeneration time

Section 11.7

Section 11.8.4

Section 11.8.6

Necessary to carry out initial screening of a Section 11.8.5 wide range of dye–ligand supports

Section 11.8 Limited by availability of immobilised ligand. Elution may denature protein. Yield medium–low. Commonly used towards end of purification protocol

Section 11.6 Sample ionic strength must be low. Fractions are of varying pH and/or ionic strength. Medium yield. Commonly used in early stages of purification protocol

Can cope with high ionic strength samples, e.g. ammonium sulphate precipitates. Fractions are of varying pH and/or ionic strength. Medium yield. Commonly used in early stages of purification protocol. Unpredictable

Resolution Practical points

Table 8.4 Summary of chromatographic techniques commonly used in protein purification

326

Protein structure, purification, characterisation and function analysis

(ligand) is bound to an insoluble support. When a crude mixture of proteins containing the protein of interest is passed through the column, the ligand binds the protein to the matrix whilst all other proteins pass through the column. The bound protein can then be eluted from the column by changing the pH, increasing salt strength or passing through a high concentration of unbound free ligand. For example, the protein concanavalin A (con A) binds strongly to glucose. An affinity column using glucose as the ligand can therefore be used to bind con A to the matrix, and the con A can be recovered by passing a high concentration of glucose through the column. Affinity chromatography is covered in detail in Section 11.8. Hydrophobicity Proteins differ in the amount of hydrophobic amino acids that are present on their surface. This difference can be exploited in salt fractionation (see above) but can also be used in a higher resolution method using hydrophobic interaction chromatography (HIC) (Section 11.4.3). A typical column material would be phenyl-Sepharose, where phenyl groups are bonded to the insoluble support Sepharose. The protein mixture is loaded on the column in high salt (to ensure hydrophobic patches are exposed) where hydrophobic interaction will occur between the phenyl groups on the resin and hydrophobic regions on the proteins. Proteins are then eluted by applying a decreasing salt gradient to the column and should emerge from the column in order of increasing hydrophobicity. However, some highly hydrophobic proteins may not even be eluted in the total absence of salt. In this case it is necessary to add a small amount of water-miscible organic solvent such as propanol or ethylene glycol to the column buffer solution. This will compete with the proteins for binding to the hydrophobic matrix and will elute any remaining proteins.

8.3.5 Engineering proteins for purification With the ability to clone and overexpress genes for proteins using genetic engineering methodology has also come the ability to aid considerably the purification process by manipulation of the gene of interest prior to expression. These manipulations are carried out either to ensure secretion of the proteins from the cell or to aid protein purification. Ensuring secretion from the cell For cloned genes that are being expressed in microbial or eukaryotic cells, there are a number of advantages in manipulating the gene to ensure that the protein product is secreted from the cell:





To facilitate purification: Clearly if the protein is secreted into the growth medium, there will be far fewer contaminating proteins present than if the cells had to be ruptured to release the protein, when all the other intracellular proteins would also be present. Prevention of intracellular degradation of the cloned protein: Many cloned proteins are recognised as ‘foreign’ by the cell in which they are produced and are therefore degraded by intracellular proteases. Secretion of the protein into the culture medium should minimise this degradation.

327





8.3 Protein purification

Reduction of the intracellular concentration of toxic proteins: Some cloned proteins are toxic to the cell in which they are produced and there is therefore a limit to the amount of protein the cell will produce before it dies. Protein secretion should prevent cell death and result in continued production of protein. To allow post-translational modification of proteins: Most post-translational modifications of proteins occur as part of the secretory pathway, and these modifications, for example glycosylation (see Section 8.4.4), are a necessary process in producing the final protein structure. Since prokaryotic cells do not glycosylate their proteins, this explains why many proteins have to be expressed in eukaryotic cells (e.g. yeast) rather than in bacteria. The entry of a protein into a secretory pathway and its ultimate destination is determined by a short amino acid sequence (signal sequence) that is usually at the N terminus of the protein. For proteins going to the membrane or outside the cell the route is via the endoplasmic reticulum and Golgi apparatus, the signal sequence being cleaved-off by a protease prior to secretion. For example, human g-interferon has been secreted from the yeast Pichia pastoris using the protein’s native signal sequence. Also there are a number of well-characterised yeast signal sequences (e.g. the a-factor signal sequence) that can be used to ensure secretion of proteins cloned into yeast. Fusion proteins to aid protein purification This approach requires an additional gene to be joined to the gene of the protein of interest such that the protein is produced as a fusion protein (i.e. linked to this second protein, or tag). As will be seen below, the purpose of this tag is to provide a means whereby the fusion protein can be selectively removed from the cell extract. The fusion protein can then be cleaved to release the protein of interest from the tag protein. Clearly the amino acid sequence of the peptide linkage between tag and protein has to be carefully designed to allow chemical or enzymatic cleavage of this sequence. The following are just a few examples of many different types of fusion proteins that have been used to aid protein purification. Flag™ This is a short hydrophilic amino acid sequence that is attached to the N-terminal end of the protein, and is designed for purification by immunoaffinity chromatography. Asp-Tyr-Lys-Asp-Asp-Asp-Asp-Lys-Protein A monoclonal antibody against this Flag sequence is available on an immobilised support for use in affinity chromatography. The cell extract, which includes the Flag-labelled protein, is passed through the column where the antibody binds to the Flag-labelled protein, allowing all other proteins to pass through. This is carried out in the presence of Ca2þ, since the binding of the Flag sequence to the monoclonal antibody is Ca2þ dependent. Once all unbound protein has been eluted from the column, the Flag-linked protein is released by passing EDTA through the column, which chelates the Ca2þ. Finally the Flag sequence is removed by the enzyme enterokinase, which recognises the following amino acid sequence and cleaves the C-terminal to the lysine residue:

328

Protein structure, purification, characterisation and function analysis

N-Asp-Asp-Asp-Lys-C. Using this approach, granulocyte-macrophage colony-stimulating factor (GMCSF) was cloned in and secreted from yeast, and purified in a single step. GMCSF was produced in the cell as signal peptide-Flag-gene. The signal sequence used was the signal sequence for the outer membrane protein OmpA. The Flag-gene protein was thus secreted into the periplasm, the fusion protein purified, and finally the Flag sequence removed, as described above. Glutathione affinity agarose In this method the protein of interest is expressed as a fusion protein with the enzyme glutathione S-transferase. The cell extract is passed through a column of glutathionelinked agarose beads, where the enzyme binds to the glutathione. Once all unbound protein has been washed through the column, the fusion protein is eluted by passing reduced glutathione through the column. Finally, cleavage of the fusion protein is achieved using human thrombin, which recognises a specific amino acid sequence in the linker region. Protein A Protein A binds to the Fc region of the immunoglobulin G (IgG) molecule. The protein of interest is cloned fused to the protein A gene, and the fusion protein purified by affinity chromatography on a column of IgG-Sepharose. The bound fusion protein is then eluted using either high salt or low pH, to disrupt the binding between the IgG molecule and the protein A–protein fusion product. Protein A is then finally removed by treatment with 70% (v/v) formic acid for 2 days, which cleaves an acid-labile Asp-Pro bond in the linker region. Poly(arginine) This method requires the addition of a series of arginine residues to the C terminus of the protein to be purified. This makes the protein highly basic (positively charged at neutral pH). The cell extract can therefore be fractionated using cation-exchange chromatography. Bound proteins are sequentially released from the column by applying a salt gradient, with the poly(Arg)-containing protein, because of its high overall positive charge, being the last to be eluted. The poly(Arg) tail is then removed by incubation with the enzyme carboxypeptidase B. Carboxypeptidase B is an exoprotease that sequentially removes arginine or lysine residues from the C terminus of proteins. The arginine residues are therefore sequentially removed from the C terminus, the removal of amino acid residues stopping when the ‘normal’ (i.e. non-arginine) C-terminal amino acid residue of the protein is reached.

8.4 PROTEIN STRUCTURE DETERMINATION 8.4.1 Relative molecular mass There are three methods available for determining protein relative molecular mass, Mr, frequently referred to as molecular weight. The first two described here are quick and easy methods that will give a value to  5–10%. For many purposes one simply needs a rough

329

8.4 Protein structure determination

estimate of size and these methods are sufficient. The third method, mass spectrometry, requires expensive specialist instruments and can give accuracy to  0.001%. This kind of accuracy is invaluable in detecting postsynthetic modification of proteins. SDS-polyacrylamide gel electrophoresis (SDS-PAGE) This form of electrophoresis, described in Section 10.3.1, separates proteins on the basis of their shape (size), which in turn relates to their relative molecular masses. A series of proteins of known molecular mass (molecular weight markers) are run on a gel on a track adjacent to the protein of unknown molecular mass. The distance each marker protein moves through the gel is measured and a calibration curve of log Mr versus distance moved is plotted. The distance migrated by the protein of unknown Mr is also measured, and from the graph its log Mr and hence Mr is calculated. The method is suitable for proteins covering a large Mr range (10 000–300 000). The method is easy to perform and requires very little material. If silver staining (Section 10.3.7) is used, as little as 1 ng of protein is required. In practice SDS–PAGE is the most commonly used method for determining protein Mr values. Molecular exclusion (gel filtration) chromatography The elution volume of a protein from a molecular exclusion chromatography column having an appropriate fractionation range is determined largely by the size of the protein such that there is a logarithmic relationship between protein relative molecular mass and elution volume (Section 11.7.1). By calibrating the column with a range of proteins of known Mr, the Mr of a test protein can be calculated. The method is carried out on HPLC columns (1  30 cm) packed with porous silica beads. Flow rates are about 1 cm3 min1, giving a run time of about 12 min, producing sharp, wellresolved peaks. A linear calibration line is obtained by plotting a graph of log Mr versus Kd for the calibrating proteins. Kd is calculated from the following equation: Kd ¼

ðVe  Vo Þ ðVt  Vo Þ

where Vo is the volume in which molecules that are wholly excluded from the column material emerge (the excluded volume), Vt is the volume in which small molecules that can enter all the pores emerge (the included volume) and Ve is the volume in which the marker protein elutes. This method gives values that are accurate to  10%. Mass spectrometry Using either electrospray ionisation (ESI) (Section 9.2.4) or matrix-assisted laser desorption ionisation (MALDI) (Section 9.3.8) intact molecular ions can be produced for proteins and hence their masses accurately measured by mass spectrometry. ESI produces molecular ions from molecules with molecular masses up to and in excess of 100 kDa, whereas MALDI produces ions from intact proteins up to and in excess of 200 kDa. In either case, only low picomole quantities of protein are needed. For example, ab2 crystallin gave a molecular mass value (20 200  0.9), in excellent agreement with the deduced mass of 20 201. However, in addition about 10% of the analysed material produced an ion of mass 20 072.2. This showed that some of the purified protein molecules had lost their N-terminal amino acid (lysine). The deduced mass with

330

Protein structure, purification, characterisation and function analysis

the loss of N-terminal lysine was 20 072.8. Clearly mass spectrometry has the ability to provide highly accurate molecular mass measurements for proteins and peptides, which in turn can be used to deduce small changes made to the basic protein structure.

8.4.2 Amino acid analysis The determination of which of the 20 possible amino acids are present in a particular protein, and in what relative amounts, is achieved by hydrolysing the protein to yield its component amino acids and identifying and quantifying them chromatographically. Hydrolysis is achieved by heating the protein with 6 M hydrochloride acid for 14 h at 110  C in vacuo. Unfortunately, the hydrolysis procedure destroys or chemically modifies the asparagine, glutamine and tryptophan residues. Asparagine and glutamine are converted to their corresponding acids (Asp and Glu) and are quantified with them. Tryptophan is completely destroyed and is best determined spectrophotometrically on the unhydrolysed protein. The amino acids in the protein hydrolysate are then separated chromatographically. Nowadays this is normally done using the method of precolumn derivatisation, followed by separation by reverse-phase HPLC. In this approach the amino acid hydrolysate is first treated with a molecule that (i) reacts with amino groups in amino acids, (ii) is hydrophobic, thus allowing separation of derivatised amino acids by reversedphase HPLC and (iii) is easily detected by its ultraviolet absorbance or fluorescence. Reagents routinely used for precolumn derivatisation include o-phthalaldehyde and 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC), which both produce fluorescent derivatives, and phenylisothiocyanate, which produces a phenylthiocarbamyl derivative that is detected by its absorbance at 254 nm. Analysis times can be as little as 20 min, and sensitivity is down to 1 pmole or less of amino acid.

8.4.3 Primary structure determination For many years the amino acid sequence of a protein was determined from studies made on the purified protein alone. This in turn meant that sequence data available were limited to those proteins that could be purified in sufficiently large amounts. Knowledge of the complete primary structure of the protein was (and still is) a prerequisite for the determination of the three-dimensional structure of the protein, and hence an understanding of how that protein functions. However, nowadays the protein biochemist is normally satisfied with data from just a relatively short length of sequence either from the N terminus of the protein or from an internal sequence, obtained by sequencing peptides produced by cleavage of the native protein. The sequence data will then most likely be used for one of three purposes:



To search sequence databases to see whether the protein of interest has already been isolated, and hence can therefore be identified. For this type of search extremely short lengths of sequence (three to five residues), known as sequence tags, need to be used. Examples of this type of data search are given in Sections 8.5.1 and 9.5.2.





8.4 Protein structure determination

To search for sequence homology using computerised databases in order to identify the function of the protein. For example, the search may show significant sequence identity with the amino acid sequence of some known protein tyrosine kinases, strongly suggesting that the protein is also a tyrosine kinase. The sequence will be used to design an oligonucleotide probe for selecting appropriate clones from complementary DNA libraries. In this way the DNA coding for the protein can be isolated and the DNA sequence, and hence the protein sequence, determined. Obtaining a protein sequence in this way is far less laborious and time-consuming than having to determine the total protein sequence by analysis of the protein. A further use of protein sequence data is in quality control in the biopharmaceutical industry. Many pharmaceutical companies produce products that are proteins, for example peptide hormones, antibodies, therapeutic enzymes, etc., and synthetic peptides also require analysis to confirm their identities. Sequence analysis, especially to determine sites and nature of postsynthetic modifications such as glycosylation, is necessary to confirm the structural integrity of these products. Edman degradation In 1950, Per Edman published a chemical method for the stepwise removal of amino acid residues from the N terminus of a peptide or protein. This series of reactions came to be known as the Edman degradation, and the method still remains the most effective chemical means for removing amino acid residues in a stepwise fashion from a polypeptide chain and thus determining the order of amino acids at the N-terminus of a protein or peptide. However, the method is only infrequently used nowadays and will not be described in any detail here. Developments in the use of mass spectrometry over the past 20 years has led to mass spectrometry being the method of choice nowadays for determining protein sequences, and is discussed in more detail below and in Chapter 9. Protein cleavage and peptide production When studying proteins there are many occasions when one might wish to cleave a protein into peptide fragments (see, for example, peptide mass fingerprinting, Section 8.5.1). Peptides can be produced by either chemical or enzymatic cleavage of the native protein (see Table 8.5). Chemical methods tend to produce large fragments, as they cleave at the less common amino acids (often giving as few as two or three large peptides). Enzymatic methods tend to cleave adjacent to the more common amino acids (e.g. trypsin cleaves at every arginine and lysine residue in a protein), thus often producing as many as 50 or more peptides from a protein. Throughout this and other chapters, you will come across examples of where it is necessary to study peptide fragments of a protein. Mass spectrometry Because of the absolute requirement to produce ions in the gas phase for the analysis of any sample by mass spectrometry (MS), for many years MS analysis was applicable only to small, non-polar molecules ( 500 Mr). However, in the early 1980s the >

331

332

Protein structure, purification, characterisation and function analysis

Table 8.5 Specific cleavage of polypeptide Reagent

Specificity

Enzymic cleavage Chymotrypsin

C-terminal side of hydrophobic amino acid residues, e.g. Phe, Try, Tyr, Leu

Endoproteinase Arg-C

C-terminal side of arginine

Endoproteinase Asp-N

Peptide bonds N-terminal to aspartate or cysteine residues

Endoproteinase Glu-C

C-terminal side of glutamate residues and some aspartate residues

Endoproteinase Lys-C

C-terminal side of lysine

Thermolysin

N-terminal side of hydrophobic amino acid residues excluding Trp

Trypsin

C-terminal side of arginine and lysine residues but Arg-Pro and Lys-Pro poorly cleaved

Chemical cleavage BNPS skatole N-Bromosuccinimide o-Iodosobenzoate

g

C-terminal side of tryptophan residues

Cyanogen bromide

C-terminal side of methionine residues

Hydroxylamine

Asparagine–glycine bonds

2-Nitro-5-thiocyanobenzoate

N-terminal side of cysteine residues

introduction of fast atom bombardment (FAB) MS allowed the analysis of large, charged molecules such as proteins and peptides to be achieved for the first time. The further development of more sophisticated methods such as electrospray ionisation (ESI) and matrix-assisted laser desorption ionisation (MALDI) (see Chapter 9) has led to the analysis of protein by mass spectrometry becoming routine. Although the Edman degradation still has occasional applications in protein structure analysis, mass spectrometry is now the method of choice for determining amino acid sequence data. When peptides are fragmented by MS it is fortunate that cleavage occurs predominantly at the peptide bond (although it must be noted that other fragmentations, such as internal cleavages, secondary fragmentations, etc. do occur, thus complicating the mass spectrum). This means that the peptide fragments produced each differ sequentially by the mass of one amino acid residue. The amino acid sequence can thus be readily deduced. In particular, if side-chain modifications occur, these can also be observed due to the corresponding increase in mass difference. The use of mass spectrometry to obtain sequence data from proteins and peptides is described more fully in Section 9.5. Tandem mass spectrometry (MS/MS or MS2) is also increasingly being used to obtain sequence data. A digest of the protein (e.g. with

333

8.4 Protein structure determination

trypsin) is separated by MS. The ion corresponding to one peptide is selected in the first analyser and collided with argon gas in a collision cell to generate fragment ions. The fragment ions thus generated are then separated, according to mass, in a second analyser, identified, and the sequence determined as described in Section 9.5.2. A further method, ladder sequencing, has been developed, and combines the Edman chemistry with MS. Edman sequencing is carried out using a mixture of PITC and phenylisocyanate (PIC) (at about 5% of the concentration of PITC). N-terminal amino groups that react with PIC are effectively blocked as they are not cleaved at the acid cleavage step. Consequently, at each cycle, approximately 5% of the protein molecules are blocked. Thus, after 20 to 30 cycles of Edman degradation, a nested set of peptides is produced, each differing by the loss of one amino acid. Analysis of the mass of each of these polypeptides using ESI or MALDI allows the determination of the molecular mass of each polypeptide and the difference in mass between each molecule identifies the lost amino acid residue. Detection of disulphide linkages For proteins that contain more than one cysteine residue it is important to determine whether, and if so how many, cysteine residues are joined by disulphide bridges. The most commonly used method involves the use of MS (Section 9.5.5). The native protein (i.e. with disulphide bridges intact) is cleaved with a proteolytic enzyme (e.g. trypsin) to produce a number of small peptides. The same experiment is also carried out on proteins treated with dithiothreitol (DTT) which reduces (cleaves) the disulphide bridges. MALDI spectra of the tryptic digest before and after reduction with DTT allows identification of disulphide-linked peptides. Linked peptides from the native protein will disappear from the spectrum of the reduced protein and reappear as two peptides of lower mass. Knowledge of the exact mass of each of the two peptides, and knowledge of the cleavage site of the enzyme used, will allow easy identification of the two peptides from the known protein sequence. Thus, if the mass of two disulphide-linked peptides is M, and this is reduced to two separate chains of masses A and B, respectively, then A þ B ¼ M þ 2. The extra two mass units derive from the fact that reduction of the disulphide bond results in an increase of mass of þ 1 for both cysteine residues. 2H

-S  S- !  SH þ HSHydrophobicity profile Having determined the amino acid sequence of a protein, analysis of the distribution of hydrophobic groups along the linear sequence can be used in a predictive manner. This requires the products of a hydrophobicity profile for the protein, which graphs the average hydrophobicity per residue against the sequence number. Averaging is achieved by evaluating, using a predictive algorithm, the mean hydrophobicity within a moving window that is stepped along the sequence from each residue to the next. In this way, a graph comprising a series of curves is produced and reveals areas of minima and maxima in hydrophobicity along the linear polypeptide chain. For membrane proteins, such profiles allow the identification of potential membrane-spanning segments. For example, an analysis of a thylakoid membrane protein revealed seven general regions of the protein

334

Protein structure, purification, characterisation and function analysis

sequence that contained spans of 20–28 amino acid residues, each of which contained predominantly hydrophobic residues flanked on either side by hydrophilic residues. These regions represent the seven membrane-spanning helical regions of the protein. For membrane proteins defining aqueous channels, hydrophilic residues are also present in the transmembrane section. Pores comprise amphipathic a-helices, the polar sides of which line the channel, whereas the hydrophobic sides interact with the membrane lipids. More advanced algorithms are used to detect these sequences, since such helices would not necessarily be revealed by simple hydrophobicity analysis.

8.4.4 Glycoproteins Glycoproteins result from the covalent attachment of carbohydrate chains (glycans), both linear and branched in structure, to various sites on the polypeptide backbone of a protein. These post-translational modifications are carried out by cytoplasmic enzymes within the endoplasmic reticulum and Golgi apparatus. The amount of polysaccharide attached to a given glycoprotein can vary enormously, from as little as a few per cent to more than 60% by weight. Glycoproteins tend to be found in the serum and in cell membranes. The precise role played by the carbohydrate moiety of glycoproteins includes stabilisation of the protein structure, protection of the protein from degradation by proteases, control of protein half-life in blood, the physical maintenance of tissue structure and integrity, a role in cellular adhesion and cell–cell interaction, and as an important determinant in receptor–ligand binding. The major types of protein glycoconjugates are:

• • •

N-linked; O-linked; glycosylphosphatidylinositol (GPI)-linked. N-linked glycans are always linked to an asparagine residue side-chain (Fig. 8.3) at a consensus sequence Asn-X-Ser/Thr where X is any amino acid except proline. O-linked glycosylation occurs where carbohydrate is attached to the hydroxyl group of a serine or threonine residue (Fig. 8.3). However, there is no consensus sequence similar to that found for N-linked oligosaccharides. GPI membrane anchors are a more recently discovered modification of proteins. They are complex glycophospholipids that are covalently attached to a variety of externally expressed plasma membrane proteins. The role of this anchor is to provide a stable association of protein with the membrane lipid bilayer, and will not be discussed further here. There is considerable interest in the determination of the structure of O- and N-linked oligosaccharides, since glycosylation can affect both the half-life and function of a protein. This is particularly important of course when producing therapeutic glycoproteins by recombinant methods as it is necessary to ensure that the correct carbohydrate structure is produced. It should be noted that prokaryotic cells do not produce glycoproteins, so cloned genes for glycoproteins need to be expressed in eukaryotic cells. The glycosylation of proteins is a complex subject. From one glycoprotein to another there are variations in the sites of glycosylation (e.g. only about

335

8.4 Protein structure determination

NH

O CH2OH O NH OH

C

CH2

CH

Asn

C

O

O

O NH

C

CH3 X

GlcNAc Ser orThr

N-glycosylation

NH CH2OH O O OH

CH2

CH

O C

O NH GalNAc

C

CH3

O Ser (orThr)

O-glycosylation

Fig. 8.3 The two types of oligosaccharide linkages found in glycoproteins.

30% of consensus sequences for N-linked attachments are occupied by polysaccharide; the nature of the secondary structure at this position also seems to play a role in deciding whether glycosylation takes place), variations in the type of amino acid–carbohydrate bond, variations in the composition of the sugar chains, and variations in the particular carbohydrate sequences and linkages in each chain. There are eight monosaccharide units commonly found in mammalian glycoproteins, although other less common units are also known to occur. These eight are N-acetyl neuraminic acid (NeuNAc), N-glycolyl neuraminic acid (NeuGc), D-galactose (Gal), N-acetyl-D-glucosamine (GlcNac), N-acetyl-D-galactosamine (GalNAc), D-mannose (Man), L-fucose (Fuc) and D-xylose (Xyl). To further complicate the issue, within any population of molecules in a purified glycoprotein there can be considerable heterogeneity in the carbohydrate structure (glycoforms). This can include some molecules showing increased branching of sugar side-chains, reduced chain length and further addition of single carbohydrate units to the same polypeptide chain. The complete determination of the glycosylation status of a molecule clearly requires considerable effort. However, the steps involved are fairly straight forward and the following therefore provides a generalised (and idealised) description of the overall procedures used.

336

Protein structure, purification, characterisation and function analysis

The first question to be asked about a purified protein is ‘Is it a glycoprotein?’ Glycoprotein bands in gels (e.g. on SDS-polyacrylamide gels) can be stained with cationic dyes such as Alcian Blue, which bind to negatively charged glycosaminoglycan side-chains, or by the periodic acid-Schiff reagent (PAS), where carbohydrate is initially oxidised by periodic acid then subsequently stained with Schiff’s reagent. However, although they are both carbohydrate specific (i.e. non-glycosylated proteins are not stained) both methods suffer from low sensitivity. A more sensitive, and informative, approach is to use the specific carbohydrate-binding proteins known as lectins. Blots from SDS-PAGE, dot blots of the glycoprotein sample, or the glycoprotein sample adsorbed onto the walls of a microtitre plate can be challenged with enzyme-linked lectins. Lectins that bind to the glycoprotein can be identified by the associated enzymic activity. By repeating the experiment with a range of different lectins, one can not only confirm the presence of a glycoprotein but also identify which sugar residues are, or are not, present. Having confirmed the presence of glycoprotein the following procedures would normally be carried out.

• •





Identification of the type and amount of each monosaccharide: Release of monosaccharides is achieved by hydrolysis in methanolic HCl at 80  C for 18 h. The released monosaccharide can be separated and quantified by gas chromatography. Protease digestion to release glycopeptide: A protease is chosen that cleaves the glycoprotein into peptides and glycopeptides of ideally 5–15 amino acid residues. Glycopeptides are then fractionated by HPLC and purified glycopeptides subjected to N-terminal sequence analysis to allow identification of the site of glycosylation. Oligosaccharide profiling: Oligosaccharide chains are released from the polypeptide backbone either chemically, for example by hydrazinolysis to release N-linked oligosaccharide, or enzymatically using peptide-N-glucosidase F (PNGase F), which cleaves sugars at the asparagine link, or using endo-a-N-acetylgalactosaminidase (O-glycanase), which cleaves O-linked glycans. These released oligosaccharides can then be separated either by HPLC or by high performance anion exchange chromatography (HPAEC). Structure analysis of each purified oligosaccharide: This requires the determination of the composition, sequence and nature of the linkages in each purified oligosaccharide. A detailed description is beyond the scope of this book, but would involve a mixture of complementary approaches including analysis by FAB-MS, gas chromatography-MS, lectin analysis following partial release of sugars and nuclear magnetic resonance (NMR) analysis.

8.4.5 Tertiary structure The most commonly used method for determining protein three-dimensional structure is X-ray crystallography. A detailed description of the theory and methodology is beyond the scope of this book, requiring a detailed mathematical understanding of the process and computer analysis of the extensive data that are generated. The following is therefore a brief and idealised description of the overall process, and ignores the multitude of pitfalls and problems inherent in determining three-dimensional structures.



Clearly the first step must be to produce a crystal of the protein (a crystal should be thought of as a three-dimensional lattice of molecules). Protein crystallisation is

337

8.4 Protein structure determination

attempted using as homogeneous a preparation as possible, such preparations having a greater chance of yielding crystals than material that contains impurities. Because of our inadequate understanding of the physical processes involved in crystallisation, methods for growing protein crystals are generally empirical, but basically all involve varying the physical parameters that affect solubility of the protein–for example pH, ionic strength, temperature, presence of precipitating agents–to produce a state of supersaturation. The process involves extensive trial and error to find a procedure that results in crystals for a particular protein. Initially this involves a systematic screen of methods to identify those conditions that indicate crystallinity, followed by subsequent experiments that involve fine-tuning of these conditions. Basically, nucleation sites of crystal growth are formed by chance collisions of molecules forming molecular aggregates, and the probability that these aggregates will occur will be greater in a saturated solution. Clearly, to produce saturated solutions, tens of milligrams of proteins are required. This used to represent a considerable challenge for other than the most abundant proteins, but nowadays genetic engineering methodology allows the overproduction of most proteins from cloned genes almost on demand. The following are some of the methods that have proved successful. (a) Dialysis. A state of supersaturation is achieved by dialysis of the protein solution against a solution containing a precipitant, or by a gradual change in pH or ionic strength. Because of frequent limitations on the amount of protein available, this approach often uses small volumes ( 50 mm3) for which a number of microdialysis techniques exist. (b) Vapour diffusion. This process relies on controlled equilibration through the vapour phase to produce supersaturation in the sample. For example, in the hangingdrop method, a microdroplet (2–20 mm3) of protein is deposited on a glass coverslip; then the coverslip is inverted and placed over a sealed reservoir containing a precipitant solution, with the droplet initially having a precipitant concentration lower than that in the reservoir. Vapour diffusion will then gradually increase the concentration of the protein solution. Because of the small volumes involved this method readily lends itself to screening large numbers of different conditions. When produced, crystals may not be of sufficient size for analysis. In this case larger crystals can be obtained by using a small crystal to seed a supersaturated protein solution, which will result in a larger crystal. Once prepared, the crystal (which is extremely fragile) is mounted inside a quartz or glass capillary tube, with a drop of either mother liquor (the solution from which it was crystallised) or a stabilising solution drawn into one end of the capillary tube to prevent the crystal from drying out. The tube is then sealed and the crystal exposed to a beam of X-rays. Since the wavelength of X-rays is comparable to the planar separation of atoms in a crystal lattice, the crystal can be considered to act as a three-dimensional grating. The X-rays are therefore diffracted, interfering both in phase and out of phase to produce a diffraction pattern as shown in Fig. 8.4. Data collection technology necessary for recording the diffraction pattern is now highly sophisticated. Originally, conventional diffractometers and photographic film were used to detect diffracted X-rays. This involved wet developing of the film and subsequent digital scanning of the negative. Data collection by this method took many weeks. By contrast, modern area-detectors can collect data in under 24 h. >



338

Protein structure, purification, characterisation and function analysis

Fig. 8.4 X-ray diffraction frame of data from a crystal of herpes simplex virus type 1 thymidine kinase, complexed with substrate deoxythymidine, at 2 A˚ resolution. (Picture provided by John N. Champness, Matthew S. Bennett and Mark R. Sanderson of King’s College London.)



Unfortunately the diffraction pattern alone is insufficient to determine the crystal structure. Each diffraction maximum has both an amplitude and a phase associated with it, and both need to be determined. But the phases are not directly measurable in a diffraction experiment and must be estimated from further experiments. This is usually done by the method of isomorphous replacement (MIR). The MIR method requires at least two further crystals of the protein (derivatives), each being crystallised in the presence of a different heavy-metal ion (e.g. Hg2þ, Cu2þ, Mn2þ). Comparison of the diffraction patterns from the crystalline protein and the crystalline heavy-metal atom derivative allows phases to be estimated. A more recent approach to producing a heavy-metal derivative is to clone the protein of interest into a methionine

339

8.4 Protein structure determination

Fig. 8.5 (Relaxed-eye stereo pair): A Ca-trace of herpes simplex virus type 1 thymidine kinase from a crystallographic study of a complex of the enzyme with one of its substrates, deoxythymidine. The enzyme is an a–b protein, having a five-stranded parallel b-sheet surrounded by 14 a-helices. The active site, occupied by deoxythymidine, is a volume surrounded by four of the helices, the C-terminal edge of the b-sheet and a short ‘flap’ segment; a sulphate ion occupies the site of the b-phosphate of the absent co-substrate ATP. (Short missing regions of chain indicate where electron density calculated from the X-ray data could not be interpreted.) (Picture provided by John N. Champness, Matthew S. Bennett and Mark R. Sanderson of King’s College London.)



auxotroph, and then grow this strain in the presence of selenomethionine (a seleniumcontaining analogue of methionine). Selenomethionine is therefore incorporated into the protein in the place of methionine, and the final purified and crystallised protein has the selenium heavy metal conveniently included in its structure. Diffraction data and phase information having been collected, these data are processed by computer to construct an electron density map. The known sequence of the protein is then fitted into the electron density map using computer graphics, to produce a three-dimensional model of the protein (Fig. 8.5.). In the past there had been concern that the three-dimensional structure determined from the rigid molecules found in a crystal may differ from the true, more flexible, structure found in free solution. These concerns have been effectively resolved by, for example, diffusing substrate into an enzyme crystal and showing that the substrate is converted into product by the crystalline enzyme (there is sufficient mother liquor within the crystal to maintain the substrate in solution). In a more recent development, it is now becoming possible to determine the solution structure of protein using NMR. At present the method is capable of determining the structure of a protein up to about 20 000 kDa but will no doubt be developed to study larger proteins. Although the time-consuming step of producing a crystal is obviated, the methodology and data analysis involved are at present no less time-consuming and complex than that for X-ray crystallography.

340

Protein structure, purification, characterisation and function analysis

8.5 PROTEOMICS AND PROTEIN FUNCTION In order to completely understand how a cell works, it is necessary to understand the function (role) of every single protein in that cell. The analysis of any specific disease (e.g. cancer) will also require us to understand what changes have taken place in the protein component of the cell, so that we can use this information to understand the molecular basis of the disease, and thus design appropriate drug therapies and develop diagnostic methods. (Just about every therapeutic drug that is currently in use has a protein as its target.) The completion of the Human Genome Project might suggest that it is not now necessary to study proteins directly, since the amino acid sequence of each protein can be deduced from the DNA sequence. This is not true for the following reasons:





First, although the DNA in each cell type in the body is the same, different sets of genes are expressed in different tissues, and hence the protein component of a cell varies from cell type to cell type. For example, some proteins are found in nearly all cells (the so-called house-keeping genes) such as those involved in glycolysis, whereas specific cell types such as kidney, liver, brain, etc. contain specific proteins unique to that tissue and necessary for the functioning of that particular tissue/organ. It is therefore only by studying the protein component of a cell directly that we can identify which proteins are actually present. Secondly, it is now appreciated that a single DNA sequence (gene) can encode multiple proteins. This can occur in a number of ways: (i) Alternative splicing of the mRNA transcript. (ii) Variation in the translation ‘stop’ or ‘start’ sites. (iii) Frameshifting, where a different set of triplet codons is translated, to give a totally different amino acid sequence. (iv) Post-translational modifications. The genome sequence defines the amino acid sequence of a protein, but tells us nothing of any post-translational modifications (Sections 8.2.1 and 9.5.5) that can occur once the polypeptide chain is synthesised at the ribosome. Up to 10 different forms (variants) of a single polypeptide chain can be produced by phosphorylation, glycosylation, etc. The consequence of the above is that the total protein content of the human body is an order of magnitude more complex than the genome. The human genome sequence suggests there may be 30 000–40 000 genes (and hence proteins) whereas estimates of the actual number of proteins in human cells suggests possibly as many as 200 000 or even more. The dogma that one gene codes for one protein has been truly demolished! From the above, I hope it is easy to appreciate the need to directly analyse the protein component of the cell, and the need for an understanding of the function of each individual protein in the cell. In recent years, development of new techniques (discussed below) has enhanced our ability to study the protein component of the cell and has led to the introduction of the terms proteome and proteomics. The total DNA composition

341

8.5 Proteomics and protein function

of a cell is referred to as the genome, and the study of the structure and function of this DNA is called genomics. By analogy, the proteome is defined as the total protein component of a cell, and the study of the structure and function of these proteins is called proteomics. The ultimate aim of proteomics is to catalogue the identity and amount of every protein in a cell, and determine the function of each protein. Earlier sections of this chapter and Chapter 11 describe the traditional, but still very valid approach to studying proteins, where individual proteins are extracted from tissue and purified so that studies can be made of the structure and function of the purified proteins. The subject of proteomics has developed from a different approach, where modern techniques allow us to view and analyse much of the total protein content of the cell in a single step. The development of these newer techniques has gone hand-in-hand with the development of techniques for the analysis of proteins by mass spectrometry, which has revolutionised the subject of protein chemistry. The cornerstone of proteomics has been two-dimensional (2-D) PAGE (described in Section 10.3) and the applications of this technique in proteomics are described below. However, although 2-D PAGE remains central to proteomics, the study of proteomics has stimulated the development of further methods for studying proteins and these will also be described below.

8.5.1 2-D PAGE 2-D PAGE has found extensive use in detecting changes in gene expressions between two different biological states, for example comparing normal and diseased tissue. In this case, a 2-D gel pattern would be produced of an extract from a diseased tissue such as a liver tumour and compared with the 2-D gel patterns of an extract from normal liver tissue. The two gel patterns are then compared to see whether there are any differences in the two patterns. If it is found that a protein is present (or is absent) only in the liver tumour sample, then by identifying this protein we are directed to the gene for this protein and can thus try to understand why this gene is expressed (or not) in the diseased state. In this way it is possible to obtain an understanding of the molecular basis of diseases. This approach can be taken to study any disease process where normal and diseased tissue can be compared, for example arthritis, kidney disease, or heart valve disease. Under favourable circumstances up to 5000 protein spots can be identified on a large format 2-D gel. Thus with 2-D PAGE we now have the ability to follow changes in the expression of a significant proportion of the proteins in a cell or tissue type, rather than just one or two, which has been the situation in the past. The potential applications of proteome analysis are vast. Initially one must produce a 2-D map of the proteins expressed by an organism, tissue or cell under ‘normal’ conditions. This 2-D reference map and database can then be used to compare similar information from ‘abnormal’ or treated organisms, tissues or cells. For example, as well as comparing normal tissue with diseased tissue (as described above), we can:

• •

analyse the effects of drug treatment or toxins on cells; observe the changing protein component of the cell at different stages of tissue development;

observe the response to extracellular stimuli such as hormones or cytokines; compare pathogenic and non-pathogenic bacterial strains; compare serum protein profiles from healthy individuals and Alzheimer or cancer patients to detect proteins, produced in the serum of patients, which can then be developed as diagnostic markers for diseases (e.g. by setting up an enzyme-linked immunosorbent assay (ELISA) to measure the specific protein). As a typical example, a research group studying the toxic effect of drugs on the liver can compare the 2-D gel patterns from their ‘damaged’ livers with the normal liver 2-D reference map, thus identifying protein changes that occur as a result of drug treatment. The sheer complexity and amount of data available from 2-D gel patterns is daunting, but fortunately there is a range of commercial 2-D gel analysis software, compatible with personal computer workstations, which can provide both qualitative and quantitative information from gel patterns, and can also compare patterns between two different 2-D gels (see below). This has allowed the construction of a range of databases of quantitative protein expression in a range of tissue and cell types. For example, an extensive series of 2-DE databases, known as SWISS-2D PAGE, is maintained at Geneva University Hospital and is accessible via the World Wide Web at http://au.expasy.org/ch2d/>. This facility therefore allows an individual laboratory to compare their own 2-D protein database with that in another laboratory. The comparison of two gel patterns is made by using any one of a number of software packages designed for this purpose. One of the more interesting approaches to comparing gel patterns is the use of the Flicker program, which is available on the Web at http://open2dprot.sourceforge.net/Flicker>. This program superimposes the two 2-D patterns to be compared and then alternately, and rapidly, displays one pattern and then the other. Spots that appear on both gel patterns (the majority) will be seen as fixed spots, but a spot that appears on one gel and not the other will seen to be flashing (hence ‘flicker’). When one has compared two 2-DE patterns and identified any proteins spot(s) of interest, it is then necessary to identify each specific protein. In the majority of cases this is done by peptide mass-fingerprinting. The spot of interest is cut out of the gel and incubated in a solution of the proteolytic enzyme trypsin, which cleaves the protein C-terminal to each arginine and lysine residue. In this way the protein is reduced to a set of peptides. This collection of peptides is then analysed by MALDI-MS (see Section 9.3.8) to give an accurate mass measurement for each of the peptides in the sample. This set of masses, derived from the tryptic digestion of the protein, is highly diagnostic for this protein, as no other protein would give the same set of peptide masses (fingerprint). Using Web-based programs such as Mascot or Protein Prospector this experimentally derived peptide mass-fingerprint is compared with databases of tryptic peptide mass-fingerprints generated from sequences of known proteins (or predicted sequences deduced from nucleotide sequences). If a match is found with a fingerprint from the database then the protein will be identified. However, sometimes results from peptide mass-fingerprinting can be ambiguous. In this case it is necessary to obtain some partial amino acid sequence data from one of the peptides. This is done by tandem mass spectrometry (MS/MS; Section 9.5), >

• • •

Protein structure, purification, characterisation and function analysis

>

342

343

8.5 Proteomics and protein function

1280.0 Tag is (1116.8) YWS (1553.6) 15000

1553.6 1116.8 758.8 890.2

10000

1002.0 595.8 Y

E

N

D

Y

5000 W

S

1000 300

600

900

1200

1500

m/z

Fig. 8.6 Nano-ESI MS2 spectrum of m/z 890 from RBL spot 2 showing construction of a sequence tag. The y-axis shows relative intensity. (Courtesy of Glaxo SmithKline, Stevenage, UK.)

where one of the peptides separated for mass-fingerprinting is further fragmented in a second analyser, and from the fragmentation pattern sequence data can be deduced (mass spectrometry conveniently fragments peptides at the peptide bond, such that the difference in the mass of fragments produced can be related to the loss of specific amino acids; Section 9.5.2). This partial sequence data is then used to search the protein sequence databases for sequence identity. Universal databases are available that store information on all types of protein from all biological species. These databases can be divided into two categories: (i) databases that are a simple repository of sequence data, mostly deduced directly from DNA sequences, for example the Tr EMBL database; and (ii) annotated databases where information in addition to the sequence is extracted by the biologist (the annotator) from the literature, review article, etc., for example the SWISS-PROT database. An example of how sequence data can be produced is shown in Fig. 8.6. A lysate of 2  106 rat basophil leukaemic (RBL) cells were separated by 2-D electrophoresis and spot 2 chosen for analysis. This spot was digested in situ using trypsin and the resultant peptides extracted. This sample was then analysed by tandem MS using a triple quadrupole instrument (ESI-MS2). MS of the peptide mixture showed a number of molecular ions relating to peptides. One of these (m/z 890) was selected for further analysis, being further fragmented in a quadrupole mass spectrometer to give fragment ions ranging from m/z 595.8 to 1553.6 (Fig. 8.6). The ions at m/z 1002.0, 1116.8, 1280.0, 1466.2 and 1553.6 are likely to be part of a Y ion series (see Fig. 8.6) as they appear at higher m/z than the precursor at m/z 890. The gap between adjacent Y ions is

344

Protein structure, purification, characterisation and function analysis

Fig. 8.7 The PeptideSearchTM input form and search result based on data obtained from nano-ESI MS2 of m/s 890 from RBL Spot 2. (Courtesy of Glaxo SmithKline, Stevenage, UK.)

related directly to an amino acid residue because the two flanking Y ions result from cleavage of two adjacent amide bonds. Therefore, with a knowledge of the relative molecular masses of each of the 20 naturally occurring amino acids, it is possible to determine the presence of a particular residue at any point within the peptide. The position of the assigned amino acid is deduced by virtue of the m/z ratio of the two ions. By reading several amino acids it was possible to assemble a sequence of amino acids, in this case (using the one-letter code) YWS. Database searching was then possible using the peptide 1778 Da, the position of the lower m/z Y ion (1116.8), the proposed amino acid sequence (YWS) and the higher Y ion at m/z 1553.6. This provides a sequence tag, which is written as (1116.8) YWS (1553.6). A search of the SWISS-PROT database (Fig. 8.7), showed just two ‘hits’ from 40 000 entries, suggesting the protein is glyceraldehyde-3-phosphate dehydrogenase. The full sequence of this peptide is LISWYDNEYGYSNR and the MS/MS fragmentation data give a perfect match. Other peptides in the sample can also be analysed in the same manner, confirming the identity of the protein. A further development of 2-D PAGE has been the introduction of difference gel electrophoresis (DIGE). This again allows the comparison of protein components of similar mixtures, but has the advantage that only one 2-D gel has to be run rather than two. In this method the two samples to be compared are each treated with one of two different, yet structurally very similar, fluorescent dyes (cy3 and cy5). Each dye reacts with amino groups, so that each protein is fluorescently labelled by the dye binding to lysine residues and the N-terminal amino groups. The two protein solutions to be compared are then mixed and run on a single 2-D gel. Thus every protein in one

345

8.5 Proteomics and protein function

sample superimposes with its differentially labelled identical counterpart in the other sample. Scanning of the gel at two different wavelengths that excite the two dye molecules reveals whether any individual spot is associated with only one dye molecule rather than two. Most spots will, of course, fluoresce at both wavelengths, but if a spot is associated with only one dye molecule then this tells us that that protein can have been present in only one of the extracts, and the wavelength at which it fluoresces tells you which extract it was originally in.

8.5.2 Isotope-coded affinity tags (ICAT) Isotope-coded affinity tags (ICAT) uses mass spectrometry (rather than 2-D gels) to identify differences in the protein content of two complex mixtures. For example, the method can be used to identify protein differences between tumour and normal tissue, in the same way that 2-D PAGE can be used to address the same question (Section 8.5.1). This method uses two protein ‘tags’ that, whilst being in every other respect identical, differ slightly in molecular mass; hence one is ‘heavy’ and one is ‘light’. Both contain (a) a chemical group that reacts with the amino acid cysteine, and (b) a biotin group. In both molecules these groups are joined by a linker region, but in one case the linker contains eight hydrogen atoms, in the other, eight deuterium atoms; one molecule (tag) is thus heavier than the other by 8 Da (see Fig. 9.26). One cell extract (e.g. from cancer cells) is thus treated with one tag (which binds to cysteine residues in all the proteins in the extract) and the second tag is used to treat the second extract (e.g. from normal cells). Both extracts are then treated with trypsin to produce mixtures of peptides, those peptides that contain cysteine having been ‘tagged’. The two extracts are then combined and an avidin column used to affinity-purify the labelled peptides by binding to the biotin moiety. When released from the column this mixture of labelled peptides will contain pairs of identical peptides (derived from identical proteins) from the two cell extracts, each pair differing by a mass of 8 Da. Analysis of this peptide mixture by liquid chromatography–MS will then reveal a series of peptide mass signals, each one existing as a ‘pair’ of signals separated by eight mass units. These data will reveal the relative abundance of each peptide in the pair. Since most proteins present in the two samples originally being compared will be present at much the same levels, most peptide pairs will have equal signal strengths. However, for proteins that exist in greater or lesser amounts in one of the extracts, different signal strengths will be observed for each of the peptides in the pair, reflecting the relative abundance of this protein in the two samples. Further analysis of either of these pairs via tandem mass spectrometry will provide some sequence data that should allow the protein to be identified. ICAT is discussed in more detail in Section 9.6.2.

8.5.3 Determining the function of a protein Successfully applied, the methods described in the preceding section will have provided the amino acid sequence (or partial sequence) of a protein of interest. The next step is to identify the function and role of this protein. The first step is invariably to search the databases of existing protein sequences to find a protein or proteins that have sequence homology with the protein of interest (the homology method). This is

346

Protein structure, purification, characterisation and function analysis

done using programs such as BLAST and PSI-BLAST. If sequence homology is found with a protein of known function, either from the same or different species, then this invariably identifies the function of the protein. However, this approach does not always work. For example, when the genome of the yeast Saccharomyces cerevisiae was completely sequenced in 1996, 6000 genes were identified. Of these, approximately 2000 coded for proteins that were already known to exist in yeast (i.e. had been purified and studied in previous years), 2000 had homology with known sequences and hence their function could be deduced by the homology method but 2000 could not be matched to any known genes, i.e. they were ‘new’, previously undiscovered genes. In these cases, there are a number of other computational methods that can be used to help to identify the protein’s function. These include:









Phylogenic profile method: This method aims to identify any other protein(s) that has the same phylogenic profile (i.e. the same pattern of presence or absence) as the unknown protein, in all known genomes. If such proteins are found it is inferred that the unknown protein is involved in the same cellular process as these other protein(s) (i.e. they are said to have a functional link) and will give a strong clue as to the function of the unknown protein. This method is based on the premise that two proteins would not always both be inherited into a new species (or neither inherited) unless the two proteins have a functional link. At the time of writing there are over 100 published genome sequences that can be surveyed with this method. Fig. 8.8 shows a simple, hypothetical example, where just five genomes are analysed. Method of correlated gene neighbours: If two genes are found to be neighbours in several different genomes, a functional linkage may be inferred between the two proteins. The central assumption of this approach is based on the observation that functionally related genes in prokaryotes tend to be linked to form operons (e.g. the lac operon). Although operons are rare in eukaryotic species, it does appear that proteins involved in the same biological process/pathway within the cell have their genes situated in close proximity (e.g. within 500 bp) in the genome. Thus, if two genes are found to be in close proximity across a number of genomes, it can be inferred that the protein products of these genes have a functional linkage. This method is most robust for microbial genomics but works to some extent in human cells where operon-like clusters are also observed. As an example, this method correctly identified a functional link between eight enzymes in the biosynthetic pathway for the amino acid arginine in Mycobacterium tuberculosis. Analysis of fusion: This method is based on the observation that two genes may exist separately in one organism, whereas the genes are fused into a single multifunctional gene in another organism. The existence of the protein product of the fused gene, in which the two functions of the protein clearly interact (being part of the same protein molecule), suggests that in the first organism the two separate proteins also interact. It has been suggested that gene fusion events occur to reduce the regulational load of multiple interacting gene products. Protein–protein interactions: A further clue to identifying protein function can come from identifying protein–protein interactions, and methods to identify these are described in the next section.

347

8.5 Proteomics and protein function

A

B

C

D

E

P1

1

1

1

0

0

P2

0

0

1

1

1

P3

1

0

1

1

0

P4

0

1

1

0

1

P5

1

1

0

0

1

P6

0

1

1

0

1

P7

1

0

0

1

0

P8

1

0

1

1

0

Fig. 8.8 Phylogenic profile method. Five genomes, A–E, are shown (e.g. E. coli, S. cerevisiae, etc.). The presence (1) or absence (0) of eight proteins (P1–P8) in each of these genomes is shown. It can be seen that proteins P3 and P8 have the same phylogenic profile and therefore may have a functional linkage. P4 and P6 are similarly linked.

8.5.4 Protein–protein interactions Given the complex network of pathways that exist in the cell (signalling pathways, biosynthetic pathways, etc.), it is clear that all proteins must interact with other molecules to fulfil their role. Indeed, it is now apparent that proteins do not exist in isolation in the cell; proteins involved in a common pathway appear to exist in a loose interaction, sometimes referred to as a biomodule. Therefore, if one can identify an interaction between our unknown protein and a well-characterised protein, it can be inferred that the former has a function somehow related to the latter. For example, if the unknown protein is shown to interact with one or more proteins involved in the biosynthetic pathways for arginine, then this strongly suggests that the unknown protein is also involved in this pathway. Using this approach networks of

348

Protein structure, purification, characterisation and function analysis

interacting proteins are being identified in individual organisms. This has led to the development of the Database of Interacting Proteins (DIP), which can be found at http://dip.doe-mbi.ucla.edu>. Given the current fad for inventing new words ending in ‘ome’, some refer to these maps of protein interactions as the interactome. One of the most widely used, and successful, methods for investigating protein– protein interaction is the yeast two-hybrid (Y2H) system, which exploits the modular architecture of transcription factors. A transcription factor gene (GAL4) is split into the coding regions for two domains, a DNA-binding domain and a trans-activation domain. Both these domains are expressed, each linked to a different protein (one being the unknown protein, the other a protein with which it may interact), in separate yeast cells, which are then mated to produce diploid cells (the two proteins being studied are often referred to as the bait and prey). If, in this diploid cell, the bait and prey proteins bind to each other, they will bring together the two domains of the transcription factor, which will then be active and will bind to the promoter of a reporter gene (e.g. the his gene), inducing its expression. Identification of cells expressing the reporter gene product is evidence that the bait and prey proteins interact. In practice, following mating, diploids are selected on deficient medium (in this case, medium deficient in histidine), thus only yeast cells expressing interacting proteins survive (as they are capable of synthesising histidine). Once such a positive interaction is identified, the two interacting open reading frames (ORFs) are simply identified by sequencing a small part of the protein gene. Using this approach, all 6000 ORFs from S. cerevisiae were individually cloned as both bait and prey. When the pool of 6000 prey clones was screened against each of the 6000 bait clones, 691 interactions were identified, only 88 of which were previously known. This therefore gave an indication of the function of over 600 proteins whose function was previously unknown. On a much larger scale, the same approach was used to identify protein–protein interactions in the fruit fly, Drosophila melanogaster. All 14 000 predicted D. melanogaster ORFs were amplified using the polymerase chain reaction (PCR) and each cloned into two-hybrid bait and prey vectors. A total of 45 417 two-hybrid positive colonies were obtained, from which 10 021 protein interactions involving 4500 proteins were obtained. The yeast twohybrid system is described in greater detail in Section 6.8.3. >

8.5.5 Protein arrays A newly developing area for studying protein–protein interactions is the use of protein arrays (chips). Although the basic principle for screening and identifying interacting molecules is much the same as for DNA arrays (Section 6.8.8), the production of protein arrays is more technically demanding owing mainly to the difficulty of binding proteins to a surface and ensuring that the protein is not denatured at any stage of the assay procedure. In a protein array, proteins are immobilised as small spots (150–200 mm) onto a solid support (typically glass or a nitrocellulose membrane), using high precision contact printing (not unlike a dot-matrix printer) at a spot density of the order of 1500 spots cm2. A solution of the protein of unknown function is then incubated on

349

8.5 Proteomics and protein function

the array surface for a period of time, then washed off, and the position(s) where the protein has bound, identified (see below). Since it is known which protein was immobilised in each position of the chip, each pair of interacting proteins can be identified. Saccharomyces cerevisiae again provides a good example of the successful use of this technology where a protein array was used to identify yeast proteins that bind to the protein calmodulin (an important protein involved in calcium regulation). Five thousand eight hundred yeast ORFs were cloned into a yeast high copy expression vector, and each of the expressed proteins purified. Each protein was then spotted at high density onto nickel-coated glass microscope slides. Since each protein also contained a (His)6-Tag (which binds to nickel) introduced at the C terminus, proteins were attached to the surface in an orientated manner, the C terminus being linked to the nickel-coated glass through the (His)6 sequence, while the rest of the molecule was therefore suitably orientated away from the surface of the array to be available for interaction with another protein. The array was then incubated in a solution of calmodulin that had been labelled with biotin. The calmodulin was then washed off and the positions where calmodulin had bound to the array were identified by incubating the array with a solution of fluorescently labelled avidin (the protein avidin binds strongly to the small-molecular-mass vitamin biotin: see Section 10.3.8). The use of ultraviolet light thus identified fluorescence where the screening molecules had bound. In total, 33 new proteins that bind calmodulin were discovered in this way. Figure 8.9 (see also colour section) shows an interaction map of the yeast proteome. The authors constructed the map from published data on protein–protein interactions in yeast. The map contains 1584 proteins and 2358 interactions. Proteins are coloured according to their functional role, e.g. proteins involved in membrane fusion (blue), lipid metabolism (yellow), cell structure (green), etc. If one views the electronic version of this publication it is possible for the reader to zoom in and search for protein names and to read interactions more clearly. Figure 8.10 (see also colour section) is a summary of Fig. 8.9 showing the number of interactions of proteins from each functional group with proteins of their own and other groups. The word function means the cellular role of the protein. Numbers in parentheses indicate, first, the number of interactions within a group and, secondly, the number of proteins within a group. Numbers on connecting lines indicate the numbers of interactions between proteins of the two connected groups. For example, in the upper left-hand corner, there are 77 interactions between the 21 proteins involved in membrane fusion and 141 proteins involved in vesicular transport. Looking at the bottom right of the diagram it can be seen that some proteins involved in RNA processing/modification not surprisingly also interact with proteins involved in RNA turnover, RNA splicing, RNA transcription and protein synthesis.

8.5.6 Systems biology It can be seen from the section on proteomics that the study of proteins is moving away from methods that involve the purification and study of individual proteins. Nowadays proteins are more likely to be studied as a stained spot on a complex 2-D gel pattern, often present in as little as nanogram amounts, more often than not using

350

Protein structure, purification, characterisation and function analysis

Fig. 8.9 An interaction map of the yeast proteome, assembled from published interactions (see text for details). (Courtesy of Benno Schwikowski, Peter Uetz and Stanley Fields. Reprinted with the permission of Nature Publishing Group.) (See also colour plate.)

analytical techniques such as mass spectrometry (see Chapter 9) and invariably requiring the interrogation of protein and genome sequence data on the Web (bioinformatics, Section 5.8). It is then necessary to determine which other proteins interact with the protein being studied. Proteomics is thus moving us away from studying proteins in isolation and encouraging us to consider the proteins in the cell as part

351

8.6 Suggestions for further reading

Amino acid metabolism (23/68) 20 Meiosis (17/55) 20 DNA synthesis (41/50) Mitosis (81/75) 97 77 Protein degradation (77/84) 110 19 101 20 Vesicular transport (141/141) 18 21 39 36 19 Cell cycle control 15 Recombination (9/28) 22 38 (90/113) Cell structure (39/54) 24 20 23 17 52 Cell polarity (54/52) 27 19 15 19 30 Mating response Protein 47 DNA repair (37/65) (41/66) Protein folding (18/32) modification 27 15 19 (28/65) 25 16 Protein synthesis (54/89) Cytokinesis (18) Differentiation Chromatin/chromosome 21 (4/20) structure (72/102) Protein translocation (51/54) 32 31 45 30 26 Membrane fusion (23/21)

18 26

19

19

27

24

24 98

20

RNA processing/modification (117/132)

Nuclear–cytoplasmic transport 32 136 28 (106/56) Signal transduction (42/66) RNA turnover Lipid/fatty acid and 29 (9/16) sterol metabolism (18/27) 23 18 RNA splicing (65/65) Pol II transcription (184/177) 18 Cell stress (27/75) 20 Pol I transcription (9/17) 19 25 32 Pol III transcription (14/21) Carbohydrate metabolism (30/78)

Fig. 8.10 A simplification of Fig. 8.9 identifying interactions between functional groups of proteins (see text for details). (Courtesy of Benno Schwikowski, Peter Uetz and Stanley Fields. Reprinted with the permission of Nature Publishing Group.) (See also colour plate.)

of a dynamic interacting system. This has led to the development of the concept of systems biology, which can be defined as the study of living organisms in terms of their underlying network structure rather than just their individual molecular components. Since systems biology requires a study of all interacting components in the cell the new high throughput and quantitative.

8.6 SUGGESTIONS FOR FURTHER READING Cutler, P. (2004). Protein Purification Protocols. Totowa, NJ: Humana Press. (Detailed theory and practical procedures for a range of protein purification techniques.) Walker, J. M. (2005). Proteomics Protocols Handbook. Totowa, NJ: Humana Press. (Theory and techniques of a spectrum of methods applied to proteomics.) Nedelkov, D. (2006). New and Emerging Proteomics Techniques. New York: Humana Press. (In-depth details of a range of proteomics techniques.) Thompson, J. D. (2008). Functional Proteomics. New York: Humana Press. (Comprehensive coverage of functional proteomics including protein analysis and mass spectrometry.) Simpson, R. J., Adams, P. D. and Golemis, E. A. (2008). Basic Methods in Protein Purification and Analysis: A Laboratory Manual. New York: CSH Press. (A comprehensive collection of protein purification methods.)

9

Mass spectrometric techniques A. AITKEN

9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8

Introduction Ionisation Mass analysers Detectors Structural information by tandem mass spectrometry Analysing protein complexes Computing and database analysis Suggestions for further reading

9.1 INTRODUCTION 9.1.1 General Mass spectrometry (MS) is an extremely valuable analytical technique in which the molecules in a test sample are converted to gaseous ions that are subsequently separated in a mass spectrometer according to their mass-to-charge (m/z) ratio and detected. The mass spectrum is a plot of the (relative) abundance of the ions at each m/z ratio. Note that it is the mass to charge ratios of ions (m/z) and not the actual mass that is measured. If for example, a biomolecule is ionised by the addition of one or more protons (Hþ ions) the instrument measures the m/z after addition of 1 Da for each proton if the instrument is measuring positive ions or m/z minus 1 Da for each proton lost if measuring negative ions. The development of two ionisation techniques, electrospray (ESI) and matrix-assisted laser desorption/ionisation (MALDI), has enabled the accurate mass determination of high-molecular-mass compounds as well as low-molecular-mass molecules and has revolutionised the applicability of mass spectrometry to almost any biological molecule. Applications include the new science of proteomics as well as in drug discovery. The latter includes combinatorial chemistry where a large number of similar molecules (combinatorial libraries) are produced and analysed to find the most effective compounds from a group of related organic chemicals. Mr is sometimes used to designate relative molar mass. Molecular weight (which is a force not a mass) is also frequently and incorrectly used. Mr is a relative measure and 352

353

9.1 Introduction

has no units. However, Mr is numerically equivalent to the mass, M, which does have units and the Dalton is frequently used (see Section 1.2.2). The essential features of all mass spectrometers are therefore:

• • • •

production of ions in the gas phase; acceleration of the ions to a specific velocity in an electric field; separation of the ions in a mass analyser; and detection of each species of a particular m/z ratio. The instruments are calibrated with standard compounds of accurately known Mr values. In mass spectrometry the carbon scale is used with 12C ¼ 12.000000. This level of accuracy is achievable in high-resolution magnetic sector double-focussing, accelerator mass spectrometers and Fourier transform mass spectrometers (Sections 9.3.5, 9.3.6 and 9.3.13). The mass analyser may separate ions either by use of a magnetic or an electrical field. Alternatively the time taken for ions of different masses to travel a given distance in space is measured accurately in the time-of-flight (TOF) mass spectrometer (Section 9.3.8). Any material that can be ionised and whose ions can exist in the gas phase can be investigated by MS, remembering that very low pressures, i.e. high vacuum, in the region of 106 Torr are required (Torr is measure of pressure which equals 1 mm of mercury (133.3 Pa; atmospheric pressure is 760 Torr)). The majority of biological MS investigations on proteins, oligosaccharides and nucleic acids is carried out with quadrupole, quadrupole–ion trap and TOF mass spectrometers. In the organic chemistry/biochemistry area of analysis, the well-established magnetic sector mass spectrometers still find wide application and their main principles will also be described. The treatment of mass spectrometry in this chapter will be strictly non-mathematical and non-technical. However, the intention is to give an overview of the types of instrumentation that will be employed, the main uses of each, complementary techniques and advantages/disadvantages of the different instruments and particular applications most suited to each type. Data analysis and sample preparation to obtain the best sensitivity for a particular type of compound will also be covered.

9.1.2 Components of a mass spectrometer All mass spectrometers are basically similar (Fig. 9.1). They consist of the following:

• • • • •

A high vacuum system (106 torr or 1 mtorr): These include turbomolecular pumps, diffusion pumps and rotary vane pumps. A sample inlet: This comprises a sample or target plate; a high-performance liquid chromatography (HPLC), gas chromatography (GC) or capillary electrophoresis system; solids probe; electron impact or direct chemical ionisation chamber. An ion source (to convert molecules into gas-phase ions): This can be MALDI; ESI; fast atom bombardment (FAB); electron impact or direct chemical ionisation. A mass filter/analyser: This can be: TOF; quadrupole; ion trap; magnetic sector or ion cyclotron Fourier transform (the last is also actually a detector). A detector: This can be a conversion dynode, electron multiplier, microchannel plate or array detector.

354

Mass spectrometric techniques

High vacuum system

Inlet

Ion source

Mass filter

Detector

Data system

Fig. 9.1 Basic components of mass spectrometers.

9.1.3 Vacuum system All mass analysers operate under vacuum in order to minimise collisions between ions and air molecules. Without a high vacuum, the ions produced in the source will not reach the detector. At atmospheric pressure, the mean free path of a typical ion is around 52 nm; at 1 mtorr, it is 40 mm; and at 1 mtorr, it is 40 m. In most instruments, two vacuum pump types are used, e.g. a rotary vane pump (to produce the main reduction in pressure) followed by a turbomolecular pump or diffusion pump to produce the high vacuum. The rotary vane pump can be an oil pump to provide initial vacuum (approximately 1 torr), while the turbomolecular pump provides working high vacuum (1 mtorr to 1 ntorr). This is a high-speed gas turbine with interspersed rotors (moving blades) and stators (i.e. fixed or stationary blades) whose rotation forces molecules through the blade system.

9.2 IONISATION Ions may be produced from a neutral molecule by removing an electron to produce a positively charged cation, or by adding an electron to form an anion. Both positiveand negative-ion mass spectrometry may be carried out but the methods of analysis in the following sections will be described mainly for positive-ion MS, since this is more common and the principles of separation and detection are essentially the same for both types of ion.

9.2.1 Electron impact ionisation (EI) Electron impact ionisation (EI) is widely used for the analysis of metabolites, pollutants and pharmaceutical compounds, for example in drug testing programmes. Electron impact (EI) has major applications as a mass detector for gas chromatography (GC/MS, Section 11.9.3). A stream of electrons from a heated metal filament is accelerated to 70 eV potential (the electron volt, eV, is a measure of energy). Sample ionisation occurs when the electrons stream across a high vacuum chamber into which molecules of the substance to be analysed (analyte) are allowed to diffuse (Fig. 9.2). Interaction with the analyte results in either loss of an electron from the

355

9.2 Ionisation

Heater/sample vaporiser Sample insertion Source chamber Ion repeller, +4 kV Filament Electron beam + 70 eV Ions

Ion focussing

Mass analyser and detector

Fig. 9.2 Electron impact source. Electrons are produced by thermionic emission from a filament of tungsten or rhenium. The filament current is typically 0.1 mA. Electrons are accelerated toward the ion source chamber (held at a positive potential equal to the accelerating voltage) and acquire an energy equal to the voltage between the filament and the source chamber, typically 70 eV. The electron trap is held at a fixed positive potential with respect to the source chamber. Gaseous analyte molecules are introduced into the path of the electron beam where they are ionised. Owing to the positive ion repeller voltage and the negative excitation voltage that produce an electric field in the source chamber, the ions leave the source through the ion exit slit and are analysed.

substance (to produce a cation) or electron capture (to produce an anion). The analyte must be in the vapour state in the electron impact source, which limits the applicability to biological materials below ca. 400 Da. Before the advent of electrospray and MALDI, the method did have some applicability to peptides, for example, whose volatility could be increased by chemical modification. A large amount of fragmentation of the sample is common, which may or may not be desirable depending on the information required. Chemical bonds in organic molecules are formed by the pairing of electrons. Ionisation resulting in a cation requires loss of an electron from one of these bonds (effectively knocked out by the bombarding electrons), but it leaves a bond with a single unpaired electron. This is a radical as well as being a cation and hence the representation as M.þ, the (þ) sign indicating the ionic state and the (.) a radical. Conversely, electron capture results both in an anion but also the addition of an unpaired electron and therefore a negatively charged radical, hence the symbol M.. Such radical ions are termed molecular ions, parent ions or precursor ions and under the conditions of electron bombardment are relatively unstable. Their energy in excess of that required for ionisation has to be dissipated. This latter process results in the

356

Mass spectrometric techniques

precursor ion disintegrating into a number of smaller fragment ions that may be relatively unstable and further fragmentation may occur. This gives rise to a series of daughter ions or product ions, which are recorded as the mass spectrum. For the production of a radical cation, as it is not known where either the positive charge or the unpaired electron actually reside in the molecule, it has been the practice to place the dot signs outside the abbreviated bracket sign,‘e’. The recent recommendation by IUPAC for mass spectrometry notation is to write the sign first followed by the superscripted dot, i.e. Mþ. or M.. When the precursor ion fragments, one of the products carries the charge and the other the unpaired electron, i.e. it splits into a radical and an ion. The product ions are therefore true ions and not radical ions. The radicals produced in the fragmentation process are neutral species and therefore do not take any further part in the mass spectrometry but are pumped away by the vacuum system. Only the charged species are accelerated out of the source and into the mass analyser. It is also important to recognise that almost all possible bond breakages can occur and any given fragment will arise both as an ion and a radical. The distribution of charge and unpaired electron, however, is by no means equal. The distribution depends entirely on the thermodynamic stability of the products of fragmentation. Furthermore, any fragment ion may break down further (until single atoms are obtained) and hence not many ions of a particular type may survive, resulting in a low signal being recorded. A simple example is given by n-butane (CH3CH2CH2CH3) and some of the major fragmentations are shown Fig. 9.3.a. The resultant EI spectrum is shown in Fig. 9.3b.

9.2.2 Chemical ionisation Chemical ionisation (CI) is used for a range of samples similar to those for EI. It is particularly useful for the determination of molecular masses, as high intensity molecular ions are produced due to less fragmentation. CI therefore gives rise to much cleaner spectra. The source is essentially the same as the EI source but it contains a suitable reagent gas such as methane (CH4) or ammonia (NH3) that is initially ionised by EI. The high gas pressure in the source results in ion–molecule reactions between þ reagent gas ions (such as NHþ 3 and CH4 ) some of which react with the analyte to produce analyte ions. The mass differences from the neutral parent compounds therefore correspond to these adducts.

9.2.3 Fast atom bombardment (FAB) At the time of its development in the early 1980s, fast atom bombardment (FAB) revolutionised MS for the biologist. The important advance was that this soft ionisation technique, which leads to the formation of ions with low internal energies and little consequent fragmentation, permitted analysis of biomolecules in solution without prior derivatisation. The sample is mixed with a relatively involatile, viscous matrix such as glycerol, thioglycerol or m-nitrobenzyl alcohol. The mixture, placed on a probe, is introduced into the source housing and bombarded with an ionising beam of neutral

9.2 Ionisation

(a)

CH3CH2CH2CH3 +e– ¬+.

CH3CH2CH2CH3 .

¬+

CH3CH2CH2 + CH3 m/z = 15

.

.

CH3 + CH3CH2CH2 m/z = 43

¬+

CH3CH2 + CH3CH2 m/z = 29

CH2¬+

CH3CHCH m/z = 41

¬+

CH

¬+

CH3CH2 m/z = 29

CH2CH m/z = 27

¬+

(b)

¬+

¬+

¬+

¬+

C

¬+

CH3 m/z = 15

CH2 m/z = 14

etc.

etc.

100

43

90 80

% relative abundance

357

70 60 28 50

29 27

40 42 30

41

20

58 32

10

15

0 10

20

30

40

50

60

m/z

Fig. 9.3 Fragmentation pathways in n-butane and the EI spectrum. The pathway for fragmentation of n-butane is shown in (a) and the EI spectrum in (b). In the spectrum, the relative abundance is plotted from 0 to 100% where the largest peak is set at 100% (base peak). Spectra represented in this way are said to be normalised.

atoms (such as Ar, He, Xe) of high velocity. A later development was the use of a beam of caesium (Csþ) ions and the term liquid secondary ion mass spectrometry (LSIMS) was introduced to distinguish this from FAB–MS. Pseudomolecular ion species arise as either protonated or deprotonated entities (M þ H)þ and (M  H) respectively, which allows

358

Mass spectrometric techniques

positive and negative ion mass spectra to be determined. The term pseudomolecular implies the mass of the ion formed from a substance of a given mass by the gain or loss of one or more protons. Other charged adducts can also be formed such as (M þ Na)þ and (M þ K)þ.

9.2.4 Electrospray ionisation (ESI) This involves the production of ions by spraying a solution of the analyte into an electrical field. This is a soft ionisation technique and enables the analysis of large intact (underivatised) biomolecules, such as proteins and DNA. The electrospray (ES) creates very small droplets of solvent-containing analyte. The essential principle in ES is that a spray of charged liquid droplets is produced by atomisation or nebulisation. Solvent (typically 50 : 50 water and organic solvent) is removed as the droplets enter the mass spectrometer. ESI is the result of the strong electric field (around 4 keV at the end of the capillary and 1 keV at the counter electrode) acting on the surface of the sample solution. As the solvent evaporates in the high-vacuum region, the droplet size decreases and eventually charged analyte (free of solvent) remains. Ionisation can occur at atmospheric pressure and this method is also sometimes referred to as atmospheric pressure ionisation (API). The concentration of sample is usually around 1–10 pmol mm3. Typical solvents are 50/50 acetonitrile (or methanol)/H2O with 1% acetic acid or 0.1% formic acid. Ammonium hydroxide or trifluoroacetic acid (TFA, 0.02%) in 50/50 acetonitrile (or methanol)/H2O can also be used. The organic acid (or the NH4OH) aids ionisation of the analyte. At low pH, basic groups will be ionised. In the example of peptides these are the side groups of Lys, His, Arg and the N-terminal amino group. At alkaline pH the carboxylic acid side chains as well as stronger anions such as phosphate and sulphate groups will be ionised. The presence of organic solvent assists in formation of small droplets and facilitates evaporation. The flow rate into the source is normally around a few mm3 min1 although higher flow rates can be tolerated (up to 1 cm3) if the solution is an eluant from on-line HPLC for example. Smaller molecules usually produce singly charged ions but multiply charged ions are frequently formed from larger biomolecules, in contrast to MALDI, resulting in m/z ratios that are sufficiently small to be observed in the quadrupole analyser. Thus masses of large intact proteins, DNA and organic polymers can also be accurately measured in electrospray MS although the m/z limit of measurement is normally 2000 or 3000 Da. For example, proteins are normally analysed in the positive ion mode where charges are introduced by addition of protons. The number of basic amino acids in the protein (mainly lysine and arginine) determines the maximum number of charges carried by the molecule. The distribution of basic residues in most proteins is such that the multiple peaks (one for each M þ nH)nþ ion, are centred on m/z about 1000. In Fig. 9.6 a large protein with a mass of over 100 000 Da behaves as if it were multiple mass species around 1020 Da. For the species with 100 protons (Hþ) i.e. with 100 charges, z ¼ 100, m/z ¼ 1027.6 therefore (M þ 100H)100þ ¼ 1027.6. When the computer processes the data for the multiple peaks, the average for each set of peaks gives a mass determination

359

9.3 Mass analysers

Example 1 PROTEIN MASS DETERMINATION BY ESI Question A protein was isolated from human tissue and subjected to a variety of investigations. Relative molecular mass determinations gave values of approximately 12 000 by size exclusion chromatography and 13 000 by gel electrophoresis. After purification, a sample was subjected to electrospray ionisation mass spectrometry and the following data obtained. m/z

773.9

825.5

884.3

952.3

1031.3

Abundance (%)

59

88

100

66

37

Given that n2 ¼ (m1  1)/(m2  m1) and M ¼ n2 (m2  1) and assuming that the only ions in the mixture arise by protonation, deduce an average molecular mass for the protein by this method.

Answer Mr by exclusion chromatography ¼ 12 000 Mr by gel electrophoresis ¼ 13 000 Taking ESI peaks in pairs: m1  1

m2  m1

n2

m2  1

M (Da)

z

951.3 883.3 824.5 772.9

79.0 68.0 58.8 51.6

12.041 12.989 14.022 14.978

1030.3 951.3 883.3 824.5

12406.6 12357.1 12385.7 12349.9

12 13 14 15

SM ¼ 49 499.3 Da Mean M ¼ 12 374.8 Da Note: Relative abundance values are not required for the determination of the mass. to high accuracy. The peaks can be deconvoluted and presented as a single peak representing the Mr (in this example M ¼ 102 658). A diagrammatic representation of the ESI source is shown in Fig. 9.4. A curtain or sheath gas (usually nitrogen) around the spray needle at a slow flow rate may be used to assist evaporation of the solvent at or below room temperature. This may be an advantage for thermally labile compounds.

9.3 MASS ANALYSERS 9.3.1 Introduction Once ions are created and leave the ion source, they pass into a mass analyser, the function of which is to separate the ions and to measure their masses. (Remember, what

360

Mass spectrometric techniques

Heated capillary

Glass capillary ESI needle ± 5 kV

Solvent evaporation Mass analyser

Sample solution

Fig. 9.4 Electrospray ionisation source. The ESI creates very small droplets of solvent-containing analyte by atomisation or nebulisation as the sample is introduced into the source through the fine glass (or other material) hollow needle capillary. The solvent evaporates in the high-vacuum region as the spray of droplets enters the source. As the result of the strong electric field acting on the surface of the sample droplets, and electrostatic repulsion, their size decreases and eventually single species of charged analyte (free of solvent) remain. These may have multiple charges depending on the availability of ionisable groups.

is really measured is the mass-to-charge ratio (m/z) for each ion.) At any given moment, ions of a particular mass are allowed to pass through the analyser where they are counted by the detector. Subsequently, ions of a different mass are allowed to pass through the analyser and again the detector counts the number of ions. In this way, the analyser scans through a large range of masses. In the majority of instruments, a particular type of ionisation is coupled to a particular mass analyser that operates by a particular principle. That is, EI, CI and FAB are combined with magnetic sector instruments; ESI and its derivatives with quadrupole (or its variant ion-trap) and MALDI is coupled to TOF detection.

9.3.2 Quadrupole mass spectrometry The quadrupole analyser consists of four parallel cylindrical rods (Fig. 9.5). A direct current (DC) voltage and a superimposed radio frequency (RF) voltage are applied to each rod, creating a continuously varying electric field along the length of the analyser. Once in this field, ions are accelerated down the analyser towards the detector. The varying electric field is precisely controlled so that during each stage of a scan, ions of one particular mass-to-charge ratio pass down the length of the analyser. Ions with any other mass-to-charge value impact on the quadrupole rods and are not detected. By changing the electric field (scanning), the ions of different m/z successively arrive at the detector. Quadrupoles can routinely analyse up to m/z 3000, which is extremely useful for biological MS since, as we have seen, proteins and other biomolecules normally give a charge distribution of m/z that is centred below this value (see Fig. 9.6). Note that hexapole and octapole devices are also used, to direct a beam into the next section of a triple quadrupole or into the ion trap for example, but the principle is the same.

361

9.3 Mass analysers

Quadrupole rods

Detector

Ion that is unsuccessful at reaching detector



RF supply

+

Ions

– DC supply +

Fig. 9.5 Quadrupole analyser. The fixed (DC) and oscillating (RF) fields cause the ions to undergo complicated trajectories through the quadrupole filter. For a given set of fields, only certain trajectories are stable, which only allows ions of specific m/z to travel through to the detector. The efficiency of the quadrupole is impaired after a build-up of ions that do not reach the detector. Therefore a set of pre-filters is added to the quadrupole to remove the ions that would otherwise affect the main quadrupole. A100 1027.6 A97 1059.4 A94 1093.1 A106 988.0

100

A106 969.4

M = 102658.3 ± 7.0

Relative ion intensity (%)

A86 1154.6

%

A86 1194.6

A80 1284.0 A78 1317.3 A76 1352.1

0 900

1000

1100

1200 m/z

1300

1400

1500

Fig. 9.6 Large intact protein mass accurately measured in electrospray MS. The species of ions are annotated by the charge state, e.g. with 99, 100, 101 charges, etc., and the associated m/z value. The inset shows the ‘deconvoluted spectrum’.

362

Mass spectrometric techniques

End electrodes 1

2

From ion source and quadrupole

Detector

Ring electrode 3

4

5

Detector

Fig. 9.7 Diagram of an ion trap. The ion trap contains three hyperbolic electrodes which form a cavity in a cylindrical device of around 5 cm diameter in which the ions are trapped (stored) and subsequently analysed. Each end-cap electrode has a small hole in the centre. Ions produced from the source enter the trap through the quadrupole and the entrance end-cap electrode. Potentials are applied to the electrodes to trap the ions (diagrams 1 and 2). The ring electrode has an alternating potential of constant radio frequency but variable amplitude. This results in a three-dimensional electrical field within the cavity. The ions are trapped in stable oscillating trajectories that depend on the potentials and the m/z of the ions. To detect these ions, the potentials are varied, resulting in the ion trajectories becoming unstable and the ions are ejected in the axial direction out of the trap in order of increasing m/z into the detector. A very low pressure of helium is maintained in the trap, which ‘cools’ the ions into the centre of the trap by low-speed collisions that normally do not result in fragmentation. These collisions merely slow the ions down so that during scanning, the ions leave quickly in a compact packet, producing narrower peaks with better resolution. In sequencing, all the ions are ejected except those of a particular m/z ratio that has been selected for fragmentation (see diagrams 3, 4 and 5). The steps are: (3) selection of precursor ion, (4) collision-induced dissociation of this ion, and (5) ejection and detection of the fragment ions.

9.3.3 Ion trap mass spectrometry Ion trap mass spectrometers use ESI to produce ions, all of which are transferred into and subsequently measured almost simultaneously (within milliseconds) in a device called an ion trap (Fig. 9.7). The trap must then be refilled with the ions that are arriving from the source. Therefore, although the trap does not measure 100% of all ions produced (it depends on the cycle time to refill the trap then analyse the ions) this results nevertheless in a great improvement in sensitivity relative to quadrupole mass spectrometers where at any given moment only ions of one particular m/z are detected. ESI–ion trap mass spectrometers have found wide application for analysis of peptides and small biomolecules such as in protein identification by tandem MS; liquid chromatography/mass spectrometry (LC/MS); combinatorial libraries and rapid analysis in drug discovery and drug development.

363

9.3 Mass analysers

O

615.32 (M+H)+

615.32

637.33 (M+Na)+

637.33

N+

N

HO

H

SO3–

615.3 (M+H)+

Full MS

638.39

OH

HO

597.30 579.37 561.21

500

m/z

639.32 653.40

699.13 773.11 654.45 767.20 720.99 785.07 755.51

600

700

O

800

597.6 (M–H2O)

579.20 597.22

561.2

597.6 (M–H2O)

(M–3H2O) HO

430.27 394.25

394.24 260.82

300

OH

448.27

412.32

O

448.3

293.08

SO3–

H

579.6 (M–2H2O)

MS2

N+

N

HO

561.19

393.41 337.01

400

490.21 449.02

560.17

500

m/z

N

HO

598.09 650.14

600

700

H

800

448.3

MS3of 561.2

MS3of 579.6 561.21 561.2

394.24 394.29

HO

OH

394.24

O

412.4 412.31

N H

0

319.10 294.97 393.61 293.19

300

400

319.24 355.35

500

600

m/z

700

800

300

413.00

400

562.07

500

600

700

800

394.29 (–3H2O)

m/z

Fig. 9.8 Structural analysis, MSn in an ion trap. In this example, of a steroid-related compound, the structure can be analysed when the (M þ H)þ ions at 615.3 are selected to be retained in the ion trap. These ions are subjected to collision-induced dissociation (CID) resulting in loss of the aliphatic sulphonate from the quaternary ammonium group and partial loss of some hydroxyl groups in the tandem MS (MS2) experiment. The major fragment ions (561.2 and 579.6) are further selected for CID (MS3) resulting in subsequent losses of more hydroxyl groups from specific parts of the steroid ring.

Ion trap MS permits structural information to be readily obtained (and sequence information in the case of polypeptides). Not only can MS–MS analysis be carried out but also due to the high efficiency of each stage, further fragmentation of selected ions may be carried out to MS to the power n (MSn) (Fig. 9.8). The instrument still allows accurate molecular mass determination to over 100 000 Da at greater than 0.01% mass accuracy. The MSn procedure in an ion trap involves ejecting all ions that are stored in the trap, except those corresponding to the selected m/z value. To perform tandem MS (MS2) a collision gas is introduced (a low pressure of helium) and collision-induced dissociation (CID) occurs (Fig. 9.7). The fragment ions are then ejected in turn and the fragment spectrum determined. The process can be repeated successively where all the fragment ions stored in the trap except those fragment ions corresponding to another selected m/z value are ejected. This fragment ion can then be further fragmented to obtain more structural information, as illustrated for the example shown in Fig. 9.8. This technique has a big advantage since no additional mass spectrometers

364

Mass spectrometric techniques

or collision cells are required. The limitation is sensitivity, which decreases with each MS experiment, although the claimed record in an ion trap is currently MS14.

9.3.4 Nanospray and on-line tandem mass spectrometry The sensitivity with ESI can be greatly improved with a reduction in flow rate. Nanospray is therefore the technique of choice for ultimate sensitivity when sample amounts are limited. There are two ways of achieving this. Both static and dynamic nanospray techniques are widely used. Flow rates in both nanospray techniques are in the order of tens of nm3 min1, which leads to low sample consumption and low signalto-noise ratio. Firstly, in static nanospray, glass needles are used with a very finely drawn out capillary tip (coated with gold to allow the needle to be held at the correct kV potential; see Fig. 9.4). The needles are filled with 1–2 mm3 of sample and accurately positioned at the entrance to the source. Closed-circuit television (CCTV) is used to determine accurately the position of the capillary. The solution is drawn into the source by electrostatic pressure, although a low pressure may be applied with an airfilled syringe behind the other (open) end of the needle if necessary. In dynamic nanospray experiments, small-diameter microbore HPLC or capillary columns are also used to achieve separation at low flow rates. This can be combined with a stream splitter device that can further reduce flow rate (Section 11.9.3). The stream splitter can be used to divert a percentage of the solvent flow from the pump, say 99% to 99.9% to waste and allow the remainder to pass through the column. This allows for much more accurate flow rates since it is extremely difficult to directly and accurately pump at 0.5 mm3 or even 50 nm3 min1 with a high-pressure pump. Therefore one can use a pump that functions more efficiently at flow rates of 50 to 500 mm3 min1 to pass 0.5 nm3 min1 or less into the micro column. Nanospray sources are used in triple quadrupole, ion trap and hybrid MALDI instruments. Computer programs can be set up to perform tandem MS during the chromatographic separation on each component as it elutes from the column, if it gives a signal above a threshold that is set by the operator.

9.3.5 Magnetic sector analyser A magnetic sector analyser is shown diagrammatically in Fig. 9.9. The ions are accelerated by an electric field. The electric sector acts as a kinetic energy filter and allows only ions of a particular kinetic energy to pass, irrespective of the m/z. This greatly increases the resolution since the ions emerge from the electrostatic analyser (ESA) with the whole range of masses but the same velocity. A given ion with the appropriate velocity then enters the magnetic sector analyser. It will travel in a curved trajectory in the magnetic field with a radius depending on the m/z and the velocity of the ion (the latter has already been selected). Thus only ions of a particular m/z will be detected at a particular magnetic field strength. The trajectory

365

9.3 Mass analysers

m/z1 m/z2

Magnet m/z3

Detector

Electrostatic analyser Beam of ions

Ion source

Fig. 9.9 Double-focussing magnetic sector mass spectrometer. The figure shows the ‘forward geometry’ arrangement where the electrostatic analyser is before the magnetic sector (known as EB; E for electric, B for magnetic). Similar results may be obtained if the reverse geometry (BE) type is used. The radial path followed by each ion is shown by scanning the magnetic field, B, and each ion of a particular m/z can be brought into the detector slit in turn.

of the ions is through a sector of the circular poles of the magnet, hence the term magnetic sector. Figure 9.9 shows several possible trajectories for a given ion in the magnetic field. Only one set of ions will be focussed on the detector. If the field is changed, these ions will be defocussed because they will not be deflected to the correct extent. A new set of ions will be deflected and collected at the detector. By starting at either end of the magnet range the ions can be scanned from high to low mass or from low to high mass. This magnetic scanning is the most commonly used type of analysis in this instrument. Alternatively, the mass spectrum can be scanned electrically by varying the voltage, V, while holding the magnetic field, B, constant. This type of instrument is called a twosector or double-focussing mass spectrometer and resolving power to parts per million may be obtained.

9.3.6 Accelerator mass spectrometry Accelerator mass spectrometry (AMS) has proved to be extremely useful for quantifying rare isotopes and has had a major impact in archaeology (to measure 14C) and geochronology. AMS can also measure radioisotopes such as 3H, 10Be, 26Al, 36Cl and 41 Ca with attomole (1018) to zeptomole (1021) levels of sensitivity and very high precision. AMS has found increasing application in human microdosing studies in drug development. This enables metabolites to be measured in human plasma or urine after administration of low, pharmacologically relevant doses of labelled drugs. Among the many applications of AMS are long-term pharmacokinetic studies to determine low-dose and chronic effects and the analysis of molecular targets of neurotoxins (see Section 18.3.1).

366

Mass spectrometric techniques

9.3.7 Plasma desorption ionisation Plasma desorption ionisation mass spectrometry (PDMS) was the first mass spectrometer to be able to analyse proteins and other large biomolecules (although only those of relatively low Mr, less than 35 k). The technique and instruments developed are now obsolete and clearly overtaken by the much more powerful, sensitive and accurate instruments described elsewhere in this chapter. PDMS instruments are however still in use in some laboratories and research publications still appear with mass spectra obtained on this instrument. A basic understanding of the principle is therefore worth including. The source of the plasma (atomic nuclei stripped of electrons) was radioactive californium, 252Cf, and two typical emission nuclei were the 100 MeV Ba2Oþ and Tc18þ, formed by the decay of the Cf, which are ejected in opposite directions, almost collinearly and with equal velocity. This is a pulsed technique, i.e. particles are emitted at discrete time intervals and require a TOF mass detector. The plasma particle emitted in the opposite direction to that passing through the sample triggers a time counter and the desorbed sample ions are accelerated electrically and detected as for other TOF analysers (Section 9.3.8).

9.3.8 MALDI, TOF mass spectrometry, MALDI-TOF Matrix-assisted laser desorption ionisation (MALDI) produces gas phase protonated ions by excitation of the sample molecules from the energy of a laser transferred via a UV light-absorbing matrix. The matrix is a conjugated organic compound (normally a weak organic acid such as a derivative of cinnamic acid and dihydroxybenzoic acid) that is intimately mixed with the sample. Examples of MALDI matrix compounds and their application for particular biomolecules are shown in Table 9.1. These are designed to maximally absorb light at the wavelength of the laser, typically a nitrogen laser of 337 nm or a neodymium/yttrium-aluminium-garnet (Nd-YAG) at 355 nm. The sample (1–10 pmol mm3) is mixed with an excess of the matrix and dried on to the target plate, where they co-crystallise on drying. Pulses of laser light of a few nanoseconds duration cause rapid excitation and vaporisation of the crystalline matrix and the subsequent ejection of matrix and analyte ions into the gas phase (Fig. 9.10). This generates a plume of matrix and analyte ions that are analysed in a TOF mass analyser. The particular advantage of MALDI is the ability to produce large mass ions, with high sensitivity. MALDI is a very soft ionisation method that does not produce abundant amounts of fragmentation compared with some other ionisation methods. Since the molecular ions are produced with little fragmentation, it is a valuable technique for examining mixtures (see Fig. 9.14 and compare this to the more complex spectrum in Fig. 9.6). TOF is the best type of mass analyser to couple to MALDI, as this technique has a virtually unlimited mass range. Proteins and other macromolecules of Mr greater than 400 000 have been accurately measured. The principle of TOF is illustrated in Fig. 9.11 and the main components of the instrument are shown in Fig. 9.12.

367

9.3 Mass analysers

Table 9.1 Examples of MALDI matrix compounds Compound

Structure

Application O

a-Cyano-4-hydroxycinnamic acid (CHCA)

Peptides <10 kDa (glycopeptides)

OH CN

HO Sinapinic acid (3,5-dimethoxy4-hydoxycinnamic acid) (SA)

O CH3

O

Proteins >10 kDa

OH

HO O CH3 ‘Super DHB’, mixture of 10% 5-methoxysalycilic acid (2-hydroxy5-methoxybenzoic acid) with DHB 2,5-Dihydroxybenzoic acid (DHB) (gentisic acid)

Proteins, glycosylated proteins

O HO

Neutral carbohydrates, synthetic polymers (oligos)

OH OH OH

3-Hydroxypicolinic acid

Oligonucleotides

O N 2,-(4-hydroxy-phenlyazo)-Benzoic acid (HABA)

C

OH

O C

Oligosaccharides, proteins

OH N

N

OH

Sample concentration for MALDI Maximum sensitivity is achieved in MALDI–TOF if samples are diluted to a particular concentration range. If the sample concentration is unknown a dilution series may be needed to produce a satisfactory sample/matrix spot of suitable concentration on the MALDI plate. Peptides and proteins seem to give best spectra at around 0.1 to 10 pmol mm3 (Figs. 9.13, 9.14). Some proteins, particularly glycoproteins, may yield better results at concentrations up to 10 pmol mm3. Oligonucleotides give better spectra at around 10 to 100 pmol mm3 while polymers require a concentration around 100 pmol mm3. (Note: 1 pmol nm3 ¼ 106 mol dm3.)

368

Mass spectrometric techniques

(a)

(b) Laser beam

337 nm

Sample plate

+20 kV

Variable Ground grid grid Plume of matrix and sample ions

Fig. 9.10 MALDI ionisation mechanism and MALDI–TOF sample plate. (a) The sample is mixed, in solution, with a ‘matrix’ – the organic acid in excess of the analyte (in a ratio between 1000 : 1 to 10 000 : 1) and transferred to the MALDI plate. An ultraviolet laser is directed to the sample (with a beam diameter of a few micrometres) for desorption. The laser radiation of a few nanoseconds’ duration is absorbed by the matrix molecules, causing rapid heating of the region around the area of laser impact and electronic excitation of the matrix. The immediate region of the sample explodes into the high vacuum of the mass spectrometer, creating gas phase protonated molecules of both the acid and the analyte. The laser flash ionises matrix molecules: neutrals (M) and matrix ions (MH)þ, (M  H) and sample neutral fragments (A). Sample molecules are ionised by gas phase proton transfer from the matrix: MHþ þ A > M þ AHþ : ðM  HÞ þ A > ðA  HÞ þ M: The matrix serves as an absorbing medium for the ultraviolet light converting the incident laser energy into molecular electronic energy, both for desorption and ionisation and as a source of Hþ ions to transfer to, and ionise, the analyte molecule. (b) A MALDI sample plate. Detector 15–25 kV gradient Flight tube +

+

+

+

+

Sample on plate Laser

Fig. 9.11 Principle of time-of-flight (TOF). The ions enter the flight tube, where the lighter ions travel faster than the heavier ions to the detector. If the ions are accelerated with the same potential at a fixed point and a fixed initial time, the ions will separate according to their mass to charge ratios. This time of flight can be converted to mass. Typically a few 100 pulses of laser light are used, each of around a few nanoseconds’ duration and the information is accumulated to build up a good spectrum. With the benefit of a camera that is used to follow the laser flashes one can move or ‘track’ the laser beam around the MALDI plate to find so called sweet spots where the composition of co-crystallised matrix and sample is optimal for good sensitivity.

369

9.3 Mass analysers

2 Camera

Reflector detector

3 Laser pulses

Reflector

5

Grids 1 Samples on target plate

Flight tube

Linear detector

6

Ion gate

Vacuum pump

Vacuum pump

Voltage gradient

4 7 Data system

Fig. 9.12 MALDI–TOF instrument components. (1) Sample mixed with matrix is dried on the target plate which is introduced into high-vacuum chamber. (2) The camera allows viewing of the position of the laser beam which can be tracked to optimise the signal. (3) The sample/matrix is irradiated with laser pulses. (4) The clock is started to measure time-of-flight. (5) Ions are accelerated by the electric field to the same kinetic energy and are separated according to mass as they fly through the flight tube. (6) Ions strike the detector either in linear (dashed arrow) or reflectron (full arrows) mode at different times, depending on their m/z ratio. (7) A data system controls instrument parameters, acquires signal versus time and processes the data.

1168.6175

100 90

1179.6641

% Intensity

80 1169.6206

70 60 1703.9316

50 40

1170.6269

30 20 10 1037.5707

1874.9790

1692.2220 1193.7027 1434.8363 1783.8597 1329.7604

Mass (m/z)

1171.6166 1172.6194

Mass (m/z)

Fig. 9.13 Two examples of MALDI–TOF peptide spectra. The left-hand spectrum is from a protein digest mixture and the right-hand image is an expanded one of a small part of a spectrum showing 13C-containing forms (see Section 9.5.4).

9.3.9 Delayed extraction In the first MALDI–TOF instruments, the ions in the plume of material generated by the laser pulse were continuously extracted by a high electrostatic field. Since this plume of material occupies a small but finite volume of space, ions arising at different

370

Mass spectrometric techniques

0.25

% Intensity

0.20

29281.7 28215.8

0.15

0.10

28109.0

27872.1 27803.4

28313.2

29434.4

28423.5 29616.8 29873.2

0.05

28000

29000

30000

m/z

Fig. 9.14 MALDI–TOF spectrum of protein isoforms. The spectrum is almost exclusively singly charged ions representing the molecular ion species of the constituent proteins. Compare this spectrum with the electrospray spectrum of another protein (Fig 9.6) where the multiply charged ions result in multiple peaks which would make it harder to interpret masses of mixtures. (I acknowledge the assistance of Bruker Daltonics who carried out the analysis.)

places could have different energies. This energy spread (and fragmentation occurring during this initial extraction period) usually broadens the peak corresponding to any particular ion which leads to lower mass accuracy. However, if extraction is delayed until all ions have formed, this spread is minimised. The procedure is known as delayed extraction (DE), whereby the ions are formed in either a weak field or no field during a predetermined time delay, and then extracted by the application of a high-voltage pulse. The degree of fragmentation of ions (Section 9.3.10) can also be controlled, to some extent, by the length of the time delay. Delayed extraction is illustrated in Fig. 9.15.

9.3.10 Post-source decay Post-source decay (PSD) is the process of fragmentation that may occur after an ion (the precursor ion) has been extracted from the source. Many biological molecules, particularly peptides, give rise to ions that dissociate over a timespan of microseconds and most precursor ions will have been extracted before this dissociation is complete. The fragment ions generated will have the same velocity as the precursor and cause peak broadening and loss of resolution in a linear TOF analyser (Fig. 9.16). The problem is overcome by the use of a reflector.

371

9.3 Mass analysers

Example 2 PEPTIDE MASS DETERMINATION (I) Question A peptide metabolite and an enzyme digest of it were analysed by a combination of mass spectrometric techniques giving the data listed below: (i) The peptide showed two signals at 3841.5 and 1741 in the MALDI–TOF. (ii) Five signals could be discerned when the peptide was introduced into a mass spectrometer via an electrospray ionisation source: m/z

498.2

581.1

697.1

871.2

1161.2

(iii) HPLC-MS of the digest indicated four components, the (M þ H)þ data for the components being m/z ¼ 176, 625, 1229 and 1508. The ions corresponding to the MS of the ‘625’ component appeared at m/z ¼ 521, 406, 293, 130 and 113. (iv) HPLC-MS–MS of the m/z ¼ 406 ion of the ‘625’ component identified two ions at m/z ¼ 378 and 336, and that of the m/z ¼ 113 ion gave m/z ¼ 85 and 57, in the product ion spectra. Use the above data to compare and contrast the different ionisation methods, deduce a molecular mass for the peptide and determine a sequence for the ‘625’ component. Use the amino acid residue mass values in Table 9.2.

Answer The data in (i) are m/z ¼ 3481.5 and m/z ¼ 1741. These data could represent either of the following possibilities: (a) m/z ¼ 3481.5 (M þ H)þ when m/z ¼ 1741 (M þ H)2þ, giving M ¼ 3480.5 (b) m/z ¼ 3481.5 (2M þ H)þ when m/z ¼ 1741 (M þ H)þ, giving M ¼ 1740 Consideration of the data in (ii) allows a choice to be made between these two alternatives, using n2 ¼ (m1  1)/(m2  m1) and M ¼ n2(m2  1). m1  1

m2  m1

n2

m2  1

M (Da)

z

870.2 696.1 580.1 497.2

290 174.1 116 82.9

3.0006 3.9982 5.0000 5.9975

1160.2 870.2 696.1 580.1

3481.2 3479.3 3481.1 3479.2

3 4 5 6

SM ¼ 13920.8 Da Mean M ¼ 3480.2 Da The mean M result confirms set (a) of the conclusions above concerning the data obtained from the MALDI experiments.

372

Mass spectrometric techniques

Example 2 (cont.) The data in (iii) indicate that four products arise from the enzymatic digest of the original peptide. As these products arise directly from the original, the sum of these masses will be related to the M of the pepide. Therefore 176 þ 625 þ 1229 þ 1508 ¼ 3538 Da The difference between this mass and the M determined above is 3538  3480:2 ¼ 57:8 58 Da The difference of 58 mass units is explained as follows. Each of the enzyme digest products is protonated (to be ‘seen’ in the mass spectrometer). Hence this accounts for 4 units. The remaining 54 unit increase arises from the enzymic hydrolysis. From a linear peptide, four products arise from three cleavage points (three cuts in a piece of string give four pieces). Each cleavage point requires the input of one water molecule (hydrolysis, H2O ¼ 18). Three cleavage points require 3  18 ¼ 54. The m/z ¼ 625, (M þ H)þ, peak was subjected to further mass spectrometry and sequence ions were observed. m/z D aa

624

521 103 Cys

406 115 Asp

293 113 Ile/Leu

130 163 Tyr

113 17 Ile/Leu

The loss of 113 from the m/z ¼ 406 ion indicates either Ile or Leu. MS2 shows consecutive losses of 28 (CO) and 42 (CH2 ¼ CH ¼ CH3) which is indicative of Leu. The loss of 17 (not a sequence ion) from 130 confirms this as the C-terminal amino acid. The predicted sequence from the N-terminal end is Cys - Asp - Leu - Tyr - Ile

The reflector A reflector (or reflectron) is a type of ion mirror that provides higher resolution in MALDI–TOF. The reflector increases the overall path length for an ion and it corrects for minor variation in the energy spread of ions of the same mass. Both effects improve resolution. The device has a gradient electric field and the depth to which ions will penetrate this field, before reversal of direction of travel, depends upon their energy. Higher-energy ions will travel further and lower-energy ions a shorter distance. The flight times thus become focussed, while neutral fragments are unaffected by the deflection. Figure 9.16 shows a diagrammatic representation of a MALDI–TOF

373

9.3 Mass analysers

Example 3 PEPTIDE MASS DETERMINATION (II) Question Consider the following mass spectrometric data obtained for a peptide metabolite. (i) The MALDI spectrum showed two signals at m/z ¼ 1609 and 805. (ii) There were two significant signals in positive ion trap MS mass spectrum at m/z ¼ 805 and 827, the latter signal being enhanced on addition of sodium chloride. (iii) Signals at m/z ¼ 161.8, 202.0, 269.0 and 403.0 were observed when the sample was introduced into the mass spectrometer via an electrospray ionisation source. Use these data to give an account of the ionisation methods used. Discuss the significance of the data and deduce a relative molecular mass for the metabolite. Use the amino acid residue mass values in Table 9.2.

Answer

(i) Signals in the MALDI spectrum were observed at m/z ¼ 1609 and 805. These data could represent the following possibilities: (a) m/z ¼ 1609 (M þ H)þ when m/z ¼ 805 (M þ 2H)2þ and m/z ¼ 403 (M þ 4H)4þ, giving M ¼ 1608Da (b) m/z ¼ 1609 (2M þ H)þ when m/z ¼ 805 (M þ H)þ and m/z ¼ 403 (M þ 2H)2þ, giving M ¼ 804Da (ii) The distinction between the above options can be made by considering the ion trap data. This mode of ionisation gave peaks at m/z ¼ 805 and 827, the latter being enhanced on the addition of sodium chloride. This evidence suggests: m/z ¼ 805 (M þ H)þ m/z ¼ 827 (M þ Na)þ giving M ¼ 804 Da and supports option (b) from the MALDI data. (iii) The multiply charged ions observed in the electrospray ionisation method allow an average M to be calculated. Using the standard formula: m1  1

m2  m1

n2

m2  1

M (Da)

z

268.0 201.0 160.8

134 67 40.2

2.0 3.0 4.0

402.0 268.0 201.0

804 804 804

2 3 4

The molecular mass is clearly 804 Da, confirming the above conclusions. instrument that includes the facility for both linear and reflectron modes of ion collection. The reflectron improves resolution and mass accuracy and also allows structure and sequence information (in the case of peptides) to be obtained by PSD analysis.

374

Mass spectrometric techniques

Flight tube

0V 1

Ions of same mass but different velocities Sample on target plate

Detector

+20kV field applied

2

+20kV field 3

Fig. 9.15 Delayed extraction (DE). (1) No applied electric field. The ions spread out. (2) Field applied. The potential gradient accelerates slow ions more than fast ones. (3) Slow ions catch up with faster ones at the detector.

(a)

Reflector detector +

0

Linear detector

+

+ 0 +

0 + +

+ Ion gate Reflector

(b) In the linear mode

+ 0 +

+ 0

+ Ion gate (c) In the PSD mode Voltage gradient

Fig. 9.16 The MALDI–TOF reflector. Post-source decay (PSD) theory. (a) Fragment ions arising by PSD as well as the neutral fragments and the precursor ions have the same velocity and reach the detector simultaneously. This prevents a distinction between precursor and PSD fragment. (b) In the linear mode the charged fragments are not separated. (c) In the reflector mode, the fragment that does not retain the charge (neutral, denoted  by 0 ) is not deflected in the reflector but the charged fragments ( þ) are deflected according to their m/z and a spectrum of the fragment (daughter) ions is recorded, albeit of a limited m/z range for each setting of the reflector voltages.

375

9.3 Mass analysers

Sequencing peptides by PSD analysis in MALDI–TOF is less straightforward (and in a large percentage of experiments is unsuccessful) than tandem MS on a quadrupole ESI or ion trap instrument. At any given setting of the reflector/ion mirror, charged fragments of a particular range of m/z are focussed in the reflector (Fig. 9.16). Fragment ions of m/z above and below this narrow range are poorly focussed. Therefore, since only fragment ions of a limited mass range are focussed for a given mirror ratio in the reflector, a number of spectra are run at different settings and stitched together to generate a composite spectrum. Types of MALDI sample plates MALDI sample plate types that are available include 100-well stainless steel flat plates. These are good for multiple sample analysis where close external calibration is used, that is the use of a compound or compounds of known molecular mass placed on an adjacent spot to calibrate the instrument. It is also easier to see crystallisation of the matrix on this type of surface. Four-hundred-spot Teflon-coated plates have particular application for concentrating sample for increased sensitivity. Due to the very small diameter of the spots, it is difficult to spot accurately manually but these plates are good for automated sample spotting. Only in the centre of each spot is the surface of the plate exposed therefore the sample does not ‘wet’ over the whole surface but concentrates itself into the centre of each spot as it dries. Gold-coated plates with wells (2 mm diameter, see Fig. 9.10b) are good surfaces on which to contain the spread of sample and matrix when used with highly organic solvents, e.g. tetrahydrofuran (THF) preparations for polymers. They also allow on-plate reactions within the well with thiol-containing reagents that bind to the gold surface.

9.3.11 Novel hybrid instruments There are a number of commercial developments of hybrid MS instruments that involve coupling an electrospray, ion trap or a MALDI ion source with a hybrid quadrupole orthogonal acceleration time-of-flight mass spectrometer (Fig. 9.17). This potentially leads to improved tandem MS performance from MALDI phase samples. The intention of the development of these instruments is to combine the best features of both types of ion source with the best features of all types of analyser in order to improve tandem MS capability and increase sensitivity. Hybrid magnetic sector instruments are also manufactured where the first mass spectrometer is a twosector device and the second mass spectrometer is a quadrupole.

9.3.12 Fourier-transform ion cyclotron resonance MS The recent development of Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometry has great potential in analysis of a wide range of biomolecules. It is potentially the most sensitive mass spectrometric technique and has very high

376

Mass spectrometric techniques

Resolving quadrupole Collision cell Ion source (electrospray, ion trap or MALDI)

Detector Quadrupoles

Collision gas Reflector

Linear detector

Fig. 9.17 Diagram of a hybrid quadrupole TOF MS. The diagram shown here does not represent any specific instrument from a particular manufacturer. The source may be an ion trap device, an electrospray or even a MALDI source (such as in the ‘MALDI Q-TOF’ from Micromass). Other hybrid instruments include the Bruker Daltonics ‘BioTOF III, ESI-Q-q-TOF System’ and the ‘QSTAR’ Hybrid LC/MS/MS from Applied Biosystems with an electrospray, nanospray or an optional MALDI source. The Shimadzu Biotech ‘AXIMA MALDI QIT TOF’ combines a MALDI source with an ion trap and reflectron TOF mass analyser.

mass resolution; >106 is observable with most instruments. The instrument also allows tandem MS to be carried out. The ions can be generated by a variety of techniques, such as an ESI or a MALDI source. FT-ICR MS is based on the principle of ions, which while orbiting in a magnetic field, are excited by radio frequency (RF) signals. As a result, the ions produce a detectable image current on the cell in which they are trapped. The time-dependent image current is Fourier transformed to obtain the component frequencies of the different ions, which correspond to their m/z (Fig. 9.18).

9.3.13 Orbitrap mass spectrometer The resolving power of FTICR-MS is proportional to the strength of the magnetic field therefore superconducting magnets (3–12 tesla) are required, which makes for high maintenance cost. The Orbitrap mass analyser is a lower-cost alternative to the high magnetic field FTICR mass spectrometer. Ions are trapped in the orbitrap, where they undergo harmonic ion oscillations, along the axis of an electric field (see Fig. 9.19). Their m/z values are measured from the frequency of the ions, measured non-destructively, using Fourier transforms to obtain the mass spectrum. The instrument has high mass resolution (up to 150 000), high mass accuracy (2–5 p.p.m.), an m/z range of >6000 and a dynamic range greater than 103 with sub-femtomol sensitivity.

377

9.4 Detectors

Trapping plate (at back) Detector plate

Excitation (transmitter) plate

+ Fourier transform

Trapping plate (at front)

Magnetic field Excitation (transmitter) plate

Ions

Detector plate Radio frequency

Fig. 9.18 Schematic diagram of the Fourier-transform ion cyclotron resonance (FT-ICR) instrument. The technique involves trapping, excitation and detection of ions to produce a mass spectrum. The trapping plates to maintain the ions in orbit are at the front and back in the schematic. The excitation or transmitter plates where the radio frequency (RF) pulse is given to the ions are shown at each side and the detector plates that detect the image current which is Fourier transformed are shown at the top and bottom. The sample source is normally electrospray (described in Section 9.2.4) or MALDI (see Section 9.3.8 and Fig. 9.10). The ions are focussed and transferred into the analyser cell under high vacuum. The analyser cell is a type of ion trap in a spatially uniform strong magnetic field which constrains the ions in a circular orbit, the frequency of which is determined by the mass, charge and velocity of the ion. While the ions are in these stable orbits between the detector electrodes they will not give a measurable signal. In order to achieve this, ions of a given m/z are excited to a wider orbit by applying a RF signal of a few milliseconds’ duration. One frequency excites ions of one particular m/z which results in the ions producing a detectable image current. This time-dependent image current is Fourier transformed to obtain the component frequencies which correspond to the m/z of the different ions. The angular frequency measurements produce values for m/z. Therefore the mass spectrum is determined to a very high mass resolution since frequency can be measured more accurately than any other physical property. After excitation, the ions relax back to their previous orbits and high sensitivity can be achieved by repeating this process many times.

9.4 DETECTORS 9.4.1 Introduction The ions from the mass analyser impinge on a surface of a detector where the charge is neutralised, either by collection or donation of electrons. An electric current flows that is amplified and ultimately converted into a signal that is processed by a computer. The total ion current (TIC) is the sum of the current carried by all the ions being detected at any given moment and is a very useful parameter to measure during on-line MS. A plot of ion current versus time complements the ultraviolet trace that is also normally recorded during the chromatography run. Unlike the ultraviolet trace which depends on the absorbance of each component at the particular wavelength(s) set on the ultraviolet detector, the TIC is of course independent of the light-absorbing properties of a substance and depends only on its ionisability in the instrument.

378

Mass spectrometric techniques

(a) ESI

Linear ion trap

C-trap

Quadrupoles (and octopoles) High vacuum Orbitrap Mass spectrum

Fourier transform

(b) r Central and outer cylindrical electrodes

Z

Fig. 9.19 (a) Simplified schematic of an Orbitrap mass spectrometer with an electrospray ionisation (ESI) source. Ions are transferred from an ESI source (described in Section 9.2.4) through three stages of differential vacuum pumping using RF guide quadrupoles and octopoles which focus and guide the ions through the various parts of the instrument. The ions are stored in the linear ion trap then axially ejected to the C-trap where they are squeezed into a small cloud. ‘Bunches’ of ions are then injected into the Orbitrap analyser. The third quadrupole, which is pressurised to less than 103 torr with collision gas, acts as an ion accumulator where ion/neutral collisions slow the ions which pool in an axial potential well at the end of the quadrupole (the C-trap). The linear ion trap operates on a similar principle to the ion trap described in Section 9.3.3. and Fig. 9.7. MALDI (see Section 9.3.8. and Fig. 9.10) is an alternative ion source. (b) Detail of the Orbitrap analyser. In the Orbitrap, the ions are trapped in a radial electric field between a central and an outer cylindrical electrode. They orbit around the central electrode with axial oscillations. The superimposed harmonic oscillations in the Z direction are detected by measuring the image current at the outer electrode. The frequency of oscillation is proportional to the m/z ratio and is detected and processed by fast Fourier transform, as in FT-ICR MS.

379

9.5 Structural information by tandem mass spectrometry

(b)

(a) Conversion dynode Electrons released e– e– Ions from MS

Conversion dynode

Ions from MS e–

e–

Photomultiplier

e– Scintillator or phosphor screen

Electrons released Amplification To amplifier

Fig. 9.20 Conversion dynode and electron multiplier. (a and b) Each ion strikes the conversion dynode (which converts ions to electrons) which emits a number of electrons that travel to the next, higher-voltage dynode. The secondary electrons from the conversion dynode are accelerated and focussed onto a second dynode, which itself emits secondary electrons. Each electron then produces several more electrons. Amplification is achieved through the ‘cascading effect’ of secondary electrons from dynode to dynode that finally results in a measurable current at the end of the electron multiplier. The cascade of electrons continues until a sufficiently large current for normal amplification is obtained. A series of up to 10–20 dynodes (set at different potentials) provides an amplification gain of 106 or 107.

9.4.2 Electron multiplier and conversion dynode Electron multipliers are used as detectors for many types of mass spectrometers. These are frequently combined with a conversion dynode which is a device to increase sensitivity. The ion beam from the mass analyser is focussed onto the conversion dynode, which emits electrons in direct proportion to the number of bombarding ions. A positive ion or a negative ion hits the conversion dynode, causing the emission of secondary particles containing secondary ions, electrons and neutral particles (see Fig. 9.20). These secondary particles are accelerated into the dynodes of the electron multiplier. They strike the dynodes with sufficient energy to dislodge electrons, which pass further into the electron multiplier, colliding with the dynodes, producing more and more electrons.

9.5 STRUCTURAL INFORMATION BY TANDEM MASS SPECTROMETRY 9.5.1 Introduction As mentioned above, the newer ionisation techniques ESI and MALDI are soft ionisation techniques (as is FAB and its derivative techniques). In contrast to EI, they do not produce significant amounts of fragment ions. Therefore in order to obtain structural information on biomolecules and sequence information (in the case of proteins and peptides), tandem MS has been developed. The technique can also be applied to obtain sequence information on oligosaccharides (see Sections 9.5.5 and 9.5.6) and oligonucleotides. Although it is unlikely that this method will ever replace DNA sequencing

380

Mass spectrometric techniques

Detector Q3

Q2

(a)

MS-2, separation of fragment ions Q1 Collision cell

MS-1 Selection of precursor ion

Ions

(b)

R1 O

R2 O

R3 O

R4

H2N C C N C C N C C N C H

R1 O

H H

R2 O

H2N C C N C C H

H H

H H

CO2H

H H

R3 O

R4

N C C N C H H

CO2H

H H

Fig. 9.21 Quadrupole MS sequencing. An ion of a particular m/z value is selected in the first quadrupole, Q1, as in Fig. 9.5, but instead of being detected, it passes through the second quadrupole, Q2, where it is subjected to collision with the collision gas. The Q3 acts like a second quadrupole mass spectrometer, MS-2, to scan m/z to obtain a spectrum of the fragment ions. The collision cell, Q2, is frequently a radio frequency (RF)-only quadrupole containing the appropriate collision gas. No mass filtering occurs here, the RF merely constrains the ions to allow a greater number of collisions to occur. The fragmentation depicted here is at the peptide bond and one of the fragments will retain the charge, resulting in either a y-series or a b-series ion (see Fig. 9.22).

gels, it can be used to identify positions of modified or labelled bases that might not be picked up by the Sanger dideoxy sequencing method. Structural information can be obtained on almost any type of organic molecule, on an instrument that is suitable for that type of sample. This includes investigation of organic compounds on a magnet sector MS where two double-focussing magnetic sector machines can be combined into a four-sector device coupled through a collision cell. The general procedure is that a mixture of ions is generated in the ion source of the mass spectrometer as normal and the ions are allowed to pass through the first mass

381

9.5 Structural information by tandem mass spectrometry

analyser where an ion of a particular m/z is selected (but not detected). This ion then enters the collision cell and collides with an inert collision gas such as helium or argon. The kinetic energy of this ion is converted to vibrational energy and the ion fragments. This is known as collision-induced dissociation (CID) or collision-activated dissociation (CAD). The m/z values of the fragment ions are then determined in a second mass spectrometer (see Fig. 9.21 for an illustration of the principle in a quadrupole mass spectrometer). Collision cells may be placed in any of the fieldfree regions, leading to a wide variety of experimental methodologies for many different applications. For example, as well as in the triple quadrupole MS this can be done in a hybrid instrument such as the Q-TOF (described in Section 9.3.11). Since the principles of tandem MS are similar for most instrument configurations, further discussion will focus on electrospray tandem MS. The procedure for obtaining structural and sequence information on polypeptides in ion trap MS has been described above (Section 9.3.3).

9.5.2 Sequencing of proteins and peptides The identification of proteins involves protease cleavage, mostly by trypsin. Owing to the specificity of this protease, tryptic peptides usually have basic groups at the N- and Cterminis. Trypsin cleaves after lysine and arginine residues, both of which have basic side chains (an amino and a guanidino group respectively). This results in a large proportion of high-energy doubly charged positive ions that are more easily fragmented. The digestion of the protein into peptides is followed by identification of the peptides by mass charge ratio (m/z) either as very accurate masses alone or by using a second fragmentation that gives ladders of fragments cleaved at the peptide bonds. Although a wide variety of fragmentations may occur, there is a predominance of peptide bond cleavage which gives rise to peaks in the spectrum that differ sequentially by the residue mass. The mass differences are thus used to reconstruct the amino acid sequence (primary structure) of the peptide (Table 9.2). Different series of ions, a, b, c and x, y, z, may be recognised, depending on which fragment carries the charge. Ions x, y and z arise by retention of charge on the C-terminal fragment of the peptide. For example, the z1 ion is the first C-terminal residue; y1 also contains the NH group (15 atomic mass units greater) and x1 includes the carbonyl group; y2 comprises the first two C-terminal residues, and so on. The a, b, and c ion series arise from the N-terminal end of the peptide, when the fragmentation results in retention of charge on these fragments. Figure 9.22a shows an idealised peptide subjected to fragmentation. Particular series will generally predominate so that the peptide may be sequenced from both ends by obtaining complementary data (Fig. 9.22b). In addition, ions can arise from side chain fragmentation, which enables a distinction to be made between isomeric amino acids such as leucine and isoleucine. The protein is identified by searching databases of expected masses from all known peptides from every protein (or translations from DNA) and theoretical masses from fragmented peptides. Sensitivity of tandem MS has been claimed down to zeptomole level.

382

Mass spectrometric techniques

Table 9.2 Symbols and residue masses of the protein amino acids Residue massa

Name

Symbol

Side chain

Alanine

A, Ala

71.079

Arginine

R, Arg

156.188

HN¼C(NH2)—NH—(CH2)3-

Asparagine

N, Asn

114.104

H2N—CO—CH2-

Aspartic acid

D, Asp

115.089

HOOC—CH2-

Cysteine

C, Cys

103.145

HS—CH2-

Glutamine

Q, Gln

128.131

H2N—CO—(CH2)2-

Glutamic acid

E, Glu

129.116

HOOC—(CH2)2-

Glycine

G, Gly

57.052

Histidine

H, His

137.141

Imidazole-CH2-

Isoleucine

I, Ile

113.160

CH3—CH2—CH(CH3)-

Leucine

L, Leu

113.160

(CH3)2—CH—CH2-

Lysine

K, Lys

128.17

H2N—(CH2)4-

Methionine

M, Met

131.199

CH3—S—(CH2)2-

Metsulphoxide

Met.SO

147.199

CH3—S(O)—(CD2)2-

Phenylalanine

F, Phe

147.177

Phenyl-CH2-

Proline

P, Pro

97.117

Pyrrolidone-CH-

Serine

S, Ser

87.078

HO—CH2-

Threonine

T, Thr

101.105

CH3—CH(OH)-

Tryptophan

W, Trp

186.213

Indole-NH—CH¼C—CH2-

Tyrosine

Y, Tyr

163.176

4-OH-Phenyl-CH2-

Valine

V, Val

99.133

CH3-

H-

CH3—CH(CH2)-

Note: aResidue mass is the mass in a peptide bond, i.e. after loss of H2O when the peptide bond is formed. The numbers in bold in the residue mass column indicate amino acids that may be ambiguous in a sequence determined by tandem MS due to close similarity or identity in mass.

9.5.3 Comparison of MS and Edman sequencing Edman degradation (Section 8.4.3) to obtain the complete sequence of a protein is uncommon nowadays since genomes are available to search with fragmentary sequences. Most intact proteins, if they are not processed from a secretory or propeptide form, are blocked at the N-terminus, most commonly with an acetyl group. Other amino terminal blocking includes fatty acylation, most commonly with a myristoyl, C12 fatty acid, attached through a glycine residue but the presence of many shorter-chain fatty acids is known to occur. Cyclisation of glutamine to a pyroglutamyl

383

9.5 Structural information by tandem mass spectrometry

(a)

x3

H2N

R1

O

C

C

H

a1

b1

y3

z3

x2

R2

O

N

C

C

H

H

c1

a2

(b)

b2

y2

z2

x1

R3

O

N

C

C

H

H

c2

a3

b3

y1

z1 R4

N

C

H

H

CO2H

c3

+

130.05 b1 277.12 b2 378.17 b3 475.22 b4 572.27 b5 629.29 b6 757.35 b7 828.39 b8 899.43 b9 b10 1062.49 b11 1190.55

E FTPPGQAAYQK EF TPPGQAAYQK EFT PPGQAAYQK EFTP PGQAAYQK EFTPP GQAAYQK EFTPPG QAAYQK EFTPPGQ AAYQK EFTPPGQA AYQK EFTPPGQAA YQK EFTPPGQAAY QK EFTPPGQAAYQ K

y11 1207.61 y10 1060.54 959.50 y9 862.44 y8 765.39 y7 708.37 y6 580.31 y5 509.27 y4 438.24 y3 275.17 y2 147.11 y1

(M+H)– 1336.65

+

y8 y6 y11 1207.61 862.44 y y4 708.37 9 y3 y7 509.27 959.50 y1 147.11 438.24 y10 1060.54 y2 765.39 y5 b11 b4 275.17 b8 580.31 1190.55 475.22 828.39 b10 b6 b3 b1 b2 b7 b5 b9 1062.49 629.29 757.35 130.1 277.12 378.17 572.27 899.43

0

100

1000

Fig. 9.22 Peptide fragment ion nomenclature and tandem MS spectrum of a peptide. (a) Charge may be retained by either the N- or C-terminal fragment, resulting in the a, b and c series of ions or x, y and z series respectively. Ions in the b and y series frequently predominate. Corresponding neutral fragments are of course not detected. (b) The sequence of the peptide from a mutant haemoglobin is: EFTPPGQAAYQK. The figure shows the tandem mass spectrum from collision-induced dissociation of the doubly charged (M þ 2H)2þ precursor, m/z ¼ 668.3. Cleavage at each peptide bond results in the b or y ions when the positive charge is retained by the fragment containing the N- or C-terminus of the peptide respectively (see inset).

residue and post-translational modification to N-terminal trimethylalanine and dimethylproline also occur. In the case of recombinant proteins over-expressed in E. coli, the initiator residue N-formyl methionine is often incompletely removed. All these modifications leave the N-terminal residue without a free proton on the alpha nitrogen and Edman chemistry cannot proceed. Mass spectrometry has therefore been essential for their correct structural identification. The protein sequencing

384

Mass spectrometric techniques

instruments are still important for solid phase sequencing to identify post-translational modifications; in particular, sites of phosphorylation and a combination of microsequencing and mass spectrometry techniques are now commonly employed for complete covalent structure determination of proteins.

Example 4 PEPTIDE SEQUENCING (I) Question An oligopeptide obtained by tryptic digestion was investigated by ESI–MS and ion trap MS–MS both in positive mode, and gave the following m/z data: ESI 223.2 297.3 Ion trap 146 203 260 357 444 591 648 705 802 890

(i) Predict the sequence of the oligopeptide. Use the amino acid residual mass values in Table 9.2. (ii) Determine the average molecular mass. (iii) Identify the peaks in the ESI spectrum. Note: Trypsin cleaves on the C-terminal side of arginine and lysine.

Answer

(i) The highest mass peak in the ion trap MS spectrum is m/z ¼ 890, which represents (M þ H)þ. Hence M ¼ 889 Da. m/z 146 D aa

203 57 Gly

260 57 Gly

357 97 Pro

444 87 Ser

591 147 Phe

648 57 Gly

705 57 Gly

802 97 Pro

889 87 Ser

The mass differences (D), between sequence ions, represent the amino acid (aa) residue masses. The lowest mass sequence ion, m/z ¼ 146, is too low for arginine and must therefore represent Lys þ OH. The sequence in conventional order from the N-terminal end would be: Ser-Pro-Gly-Gly-Phe-Ser-Pro-Gly-Gly-Lys (ii) The summation of the residues ¼ 889 Da, which is a check on the mass spectrometry value for M. (iii) The m/z values in the ESI spectrum represent multiply charged species and may be identified as follows: m/z ¼ 223.2 (M þ 4H)4þ from 889/223.2 ¼ 3.98 m/z ¼ 297.3 (M þ 3H)3þ from 889/297.3 ¼ 2.99 Remember that z must be an integer and hence values need to be rounded to the nearest whole number.

385

9.5 Structural information by tandem mass spectrometry

Table 9.3 Mass differences due to isotopes in multiply charged peptides

% Intensity

100

Charge on peptide

Apparent mass

Mass difference between isotope peaks

Single charge

[(M þ H)/1]

1 Da

Double charge

[(M þ 2H)/2]

0.5 Da

Triple charge

[(M þ 3H)/3]

0.33 Da

n charges

[(M þ nH)/n]

1/n Da

1296.65

80

+1 (1296.69)

1297.64

60

649.81 650.31

20

0

0 1296

% Intensity

649.32

80

40

1298.64

20

80

+2 (648.85)

60

40

100

648.82

100

1298 m/z

1300

648

649

650

651

m/z

432.93

+3 (432.90)

324.95

100 80

433.26

60

+4 (324.93)

325.17

60

40

40

433.58

20

20

0

0 433

434 m/z

325.42 325.68 324.5

325.0

325.5 m/z

326.0

326.5

Fig. 9.23 Spectra of a multiply charged peptide. Finding the charge state of a peptide involves zooming in on a particular part of the mass spectrum to obtain a detailed image of the mass differences between different peaks that arise from the same biomolecule, due to isotopic abundance. This is mainly due to 12C and its 13 C isotope, as described in the text.

9.5.4 Carbon isotopes and finding the charge state of a peptide Since the mass detector operates on the basis of mass-to-charge ratio (m/z), mass assignment is normally made assuming a single charge per ion (i.e. m/z ¼ m þ 1 in positive ion mode). However, since there is around 1.1% 13C natural abundance, with increasing size, peptides will have a greater chance of containing at least one 13C and two 13C, etc. A peptide of 20 residues has approximately equal peak heights of the ‘all 12 C peptide’ and of the peptide with one 13C. A singly charged peptide will show adjacent peaks differing in one mass unit; a doubly charged peptide will show adjacent peaks differing in half a mass unit and so on (Fig. 9.23 and Table 9.3). In the example illustrated, the peptide has a mass

386

Mass spectrometric techniques

calculated from its sequence as 1295.69. The experimentally derived values are, for the singly charged ion, [(M þ H)/1] ¼ 1296.65 and for the doubly charged ion, [(M þ 2H)/2] ¼ 648.82. For elements such as chlorine, the isotopic abundance is approximately 3 : 1, 35Cl : 37 Cl. If a compound contains a single chlorine atom, two ion species will be observed, with peak intensities in an approximate ratio of 3 : 1. If a compound contains two chlorine atoms then three peaks will be seen. The technique is particularly useful for determining which are the high-energy doubly charged tryptic peptides, for tandem MS.

Example 5 PEPTIDE SEQUENCING (II) Question Determine the primary structure of the oligopeptide that gave the following, positive mode, MS–MS data: m/z

149

305

442

529

617

Use the amino acid residual mass values in Table 9.2.

Answer m/z ¼ 617 (M þ H)þ m/z

149

305

442

529

D aa

156 Arg

137 His

87 Ser

87 Ser

616

Conventional order for the sequence would be: Ser-Ser-His-Arg-? It is important to note that no assignment has been given for the remaining m/z ¼ 149. It may not in fact be a sequence ion and more information would be required, such as an accurate molecular mass of the oligopeptide, in order to proceed further. It is, however, possible to speculate as to the nature of this ion. If the m/z ¼ 149 ion is the C-terminal amino acid then it would end in -OH and be 17 mass units greater than the corresponding residue mass. The difference between 149 and 17 is 132, which is extremely close to methionine, so this amino acid remains a possibility to end the chain.

9.5.5 Post-translational modification of proteins Many chemically distinct types of post-translational modification of proteins are known to occur. These include the wide variety of acylations at the N-terminus of proteins (mentioned above) as well as acylations at the C-terminus and at internal sites. In this section, examples of the application of MS techniques employed for analysis of glycosylation, phosphorylation and disulphide bonds are given. An up-to-date list of the broad chemical diversity of known modifications and the side chains of the amino acids to which they are attached is on the website

387

9.5 Structural information by tandem mass spectrometry

Example 6 PEPTIDE MASS DETERMINATION (III) Question An unknown peptide and an enzymatic digest of it were analysed by mass spectrometric and chromatographic methods as follows: (i) MALDI–TOF mass spectrometry of the peptide gave two signals at m/z ¼ 3569 and 1785; (ii) MALDI–TOF of the hydrolysate showed signals at m/z ¼ 766, 891, 953 and 1016; (iii) the data obtained from analysis of the peptide using coupled HPLC–MS operating through an electrospray ionisation source were m/z ¼ 510.7, 595.7, 714.6, 893.0 and 1190.3; (iv) when the hydrolysate was analysed by HPLC, four distinct components could be discerned. Explain what information is available from these observations and determine a molecular mass, using the amino acid residue mass values in Table 9.2, for the unknown peptide.

Answer

(i) Signals from MALDI–TOF were observed at m/z ¼ 3569 and 1785. These data could represent either of the following possibilities: (a) m/z ¼ 3569 (M þ H)þ when m/z ¼ 1785 (M þ 2H)2þ, giving M ¼ 3568 (b) m/z ¼ 3569 (2M þ H)þ, when m/z ¼ 1785 (M þ H)þ, giving M ¼ 1784 (ii) It is possible to distinguish between these two options by considering the MALDI–TOF of the products of hydrolysis. Four m/z values were obtained: 766, 891, 953 and 1016. Each is a protonated species and the sum of these masses, 3626, will be of the order of the M of the original peptide. The value of this sum supports option (a) in (i) above. (iii) Electrospray ionisation data represent multiply charged ions. Using the standard formula the mean M may be obtained. m1  1

m2  m1

n2

m2  1

M (Da)

z

892.0 713.6 594.7 509.7

297.3 178.4 118.9 85.0

3.0003 4.0000 5.0016 5.9964

1189.3 892.0 713.6 594.7

3568.3 3568.0 3569.2 3566.1

3 4 5 6

SM ¼ 14271.6 Da Mean M ¼ 3567.9 Da This more precise value confirms the conclusions found above. For an explanation of the mass difference between Mr and the sum of the hydrolysate products, refer to the answer to Example 2. The data in (iv) are confirmatory chromatographic evidence that only four hydrolysis products were obtained.

388

Mass spectrometric techniques

‘Delta Mass’, which is a database of protein post-translational modifications that can be found at http://www.abrf.org/index.cfm/dm.home. There are hyperlinks to references to the modifications. Protein phosphorylation and identification of phosphopeptides Phosphate is reversibly covalently attached to eukaryotic proteins in order to regulate activity (Section 15.5.4). The modified residues are O-phosphoserine, O-phosphothreonine and O-phosphotyrosine but many other amino acids in proteins can be phosphorylated: O-phospho-Asp; S-phospho-Cys; N-phospho-Arg; N-phospho-His and N-phosphoLys. Analysis of modified peptides by mass spectrometry is essential to confirm the exact location and number of phosphorylated residues, especially if no 32P or other radiolabel is present. Identification of either positive or negative ions may yield more information, depending on the mode of ionisation and fragmentation of an individual peptide. Phosphopeptides may give better spectra in the negative ion mode since they have a strong negative charge due to the phosphate group. Phosphopeptides may not run well on MALDI–TOF and methods have been successfully developed for this type of instrument that employ examination of spectra before and after dephosphorylation of the peptide mixture with phosphatases. Mass spectrometry of glycosylation sites and structures of the sugars The attachment points of N-linked (through asparagine) and O-linked (through serine) glycosylation sites and the structures of the complex carbohydrates can be determined by MS. The loss of each monosaccharide unit of distinct mass can be interpreted to reconstruct the glycosylation pattern (see example in Fig. 9.24). The ‘GlycoMod’ website, part of the ExPASy suite, provides valuable assistance in interpretation of the spectra. GlycoMod is a tool that can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses. The program can be used for free or derivatised oligosaccharides and for glycopeptides. Another algorithm, GlycanMass, also part of the ExPASy suite, can be used to calculate the mass of an oligosaccharide structure from its oligosaccharide composition. GlyocoMod and GlycanMass are found at http://us. expasy.org/tools/glycomod/ and http://us.expasy.org/tools/glycomod/glycanmass. html respectively. Identification of disulphide linkages by mass spectrometry Mass spectrometry is also used in the location of disulphide bonds in a protein. Identification of the position of the disulphide linkages involves the fragmentation of proteins into peptides under low pH conditions to minimise disulphide exchange. Proteases with active site thiols should be avoided (e.g. papain, bromelain). Pepsin and cyanogen bromide are particularly useful. The disulphide-linked peptide fragments are separated and identified under mild oxidising conditions by HPLC–MS. The separation is repeated after reduction with reagents such as mercaptoethanol and dithiothreitol (DTT) to cleave –S–S– bonds and the products reanalysed as before. Peptides that were disulphide linked disappear from the spectrum and reappear at the appropriate positions for the individual components.

389

9.5 Structural information by tandem mass spectrometry

(a)

393.4 365.2 876.7

406.4 (= 393.4 + NH) 550.4 208.2 203.3 349.1

534.4

231.3 244.1

185.4

569.0

331.1

56.5

696.5

Mass (m/z) (b) 187 (169) Fucose

349 331

551 533

(715) 697

Galactose

N-Acetyl-glucosamine

Galactose

(758) (731)

(596) 568

393 365

Glucose

231 203

Fig. 9.24 MALDI–TOF PSD MS of carbohydrates. (a) PSD MS spectrum of the carbohydrate Fuc1–2Gal1– 3GlucNAc1–3Gal1–4Glc using 2,5-dihydroxybenzoic acid (DHB) as matrix. On careful inspection of the spectrum one can observe a number of abrupt changes in baseline corresponding to where the PSD spectra have been stitched together. The peak at 876.7 Da is due to the mass of the intact molecule as a sodium adduct, i.e. the parent ion at 876.7 ¼ [M þ Na]þ ion. (Spectrum courtesy of Dr Andrew Cronshaw.) (b) Interpretation of the spectrum. Experimentally derived fragment masses are mainly within 1 Da of the theoretical. The masses in parentheses were not seen in this experiment.

9.5.6 Selected ion monitoring Selected ion monitoring (SIM) is typically used to look for ions that are characteristic of a target compound or family of compounds. This technique has particular application for on-line chromatography/MS where the instruments can be set up to monitor selected ion masses as the components elute successively from the capillary LC or reverse-phase HPLC column for example (Sections 11.3.3 and 11.9.3). Detection programmes or algorithms that are set up to carry out tandem MS on each component as it elutes from a chromatography column can be adapted to enable selective detection of many types of post-translationally modified peptides. This technique can selectively detect low-mass fragment ions that are characteristic markers that identify the presence of post-translational modifications such as phosphorylation, glycosylation, sulphation and acylation in any particular peptide. For example, phosphopeptides can be identified by production of phosphate-specific fragment ions  of 63 Da (PO 2 ) and 79 Da (PO3 ) by collision-induced dissociation during negative ion

390

Mass spectrometric techniques

HPLC–ES MS. Glycopeptides can be identified by characteristic fragment ions including hexoseþ (163 Da) and N-acetyl hexosamineþ (204 Da). Phosphoserine- and phosphothreonine-containing peptides can also be identified by a process known as neutral loss scanning where these peptides show loss of 98 Da by b-elimination of H3PO4 (Fig. 9.25).

9.6 ANALYSING PROTEIN COMPLEXES Mass spectrometry is frequently used to identify partner proteins that interact with a particular protein of interest. Interacting proteins can be isolated by a number of methods including immunoprecipitation of tagged proteins from cell transfection; affinity chromatography and surface plasmon resonance. Surface plasmon resonance (SPR) (Section 13.3) technology has widespread application for biomolecular interaction analysis and during characterisation of protein–ligand and protein–protein interactions, direct analysis by MALDI–TOF MS of samples bound to the Biacore chips is now possible (where interaction kinetic data is also obtained; see Sections 13.3 and 17.3.2). Direct analysis of protein complexes by mass spectrometry is also possible. As well as accurate molecular weight of large biopolymers such as proteins of mass greater than 400 kDa, intact virus particles of Mr 40  106 (40 MDa) have been analysed using ESI–TOF. An icosahedral virus consisting of a single-stranded RNA surrounded by a homogeneous protein shell with a total mass of 6.5  106 Da and a rod-shaped RNA virus with a total mass of 40.5  106 Da were studied on a ESI–TOF hybrid mass spectrometer.

9.6.1 Sample preparation and handling Mass analysis by ES–MS and MALDI–TOF is affected, seriously in some cases, by the presence of particular salts, buffers and detergents. Keratin contamination from flakes of skin and hair can be a major problem particularly when handling gels and slices; therefore gloves and laboratory coats must be worn. Work on a clean surface in a hood with air filter if possible and use a dedicated box of clean polypropylene microcentrifuge tubes tested to confirm that they do not leach out polymers, mould release agents, plasticisers, etc. Sample clean-up to remove or reduce levels of buffer salts, EDTA, DMSO, non-ionic and ionic detergents (e.g. SDS) etc. can be achieved by dilution, washing, drop dialysis and ion exchange resins. If one is analysing samples by MALDI–TOF, on-plate washing can remove buffers and salts. Sample clean-up can also be achieved by pipette tip chromatography (Section 11.2.5). This consists of a miniature C18 reverse-phase chromatography column, packed in a 10 nm3 pipette tip. The sample, in low or zero organic solventcontaining buffer, is loaded into the tip with a few up- and-down movements of the pipette piston to ensure complete binding of the sample. Since most contaminants described above will not bind, the sample is trapped on the reverse phase material and eluted with a solvent containing high organic solvent (typically 50–75% acetonitrile). This is particularly applicable for clean-up of samples after in-gel digestion of protein bands separated on SDS-PAGE. Coomassie Brilliant Blue dye is also removed by this procedure. The technique can be used to concentrate samples and fractionate a mixture. Purification

1155.53 1225.59 1146.60

1272.60

1249.55 1248.53 1271.61 1244.54 1392.45 1398.52 1441.44

1390.44

1370.46

1513.72

MS2

1368.45 + (M+Na)

1347.49 + 13 (M+H) , C

1346.48

m/z

b3 405.95

0 400

700 800 900 1000 1100 1200 1300

1056.35 928.30 956.33 .15 749.13 862.33 1026.20

1248.45

500

600

700

800 m/z

713.24 b6-H2O b7 684.11 799.28

900

1056.36

b9

b10-H2O 1109.38

b101127.3810

1000

1100

b9-NH3 1039.28

y9956.36

b8 928.23 b8-H2O 910.12

y8 843.23

y7 730.18

b6-NH3 616.20

y5 547.10

MS3

1200

1230.45(-H2O)

1249.72 MH+, -H, -H3PO4

Fig. 9.25 MS identification of phosphopeptides. Sequence is YEILNSPEKAC where SP is phosphoserine. The MS2 and MS3 spectra are shown. The first tandem MS experiment mainly results in loss of H3PO4, 98 Da. Particular problems may also be associated with electrospray mass spectrometry of phosphopeptides, where a high level of Naþ and Kþ adducts is regularly seen.

m/z

1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500

1067.62

+

(M+H)

392

Mass spectrometric techniques

O HN

NH

S

Biotin

O

X N H

X

X O

O

O X X

X X

O

X

Linker

N H

Y

Reactive group

Fig. 9.26 Structure of the ICAT reagent. The ICAT reagent is in two forms, heavy (eight deuterium atoms) and light (no deuterium). The reagent has three elements: an affinity tag (biotin), to isolate ICAT-labelled peptides; a linker in two forms that has stable isotopes incorporated; and a reactive group (Y) with specificity toward thiol groups or other functional groups in proteins (e.g. SH, NH2, COOH). The heavy reagent is D8-ICAT (where X is deuterium) and light reagent is D0-ICAT (where X is hydrogen). Two protein mixtures representing two different cell states are treated with the isotopically light and heavy ICAT reagents; an ICAT reagent is covalently attached to each cysteine residue in every protein. The protein mixtures are combined; proteolysed and ICAT-labelled peptides are isolated on an avidin column utilising the biotin tag. Peptides are separated by microbore HPLC. Since each pair of ICAT-labelled peptides is chemically identical they are easily visualised because they co-elute, with an 8 Da mass difference. The ratios of the original amounts of proteins from the two cell states are strictly maintained in the peptide fragments. The relative quantification is determined by the ratio of the peptide pairs. The protein is identified by database searching with the sequence information from tandem MS analysis by selecting peptides that show differential expression between samples.

can also be carried out to specifically bind one particular component in a mixture. Immobilised metal ion affinity columns are used to enrich phosphopeptides.

9.6.2 Quantitative analysis of complex protein mixtures by mass spectrometry Proteome analysis (described in Section 8.5) involves the following basic steps:

• • • • • • •

run a gel (one-dimensional (1D) or two dimensional (2D)), stain, scan to identify spots of interest, excise gel spots, extract and digest proteins, mass analyse the resulting peptides, search database. The initial separation of proteins currently relies on gel electrophoresis which has a number of limitations including the difficulty in analysing all the proteins expressed due to huge differences in expression levels. Although thousands of proteins can be reproducibly separated on one 2D gel from approximately 1 mg of tissue/biopsy or biological fluid, the dynamic range of protein expression can be as high as nine orders of magnitude. One development that has helped to overcome some of the problems is the isotope-coded affinity tag (ICAT) strategy for quantifying differential protein expression. The heavy and light forms of the sulphydryl (thiol-)-specific ICAT reagent (whose structure is illustrated in Fig. 9.26) are used to derivatise proteins in respective samples

393

9.6 Analysing protein complexes

isolated from cells or tissues in different states. The two samples are combined and proteolysed, normally with trypsin, for reasons explained above. The labelled peptides are purified by affinity chromatography utilising the biotin group on the ICAT reagent then analysed by MS on either LC–MS MS (including ion trap) or MALDI–TOF instruments. The relative intensities of the ions from the two isotopically tagged forms of each specific peptide indicate their relative abundance. These pairs of peptides are easily detected because they co-elute from reverse-phase microcapillary liquid chromatography (RP–mLC) and contain eight mass units of difference due to the two forms of the ICAT tag. An initial MS scan identifies the peptides from proteins that show differential expression by measuring relative signal intensities of each ICAT-labelled peptide pair. Peptides of interest are then selected for sequencing by tandem MS and the particular protein from which a peptide originated can be identified by database searching the tandem MS spectral data.

9.6.3 iTRAQ An alternative method for quantitative analysis of protein mixtures is to use the iTRAQ reagents from Applied Biosystems. These are a set of four isobaric (same mass) reagents which are amine-specific and yield-labelled peptides identical in mass and hence also identical in single MS mode, but which produce strong, diagnostic, lowmass tandem MS signature ions allowing quantitation of up to four different samples (or triplicate analyses plus control of the same sample) simultaneously (Fig. 9.27). Information on post-translational modifications is not lost and since all peptides are tagged, proteome coverage is expanded, and since multiple peptides per protein are analysed, this improves confidence and quantitation. As a consequence of mixing the multiple proteome samples together, the complexity of both MS and tandem MS data is not increased and since there is no signal splitting in either mode, low-level analysis is enhanced as a result of the signal amplification. The protocol involves reduction, alkylation and digestion with trypsin of the protein samples in parallel, in an amine-free buffer system (Fig. 9.28). The resulting peptides are labelled with the iTRAQ reagents. The samples are then combined and depending on sample complexity, they are may be directly analysed by LC–MS MS after one-step elution from a cation exchange column to remove reagent by-products. Alternatively, to reduce overall peptide complexity of the sample mixture, fractionation can be carried out on the cation-exchange column by stepwise elution of part of the complex mixture. Very recently the company has launched a kit with eight different isobaric tagging reagents (the principle is the same).

9.6.4 Stable isotope labelling with amino acids in cell culture Alternatives to the above include stable isotope labelling with amino acids in cell culture (SILAC) which is useful in investigation of signalling pathways and protein interactions. As the name implies, this technique involves metabolically labelling protein samples in cell cultures that are grown with different stable isotopically labelled amino acids such as 13C and/or 15N lysine and arginine. It is also possible to use 12C and 13C leucine.

394

Mass spectrometric techniques

Total mass of isobaric tag = 145.1 Da

O O N N

N O O

Amine Reporter group reactive group 13 ± 15 N with 1, 2, or 3 C (mass 114.1 to 117.1) Balance group with 12C,13C and/or 18O (mass 31 to 28)

Tandem MS fragmentation occurs here

Fig. 9.27 iTRAQ reagent structure. The iTRAQ reagents consist of a charged reporter group, a peptide reactive group (an NHS, n-hydroxysuccinimide) which reacts with amino groups and a neutral balance group. The last part maintains an overall mass of 145. The term ‘isobaric’ is defined as two or more species that have the same mass. The peptide reactive group covalently links an iTRAQ reagent isobaric tag with each lysine side chain and N-terminus of a polypeptide, labellin