Developing Bioinformatics Computer Skills will help biologists, researchers, and students develop a structured approach to biological data and the computer tools they'll need to analyze it.
Part I: Introduction
Chapter 1. Biology in the Computer Age
Section 1.1. How Is Computing Changing Biology?
Section 1.2. Isn't Bioinformatics Just About Building Databases?
Section 1.3. What Does Informatics Mean to Biologists?
Section 1.4. What Challenges Does Biology Offer Computer Scientists?
Section 1.5. What Skills Should a Bioinformatician Have?
Section 1.6. Why Should Biologists Use Computers?
Section 1.7. How Can I Configure a PC to Do Bioinformatics Research?
Section 1.8. What Information and Software Are Available?
Section 1.9. Can I Learn a Programming Language Without Classes?
Section 1.10. How Can I Use Web Information?
Section 1.11. How Do I Understand Sequence Alignment Data?
Section 1.12. How Do I Write a Program to Align Two Biological Sequences?
Section 1.13. How Do I Predict Protein Structure from Sequence?
Section 1.14. What Questions Can Bioinformatics Answer?
Chapter 2. Computational Approaches to Biological Questions
Section 2.1. Molecular Biology's Central Dogma
Section 2.2. What Biologists Model
Section 2.3. Why Biologists Model
Section 2.4. Computational Methods Covered in This Book
Section 2.5. A Computational Biology Experiment
Part II: The Bioinformatics Workstation
Chapter 3. Setting Up Your Workstation
Section 3.1. Working on a Unix System
Section 3.2. Setting Up a Linux Workstation
Section 3.3. How to Get Software Working
Section 3.4. What Software Is Needed?
Chapter 4. Files and Directories in Unix
Section 4.1. Filesystem Basics
Section 4.2. Commands for Working with Directories and Files
Section 4.3. Working in a Multiuser Environment
Chapter 5. Working on a Unix System
Section 5.1. The Unix Shell
Section 5.2. Issuing Commands on a Unix System
Section 5.3. Viewing and Editing Files
Section 5.4. Transformations and Filters
Section 5.5. File Statistics and Comparisons
Section 5.6. The Language of Regular Expressions
Section 5.7. Unix Shell Scripts
Section 5.8. Communicating with Other Computers
Section 5.9. Playing Nicely with Others in a Shared Environment
Part III: Tools for Bioinformatics
Chapter 6. Biological Research on the Web
Section 6.1. Using Search Engines
Section 6.2. Finding Scientific Articles
Section 6.3. The Public Biological Databases
Section 6.4. Searching Biological Databases
Section 6.5. Depositing Data into the Public Databases
Section 6.6. Finding Software
Section 6.7. Judging the Quality of Information
Chapter 7. Sequence Analysis, Pairwise Alignment, and Database Searching
Section 7.1. Chemical Composition of Biomolecules
Section 7.2. Composition of DNA and RNA
Section 7.3. Watson and Crick Solve the Structure of DNA
Section 7.4. Development of DNA Sequencing Methods
Section 7.5. Genefinders and Feature Detection in DNA
Section 7.6. DNA Translation
Section 7.7. Pairwise Sequence Comparison
Section 7.8. Sequence Queries Against Biological Databases
Section 7.9. Multifunctional Tools for Sequence Analysis
Chapter 8. Multiple Sequence Alignments, Trees, and Profiles
Section 8.1. The Morphological to the Molecular
Section 8.2. Multiple Sequence Alignment
Section 8.3. Phylogenetic Analysis
Section 8.4. Profiles and Motifs
Chapter 9. Visualizing Protein Structures and Computing Structural Properties
Section 9.1. A Word About Protein Structure Data
Section 9.2. The Chemistry of Proteins
Section 9.3. Web-Based Protein Structure Tools
Section 9.4. Structure Visualization
Section 9.5. Structure Classification
Section 9.6. Structural Alignment
Section 9.7. Structure Analysis
Section 9.8. Solvent Accessibility and Interactions
Section 9.9. Computing Physicochemical Properties
Section 9.10. Structure Optimization
Section 9.11. Protein Resource Databases
Section 9.12. Putting It All Together
Chapter 10. Predicting Protein Structure and Function from Sequence
Section 10.1. Determining the Structures of Proteins
Section 10.2. Predicting the Structures of Proteins
Section 10.3. From 3D to 1D
Section 10.4. Feature Detection in Protein Sequences
Section 10.5. Secondary Structure Prediction
Section 10.6. Predicting 3D Structure
Section 10.7. Putting It All Together: A Protein Modeling Project
Section 10.8. Summary
Chapter 11. Tools for Genomics and Proteomics
Section 11.1. From Sequencing Genes to Sequencing Genomes
Section 11.2. Sequence Assembly
Section 11.3. Accessing Genome Informationon the Web
Section 11.4. Annotating and Analyzing Whole Genome Sequences
Section 11.5. Functional Genomics: New Data Analysis Challenges
Section 11.6. Proteomics
Section 11.7. Biochemical Pathway Databases
Section 11.8. Modeling Kinetics and Physiology
Section 11.9. Summary
Part IV: Databases and Visualization
Chapter 12. Automating Data Analysis with Perl
Section 12.1. Why Perl?
Section 12.2. Perl Basics
Section 12.3. Pattern Matching and Regular Expressions
Section 12.4. Parsing BLAST Output Using Perl
Section 12.5. Applying Perl to Bioinformatics
Chapter 13. Building Biological Databases
Section 13.1. Types of Databases
Section 13.2. Database Software
Section 13.3. Introduction to SQL
Section 13.4. Installing the MySQL DBMS
Section 13.5. Database Design
Section 13.6. Developing Web-Based Software That Interacts with Databases
Chapter 14. Visualization and Data Mining
Section 14.1. Preparing Your Data
Section 14.2. Viewing Graphics
Section 14.3. Sequence Data Visualization
Section 14.4. Networks and Pathway Visualization
Section 14.5. Working with Numerical Data
Section 14.6. Visualization: Summary
Section 14.7. Data Mining and Biological Information
Bibliography
Section Biblio.1. Unix
Section Biblio.2. SysAdmin
Section Biblio.3. Perl
Section Biblio.4. General Reference
Section Biblio.5. Bioinformatics Reference
Section Biblio.6. Molecular Biology/Biology Reference
Section Biblio.7. Protein Structure and Biophysics
Section Biblio.8. Genomics
Section Biblio.9. Biotechnology
Section Biblio.10. Databases
Section Biblio.11. Visualization
Section Biblio.12. Data Mining
Structure of This Book
We've arranged the material in this book to allow you to read it from start to finish or to skip around, digesting later sections before previous ones. It's divided into four parts:
Part I
Chapter 1 defines bioinformatics as a discipline, delves into a bit of history, and provides a brief tour of what the book covers and why.
Chapter 2 introduces the core concepts of bioinformatics and molecular biology and the technologies and research initiatives that have made increasing amounts of biological data available. It also covers the ever-growing list of basic computer procedures every biologist should know.
Part II
Chapter 3 introduces Unix, then moves on to the basics of installing Linux on a PC and getting software up and running.
Chapter 4 covers the ins and outs of moving around a Unix filesystem, including file hierarchies, naming schemes, commonly used directory commands, and working in a multiuser environment.
Chapter 5 explains many Unix commands users will encounter on a daily basis, including commands for viewing, editing, and extracting information from files; regular expressions; shell scripts; and communicating with other computers.
Part III
Chapter 6 is about the art of finding biological information on the Web. The chapter covers search engines and searching, where to find scientific articles and software, how to use the online information sources, and the public biological databases.
Chapter 7 begins with a review of molecular evolution and then moves on to cover the basics of pairwise sequence-analysis techniques such as predicting gene location, global and local alignment, and local alignment-based searching against databases using BLAST and FASTA. The chapter concludes with coverage of multifunctional tools for sequence analysis.
Chapter 8 moves on to study groups of related genes or proteins. It covers strategies for multiple sequence alignment with tools such as ClustalW and Jalview, then discusses tools for phylogenetic analysis, and constructing profiles and motifs.
Chapter 9 covers 3D analysis of proteins and the tools used to compute their structural properties. The chapter begins with a review of protein chemistry and quickly moves to a discussion of web-based protein structure tools; structure classification, alignment, and analysis; solvent accessibility and solvent interactions; and computing physicochemical properties of proteins. The chapter concludes with structure optimization and a tour through protein resource databases.
Chapter 10 covers the tools that determine the structures of proteins from their sequences. The chapter discusses feature detection in protein sequences, secondary structure prediction, predicting 3D structure. It concludes with an example project in protein modeling.
Chapter 11 puts it all together. Up to now we've covered tools and techniques for analyzing single sequences or structures, and for comparing multiple sequences of single-gene length. This chapter discusses some of the datatypes and tools that are becoming available for studying the integrated function of all the genes in a genome, including sequencing an entire genome, accessing genome information on the Web, annotating and analyzing whole genome sequences, and emerging technologies and proteomics.
Part IV
Chapter 12 shows you how a programming language such as Perl can help you sift through mountains of data to extract just the information you require. It won't teach you to program in Perl, but the chapter gives you a brief introduction to the language and includes examples to start you on your way toward learning to program.
Chapter 13 is an introduction to database concepts. It covers the types of databases used in biological research, the database software that builds them, database languages (in particular, the SQL language), and developing web-based software that interacts with databases.
Chapter 14 covers the computational tools and techniques that allow you to make sense of your results. The first part of the chapter introduces programs that are used to visualize data arising from bioinformatics research. They range from general-purpose plotting and statistical packages for numerical data, such as Grace and gnuplot, to programs such as TEXshade that are dedicated to presenting sequence and structural information in an interpretable form. The second part of the chapter presents tools for data mining—the process of finding, interpreting, and evaluating patterns in large sets of data—in the context of applications in bioinformatics.