From xmlpipedb
Jump to: navigation, search



Why create a database?

DNA microarrays are used to measure changes in gene expression level in response to stimuli. The DNA is dyed and processed onto a chip. It is then scanned with a laser which measures the intensity of the dye and provides the user with a spreadsheet containing all gene labels and their measured expression values. Microarrays of genes from treated and untreated samples are compared and the difference in changes of expression levels are analyzed to reveal what effects the treatment had on the species genome.


In many experiments, the researchers already know which genes they want to focus on and pull out that specific data from a spreadsheet that can contain data on thousands of genes. To look at genome-wide effects, the researchers would have to identify, categorize, and place all those genes in pathways manually which would require lots of time and energy. Luckily, there are computer programs, such as GenMAPP and MAPPFinder, that take care of all that work, but they require the use of a GenMAPP specific species database. Currently, most GenMAPP gene databases are built from Ensembl which is limited to mostly animal species and is sensitive to changes in flat file formats. This poses a problem for species not included in Ensembl (i.e. bacterial species), but can be resolved with the creation of new databases using XMLPipeDB.

XMLPipeDB is an open source tool chain for building relational databases from XML sources. Using XML files from UniProt, one can easily create a species database for use in GenMAPP or MAPPFinder and use it to review and analyze microarray data.

P. aeruginosa PAO1

Pseudomonas aeruginosa PAO1 is an opportunistic human pathogen resistant to most antibiotics and disinfectants. It is specifically linked to lung infections in cystic fibrosis patients and research has found it to be hypermutable and incredibly versatile. P. aeruginosa, like all bacterial species, is susceptible to DNA and structural damage by reactive oxygen species (ROS), such as hydrogen peroxide. These oxygen radicals attack DNA at either the sugar or the base which can cause sugar fragmentation, strand breakage, and base loss. Host defense cells in the human body secrete these reactive oxygen species to attack P. aeruginosa and prevent further infection. In turn, P. aeruginosa has developed defense mechanisms that prevent and correct this DNA damage. (Imlay and Linn, 1988) P. aeruginosa responds to ROS damage with 2 main regulons: the SOS regulon, which is induced by DNA damage, and the oxidant response regulon which is directly induced by the presence of ROS. (Palma et al., 2004) The SOS regulon consists of the GO system, which identifies and fixes specific mutations, and the mismatch repair (MMR) system which fixes incorrectly paired bases. (Ciofu et al., 2005)

Chang et al. (2005) conducted a DNA microarray experiment on this pathogen where they exposed 5 treatment samples to 1 mM hydrogen peroxide for 20 minutes and compared the results to 4 control samples. This low concentration of H2O2 is ideal for inducing mode one killing of cells, which causes cell death via DNA damage. With mode one killing, the SOS regulon is often rendered impaired making DNA mutations caused by H2O2 more damaging as they cannot be repaired. (Imlay and Linn, 1988)

Analysis of this data using GenMAPP and MAPPFinder would reveal which pathways were altered as a result of this experiment. However, such an analysis was impossible as there did not exist a Gene Database for the species P. aeruginosa strain PAO1.

Solution: Create a GenMAPP compatible gene database for P. aeruginosa using XMLPipeDB's GenMAPP builder.

Database Creation

Before a database could be created for P. aeruginosa with GenMAPP Builder, the program’s code had to be modified to include a custom species profile and additional information about the species. Files necessary for database creation were gathered including:

  • The UniProt proteome set in XML format, Release 105, Jan 19, 2010
  • The Gene Ontology Associations (GOA) file, Proteome Set 79.0 released Jan 21, 2010
  • The current Gene Ontology annotations (in obo-xml format), released Jan 28, 2010

These files were loaded into the PostgreSQL database using the modified GenMAPP builder and used to export the P. aeruginosa file into a Gene Database file (.gdb), the file format read by GenMAPP and MAPPfinder.


To validate successful creation of this database, we used Tally Engine, XMLPipeDB match, and Microsoft Access. We found that there were some gene IDs included in the Gene Database that were not from the PAO1 strain or were not in the correct format and modified the species profile to leave those IDs out of the database. Once these genes were excluded, all table ID counts matched.


With the database completed, we were successfully able to run the original dataset published by Chang et al. (2005) and analyze the data using GenMAPP and MAPPfinder.

Microarray Data Analysis

We downloaded the raw microarray data file publihed by Chang et al. (2005) and took the log2 of the calculated fold changes for each gene. We then calculated the t statistic for each gene to determine whether the gene expression changes were significant.

These result were consistent with the results published in the Chang et al. (2005) article, in fact 26 out of 30 of our most significantly changed genes matched the ones published. However, in comparing our most downregulated genes, only 1 out of 30 matched the results published by Change et al. (2005). This may be an area to look into further.

The data was filtered for genes that showed a fold change either greater than 0.25 or less than -0.25 and compared the number of genes shown to be significantly changed at different p-values to determine what criterion for change we should use for our GenMAPP color set.

  • Significant at a p-value of 0.05: 1638
  • Significant at a p-value of 0.01: 802
  • Significant at a p-value of 0.001: 250
  • Significant at a p-value of 0.0001: 56

We decided to use 0.01 as our p-value cutoff, giving us 802 significantly changed genes.

We then created a GenMAPP color set using this criterion and ran our dataset through MAPPFinder. We further filtered our results files to show GO terms with a Z-score greater than 2 and a p-value less than 0.05 and then set more specific parameters for our sets of significantly increased and decreased gene groups to get the top 11 most changed gene groups for each.

Significantly Increased

Further filtered so the percent of genes in each group that are sigificantly changed is greater than 17

GOID GO Name # Changed  % Changed Z Score P value
33554 cellular response to stress 11 17.7 3.37 0
4518 nuclease activity 10 20.4 3.729 0
9081 branched chain family amino acid metabolic process 5 26.3 3.328 0
46677 response to antibiotic 6 23.1 3.245 0
16616 oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor 9 18.75 3.236 0
6310 DNA recombination 6 21.4 3.025 0
3995 acyl-CoA dehydrogenase activity 7 18.9 2.878 0
4527 exonuclease activity 5 21.7 2.798 0
9072 aromatic amino acid family metabolic process 6 19.4 2.728 0
9084 glutamine family amino acid biosynthetic process 5 20.8 2.684 0
16903 oxidoreductase activity, acting on the aldehyde or oxo group of donors 5 17.2 2.188 0

We found that the most significantly changed GO terms often were closely related so we grouped them as such. Many of the genes shown to be most increased seemed to be a part of the P. aeruginosa SOS regulon, a group of biological mechanisms that respond to DNA damage caused by oxidative stress.

Reactive oxygen species are effective defenses against pathogens because they target and ruin the pathogen's DNA, thus inhibiting the pathogen from successfully functioning or reproducing. The nuclease, DNA polymerase, and amino acid biosynthesis activities are the pathogen's attempt to repair current DNA damage and prevent any further damage.

Significantly Decreased

Further filtered so the percent of genes in each group that are sigificantly changed is greater than 40

GOID GO Name # Changed  % Changed Z Score P value
9076 histidine family amino acid biosynthetic process 10 66.7 8.626 0
105 histidine biosynthetic process 10 66.7 8.626 0
6547 histidine metabolic process 10 47.6 6.918 0
9075 histidine family amino acid metabolic process 10 47.6 6.918 0
46040 IMP metabolic process 5 55.6 5.42 0
6188 IMP biosynthetic process 5 55.6 5.42 0
4312 fatty-acid synthase activity 5 50 5.051 0
9127 purine nucleoside monophosphate biosynthetic process 5 41.7 4.446 0
9168 purine ribonucleoside monophosphate biosynthetic process 5 41.7 4.446 0
9126 purine nucleoside monophosphate metabolic process 5 41.7 4.446 0
9167 purine ribonucleoside monophosphate metabolic process 5 41.7 4.446 0

Many of the most downregulated genes were related to metabolic processes. Out of our top 11 most significantly downregulated genes (obtained by filtering results file so percent change is greater than 40 and number changed is between 4 and 100), 5 had to do with nucleoside biosynthetic process and 3 had to do with IMP (inosine monophosphate) biosynthetic process, which is a necessary precursor to nucleoside biosynthetic process. These results are corroborated by the results published in Ciofu et al. (2004). They also found nucleotide, fatty acid, and polyamine synthesis to be downregulated and suggest that this reflect changes in the cell's physiology as a result of DNA damage rather than being a part of the cellular response to H2O2.

These decreased GO terms are dfinitely an area for further research. Even though amino acid and protein biosynthesis activities were shown to be increased, production of specific amino acids such as histidine, IMP, and purines were shown to be decreased.


With a GenMAPP/MAPPFinder compatible Gene Database for P. aeruginosa it is possible to create visual representation of biological pathways to show how individual genes in the pathways are affected. In this pathwat showing histidine biosynthesis, most genes involved were downregulated (shown in green) while very few were unaffected (shown in grey). Histidine.PNG


  • A species profile for Pseudomonas aeruginosa was added to GenMAPP builder in order to produce a working gene database for the species.
  • Our results were compared to Chang et al.'s and the same general trends were noticed among increased genes, but not among decreased genes.
  • Future work: Look into why production of specific amino acids and proteins is decreased during oxidative tress when overall amino acid production is increased.

Annotated Bibliography

Chang, W., D. A. Small, F. Toghrol, and W. E. Bentley. 2005. "Microarray analysis of Pseudomonas aeruginosa reveals induction of pyocin genes in response to hydrogen peroxide." BMC Genomics. 6.115 (2005). Web. Link

This was the original article selected for its microarray data. It was the primary source of comparison for the results obtained from running the data through our newly created database and provided us with a starting point of areas of our results to further look into.

Palma, M., D. DeLuca, S. Worgall, and L. E. N. Quadri. "Transcriptome Analysis of the Response of Pseudomonas aeruginosa to Hydrogen Peroxide." Journal of Bacteriology. 186.1 (2004): 248-252. Web. Link

This article also focused on whole-genome analysis of P. aeruginosa's response to oxidative stress caused by H2O2. It provided a source to which we were able to compare our analysis and results which was especially helpful in our analysis of our downregulated genes.

Ciofu, O., B. Riis, T. Pressler, H. E. Poulsen, and N. Hoiby. "Occurrence of Hypermutable Pseudomonas aeruginosa in Cystic Fibrosis Patients Is Associated with the Oxidative Stress Caused by Chronic Lung Inflammation." Antimicrobial Agents and Chemotherapy. 49.6 (2005): 2276-2282. Web. Link

This article provided us specific genes and gene groups involved in DNA and oxidative damage repair as well as a more biological explanation of how reactive oxygen species attack bacterial DNA.

Imlay, J.A. and S. Linn. "DNA Damage and Oxygen Radical Toxicity". Science. 240.4857 (1988): 1302-1309. Web. Link

This article was really helpful in understanding the biochemistry side of P. aeruginosa's response to oxidative stress. It gave specifics on the different ways in which reactive oxygen species attack DNA and provided explanation for why iron regulators were affected in this experiment.

Data Sources

Personal tools