David Nicholson
David Nicholson
dnicholson329@gmail.com • 412-607-6313 • 11291 Chatterly Loop Apt 104, Manassas VA 20109
Summary
Data Scientist with 13+ years of programming experience and 5+ years of experience in data analytics and visualization. Most of my data analytics experience has utilized language models and document embeddings to gain further insight into biomedical research.
Skills & Proficiency
Github • Python • R • SQL • Data Visualization • Machine Learning • Deep Learning • Transformers • Auto Encoders • Clustering • Natural Language Processing • Topic Modeling • Bayesian Modeling • Data Structures • Data Engineering • GCP Data Engineering • Databases • Vector Databases • Embeddings • Algorithms • Parallel Processing • XML Parsing • Web Development
Education
Doctor of Philosophy (Ph.D.), Genomics and Computational Biology; University of Pennsylvania (Philadelphia, PA)
Postbaccalaureate Program (Penn Prep); University of Pennsylvania (Philadelphia, PA)
Bachelor of Science, Computer Science; University of Maryland Baltimore County (Baltimore, MD)
Publications
- Unmasking The Language Of Science Through Textual Analyses On Biomedical Preprints And Published PapersOn Biomedical Preprints And Published Papers
Nicholson, D. N. (2022) - Changing Word Meanings in Biomedical Literature Reveal Pandemics and New Technologies
Nicholson, D. N. Alquaddoomi, F., Rubinetti, V., Greene, C. S. (2023) - Characterization of the Genome and Silk-gland Transcriptomes of Darwin’s Bark Spider (Caerostris darwini)
Babb, P. L., Gregorič, M., Lahens, N. F.,Nicholson, D. N., Hayashi, C. Y., Higgins, L., Kuntner, M., Agnarsson, I., Voight, B. F. (2022) - Examining Linguistic Shifts between Preprints and Publications
Nicholson, D. N., Rubinetti, V., Hu, D., Thielk, M., Hunter, L. E., Greene, C. S. (2022) - Expanding a Database-derived Biomedical Knowledge Graph via Multi- Relation Extraction from Biomedical Abstracts
Nicholson, D. N., Himmelstein, D. S., Greene, C. S. (2022) - Constructing Knowledge Graphs and Their Biomedical Applications
Nicholson, D. N., Greene, C. S. (2020) - The Nephila Clavipes Genome Highlights the Diversity of Spider Silk Genes and their Complex Expression
Babb, P. L., Lahens, N. F., Correa-Garhwal, S. M., Nicholson, D. N., Kim, E.J., Hogenesch, J.B., Kuntner, M., Higgins, L., Hayashi, C. Y., Agnarsson, I., Voight, B.F. (2017)
Professional Experience
Data Scientist
Digital Science & Research Solutions, Ltd.
June 2022 - Present
- Constructed a textual processing pipeline to detect drug treatment sentences within biomedical text for pharmaceutical clients.
- Backtested vector databases to identify which database had the fastest access time in dealing with document embedding vectors.
- Utilized deep learning based language models and dimensionality reduction algorithms to uncover research topics and trends for various government funders and clients.
- Maintained dashboards that summarized research results and trends for government funders and clients
Graduate Researcher Scientist
University of Pennsylvania
August 2016 - June 2022
- Designed and implemented parallel processing pipelines that achieved a 3x speed-up when analyzing terabytes of biomedical text.
- Used weak supervision for a 1.5x speed-up when training deep learning models (recurrent neural networks and transformers) to extract biomedical relationships from biomedical text.
- Applied a k-nearest-neighbor model to provide scientists with a web service that identifies a listing of journals linguistically similar to a preprint of interest.
- Applied a time series analysis to discover over 20,000 different timepoints where words have changed their semantic meaning.
Postbaccalaureate Researcher
University of Pennsylvania
June 2015 - June 2016
- Used hypothesis testing (hyper geometric test) to discover over 1000 protein domains that are easily targetable by small molecules and drugs.
- Constructed a bioinformatics pipeline that efficiently discovers novel motifs in the Golden Orb-weaver spider genome.
Undergrad Researcher
University of Maryland Baltimore County (UMBC)
September 2013 - May 2015
- Characterized population-level transcriptional regulation by assisting with the creation of a bioinformatic pipeline that quantifies transcription factor enrichment in metagenomic data.
Summer Research Intern University of Pennsylvania (SUIP) June 2014
- Created a Perl pipeline that utilized Mendelian Randomization and Approximate Bayesian Computation to discover if having an elevated level of triglycerides causes heart disease.
Summer Research Intern
Bioinformatics and Integrative Genomics (BIG)
Harvard University and Massachusetts Institute of Technology
June 2013
- Explored a more efficient measure to track DNA samples. Researched the use of SNP (single nucleotide polymorphisms) information to act as a DNA barcode to keep track of patient samples that have undergone different gene sequencing workflows.
Summer Research Intern University of Pittsburgh and Carnegie Mellon University (TecBio) June 2012
- Conducted machine learning to assess algorithms to determine the best strategies to identify lung cancer in patients as early as possible.
Teaching Experience
Advisor for a rotation student for Research Lab
University of Pennsylvania (Genomics and Computational Biology Program)
Sept 2020 - Dec 2020
- Guided rotation student on conducting and executing a research project that analyzed biomedical abstracts to model disease-gene trajectories through time.
Student Advisor for 1st and 2nd Year Ph.D. Students University of Pennsylvania (Genomics and Computational Biology Program) August 2018 - August 2020
- Advised first and second year Ph.D. students on which classes to take for the fall and spring semester
- Advised first year Ph.D. students about the mechanics of lab rotations
Python BootCamp/Teaching Assistant
University of Pennsylvania (Genomics and Computational Biology Program)
August 2019
- Assisted in teaching Ph.D. students how to program in python.
Advanced Computational Biology/Tutor
University of Pennsylvania (Genomics and Computational Biology Program)
April 2019 - May 2019
- Assisted a Ph.D. student in learning advanced computational biology topics.
- Topics ranged from machine learning algorithms to various statistical algorithms
Python BootCamp/Teaching Assistant
University of Pennsylvania (Genomics and Computational Biology Program)
September 2017
- Assisted in teaching Ph.D. students how to program in python.
Data Structures/Tutor
University of Maryland Baltimore County (UMBC)
Sept 2013 - May 2014
- Tutored and assisted students in studying/programming various data structures such as binary search trees to hash tables.
Math Tutor University of Maryland Baltimore County (UMBC) Sept 2012 - May 2013
- Worked in walk-in tutoring sessions for university services.
- Tutored students in math classes from Algebra I to Calculus II
Calculus I and II/Learning Assistant University of Maryland Baltimore County (UMBC) Sept 2012 - May 2013
- Helped professor assist students in homework during office hours
Honors
Appointed trainee on T32 Computational Genetics, June 2019 - August 2021
- National Human Genome Research Institute (NHGRI)
Meyerhoff Scholar (M23), August 2011 - June 2015
- University of Maryland Baltimore County
Marc U*Star Scholar, Sept 2014 -June 2015
- University of Maryland Baltimore County
National Security Agency (NSA) Scholar, Sept 2012 - June 2014
- University of Maryland Baltimore County
Thomson Reuters Award HackMIT, October 2014
ABRCMS Poster Presentation Award, November 2013
Presentations
Changing Word Meanings in Biomedical Literature Reveal Pandemics and New Technologies (ISCB Rocky), December 2022
- 1 hour talk
Elsevier’s Labs Online Lecture services, October 2021
- 30 Minute Talk
Institute for Biomedical Informatics (IBI) Annual Retreat, December 2020
- Poster Presentation
Cold Spring Harbor Biological Data Science Symposium, November 2020
- Lightning Talk and Poster Presentation
National Human Genome Research Institute National Trainee Meeting, March 2020
- Poster Presentation
Seminar at Elsevier, December 2019
- Invited Speaker
International Society of Computational Biology (ISCB) Rocky, December 2019
- Poster Presentation
Institute for Biomedical Informatics (IBI) Annual Retreat, June 2019
- Poster Presentation
International Society of Computational Biology (ISCB) Rocky, December 2018
- Poster Presentation
Computational Systems for Integrative Genomics (CSIG), July 2017
- Lightning Talk
UMBC’s Undergraduate Research and Creative Achievement Day, Spring 2014
- Poster Presentation
Attended Harvard’s Biomedical Science Careers Student Conference, April 2014
Participated in MIT’s Quantitative Biology Workshop,, Jan 2014
Annual Biomedical Research Conference for Minority Students, November 2013