Click here for a printable version!


David Nicholson

dnicholson329@gmail.com • 412-607-6313 • 11291 Chatterly Loop Apt 104, Manassas VA 20109

Summary

Data Scientist with 13+ years of programming experience and 5+ years of experience in data analytics and visualization. Most of my data analytics experience has utilized language models and document embeddings to gain further insight into biomedical research.

Skills & Proficiency

Github • Python • R • SQL • Data Visualization • Machine Learning • Deep Learning • Transformers • Auto Encoders • Clustering • Natural Language Processing • Topic Modeling • Bayesian Modeling • Data Structures • Data Engineering • GCP Data Engineering • Databases • Vector Databases • Embeddings • Algorithms • Parallel Processing • XML Parsing • Web Development

Professional Experience

Data Scientist
Digital Science & Research Solutions, Ltd.
June 2022 - Present
40 Hours/Week
$134,000/Year

  • Constructed a textual processing pipeline to detect cancer drug treatments using a 64k biomedical document set for pharmaceutical clients.
  • Backtested vector databases to determine which database is most optimal in dealing with 10M+ document embedding vectors.
  • Utilized a combination of a deep-learning based language model and dimensionality reduction algorithm to uncover research topics and trends for various government funders and clients.
  • Maintained dashboards that summarized research results and trends for government funders and clients

Graduate Researcher Scientist
University of Pennsylvania
August 2016 - June 2022
60 Hours/Week
$34,000/Year

  • Designed and implemented parallel processing pipelines that achieved a 3x speed-up when analyzing terabytes of biomedical text.
  • Used weak supervision for a 1.5x speed-up when training deep learning models (recurrent neural networks and transformers) to extract biomedical relationships from biomedical text.
  • Applied a k-nearest-neighbor model to provide scientists with a web service that identifies a listing of journals linguistically similar to a preprint of interest.
  • Applied a time series analysis to discover over 20,000 different timepoints where words have changed their semantic meaning.

Publications

Education

Doctor of Philosophy (Ph.D.), Genomics and Computational Biology; University of Pennsylvania (Philadelphia, PA)

Postbaccalaureate Program (Penn Prep); University of Pennsylvania (Philadelphia, PA)

Bachelor of Science, Computer Science; University of Maryland Baltimore County (Baltimore, MD)