Jack

Jack

Clinical Data/Platform Engineer in Brooklyn, NY, he/him

About

With over 15 years experience manipulating data in Python, and a collective decade working side by side with clinical psych researchers, developing collection, transformation, and storage solutions to facilitate the complete data life-cycle, I've worked my way through every stage, from collection to publication, and even a bit of grant writing. In my approach I emphasize portability, scalability, and fault tolerance by developing infrastructure as code, abstracting configuration, and leveraging templating, containerization, and automation (Terraform being a personal favorite) whenever possible.

I’ve simultaneously maintained a personal interest in financial market mechanics, and am passionate about open data initiatives and data driven reporting, which I express through my personal project, Spotlight, an aspiring bespoke data aggregation and tracking platform intended to facilitate journalists and simplify civic engagement.

Work Experience

2022 — Now
  • Oversee all data and related technology for 6 active clinical trials and and > 10 years prior, including compute infrastructure, database management, ETL/ELT pipelines, and web service development and hosting

  • Cloud transition: built hybrid multicloud lakehouse with Minio, Azure, and S3 to feed remote storage for ClickHouse, ETL staging, Git Annex, and Nextcloud user access layer to modernize data infrastructure while maintaining HIPAA compliance, high security standards, and scalability at minimal cost

  • Implement and manage DAG scheduler platform to orchestrate Python/Bash/SQL ETL/ELT pipelines configured with S3 event triggers and Slack notifications

  • Stood up self-managed Elasticsearch, Kibana, Filebeat, and Logstash to dynamically index system metrics and files, providing efficient data aggregation, querying, and KPI visualization, which lowered overhead on maintaining diverse data sources and formats and helped locate missing data

  • Automate fMRI processing and time-series alignment of longitudinal data

  • Model clinical assessment data for relational and graph dbs (ArangoDB)

  • Develop and host full stack JavaScript  web applications, enabling daily remote data collection from subject mobile devices, increasing data resolution leading to publication of novel findings in Nature Mental Health

  • Wrote cluster provisioning IaC using Terraform, Make, Bash, & systemd templates for RHEL CoreOS to produce faster/lighter K8 alternative with low attack surface

  • Develop and maintain multi-tenant virtual workspace solution offering scalable and portable computing environments with RBAC data access and pre-configured analysis software

  • Create visual dashboards of data and system metrics in Kibana and Apache Superset

2020 — 2022
New York, New York
  • Created and managed data processing/ETL pipelines for Docker Swarm neuro-analysis cluster using Python, JavaScript, Bash, and AWS Lambda

  • Automated retrieval and preprocessing of neuro-imaging datasets.

  • Made automated ML-driven fMRI QC pipeline to detect quality degradation in real-time with Slack notification.

  • Oversaw day-to-day technical operation of MRI research lab environment

2016 — 2020
New York, NY
  • Collected and managed high quality multimodal imaging data on human subjects for clinical research studies.

  • Debugged experimenter code and collaborated with GE engineers to solve MRI system failures; consulted researchers to optimize data collection and storage.

  • Assisted pre-processing and loading of data between collection site, databases, and analysis development cluster.

  • Devoted free time to studying javascript progressive web app development, cloud architecture, distributed computing, and dev ops techniques.

Projects

Ongoing

Data aggregation platform fueled by Next.js, GraphQL, ClickHouse, Superset, Kafka, NiFi, Airflow, deployable with Terraform to Nomad distributed environment

2021

Assisted in optimizing the protocol for for acquiring neuromelanin-differentiating MRI data on human subjects, and provided brain images for the article

2018

Developed imaging strategy for post-mortem substantia nigra sample which was used to correlate histological analysis of neuromelanin levels in brain tissue with imaging based biomarker

Side Projects

Ongoing

Translate emotional sentiment across mediums. Have a voice diary entry converted to a song, photograph, or semantic analysis and played back to you.

Ongoing

Tell vapetaper whenever you get a new vape, or replenish a consumable for any habit you want to track, and Vapetaper will interpolate your consumption patterns and generate visualizations and stats.

Education

2023 — 2023
ETL and Data Pipelines with Shell, Airflow and Kafka at IBM
Online
2015 — 2016
Auburn

Full participation audit -- completed in top 5% of the class. Taught by Dr. Steven Shapiro

Awards

2013

Granted travel expenses to attend and present at IEEE International Power Modulator and High Voltage Conference

Speaking

2011
Led monthly talks between reprentatives from various engineering firms and student body at IEEE Student Body President
Auburn, AL

Volunteering

2016 — 2018
Brooklyn, NY

Invented eco-friendly alternatives to popular photodeveloping formulas and supplied photo lab with house-made solutions, cutting operating costs while minimizing ecological impact

Contact

GitHub
LinkedIn
Website