Open Postdoctoral position, faculty mentor Summer Han

Important Info

Faculty Sponsor First name:

Summer

Faculty Sponsor Last Name:

Han

Stanford Departments and Centers:

Neurosurgery

Epidemiology and Population Health

Biomedical Data Science

Postdoc Appointment Term:

2 years (can be exended)

Appointment Start Date:

December 1, 2025 (Flexible)

Group or Departmental Website:

http://med.stanford.edu/summerhanlab.html

https://med.stanford.edu/cancer/research/shared-resources/data-science.html

How to Submit Application Materials:

Please email the required application materials to:

Summer Han, Ph.D. (summer.han@stanford.edu)

Director of the Cancer Data Science Shared Resources Core for the Stanford Cancer Institute

Associate Professor of Medicine, Neurosurgery, and Epidemiology

Quantitative Sciences Unit

Stanford Center for Biomedical Informatics Research (BMIR)

Stanford University School of Medicine

Does this position pay above the required minimum?:

Yes. The expected base pay range for this position is listed in Pay Range field. The pay offered to the selected candidate will be determined based on factors including (but not limited to) the qualifications of the selected candidate, budget availability, and internal equity.

Pay Range:

$77,000 - $80,000

Postdoctoral Fellow in Large Language Models and Electronic Phenotyping in Cancer

We are seeking a highly motivated Postdoctoral Research Fellow with expertise in large language models (LLMs) and electronic phenotyping to join our dynamic team focused on advancing cancer research through innovative data-driven approaches in the Cancer Data Science Core at the Stanford Cancer Institute, directed by Dr. Summer Han, Associate Professor at Biomedical Informatics Research and co-directed by Dr. Allison Kurian, Professor in Oncology. The fellow will work on cutting-edge projects involving the application of state-of-the-art LLMs to unstructured EHR data for identifying cancer phenotypes, treatment patterns, and disease progression. This includes integrating LLMs with structured data sources to develop robust computational phenotyping algorithms and scalable models for real-world evidence generation. The role will involve both method development and applied research, with opportunities to publish in leading journals, present at top conferences, and contribute to open-source tools. Collaboration with clinicians, data scientists, and machine learning experts will be an essential and enriching component of the position.

Strong candidates will have a background in machine learning and natural language processing (NLP), with a demonstrated ability to work with large language models (LLMs) and unstructured clinical text from electronic health records (EHRs). Desired technical skills include prompt engineering, few-shot and zero-shot learning, parameter-efficient fine-tuning methods such as LoRA and adapter-based tuning, and retrieval-augmented generation (RAG) approaches. Familiarity with LLM architectures (e.g., GPT, BERT, T5), Transformer-based modeling, and clinical NLP is highly desirable. Experience with cloud platforms such as Google Cloud Platform (GCP) or Microsoft Azure, and tools like BigQuery, Vertex AI, or Azure ML Studio is a plus. Candidates should also demonstrate strong skills in Python (for ML/NLP tasks) and R (for statistical modeling or data analysis), as both will be actively used in the research workflow. Importantly, this position requires the ability to deeply engage with clinical free-text data—often complex, ambiguous, and domain-specific—to develop effective prompts and modeling strategies. The successful candidate should be comfortable working interactively with chart reviewers (e.g., medical students, residents, or fellows) who create ground-truth labels from manual EHR reviews and be invested in understanding the clinical context that underlies phenotype definitions and labeling decisions.

Required Qualifications:

A Ph.D. in biomedical informatics, data science, computer science, statistics, or a related field is required.
The candidate must have demonstrated proficiency in machine learning, natural language processing, and working with large-scale health datasets.
Strong programming skills in both Python and R are required.
A record of peer-reviewed publications, strong written and verbal communication skills, and the ability to work independently and collaboratively within multidisciplinary teams are essential.

Required Application Materials:

A cover letter, a short description of research interests
CV
Contact information of three referees