
At PNNL, our core capabilities are divided among major departments that we refer to as Directorates within the Lab, focused on a specific area of scientific research or other function, with its own leadership team and dedicated budget.
Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus.
The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world.
The AI and Data Analytics Division, part of NSD, combines profound domain expertise and creative integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support.
We are seeking an exceptional Lead Software Engineer to architect and build next-generation AI systems at PNNL spanning agentic AI platforms, petabyte-scale data orchestration, and real-time intelligence processing that define the future of national security technology. This role uniquely combines deep technical leadership in scalable system design with hands-on expertise in modern AI/ML engineering, requiring someone who can operate with both strategic vision and tactical excellence.
Who You Are
You're an experienced engineer who seamlessly bridges infrastructure, AI/ML systems, and production-grade software development. You've built highly scalable systems from scratch, led technical initiatives that matter, and have a track record of transforming complex problems into tractable solutions. You're equally comfortable architecting distributed systems processing terabytes per hour as you are fine-tuning LLMs or building developer tooling. You bring startup agility to mission-critical work where failure isn't an option.
What You'll Build
AI-Native Systems & Platforms
Scalable Infrastructure & Data Systems
Mission-Critical Production Systems
Technical Leadership
Technical Knowledge, Skills, and Abilities
Technical Leadership & Engineering Excellence
Demonstrated fluency in Python and proficiency in at least one additional language (C /.NET, Go, C++) with ability to architect solutions and guide language selection decisions across complex, multi-language codebases
Proven track record of establishing and championing software engineering best practices including version control strategies, comprehensive automated testing frameworks, code quality standards, and technical documentation across engineering teams
Expert-level proficiency in designing and implementing sophisticated CI/CD pipelines with ability to define DevOps strategies, build/release processes, and deployment architectures that ensure reliable, secure, and efficient software delivery at scale
Seasoned practitioner with ability to lead engineering teams in defining technical specifications, architectural patterns, and system designs for microservices, distributed systems, and large-scale applications while strategically leveraging AI assist tools to accelerate team productivity and drive innovation
AI/ML Systems Architecture & Implementation
Proven experience architecting, implementing, and deploying production-grade agentic AI systems with multi-step reasoning, autonomous workflows, and decision-making capabilities into operational environments at scale
Deep practical expertise with deep learning frameworks (PyTorch, TensorFlow, JAX) and LLM orchestration platforms (LangChain, LlamaIndex, LangGraph) with ability to design complex AI applications, custom chains, retrieval systems, and agent-based architectures
Advanced expertise in LLM optimization techniques including fine-tuning methodologies (LoRA/PEFT, QLoRA), retrieval-augmented generation (RAG) system design, prompt engineering strategies, and comprehensive evaluation frameworks
Comprehensive understanding of the end-to-end machine learning lifecycle with proven ability to architect and build production ML platforms including feature engineering pipelines, model serving infrastructure, monitoring, and automated retraining systems
Cloud Architecture & Distributed Systems
Demonstrated expertise architecting and deploying enterprise-scale applications across cloud platforms (AWS, Azure, GCP) with ability to design multi-cloud strategies and advanced proficiency in containerization (Docker) and orchestration technologies (Kubernetes) including Infrastructure as Code practices
Expert ability to architect and implement sophisticated event-driven systems using message brokers (Kafka, RabbitMQ), pub/sub patterns, and serverless functions with consideration for exactly-once semantics, ordering guarantees, and failure handling
Mastery of cloud native API design patterns including RESTful principles, GraphQL schemas, and gRPC services with proven experience establishing API standards, versioning strategies, and microservice communication patterns for large-scale distributed systems
Deep understanding of data storage architecture including relational databases (PostgreSQL, MySQL), NoSQL systems (MongoDB, DynamoDB, Cassandra), and data warehouses (Redshift, Snowflake, BigQuery) with ability to design polyglot persistence strategies optimized for specific workload characteristics
Data Platform Engineering & Distributed Processing
Mastery of cloud-native data pipeline architectures including ETL/ELT design patterns, orchestration frameworks (Airflow, Prefect, Step Functions), and cloud services (AWS Glue, Lambda, Azure Data Factory) with ability to architect enterprise-scale data platforms
Expert knowledge of distributed data storage systems (S3, Redshift, Delta Lake, PostgreSQL, MongoDB, OpenSearch, Databricks) with proven ability to design data lakehouse architectures and advanced proficiency with distributed computing frameworks (Spark/Databricks, Kafka, Flink, Ray)
Demonstrated expertise deploying and optimizing scalable ML workloads on distributed platforms using Kubernetes, Ray clusters, or Spark with deep understanding of data modeling principles including schema design, normalization/denormalization strategies, and data quality frameworks
Proven ability to architect petabyte-scale data systems with appropriate partitioning strategies, indexing approaches, and query optimization patterns while mastering data format selection (Parquet, Avro, ORC, Delta, Iceberg) for optimal compression, performance, and schema evolution
Data Platform Engineering & Distributed Processing
What Makes This Role Unique
You'll operate at the intersection of cutting-edge AI research and production systems engineering, building platforms that process the nation's most sensitive intelligence data while mentoring teams and shaping technical strategy. You'll work with startup agility on problems of national importance, rapidly iterating solutions that push the boundaries of what's possible in AI and distributed systems.
National Interest Project Examples