User avatar
Full Time Remote
6 days ago
The Opportunity At insitro, we are shaping the future of drug discovery by combining human biology and machine learning to discover new therapeutics. At the heart of our strategy is the true partnership between various disciplines and expertises to produce large and high quality data sets that will drive machine learning and yield key biological insights. You will be joining insitro’s first satellite hub that is emerging in Poland. Though the role is based in Poland, this is a remote position and you may work from your home office. You’ll be working in a highly collaborative and dynamic research environment of a biotech startup, where we aim to advance the rate of scientific discovery using purposefully built solutions. You will work closely with a very talented team of distinguished scientists and engineers. We offer both long-term stability thanks to significant funding, but also a mentality and work style of an early stage startup. Therefore, you will directly contribute to shaping insitro’s culture, strategic direction, and outcomes, with many opportunities for significant and diverse impact across several functions and disciplines. This is an exciting time to join us in leading the way to better medicines, with the integration of computer science, machine learning and biology at scale. Come join us, and help make a difference to patients! The Role Data Engineering plays an essential role in insitro’s approach to rethinking drug development, shaping the foundations of a truly data-driven and integrated drug discovery approach. We are seeking talented and highly motivated software engineers to join our newly formed, remote engineering office in Poland. As a Software Engineer in the team, you will: - Work closely with a team of cross-functional scientists and software engineers to identify challenges and find solutions to improve our lab data applications and integrations. - Design and implement a rigorous and scalable landscape of applications and data processing pipelines that interact with high throughput biology automation platforms, with performance and scale in mind. - Design, implement and maintain scalable backends and intuitive frontends for capturing, extracting, integrating and analyzing large volumes of scientific and lab operational data such as high-content microscopy and sequencing data. - Build tools and pipelines to automate end-to-end lifecycle of Machine Learning applications. - Get exposed to fascinating science and contribute to building state-of-the-art machine learning infrastructure to advance our expertise in biology and disease modeling. - Evaluate new technologies, practices and vendors that could increase scientific capabilities and/or efficiencies. - Ensure that solutions fit appropriately into our information ecosystem and ensure the integrity of insitro’s data architecture. About You In building our team, we look for people who share the collaborative, rigorous and scientific spirit of our culture. The successful candidate will possess: - BS, MS, or Ph.D. in computer science, statistics, mathematics, physics, engineering, or equivalent practical experience - Passion for solving complex technical problems and building durable and efficient software architectures - Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions - Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs) in one of general-purpose programming languages (such as Python, Java, Scala, C/C++, or Go) - Curiosity to learn our current tech stack, which includes Python, Django, React, TypeScript, PostgreSQL, Docker, all that running within AWS infrastructure - Experience in all aspects of the software development process, including requirements gathering, architecture, design, implementation, release, and maintenance - Experience working with medium-sized datasets (100TB+), ETL pipelines, and data warehouses - Biology background is not required Nice to Have - Experience working with lab and scientist stakeholders in a health tech or genomics field - Experience building automation and tools for Machine Learning applications (MLOps) - Experience architecting or implementing a high-throughput Laboratory Information Management System (LIMS) - Demonstrated ability to develop novel data engineering methods that go beyond applying existing frameworks and patterns, and to apply problem-solving skills to address complex challenges - Experience with Agile Scrum software development methodologies - Familiarity with cloud computing services (AWS or GCP) - Familiarity with web services and application frameworks (Django, Flask)