Senior Data Engineer

Software Development

United States - NY, New York

Requisition ID


Medidata: Powering Smarter Treatments and Healthier People

Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes. More than one million registered users across 1,900+ customers and partners access the world’s most trusted platform for clinical development, commercial, and real-world data. Medidata, a Dassault Systèmes company, is headquartered in New York City and has offices around the world to meet the needs of its customers. Discover more at and follow us @medidata.






Your Mission:

  • Member of the data engineering team responsible for data aggregation, transformation, modeling and delivery for both client usage and internal data science teams

  • Full-stack design, development, and operation of core data capabilities like data lake, data warehouse, data marts and data pipelines

  • Contribute to the team's roadmap and project planning process, partnering with stakeholders to develop business objectives and translate those into action

  • Work with data architects to develop data flows and align to platform integration standards

  • Build data flows for data acquisition, aggregation, and modeling, using both batch and streaming paradigms

  • Consolidate/join datasets to create easily consumable, consistent, holistic information

  • Empower other data teams, data scientists and data analysts to be as self-sufficient as possible by building core capabilities as services and developing reusable library code

  • Ensure efficiency, quality, resiliency of the core data platform  


Your Competencies:

  • Analytically minded and detail-oriented: you actually like working with data, looking for patterns and outliers, establishing data models, and finding the best answers to business & technology problems

  • Expertise in data engineering languages such as Java, Scala, Python, SQL

  • Data modeling experience; you've designed and implemented data marts, data warehouses or other large-scale data management systems; you have experience with Dimensional and Data Vault data modeling

  • Experience working with cloud data warehouses such as Snowflake Computing, AWS Redshift or Azure SQL Data Warehouse

  • Experience building ETL and data pipelines, both with traditional ETL solutions like Pentaho, Informatica, SSIS, Talend but also via code-oriented systems like Spark, Airflow or similar

  • Cloud-oriented with strong understanding of SaaS models

  • Experience operating in a secure networking environment, leveraging separate production support and SRE teams is a plus

  • Excellent technical documentation and writing skills

  • You have a bias towards automation, an Agile/Lean mindset and embrace the Devops culture

  • Familiarity with streaming/messaging technologies like Kafka, Kinesis, Spark Streaming

  • Familiarity with visualizing data with Tableau, Business Objects, Quicksight, PowerBI, Spotfire and similar tools

  • Great customer focus and strong technical troubleshooting skills

  • Proficiency in statistics and data science is a nice-to-have, and interest in learning these is even better

  • Experience with clinical trial data is not required, but interest to learn and understand it is a must

  • Hadoop/Spark and Graph/RDF/Ontologies experience a plus


Your Education & Experience:

  • Undergraduate or graduate degree in a technical or scientific field, such as Computer Science, Engineering, Mathematics, or similar

  • 5+ years professional experience as a data engineer, software engineer, data analyst, data scientist, or related role



Medidata is making a real difference in the lives of patients everywhere by accelerating critical drug and medical device development, enabling life-saving drugs and medical devices to get to market faster. Our products sit at the convergence of the Technology and Life Sciences industries, one of most exciting areas for global innovation. Nine of the top 10 best-selling drugs in 2017 were developed on the Medidata platform. 


Medidata Solutions have powered over 17,000+ clinical trials giving us the largest collection of clinical trial data in the world. With this asset, we pioneer innovative, advanced applications and intelligent data analytics, bringing an unmatched level of quality and efficiency to clinical trials enabling treatments to reach waiting patients sooner.

COVID Statement

Medidata requires all U.S. employees to be fully vaccinated against COVID-19 and to provide documentation of full vaccination, unless qualified for an accommodation as determined by Medidata, consistent with applicable law. Although accommodation requests will be considered (and granted where appropriate/possible), it may be determined that a candidate is unable to adequately perform the essential functions of the position without imposing an undue hardship on Medidata due to customer requirements, staffing needs, or other business reasons.




Medidata Solutions, Inc. is an Equal Opportunity Employer. Medidata Solutions provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability status, protected veteran status, or any other characteristic protected by the law. Medidata Solutions complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities. 

Not available