Career

Data Engineer

Information Technology

United States - NY, New York

Requisition ID

529510

Apply

Medidata: Power Smarter Treatments and Healthier People

Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes. More than one million registered users across 1,900+ customers and partners access the world's most trusted platform for clinical development, commercial, and real-world data. Medidata, a Dassault Systèmes company, is headquartered in New York City and has offices around the world to meet the needs of its customers. Discover more at www.medidata.com and follow us @medidata.

Our Team:

Our team is responsible for Medidata’s Unified Data Platform for Product Operations. Our internal stakeholders need to have visibility into how our products are performing, how our customers are using our products, how our people are executing and how our strategy of making our products most valuable for our customers is being executed.

We’re building our unified platform on top of a few core principles and technologies:

A single unified view of data. We bridge the gap between business and internal product teams by speaking a ubiquitous domain language. Exposing our domain model as API / self-service analytics,

Our teams can consistently access data across the platform, ignoring the underlying details of how this data is stored.    

This is an awesome opportunity to build an innovative platform for Product Metrics and Business KPIs. This is a strategic project that is a critical component of the company’s growth strategy. This platform will securely provide raw, aggregated and transformed data to measure business growth drivers and customer value chain both internally and externally, providing a unified and integrated portal. This platform will play a core role in the company’s short- and long-term strategy, enhance and leverage core products to expand into critical areas of DCTs.

 What we are looking for:

  • Develops and maintains scalable data pipelines and builds out new API integrations to support continuing increases in data volume and complexity.
  • Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organization.
  • Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
  • Writes unit/integration tests, contributes to engineering wiki, and documents work.
  • Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
  • Works closely with a team of frontend and backend engineers, product managers, and analysts.
  • Develops Python ETL processes and writing SQLs.
  • Defines company data assets (data models), spark, sparkSQL, and hiveSQL jobs to populate data models.
  • Designs data integrations and data quality framework.
  • Designs and evaluates open source and vendor tools for data lineage.
  • Works closely with all business units and engineering teams to develop strategy for long term data platform architecture.
  • Knowledge of best practices and IT operations in an always-up, always-available service
  • Experience with or knowledge of Agile Software Development methodologies
  • Excellent problem solving and troubleshooting skills
  • Process oriented with great documentation skills
  • Excellent oral and written communication skills with a keen sense of customer service

Education & Experience

  • BS or MS degree in Computer Science or a related technical field
  • 4+ years of Python or Java development experience
  • 4+ years of SQL experience (No-SQL experience is a plus)
  • 4+ years of experience with schema design and dimensional data modeling
  • Ability in managing and communicating data warehouse plans to internal clients
  • Experience designing, building, and maintaining data processing systems
  • Experience working with either a Map Reduce or an MPP system on any size/scale
  • 3 or more years of experience with Python, SQL, and data visualization/exploration tools
  • Familiarity with the AWS ecosystem specifically RedShift and RDS

#LI-MM1

Equal Employment Opportunity

In order to provide equal employment and advancement opportunities to all individuals, employment decisions at Medidata are based on merit, qualifications and abilities. Medidata is committed to a policy of non-discrimination and equal opportunity for all employees and qualified applicants without regard to race, color, religion, gender, sex (including pregnancy, childbirth or medical or common conditions related to pregnancy or childbirth), sexual orientation, gender identity, gender expression, marital status, familial status, national origin, ancestry, age, disability, veteran status, military service, application for military service, genetic information, receipt of free medical care, or any other characteristic protected under applicable law. Medidata will make reasonable accommodations for qualified individuals with known disabilities, in accordance with applicable law.

Covid Statement

Our Company requires all U.S. employees to be fully vaccinated against COVID-19 and to provide documentation of full vaccination, unless qualified for a medical, religious or state-required accommodation or otherwise exempt consistent with applicable law. Although accommodation requests will be considered (and granted where appropriate/possible), it may be determined that a candidate is unable to adequately perform the essential functions of the position without imposing an undue hardship due to customer requirements, staffing needs, or other business reasons. Definition of full-vaccination: Employees are considered to be fully vaccinated two weeks after their second dose in a 2-dose series or two weeks after a single-dose vaccine.

Apply