Data Engineer - Responsibilities: Create and maintain optimal data pipeline architecture; Develop time series data pipelines; Assemble large, complex data sets that meet functional / non-functional business requirements; Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery; Build the data pipelines required for optimal extraction, transformation, and loading of data from a wide variety of data sources using PySpark.; Build analytics tools that utilize the data pipeline to provide actionable insights (nice to have). ; Performance tune and optimize data pipeline on Spark/Palantir Foundry; Follow development and implement object maturity standards; Create and maintain documentation (i.e., Business Requirements, Design Documents) for handover to Operation Support.
Must Have: Python; Pandas; Pyspark; APIs; Knowledge of distributed computing to optimize Spark data pipeline.
Nice to have Skills : Typescript; Spark ML; Scikit Learn; Experience with Palantir Foundry platform; Front end tools: PowerBi, Tableau.