Aricent
PySpark/Databricks Engineer - Big Data Technologies
Job Location
in, India
Job Description
Job : PySpark/Databricks Engineer Open for Multiple Locations with WFO and WFH Job Description : We are looking for a PySpark solutions developer and data engineer that is able to design and build solutions for one of our Fortune 500 Client programs, which aims to build a data standardized and curation-based Hadoop cluster This high visibility, fast-paced key initiative will integrate data across internal and external sources, provide analytical insights, and integrate with the customer s critical systems Key Responsibilities : - Ability to design, build and unit test applications on Spark framework on Python. - Build PySpark based applications for both batch and streaming requirements, which will require in-depth knowledge on majority of Hadoop and NoSQL databases as well. - Develop and execute data pipeline testing processes and validate business rules and policies. - Optimize performance of the built Spark applications in Hadoop using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDDs. - Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively. - Ability to design build real-time applications using Apache Kafka Spark Streaming - Build integrated solutions leveraging Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec. - Build data tokenization libraries and integrate with Hive Spark for column-level obfuscation - Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources. - Create and maintain integration and regression testing framework on Jenkins integrated with BitBucket and/or GIT repositories - Participate in the agile development process, and document and communicate issues and bugs relative to data standards in scrum meetings - Work collaboratively with onsite and offshore team. - Develop review technical documentation for artifacts delivered. - Ability to solve complex data-driven scenarios and triage towards defects and production issues - Ability to learn-unlearn-relearn concepts with an open and analytical mindset - Participate in code release and production deployment. - Challenge and inspire team members to achieve business results in a fast paced and quickly changing environment - BE/B.Tech/ B.Sc. in Computer Science/Statistics, Econometrics from an accredited college or university. - Minimum 3 years of extensive experience in design, build and deployment of PySpark-based applications. - Expertise in handling complex large-scale Big Data environments preferably (20Tb). - Minimum 3 years of experience in the following: HIVE, YARN, HDFS preferably on Hortonworks Data Platform. - Good implementation experience of OOPS concepts. - Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities. - Ability to build abstracted, modularized reusable code components. - Hands-on experience in generating/parsing XML, JSON documents, and REST API request/responses (ref:hirist.tech)
Location: in, IN
Posted Date: 10/9/2024
Location: in, IN
Posted Date: 10/9/2024
Contact Information
Contact | Human Resources Aricent |
---|