Ara Resources Pvt Ltd
Big Data Engineer - Spark/Hadoop
Job Location
mumbai, India
Job Description
About The Company : Ara's Client is a premier IT &ITeS Organization for the last 50 years with a pan India footprint. They are a pioneer and leader in the Indian ICT industry. With more than 4000 employees spread across 100 locations in India, They are adding value to every industry through its products, services and offerings. The Role : The Data Engineer will be responsible for designing, building, and optimizing large-scale data pipelines using big data technologies and cloud platforms. The role involves working with structured and unstructured datasets, integrating data from various sources, and implementing batch and real-time data processing solutions. Key Responsibilities : - Working with Big data technologies like Hadoop, Hive, Presto, Spark, HQL, Elasticsearch, yarn. - Working with any of the cloud environments (GCP/AWS/Azure). - Cloud Certification (GCP Data Engineer or AWS SA or Azure Data Engineer) shall be a plus - Working with Python, Spark (with scala) or Pyspark, SQL stored procedures - Building Batch and Streaming data pipelines from various data sources like RDBMS, NoSQL, Iot/telemetry data, APIs to various data lake or data warehouses. - Building Streaming data ingestion pipelines using Apache Kafka and cloud-based services (like AWS IOT core/Kinesis/MSK, GCP Pub/sub, etc.) - Using various ETL tools like Apache Spark, and native cloud-based services like GCP Dataflow/Dataproc/Data Fusion, AWS Glue, Azure Data Factory - API data Integration using different methods like POSTMAN, CURL, Python libraries - Worked with various Data Lake and Data Warehouses (cloud based (like Big and on-prem open source) - Developing an incremental data pipeline NoSQL Database (either MongoDB/AWS Dynamo DB/Azure Cosmos DB/GCP Big Table/ GCP firestore) - Working with structured/unstructured dataset, different file formats like Avro/Parquet/Json/CSV/XML/text file - Worked on various file formats like csv, json, parquet, avro, text files etc. - Job Scheduling using orchestrators preferably Airflow - Setting up IAM, Data Catalog, Logging and monitoring using any of the cloud or open source-based services - Developing dashboards using any BI tool (Power BI/Tableau/Qlik) shall be a plus - Developing web crawlers like social media crawlers shall be a plus Skills Required : - Big Data Technologies : Hadoop, Hive, Presto, Spark, HQL Elasticsearch YARN - Cloud Computing : Experience with GCP, AWS, or Azure, Cloud certification (GCP Data Engineer, AWS SA, or Azure Data Engineer) is a plus - Programming & Scripting : Python, Scala, PySpark, SQL stored procedures - Data Pipeline Development : Batch and Streaming data pipelines, Integration from RDBMS, NoSQL, IoT/telemetry data, APIs to data lakes/warehouses, Apache Kafka and cloud-based services (AWS IoT Core/Kinesis/MSK, GCP Pub/Sub, etc.) - ETL & Data Processing : Apache Spark, Cloud-native services (GCP Dataflow/Dataproc/Data Fusion, AWS Glue, Azure Data Factory) - API & Data Integration : API integration using POSTMAN, CURL, Python libraries - Data Storage & Management : Data Lake & Data Warehouses (BigQuery, Redshift, Snowflake, Synapse, S3, GCS, Blob), NoSQL Databases (MongoDB, AWS DynamoDB, Azure Cosmos DB, GCP BigTable/Firestore) - Data Formats & Processing : Structured/Unstructured data, File formats: Avro, Parquet, JSON, CSV, XML, Text files - Security & Monitoring : IAM setup, Data Catalog, Logging, and Monitoring (Cloud or open-source tools), Bonus Skills (Good to Have) : - BI & Dashboard Development (Power BI, Tableau, Qlik), Web Crawlers (e.g., social media & Experience : - Any graduation - 6- 8Years (ref:hirist.tech)
Location: mumbai, IN
Posted Date: 2/21/2025
Location: mumbai, IN
Posted Date: 2/21/2025
Contact Information
Contact | Human Resources Ara Resources Pvt Ltd |
---|