Executive Summary
The client, OnePlace, partnered with Peritos Solutions to modernize and automate its data integration and transformation processes using AWS Glue. The initiative aimed to build a serverless, scalable, and secure data engineering framework that could streamline ETL (Extract, Transform, Load) operations, automate metadata management, and enhance data governance.
Before this project, OnePlace faced challenges in managing and transforming large datasets from multiple sources due to manual data pipelines and inconsistent governance. Through the AWS Glue implementation, Peritos Solutions established a fully automated ETL ecosystem, enabling OnePlace to perform seamless data ingestion, transformation, and cataloging, all integrated with other AWS services such as Amazon S3, Amazon RDS, Amazon Redshift, and Amazon Athena.
This AWS Glue solution not only optimized OnePlace’s operational efficiency but also aligned with AWS best practices for security, cost optimization, and scalability, meeting the criteria for AWS Glue Service Delivery.
About Client
OnePlace is a technology-driven organization focused on leveraging data to enhance decision-making, reporting, and operational intelligence. The company manages a growing data ecosystem comprising multiple data sources, analytics tools, and business applications.
To address scalability challenges and improve data automation, OnePlace collaborated with Peritos Solutions, an AWS Advanced Consulting Partner, to design and implement a serverless data platform using AWS Glue. The goal was to replace manual data workflows with a secure, automated ETL framework that ensures accuracy, consistency, and governance.
Project Background – Data Modernization through AWS Glue
OnePlace’s data operations were previously dependent on traditional ETL tools that lacked automation and flexibility. These manual processes caused data silos, inconsistent data quality, and delayed reporting.
Peritos Solutions proposed an AWS Glue-based serverless ETL framework to transform OnePlace’s data landscape. The solution automated schema detection, data cataloging, and transformation pipelines while ensuring end-to-end visibility through AWS CloudWatch and centralized governance.
This transformation allowed OnePlace to manage large-scale data workloads with minimal operational overhead while maintaining compliance, traceability, and cost efficiency.

Objectivesof the Engagement
- Establish a serverless, automated data integration framework using AWS Glue.
- Replace legacy ETL pipelines with scalable and efficient Glue jobs.
- Implement centralized metadata management using AWS Glue Data Catalog.
- Enable cross-service data integration with S3, RDS, and Redshift.
- Ensure security, governance, and compliance through IAM, encryption, and monitoring.
Scope & Requirements
Scope
The project’s scope included the design, deployment, and optimization of AWS Glue components to enable seamless data flow across OnePlace’s AWS environment.
Key deliverables included:
- AWS Glue Crawlers for automated schema detection.
- Glue Jobs for data transformation and enrichment.
- Glue Workflows for end-to-end orchestration.
- Centralized Data Catalog for metadata governance.
- CloudWatch monitoring and alerting integration.
Requirements
Functional:
- Automated ETL pipeline creation and scheduling.
- Dynamic schema detection and updates.
- Integration with Amazon Redshift and Athena for analytics.Non-Functional:
- Serverless architecture for scalability.
- Secure access control with IAM.
- Centralized monitoring and auditing.
- Cost efficiency and fault tolerance.

Solution Overview
Business Problem Addressed
OnePlace’s existing ETL infrastructure was time-intensive and not scalable. Manual intervention led to delays, higher costs, and inconsistent data quality.
Proposed AWS Glue-Based Solution
Peritos Solutions implemented an end-to-end AWS Glue solution integrating multiple data sources and automating data transformation and cataloging. Using Glue Crawlers, Jobs, and Workflows, the entire ETL process became event-driven, reducing human intervention and operational latency.
Key Benefits
- Serverless data integration with zero infrastructure management.
- Automated data discovery and schema management.
- Faster and more reliable data transformation pipelines.
- Improved data governance through centralized metadata.
- Seamless analytics enablement through Athena and QuickSight integration.
Implementation
Architecture Overview
The architecture consisted of:
- Data Sources: S3, RDS, and on-premises data via secure connectors.
- ETL Layer: AWS Glue Crawlers, Jobs, and Workflows.
- Data Catalog: Centralized schema and metadata management.
- Analytics Layer: Athena and QuickSight for visualization.
- Monitoring & Logging: CloudWatch for logs, metrics, and alerts.
Technology Stack
- AWS Services: Glue, S3, RDS, Redshift, CloudWatch, Lambda, Secrets Manager, IAM.
- Security: KMS encryption, MFA-enabled IAM roles, and cross-account logging.
- Automation: CI/CD pipelines with AWS CodePipeline and CodeBuild.
AWS Glue Components Implemented
- Glue Crawlers: Automated schema discovery for S3 and RDS datasets.
- Glue Jobs: ETL scripts built using PySpark to clean, normalize, and enrich data.
- Glue Workflows: Orchestration for dependency-based execution.
- Data Catalog: Managed metadata, table schemas, and data lineage.
- Triggers: Event-driven execution using CloudWatch and EventBridge.
Security and Compliance
- IAM policies applied with least privilege.
- Glue roles restricted to authorized services only.
- KMS encryption applied for data at rest and in transit.
- CloudTrail enabled for audit trails and compliance verification.
Runbook and Troubleshooting Scenarios
Routine Operational Tasks
- Daily monitoring of Glue job metrics and DPU utilization.
- Reviewing failed job logs and rerunning based on SLA thresholds.
- Verifying Data Catalog updates and schema integrity.
- Checking Glue job triggers and workflow dependencies.
Common Troubleshooting Scenarios
- Job Failures Due to Schema Drift: Re-run Glue Crawler, refresh Data Catalog, and update ETL script mapping.
- Performance Degradation: Tune Spark configurations and increase DPU allocation.
- Connection Errors: Validate IAM permissions, VPC configurations, and network paths.
- Data Quality Issues: Use Glue dynamic frames and AWS Deequ for validation.

Deployment Readiness Checklist
Testing
- Unit, integration, and system testing of all Glue jobs.
- Validation of schema mapping, data accuracy, and job success rates.
Automation
- CI/CD pipelines integrated for job versioning and automated deployment.
- Security scans embedded in build pipelines.
Documentation
- Deployment runbook, rollback plan, and configuration details maintained.
Monitoring & Validation
- Glue job metrics and alerts verified in CloudWatch.
- Post-deployment validation ensured job stability.
Evidence: Deployment logs, Glue job screenshots, and automation reports attached to project documentation.
Cost Optimization and Performance Tuning
- Used Glue 3.0 for faster job performance and improved scaling.
- Optimized DPU allocation and job parallelism.
- Leveraged job bookmarks for incremental data loads.
- Enabled data partitioning in S3 for query efficiency.
- Monitored spend through AWS Cost Explorer and adjusted scheduling.
Challenges and Resolutions
| Challenge | Resolution |
| Schema evolution from multiple data sources | Automated schema updates via Glue Crawlers |
| Long-running ETL jobs | Spark job optimization and dynamic partitioning |
| Data duplication in catalogs | Automated Data Catalog cleanup and versioning |
| Integration with legacy databases | Implemented secure JDBC connections and Glue connections |
| Monitoring job failures | Integrated CloudWatch alerts with email/SNS notifications |
Project Completion
Duration
- Implementation Phase: May 2024 – September 2024
- Support Phase: October 2024 – Present
Deliverables
- AWS Glue Data Catalog, Crawlers, Jobs, and Workflows.
- CloudWatch dashboards for Glue performance monitoring.
- Operational Runbook and Troubleshooting Guide.
- Deployment Readiness Checklist and Evidence Reports.
- CI/CD pipelines for Glue job automation.
Support
Post-implementation support for two months, including 20 hours/month of operational support, bug fixes, and performance optimization.
Next Phase
- Integrate with AWS Lake Formation for enhanced data governance.
- Implement data lineage tracking and metadata versioning.
- Expand to real-time streaming ETL using AWS Glue and Kinesis.
- Develop monitoring dashboards using QuickSight for Glue job analytics.
- Conduct quarterly optimization reviews for cost and performance improvements.










