Data Observability (Internship)
AzureVM & Docker Compose & OpenMetadata












Evaluating and Enhancing Data Observability
Supporting Documents
Introduction
During my four-month internship at Algorhythm, a premier data consultancy firm based in Belgium, I engaged in a comprehensive project focused on evaluating Data Observability tools and enhancing data governance. This page details the project's execution, its impact on Algorhythm, and the insights gained from this internship.
Ⅰ. About Algorhythm
Algorhythm is a leading data consultancy firm specializing in data strategy, data science, data architecture, data engineering, and data governance. They offer services such as extending staff with data experts, delivering end-to-end projects, and setting up data infrastructure. The firm is proficient in various cloud platforms (Azure, AWS, Google, Oracle, Snowflake) and front-end tools (Power BI, Tableau, Qlik), providing tailored solutions for diverse data needs.
Ⅱ. Internship Project
Project Description
The internship's primary goal was to conduct a comparative study of Data Observability tools, evaluate their capabilities, and determine the best fit for Algorhythm's managed services. The project evolved to include the development of a Dockerized modern data stack pipeline and the implementation of a Proof of Concept (POC) using OpenMetadata.
Project Approach and Accomplishments
Week 1-3: Objectives and Requirements Gathering
- Conduct meetings with stakeholders
- Document detailed requirements
- Identify objectives for data observability
Week 2-4: Market Research
- Conduct literature review
- Analyze features of various tools
- Shortlist tools for evaluation
Week 4-5: Evaluation Criteria Development
- Define metrics for tool assessment
- Create classification table
- Validate criteria with stakeholders
Week 5-10: Building the Sandbox Environment
- Configure Dockerized modern data stack
- Integrate tools like Airflow, Airbyte, dbt, Spark, Hive Metastore, S3, and Postgres DWH
Week 7-11: POC Implementation
- Configure OpenMetadata
- Run comprehensive tests
- Evaluate real-time alert capabilities
Week 11-13: Vendor Engagement
- Contact vendors of shortlisted tools
- Gather pricing and feature information
- Compare costs and benefits
Week 12-14: Analysis and Documentation
- Analyze POC results
- Compile comprehensive report
- Present recommendations
Project Impact and Future Recommendations
The project enhanced data quality, monitoring, and governance within Algorhythm's managed services, providing practical insights into the implementation of OpenMetadata. Future steps involve deploying the solution in a Kubernetes environment, integrating it with external systems like Redshift, BigQuery, and Snowflake, and implementing advanced security measures. Continuous training and support for staff will ensure the tool's effective use, while regular monitoring and adjustments will maintain optimal functionality.
Ⅲ. Personal Reflection
This internship has been a transformative experience, both professionally and personally. It provided a unique opportunity to apply theoretical knowledge in a real-world setting and gain hands-on experience with cutting-edge data management tools. I developed technical skills in data observability tools, Docker, OpenMetadata, and various data stack components. Additionally, I honed project management skills, problem-solving abilities, and adaptability to new tools and technologies. Improved communication skills and increased confidence in managing and executing complex projects were significant personal growth areas.
Ⅳ. Conclusion
The internship at Algorhythm was an invaluable experience that significantly enhanced my professional and personal growth. The project exceeded its initial objectives, delivering practical insights and enhancing the company's data observability capabilities. This experience has equipped me with essential skills and knowledge, preparing me for future challenges in the field of data consultancy.