Latest news with #PySpark


Business Wire
12-06-2025
- Business
- Business Wire
Affinity Solutions' Consumer Purchase Insights is Now Accessible via AWS Clean Rooms to Enable Advertisers and Publishers to Securely Drive Outcomes Measurement
NEW YORK--(BUSINESS WIRE)--Affinity Solutions, the leading consumer purchase insights company, today announced it is teaming up with Amazon Web Services (AWS) to provide advertisers and publishers with privacy-enhancing controls to securely access Affinity's Consumer Purchase Insights using AWS Clean Rooms. Advertisers and publishers will have more control over how they activate, measure, and optimize campaigns by securely collaborating with Affinity via AWS Clean Rooms and receiving access to fully permissioned consumer purchase insights from more than 95 million consumers, representing over 86 billion transactions—all without sharing or copying one another's underlying data. Forward-thinking companies seek to generate unique data-driven insights to clearly measure outcomes. To do so, they often need to strategically collaborate with their partners to analyze their collective data sets. However, with increasing privacy regulations and consumer expectations for data protection, companies and their partners need solutions to collaborate safely, without compromising data privacy. AWS Clean Rooms enables a secure and controlled environment that helps Affinity Solutions and its customers to safely integrate Affinity's deterministic transaction data with other data sets, without having to share or reveal raw data. Affinity's Consumer Purchase Insights provides a complete and granular view of spending behaviors across brands and categories, to inform a wide array of growth strategies. With this expanded relationship, advertisers and publishers can now match and measure their first-party data with Consumer Purchase Insights within a trusted environment, providing a more comprehensive view of the customer journey. 'As first-party data remains critical for advertisers to drive business growth, having the right safeguards in place to leverage that data responsibly is essential,' said Ken Barbieri, Senior Vice President, Business Development at Affinity Solutions. 'By expanding our collaboration with AWS, we're empowering advertisers and publishers to measure real-world outcomes with greater speed, precision, and confidence, while providing tools for customers to comply with today's data-privacy standards.' Additionally, advertisers and publishers can measure and model campaign outcomes with greater flexibility, using their own custom ML models and methodologies using AWS Clean Rooms. With AWS Clean Rooms ML, companies can apply privacy-enhancing controls to safeguard their proprietary data and ML models with partners while generating predictive insights—all without sharing or copying one another's raw data or models. Using PySpark in AWS Clean Rooms enables companies and their partners to bring PySpark code and libraries, and run advanced analyses across large datasets without having to share underlying data or proprietary analysis methods. These capabilities help marketers analyze Affinity's Consumer Purchase Insights alongside their own proprietary outcomes data creating opportunities to enhance the measurement of their return on ad spend. Furthermore, the always-on capability enables brands already storing their data on AWS to seamlessly analyze their datasets using AWS Clean Rooms, accelerating the path to potentially actionable campaign insights and measurable ROI. 'Our industry is continuously raising the bar on privacy standards and the technology used to gain insights while providing mechanisms for customers to protect their data. We're pleased Affinity Solutions is leveraging AWS Clean Rooms to offer their customers a secure environment where they can analyze Affinity Solutions' transaction data alongside their own first-party data, driving valuable insights and potentially improving business outcomes,' said Eric Saccullo, Senior Business Development Manager, Applied AI Solutions for Advertising & Marketing at AWS. To learn more about Affinity Solutions, please visit About Affinity Solutions Affinity Solutions is the leading consumer purchase insights company. We provide a complete view of U.S. and U.K. consumer spending, across and between brands, via exclusive access to fully permissioned data from over 150 million debit and credit cards. Our proprietary AI technology, Comet™, transforms these purchase signals into actionable insights for business and marketing leaders to drive optimal outcomes and build lasting customer relationships. Visit us at to discover how we're shaping the future of consumer purchase insights.
Yahoo
12-06-2025
- Business
- Yahoo
Affinity Solutions' Consumer Purchase Insights is Now Accessible via AWS Clean Rooms to Enable Advertisers and Publishers to Securely Drive Outcomes Measurement
Expanded relationship equips advertisers and publishers with privacy-enhanced controls to leverage first-party data, creating opportunities to measure and optimize ad campaigns using AWS Clean Rooms NEW YORK, June 12, 2025--(BUSINESS WIRE)--Affinity Solutions, the leading consumer purchase insights company, today announced it is teaming up with Amazon Web Services (AWS) to provide advertisers and publishers with privacy-enhancing controls to securely access Affinity's Consumer Purchase Insights using AWS Clean Rooms. Advertisers and publishers will have more control over how they activate, measure, and optimize campaigns by securely collaborating with Affinity via AWS Clean Rooms and receiving access to fully permissioned consumer purchase insights from more than 95 million consumers, representing over 86 billion transactions—all without sharing or copying one another's underlying data. Forward-thinking companies seek to generate unique data-driven insights to clearly measure outcomes. To do so, they often need to strategically collaborate with their partners to analyze their collective data sets. However, with increasing privacy regulations and consumer expectations for data protection, companies and their partners need solutions to collaborate safely, without compromising data privacy. AWS Clean Rooms enables a secure and controlled environment that helps Affinity Solutions and its customers to safely integrate Affinity's deterministic transaction data with other data sets, without having to share or reveal raw data. Affinity's Consumer Purchase Insights provides a complete and granular view of spending behaviors across brands and categories, to inform a wide array of growth strategies. With this expanded relationship, advertisers and publishers can now match and measure their first-party data with Consumer Purchase Insights within a trusted environment, providing a more comprehensive view of the customer journey. "As first-party data remains critical for advertisers to drive business growth, having the right safeguards in place to leverage that data responsibly is essential," said Ken Barbieri, Senior Vice President, Business Development at Affinity Solutions. "By expanding our collaboration with AWS, we're empowering advertisers and publishers to measure real-world outcomes with greater speed, precision, and confidence, while providing tools for customers to comply with today's data-privacy standards." Additionally, advertisers and publishers can measure and model campaign outcomes with greater flexibility, using their own custom ML models and methodologies using AWS Clean Rooms. With AWS Clean Rooms ML, companies can apply privacy-enhancing controls to safeguard their proprietary data and ML models with partners while generating predictive insights—all without sharing or copying one another's raw data or models. Using PySpark in AWS Clean Rooms enables companies and their partners to bring PySpark code and libraries, and run advanced analyses across large datasets without having to share underlying data or proprietary analysis methods. These capabilities help marketers analyze Affinity's Consumer Purchase Insights alongside their own proprietary outcomes data creating opportunities to enhance the measurement of their return on ad spend. Furthermore, the always-on capability enables brands already storing their data on AWS to seamlessly analyze their datasets using AWS Clean Rooms, accelerating the path to potentially actionable campaign insights and measurable ROI. "Our industry is continuously raising the bar on privacy standards and the technology used to gain insights while providing mechanisms for customers to protect their data. We're pleased Affinity Solutions is leveraging AWS Clean Rooms to offer their customers a secure environment where they can analyze Affinity Solutions' transaction data alongside their own first-party data, driving valuable insights and potentially improving business outcomes," said Eric Saccullo, Senior Business Development Manager, Applied AI Solutions for Advertising & Marketing at AWS. To learn more about Affinity Solutions, please visit About Affinity Solutions Affinity Solutions is the leading consumer purchase insights company. We provide a complete view of U.S. and U.K. consumer spending, across and between brands, via exclusive access to fully permissioned data from over 150 million debit and credit cards. Our proprietary AI technology, Comet™, transforms these purchase signals into actionable insights for business and marketing leaders to drive optimal outcomes and build lasting customer relationships. Visit us at to discover how we're shaping the future of consumer purchase insights. View source version on Contacts Media Contact Sara Serbanoiusserbanoiu@ Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


India.com
26-04-2025
- Business
- India.com
Transforming Data Landscapes: A Conversation with Raghu Gopa
Raghu Gopa is a seasoned data engineering professional with over 12 years of experience in data warehousing and ETL development. With a Master's in Information Assurance from Wilmington University, Raghu balances rich theoretical knowledge with hands-on experience. Raghu's impressive career has spanned diverse domains where he has showcased his expertise at the highest levels in the design, development, and implementation of cutting-edge data solutions. Q 1: Why data engineering and cloud technologies? A: I am interested in how organizations extract insights from data and make strategic decisions. Then, raw data being transformed into actionable insights for business value fascinated me. During that time, cloud technology was becoming prevalent in managing and processing data. Combined with lower infrastructure costs and being able to build scalable, flexible data solutions processing petabyte-scale information, these were things I wanted to pursue. I'm really excited about creating that synergy between technology and business needs to create solutions that allow organizations to be data-driven. Q2: What methodology would you apply to migrating an on-premise data warehouse to that of a Cloud platform? A: On all fronts, it takes a balancing act of technical and business understanding. I begin with a deep analysis of the current data architecture in terms of mapping dependencies, performance bottlenecks, and business-critical processes. I work out a phased migration plan to minimize disruption while bringing in the maximum benefits from cloud services. The on-premises function is replicated, and AWS services such as Lambda, Step Functions, Glue, and EMR are used to enhance the design of pipelines. One of my most successful projects was creating direct loading from a PySpark framework to Snowflake, increasing data management operational efficiency by 90%. Migration should be viewed more as modernization and optimization of the entire data ecosystem than just a lift-and-shift exercise. Q 3: How do you ensure data quality and governance for a large-scale data project? A: Data quality and governance are 'must-haves' for all successful data projects. I put in place the validation framework at different levels of the data pipeline. For example, I perform thorough data quality checks for things like structure, business rules, and so on, referend checks on constraints. As for governance, I enact data lineage tracking and access control mechanisms, plus audit mechanisms, while ensuring encryption and masking schemes of sensitive info like PII data. One project was able to achieve 100% data accuracy and consistency by effectively integrating our good data quality and governance practices directly into the PySpark framework. I truly believe that one needs to build in quality and governance in the beginning rather than tried on later. Q 4: What challenges have you faced when working with big data technologies, and how did you overcome them? A: One of the biggest challenges has been optimizing performance while managing costs. Big data systems can quickly become inefficient without careful architecture. I've addressed this by implementing partitioning strategies in Hive and Snowflake, push-down computations using Snowpark, and optimizing Spark applications with proper resource allocation. Another significant challenge was integrating real-time and batch processing systems. To solve this, I implemented solutions using Kafka and Spark Streaming, creating a unified data processing framework. By converting streaming data into RDDs and processing them in near real-time, we were able to provide up-to-date insights while maintaining system reliability. The key to overcoming these challenges has been continual learning and experimentation. The big data landscape evolves rapidly, and staying ahead requires a commitment to testing new approaches and refining existing solutions. Q 5: How do you collaborate with cross-functional teams to ensure data solutions meet business requirements? A: Effective collaboration begins with establishing a common language between technical and business teams. I serve as a translator, helping business stakeholders articulate their needs in terms that can guide technical implementation while explaining technical constraints in business-relevant terms. Regular communication is essential. I establish structured feedback loops through agile methodologies, including sprint reviews and demonstrations of incremental progress. This helps maintain alignment and allows for course correction when needed. One of my key achievements has been developing Power BI and Tableau dashboards that connect to Snowflake, providing business users with intuitive access to complex data insights. By involving stakeholders in the design process, we ensured the dashboards addressed their actual needs rather than what we assumed they wanted. This approach has consistently resulted in higher user adoption and satisfaction. Q6: What tools and technologies do you find most impactful in your data engineering toolkit? A: Great question; my toolkit has seen constant changes, and many technical solutions have almost always remained in my toolbox. In the AWS ecosystem, Glue for ETL, Lambda for serverless execution, and S3 for cost-effective storage pretty much form the backbone of many solutions I build. For data processing, PySpark would be the most flexible tool, with its scalability and flexible APIs helping me efficiently process both structured and semi-structured data. Snowflake leads innovations in the data warehouse industry by separating compute from storage, allowing scaling of resources dynamically according to workload. Airflow and Control-M are my tools for orchestrating and scheduling pipelines through complex dependencies to guarantee execution. From there, it is on to visualization: Power BI and Tableau convey sophisticated data into operational insights for business users. It's not really about specific tools but whether you can put the right technology combination together to solve a business problem while leaving yourself options for the future. Optimization is the domain of art and a science at the same time. I would begin with a data-driven approach where I fix baselines and identify bottlenecks through profiling and monitoring. This would include reviewing query execution plans, resource utilization, and data flow tracking of the various stages of the pipeline. For Spark programs, optimization of partition sizes, minimizing data shuffling, and tagging executor resources correctly would be important. In database-type setups, we would implement the right indexing strategy, query optimization, and cache mechanisms. One of the trickiest optimizations I've done is actually using Snowpark to push down computations to Snowflake's processing engine to minimize data movement. I also design data models around the expected access patterns-whether it means denormalizing for analytic workloads or leveraging strategic partitioning for faster query response. Performance optimization is a continuum and not an end-in-itself. We set up monitoring solutions to catch early signs of performance degradation so that we can proactively tune rather than troubleshoot reactively. Q 7: Do you have any advice for someone wanting to become a data engineer? A: There are a few basic principles that should be mastered: database design, SQL, and programming. However, the accompanying technologies will change from time to time and the value of these core skills will remain. Learn the concepts such as data modeling, ETL, and data quality before stepping into the big data frameworks. You must master at least one of the most popular programming languages in data engineering, such as Python or Scala. Get hands-on experience in real projects; you can use open data available online. Be curious and keep expanding your knowledge because the field is growing fast; be ready to spend time exploring new technologies and the latest in the field. Subscribe to industry blogs or communities; you might also consider pursuing certificates like the AWS Solutions Architect. Then, work on communications. The best data engineers connect the dots between the technical implementation and the value for a business by articulating to all stakeholders the complex concepts within an organization in simple terms. Q 8: How will this field be changing in the next years of data engineering? A: In fact, it would be transformational trends regarding an increasingly blended world of traditional data warehouse approaches and data lake approaches integrated in what would now be called hybrid architectures like data lakehouses, which incorporate all that structure, performance of warehouses, and also flexibility, scalability of lakes. Then, there will be several more changes. The space fiber will be smart where much of the superficial routine work will be managed by the smart machines-cum-flies around in data pipeline development, optimization, and maintenance. So, the real change occurring in the lives of data engineers would be shifting their work profile toward higher-valued activities such as architecture design and business enablement. Batch and real-time separation continues to fade away, and a common processing framework is the norm. Added will be the deep embedding of AI/ML capabilities directly within these platforms. This is all meant to enable even further sophisticated analysis and predictions on said data. Last but not least, as they mature in use, and companies become increasingly aware of what really means 'better' data governance, security, and privacy are likely to become even bigger aspects of how they do data engineering. Q 9: What has been your most challenging project, and what did you learn from it? A: Among several difficult projects, one dealing with the AWS migration of a complex on-premise data warehouse while simultaneously modernizing that architecture for real-time analysis has been truly challenging. The system was then supporting key business functions, wherein extended downtimes were to be avoided and dual environments were required to be maintained throughout the migration. We would face many technical challenges involving data type incompatibilities and performance issues with early designs for pipelines. The hardware lease expiration gave us the pressure to add more stress because it effectively squeezed the project timeline. Our successful migration strategies were all methodical: prioritizing critical data flows, building adequate testing frameworks, and observing with fine granularity. We never stopped communicating with stakeholders about what was reasonable and what we did on a timely basis. The overall lesson was how critically important it is to remain resilient and adaptable. Irrespective of how well your planning has gone, something unexpected will definitely come along. Therefore, building architecture that is flexible to modification and a mindset that generates problem-solving solutions is extremely critical. I also took home a lesson about `incremental delivery' i.e., making sure you focus on bringing business value in incremental chunks instead of going for a 'big bang' style migration. This experience taught me that an excellent technical solution is not enough; a crystal-clear stakeholder management strategy is essential, with proper communications and a process for balancing the ideal solution against, often, the practical constraints. About Raghu Gopa Raghu Gopa is a data engineering professional with over 12 years of experience across multiple industries. Holding a Master's In Information Assurance from Wilmington University, he specializes in areas such as data warehousing, ETL process, and cloud migration strategy. Having good knowledge of AWS Services, Hadoop Ecosystem Technologies, and New Data Processing Frameworks, such as Spark, Raghu, an AWS Solutions Architect, combines his technical prowess with business sense to bring about data solutions for organizational success.