Features

Quality by Design

Automating validation of standards-based data, end-to-end

By: julia zhang

Genzyme

Organizations make strategic and operational decisions based on data. Poor quality data can negatively influence how a company is perceived in the marketplace; therefore, it is critical to ensure data quality is given the highest priority. With data standards development in the biopharmaceutical industry, there are now more opportunities for businesses to ensure data quality.


Using standards can increase process efficiency and effectiveness, saving clinical trial data process life cycle resources — time and money — as well as improving compliance. Implementing standards, while a goal and a trend in clinical development, presents some key challenges including how to leverage different standards across the development life cycle. Genzyme has implemented or is in the process of implementing CDISC standards end-to-end, including PROTOCOL, CDASH, LAB, SDTM, ADaM and Controlled Terminology as well as utilizing BRIDG as its underlying information model. To leverage the standards, we are building a Metadata Repository (MDR) to govern data collection, data processing and data submission and leverage the usage of different standards enterprise-wide. In order to increase efficiency and effectiveness, a data validation tool is needed for improving data quality and ensuring data provided by one of Genzyme’s many partners or by internal teams matches all specified requirements and ensures ‘quality by design.’


We shall discuss Genzyme’s vision for automating validation of CDISC-developed standards, end to end, as well as validation of company-specific standards. We will also address how such a tool can facilitate strategic sourcing efforts.


Importance of Data Quality


Data quality isn’t just about the data. It is about people’s understanding of what it is, what it means, and how it should be used. If you can’t trust the data, 


what else will you base your decisions on?

how much effort will be wasted checking and rechecking certain results to figure out which is correct? 

how much effort will be spent in investigations, root cause analysis and corrective actions required to fix data problems? 

 

For example, if you leave for the airport for an international trip, but forget to bring your passport, then where can you go? As our enterprises begin their inevitable journey to real-time competitiveness, the passport is data quality. Poor quality data will


increase costs through wasted resources, 

increase costs through need to correct and deal with reported errors (rework),

increase costs through inability to optimize business processes,

lose revenue through customer dissatisfaction,

lose revenue through lowered employee morale, and

lose revenue through poorer decision making

 

This is an event-driven, process-oriented world and quality data will be essential to success.


Data Quality Principles


Data quality plays an important role in our business world as well as in daily life. Data quality is not linear and has many dimensions, like accuracy, completeness, consistency, timeliness and auditability. Having data quality on only one or two dimensions is as good as having ‘no quality.’ There are many factors that influence data quality, such as data design, data process, data governance, data validation, and more. We address some of these principles below:


Data Design 


Designing data is about discovering and completely defining your application’s data characteristics and processes. It is a process of gradual refinement, from the coarse “What data does your application require?” to the precise data structures and processes that provide it. With a good data design, your application’s data access is fast, easily maintained, and can gracefully accept future data enhancements. The process of data design includes identifying the data, defining specific data types and storage mechanisms, and ensuring data integrity by using business rules and other run-time enforcement mechanisms. A good data design defines data availability, manageability, performance, reliability, scalability and security.


Data Governance


Data Governance can be defined as:


a set of processes that ensures that important data assets are formally managed throughout the enterprise,

a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models that describe who can take what actions with what information, and when, under what circumstances, using which methods,

a quality control discipline for assessing, managing, using, improving, monitoring, maintaining and protecting organizational information,

putting people in charge of fixing and preventing issues with data so that the enterprise can become more efficient, and

using technology when necessary in many forms to help aid the process.

 

Data Governance describes an evolutionary process for a company, altering the company’s way of thinking and setting up the processes to handle information so that it may be utilized by the entire organization. It ensures that data can be trusted and that people can be made accountable for any business impact of low data quality. To ensure data quality, data governance processes need to be developed. 


Data Validation


Data validation consists of the processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria. It uses routines, often called “validation rules” or “check routines,” that check for correctness, meaningfulness, and security of data that are inputted to the system. The rules may be implemented through the automated facilities of a data dictionary or by the inclusion of explicit application program validation logic.


What can help us to appropriately implement these data quality principles in the business process? We think it is important and useful to leverage industry ‘common sense’ by implementing CDISC standards as a basis for data design and validation.


Standards in Clinical Trial Development


The CDISC standards have been developed to support the streamlining of processes within medical research from the production of clinical research protocols through to reporting and/or regulatory submission, warehouse population and/or archive and post-marketing studies/safety surveillance. The table (see next page) indicates the CDISC standards supporting clinical development.



Using standards proactively in clinical trial development has business benefits, as illustrated below:


facilitates the understanding, characteristics, and management usage of data over its lifecycle,

allows multiple systems to exchange information without a human being having to tell a computer how data items are related, and evaluating these relationships automatically,

allows multiple systems to move and share data transparently between functions within organizations, suppliers, external partners and regulatory bodies,

supports integrated information analysis resulting in better design of targeted therapies, 

provides improved efficiency in the planning and conduct of clinical trials, 

improves communication and enhances coordination between clinical trial sites, sponsors and regulatory agencies,

streamlines business processes from protocol through reporting, and reduces time and cost of clinical trials,

improves data quality in terms of both efficiency and effectiveness,

enhances compliance and reduces risks,

makes outsourcing easy to manage,

provides technical foundation for data integration and establishes data consistency, and

decreases learning periods over time.

 


With all these listed advantages for using standards, Genzyme has implemented or is in the process of implementing CDISC standards end-to-end. At the same time, we are building a Metadata Repository, which will help us to develop consistent and reliable means to apply standards in our business processes, and leverage information throughout the enterprise.


Data Design Workflow


There are many factors that impact data quality, as we stated above. Implementing standards appropriately streamlines business processes and enhances data quality. At Genzyme, we have developed a data governance process and are setting the process for data design from protocol development and data collection to reporting and submission. The figure below sketches our data design workflow based on developed standards.



Data Validation Tool


Data quality is an area fraught with tough challenges — for instance, the actual damage of dirty data isn’t always clearly visible. Using high quality clinical trial data is very critical for the pharmaceutical industry. It benefits patients, sponsors and regulator organizations. To improve data quality, data validation is a must-have process. This process can ensure correct metadata will be used in data collection, transmission and data derivation process, and can identify data outliers and data errors. 


Data validation generally can be defined to mean: a systematic process that compares a body of data to the requirements in a set of documented acceptance criteria. With the development of many standard initiatives at Genzyme, we have implemented or plan to implement standard protocol, standard CRF, standard central lab, SDTM/ADaM, and many other standards. All of these efforts will help Genzyme to improve the data quality. However, to ensure we will have data quality consistent with our standards as specified, a data validation tool is needed. This tool can provide data quality checks based on implemented standards and provide metrics to gauge our data quality. 


The vision and ultimate goals resulting from this Data Validation Tool (DVT) are to:


be able to check CRF, Central Lab, SDTM, ADaM data and define.xml file against CDISC standards and Genzyme-specific requirements to ensure that Genzyme receives, produces and submits quality data,

align with Genzyme Metadata Repository to ensure metadata validation, and

automate and streamline data validation processes.

 

The figure below indicates the basic business workflow from clinical data collection and data process to regulatory submission using standards with a DVT:


All standards used should be from a central Metadata Repository (MDR). This will ensure that the harmonized standards including terminology can be used effectively and efficiently throughout the entire lifecycle of a clinical program.


For business applications, data validation can be defined through declarative data integrity rules, or procedure-based business rules. Data that does not conform to these rules will negatively affect business process execution. Therefore, data validation should start with business process definition and a set of business rules within this process. To develop a successful data validation tool, we need to set up a validation governance process and develop Genzyme-specific data validation requirements based on our business needs and processes. The following figure explains the DVT’s basic capabilities, potential users and some requirements:



The data validation rules will be based on already developed industry standard data checks, such as SDTM data validation checks, ADaM data validation checks, define.xml checks, SEND data checks, as well as will-be-developed CDASH data validations checks and Central Lab data validation checks. In addition, we will develop company-specific data validation requirements based on Genzyme’s business needs. The DVT can play a role as a gateway guard for outsourced study activities. Before data delivering to Genzyme by CROs, EDC vendors or other third parties, these data will be recommended to load to the data validation tool. If the data passes the DVT checks, the data will then be delivered to Genzyme; if not, the loaded data will be returned to the sender. The senders have to ensure data satisfy the data requirements and correct any data issues. The DVT can also help us to improve the communications among different functional groups, CROs, EDC vendors, and Central Labs. In addition, we plan to add some more features, such as customized GUIs, which will allow users to add some study-specific checks. The DVT can be used as below:



The DVT usage in the data management functional area is explained in more detail. We will assume that before delivering data to Statistical Programming group for data derivation and analysis, a Data Manager (DM) loads “clean” data to DVT. “Clean” means that the data pass all specified edit checks. The DM can be an in-house data manager, or one from CROs, EDC vendors, etc. If the loaded data passes the validation, then the DM will send data to Statistical Programming group for data derivation and analysis. However, if the data does not pass validation, the loader has to resolve the reported issues and then reload the data to the DVT. This process can enhance the communication between the DM and Statistical Programmer, clarify the responsibility, and improve the data quality.This is a customized tool and we plan to build it based on OpenCDISC Validator technology in different phases. For example, the first phase will be validated on all implemented standards, and the later phase will be added on customized requirements, such as therapeutic area-specific or study-specific checks and GUI.


Using a well-designed Data Validation Tool will help us to reduce many potential risks in data processes, such as:


risk of accepting poor quality data from CROs,

risk of analyzing poor quality data,

risk of submitting low quality data to regulatory agencies,

risk of re-work or duplicate process on the similar data issues due to lack of data validation process,

low/no efficient data validation process, and

not using resources smartly.

 

The Data Validation Tool will help us improve data quality and smooth the communications. Thus, it will improve our business process efficiency and effectiveness and bring business ROI.  

 

Julia Zhang is associate director, Global Biomedical Informatics at Genzyme. She can be reached at julia.zhang@genzyme.com. 


Sue Dubman is senior director, Global Biomedical Informatics Standards and Architecture at Genzyme, She can be reached at sue.dubman@genzyme.com.

Keep Up With Our Content. Subscribe To Contract Pharma Newsletters