CareLex Specification V1.0 for eTMF

CareLex Specification V1.0 has been released.  View it at

What is an eTMF system and why do you need one?

eTMF.Org Whitepaper

July 2014 update:  

The global standards group OASIS open has published the OASIS eTMF Standard draft specification.  Visit for details.

History and Background

Every organization involved in clinical trials in the BioPharma industry maintains a trial master file or ‘TMF’ comprised of thousands of pages of regulatory documents required for each clinical trial.For the majority of clinical trials, clinical trial regulatory documents are primarily paper documents captured centrally in physical file cabinets.These paper-based TMFs are a centralized set of central documents that typically are used to support and comply with applicable regulatory requirements and Good Clinical Practices.Traditionally the required documents, document names, document classification scheme and document content requirements vary from sponsor to sponsor, creating a high degree of variability and inconsistency in the TMF.  Many sponsors establish internal TMF standards, however, for many sponsors and clinical trial stakeholders, no classification scheme or 'content model' for the content or documents in a TMF existed.  

In 2008 a working group was formed to create a TMF reference model to help classify TMF documents. In 2010 the group released a TMF reference model for paper TMFs. This model was a good starting point to capture some commonly used Trial Master File document names and document descriptions.

However, many organizations involved in BioPharma clinical trials want to move from paper-based document management systems contained in file cabinets to online electronic document management systems where documents are stored online in electronic archives. Organizations using electronic content management systems can achieve higher levels of clinical trial productivity, cost savings and can improve clinical trial quality while trimming clinical trial study timelines.

Keys to Success in Moving from Paper-based TMF Archives to Electronic TMF Archives

In order to move toward an all electronic TMF or eTMF system, organizations typically use an Enterprise Content Management system (ECM) in their efforts to manage clinical trial regulatory documents.  The ECM based eTMF provides automated methods and workflows to collect, classify, index, archive and report on documents and content.   Digital signatures may be used to minimize paper-based signature capture, minimize handling processes and to significantly cut mail and overnight delivery expenses.  A the base of any ECM system is a schema or classification system, document tagging terms or 'metadata,' as well as a database, also known as a repository, which retains the eTMF electronic documents for search, reporting and other management tasks.

While the paper-based TMF reference model for paper is a great starting point for managing paper based TMFs, it lacks several core foundational components that would make it suitable for use as a schema for an eTMF:

An effective eTMF system model builds on the following foundational components:

  • Machine readable classification scheme - The ability of a computer to read the classification scheme and to use it to create the online electronic TMF repository enables consistency, productivity and interoperability
  • Published, standards-based terms available in machine readable format
  • Automated digital signature capture option to minimize paper handling
  • Automated document audit trail and workflow history
  • Based on web standards - Most ECM systems support XML, HTTP or other web standards to exchange, view and manage eTMF content

In order to gain the benefits offered by electronic automation of paper-based processes in clinical trial regulatory document management, it's important to consider the underlying foundational schema or 'content model' that will be used to implement an ECM for eTMF's.   While the paper-based TMF Reference model is useful as a starting point to create an eTMF content model, the paper TMF Reference model lacks many of the core foundational components highlighted above.  This makes the TMF Reference model unsuitable for use as a  content model in eTMF deployment.  The paper-based TMF Reference model has no provision for digital signatures, no specification for how files should be tagged with metadata or how TMF repositories and archives can be exchanged or made searchable on an internet/intranet.    The paper-centric TMF Reference model is human readable but not machine readable, making it impossible to import into an ECM system. If the model paper-centric TMF Reference model could be converted to a machine readable format, it would still not provide the foundational components necessary to support electronic paper handling workflows, search and interoperability.

Business Rationale for moving from Paper TMFs to automated eTMF systems

Organizations today operate at ‘internet speed’ and users expect to access, exchange and manage documents electronically, anytime, anywhere, from any device.  While paper copies of TMF documents may exist in parallel with the electronic TMF record as a backup, most businesses will access and primarily use the electronic version of the TMF, therefore, the eTMF archive will become the primary Trial Master file resource.  A summary of some of the business rationale and justifications for moving from paper-based TMFs to eTMF systems include the following:

  • Reduce business risk - systems provide confidence that you have met agency regulatory compliance requirements
  • Enhanced document quality - automated systems have been proven to make fewer errors than manual paper handling processes; ability to implement automated quality control processes
  • Improved team productivity - sharing, viewing documents anytime, anywhere from any device is faster than manual paper retrieval
  • Reduced auditing and reporting costs - automated reporting and retrieval of ECM based systems can significantly reduce auditing and reporting labor and travel costs

Stakeholders in the clinical trial, including agencies, sponsors, investigators and clinical research organizations, increasingly expect to have online and offline access to the clinical trial eTMF archive to accelerate document review, compliance and audit, quality control and trial oversight.Additionally, as clinical trials expand their presence to engage, recruit and enroll patients via online social networks, web-based document content will become increasingly important to capture in eTMF archives.

In order to deliver on business expectations and rationale for moving from paper to eTMF systems, eTMF systems should enable organizations to automate the manual processes surrounding the collection, archival, management and quality control of documents and content.  The foundational components, features and benefits to consider before implementing any eTMF system are outlined below.

What is an eTMF System?

What is an electronic Trial Master File system?At its most basic level, an eTMF system is the application of an electronic document system or a content management system to automate manual paper-based Trial Master File processes.In an eTMF, documents and content are stored centrally on a computer server, typically using a document management system and exchanged with others through either a corporate Intranet or a secure internet connection that can be accessed with auditable security.Use of an eTMF system can provide cost savings, time savings and risk reduction for organizations conducting clinical trials.

The key to cost efficiency, interoperability and flexibility in any eTMF system is the use of a standards-based machine readable content model which includes at a minimum the following   1) Content Classification scheme:  A flexible, content classification scheme that supports addition of organization-specific content classifications without renumbering; 2) Vocabulary terms: published metadata and content classification terms based on widespread use and adoption by industry groups; 3) Web standards based syntax for exchange, sharing of content archive documents and information.

Standards are needed for 1) eTMF content search and 2) computer-to-computer interoperability or exchange.   First, in order to ensure agency compliance and the ability to exchange documents with stakeholders, agencies and study partners, terminology or naming standards must be followed when naming TMF document types (also referred to as Content Types). To the extent possible, document metadata, or data about the documents, needs to be based on regulatory and industry standards and should allow for implementation flexibility.  Second,  the electronic organization of the eTMF archive hierarchy needs to be based on an industry standard that allows for automated computer 'readability.'  While business people can read spreadsheet files with eTMF terms, web servers cannot easily read or parse the content of a spreadsheet or a document.  Any eTMF content archive must be based on an internet standard adopted by the Worldwide Web Consortium (W3C) for internet-based information exchange, import, export and search.    The eTMF content archive should allow electronic exchange from online as well as off-line archive sources such as a DVD or USB memory device.  Just as with a content management system, eTMF content archives need to allow interoperability with existing enterprise applications, databases and external systems such as agency eSubmission gateways.

Below we outline the critical business requirements, components and standards that comprise a basic eTMF system.

eTMF System Benefits

Regulatory compliance is a core requirement of most businesses, but especially so for the BioPharma and Health Science industries which are highly regulated by government agencies.Businesses are dealing with new regulatory compliance issues by implementing software applications to help automate and track compliance.By implementing a comprehensive,eTMF system that automates the capture and management of TMF documents and records,organizations can prevent unnecessary risk and can often realize clinical trial cost savings over manual paper handling processes.

There are many reasons that businesses may wish to put an effective eTMF management application in place:

  • Growth in Regulations: State, Federal and industry regulations continue to grow and evolve
  • Risk Management: Significant risks and penalties for non-compliance, including fines, and customer lawsuits
  • Product Quality: Enhanced product quality through easier audits and management
  • Accelerate Clinical Trials: Electronic document sharing with clinical trial stakeholders: e.g., Investigators, agencies, and clinical research centers can help resolve issues faster and accelerate clinical trial milestones
  • Cost Savings: Save document mail and overnight courier costs; save document physical storage costs; save administrative staff document handling and management costs
  • Time Savings: Anytime, anywhere access to documents helps move business operations forward faster than manual, paper-based processes

Core Components of an eTMF System

The core components of an eTMF system are based on a combination of1) Standards- based core metadata and a published TMF taxonomy – ideally based on NCI/NIH, CDISC, HL7 term databases; 2) A flexible taxonomy or hierarchical model including document classifications, document types, document descriptions, and metadata;   3) A standards-based electronic content management system that allows customization of the eTMF taxonomy for addition of new document types and metadata, 4) A fully customizable search facility that allows intelligent, precise search across one or more eTMF archives, 5) Interoperability that allows anyone to open, view, or edit an eTMF archive without the need to purchase the application that created the archive.

Core eTMF system requirements:




Document acquisition primarily involves accepting and processing documents or content. In an eTMF, documents are acquired electronically and stored electronically. Documents may be acquired from the web or email or via automated business processes.  To eliminate paper from a clinical trial study, electronic signing using digital signatures from authenticated users is often used.   Digital signatures are accepted in place of wet signatures in most countries worldwide including the USA and the EU, thereby averting the need to scan a document.  Where paper is still used for wet-signed documents or other non-digital content items, conversion from paper documents to electronic document images is done via scanners or multifunction printers.  Optical character recognition (OCR) software is sometimes used, whether integrated into the hardware or as stand-alone software, in order to convert digital images into machine readable and searchable text. Optical mark recognition (OMR) software is sometimes used to extract values of check-boxes or bubbles. Capture may also involve accepting electronic documents and other computer-based files.


Routing of the document to the proper eTMF classification for indexing. Classification is used mainly as a preparation for indexing.


Indexing is the process of adding unique document identifiers so that documents can be rapidly retrieved from the system. Document indexes are comprised of metadata which is retrieved from a pre-defined index classification topology. Often some level of indexing can be automated by utilizing a database to lookup metadata attributes.  When automated workflows, digital signing and all digital processes are used, often indexing and classification can be automated, saving labor and processing time.


Store electronic documents. Storage of the documents often includes management of those same documents; where they are stored, for how long, migration of the documents from one storage media to another (hierarchical storage management) and eventual document destruction.


Compliance rules capture document collection requirements, policies and procedures. For example, an FDA 1572 document must be collected for each investigator in a clinical trial. Compliance rules ensure that the right documents are collected in the eTMF according to pre-set rules. Often the eTMF compliance rules are expressed as part of an SOP or standard operating procedure. A compliance officer or regulatory document officer may be responsible for developing and implementing required compliance policies and procedures.  

eTMF applications as used in clinical trials in the USA are subject to regulatory compliance under FDA 21 CFR Part 11 regulations and should be independently validated and audited for adherence to FDA rules related to security and electronic signing.   At the time of this article, SureTrial eTMF by SureClinical was the only commercial eTMF application with integrated electronic signing to receive independent FDA Part 11 compliance validation.

Document Quality

If a document is to be distributed electronically in a regulatory environment, then the document should be quality checked through a pre-defined quality control process. Acceptance sampling using standards-based processes such as ASTM-E105 to sample incoming document batches is one such method of document quality control to help ensure document integrity and quality for large batches of documents.


Auditability of the eTMF is twofold: 1) Audit trail of system access, login and user activity should be auditable for all system resource usage (21 part CFR); document workflow history including date, event (e.g.,approved, submitted, created, modified), source of event and person involved;   2) Document compliance audits: Internal document compliance audits play a key role in document compliance and quality. Auditors should have online access to the eTMF documents and reports to review the eTMF archive with the goal of identifying potential violations of policies and procedures. Policies and procedures should specifically document the scope, frequency, and procedures of audits. Audits should be both routine and event-based.


A set of standard preconfigured document management reports should be offered by the eTMF. Often a user can subscribe to the eTMF to receive these reports via email. As an example, a report listing documents captured by document type, documents captured by site, documents captured by investigator or other person, documents captured by category. Also missing documents by site, by document type, and by person are useful to provide proactive notice that a document has not yet been collected or is missing.

Search and Retrieval

Retrieve the electronic documents from the storage. Although the notion of retrieving a particular document is simple, retrieval in the electronic context can be quite complex and powerful. Simple retrieval of individual documents can be supported by allowing the user to specify the unique document identifier, and having the system use the basic index (or a non-indexed query on its data store) to retrieve the document. More flexible retrieval allows the user to specify partial search terms involving the document identifier, document type, and/or parts of the expected metadata. This would typically return a list of documents which match the user's search terms. Some systems provide the capability to specify a Boolean expression containing multiple keywords or example phrases expected to exist within the documents' contents. The retrieval for this kind of query may be supported by previously built indexes, or may perform more time-consuming searches through the documents' contents to return a list of the potentially relevant documents. See also Document retrieval


Many document management systems offer content integration and exchange capabilities. Open standards allow some level of integration with other software and systems. Most recently, major enterprise document management vendors collaborated on a new specification to enable easier integration and web-based exchange of enterprise documents and records.  The Content Management Interoperability Services specification (CMIS) is a format for improving interoperability between Enterprise Content Management systems. OASIS approved CMIS as an OASIS Specification on May 1, 2010.  While CMIS can be used as a transport for communicating document information between systems, it fails to specify any format for a archive's metadata, document type names or other core schema.  The eTMF ontology will resolve this, allowing eTMF archives to be easily exchanged, imported or exported via the CMIS standard.


Metadata attributes or 'tags' are typically assigned to each document type/content type. These tags are used to capture data values for each document for classification, search and reporting.  The metadata and the metadata values are then stored with either the document in the eTMF archive, or with the actual document embedded as metadata tags.    Examples of standards-based metadata are document archive metadata such as ‘Date’ or ‘Creator’ from Dublin Core metadata, or ‘Site ID’ to identify a study site, from NCI thesaurus. Metadata values are the data that is stored with the metadata attribute.   The document management system may also extract metadata from the document automatically or prompt the user to add metadata. Some systems also use optical character recognition on scanned images, or perform text extraction on electronic documents. The resulting extracted text can be used to assist users in locating documents by identifying probable keywords or providing for full text search capability, or can be used on its own. Extracted text can also be stored as a component of metadata, stored with the image, or separately as a source for searching document collections.

eTMF Archive Format A published document archive format that allows BioPharma content archives to be exchanged and archived in both physical (e.g., CD or DVD-ROM) and web-based formats. Documents in the archive which are used for eSubmissions must be in PDF format.  PDF formatted documents cannot be easily altered, are secure and support embedding of digital signatures.  The PDF format is accepted by US and European agencies and is typically used as a format for documents in eTMF online repositories or offline archives. PDF documents and metadata record content can be stored in the online document repository, or optionally in a separate file for offline access, such as within an encrypted .zip file archive package. 

eTMF Content Model

An eTMF content model is a published, flexible, machine readable hierarchy of classifications, metadata terms, and relationships that acts like an electronic filing organization plan.   The eTMF content model allows seamless automated creation of eTMF repositories and archives.  Based on published vocabularies, terms and classifications, the eTMF content model supports automated content classification, digital signatures, audit trails, automated workflows and web-based information exchange.   ETMF content models should be sharable and network discoverable.  To facilitate web based clinical trial content interoperability, semantic technologies are often used to share information ontologies.   eTMF content models can be expressed as web sharable ontologies, allowing organizations to share information electronically through the internet.   The first eTMF content model ontology was developed by CareLex and published at the National Center for BioMedical Ontology's site, NCBO BioPortal.  The CareLex eTMF Content model ontology utilizes and links to the National Cancer Institute's Thesaurus ontology, NCIT.

While the features above are currently recommended for an eTMF system, few if any commercial solutions support every feature outlined above as of the publication date of this white paper.  The main roadblock to delivery of a complete eTMF system as outlined above is the availability of a published, machine readable, standards-based eTMF content model that incorporates the foundational components referenced previously.

To enable productivity in an eTMF content management system, content model standards are necessary.  For example, if there were no standards for email, it would be impossible for a user to email a document to another and expect that they could receive it or even view it. Similarly, without standards for eTMF content models, it will be difficult, time consuming and expensive for industry members to exchange clinical trial eTMF repositories and archives.


Adoption of electronic document management processes is becoming essential to business productivity, cost savings and shortened BioPharma product development timelines.  The key to implementation of interoperable eTMF systems is use of a standards -based content model, a standards based vocabulary, and web standards-based technologies. is a non-profit industry group that was formed to develop technology solutions to the eTMF obstacles, by using standards-based terms, technologies and systems in an open, collaborative process.   In order to assist the industry in moving forward with eTMFs, one of its first initiatives, will publish a proposal for core metadata and a shared content model schema to be freely used by industry. is established as an open forum where industry members can freely contribute ideas, publications, software and solutions. There is no fee to be a member and any and all contributions are welcome.


1.  Len Asprey, Rolf Green, Michael Middleton: Integrative Document and Content Management Systems Architecture. Encyclopedia of Database Technologies and Applications 2005: 291-297

2.  TMF Reference model Excel Spreadsheet, Accessed Aug 16 2011


Zack Schmidt, BS/MBA

Exec Director, CareLex


You are here: Home Publications