Features

AI & Drug Substance Pricing

Is artificial intelligence the ultimate tool for assessing whether the price paid for your drug substance or related precursors is fair?

In only a few industries is the cost of goods sold as low as it is in pharmaceuticals, especially when considering only variable or cash costs. For instance, our previous analysis1 revealed that in Western countries, the cost of the drug substance for branded, patented prescription products typically accounts for less than 5% of the ex-manufacturer price of the formulated product. This is a remarkably small proportion compared to most other industries.

As a result, except in cases involving large daily dosages, highly complex molecular structures, or where the product offers only limited therapeutic advantages compared to generics, the cost of the drug substance is rarely a significant barrier to the development and launch of new pharmaceuticals. It typically does not prevent companies from achieving the 75-80% margin threshold they expect for their new products.

However, this doesn’t mean that the economics of obtaining the drug substance are irrelevant, regardless of the sourcing approach or the product development phase. During clinical trials, the active substance is a significant cost factor. The key difference lies in the metrics used: for in-house production, the focus is on internal costs, while for outsourced production through third-party vendors or CDMOs, the concern shifts to the net price paid to the supplier.

Like any other business, pharmaceutical companies aim to optimize their access to drug substances not only for products in the commercial phase but also increasingly during early clinical trials. This is especially critical for start-ups with no recurring revenue, where access to the drug substance can be a significant cost. In this context, companies seek to strike the best balance between key performance indicators (KPIs) such as overall cost, flexibility, and supply security, among other factors.

To evaluate the KPIs related to drug substance access, the pharmaceutical industry has traditionally relied on a wealth of knowledge and experience gained over years of in-house drug substance production, which has often been managed internally.

These competencies have enabled companies to estimate the production costs for a given molecule, whether expressed as variable, cash, or total costs. Such analysis has played a key role in evaluating and selecting offers from various CDMOs approached for the supply of the product in question.

This type of analysis often compares the company’s estimated variable or cash costs with the price offers from CDMOs. These offers are typically calculated to cover the full costs, plus a markup that provides the vendor with a “fair” risk-adjusted return.

CDMOs take a similar approach when developing pricing offers in response to requests for proposals. Given the pharmaceutical company’s deep knowledge of drug substance production, the vendor’s cost structure is as transparent as that of regulated utilities, whose prices are set on a cost-plus basis. However, unlike utilities that enjoy capped returns and protection from competition through regulatory barriers, CDMOs typically face multiple competitors vying for the same customers.

The resulting competition among potential suppliers effectively imposes a ceiling on the markup over full costs and the returns they can achieve.

Nowadays, many pharmaceutical companies are increasingly moving away from conducting their own drug substance activities and are extensively, if not systematically, outsourcing these functions to CDMOs. The chemistries performed in-house are often restricted to discovery and pre-clinical development stages, typically not exceeding lab or kilogram-scale production.

Biopharma and emerging pharma companies, which account for 66% of all clinical trials2 and have been responsible for over 70% of the new molecular entities (NMEs) approved by the FDA in recent years, almost always utilize a fully virtual supply chain model.

Outsourcing drug substance production offers several benefits, but it can also lead to a loss of in-house knowledge and insights into production economics. For a pharmaceutical company with limited hands-on experience in large-scale synthesis and its complexities, accurately estimating the cost of a new drug substance or its intermediate becomes challenging. This makes it difficult to determine if the pricing proposals from CDMOs are fair and sustainable.

Production costs and target prices are usually estimated using a bottom-up analysis based on clearly documented assumptions. This method involves defining each step in one or more synthesis routes, including chemical reactions and physical operations like filtration, distillation, or crystallization. For every step, the consumption of inputs, cycle time, manpower, and utility requirements are estimated. The cost for each step is calculated by multiplying these inputs’ unit costs by their consumption, and the total production cost is derived by summing the costs of all steps.

At first glance, this analysis might appear straightforward and easily manageable for those with basic knowledge in chemistry, process engineering, and financial modeling. However, achieving the level of confidence needed for a well-informed business decision is far more complex in practice.

As always, the principle of “garbage in, garbage out” applies: the quality of the analysis depends critically on the strength of the assumptions used.

This involves defining the most feasible synthesis route and sequence of steps, considering their practical implementation at a large scale, including factors like cycle times and environmental impact. Concurrently, it is essential to estimate yields and consumption rates for reagents, raw materials, solvents, utilities, and other ancillary materials. This also includes evaluating yield and losses at each physical step, such as filtration or distillation. After estimating the input utilization factors, it is necessary to determine the unit cost or price for each factor based on objective criteria. This process can be complex, especially for indirect costs such as equipment cleaning, changeovers, or wastewater disposal.

An additional complication arises from uncertainties due to the limited data available for specific products and their associated processes. This lack of data often results from the minimal experimentation typically conducted during the early development of compounds.

In practice, companies experienced in large-scale synthesis—whether through their own operations or by employing staff with relevant experience—have accumulated valuable knowledge over the years. This expertise allows them to estimate the cost or price of their products based on their own capabilities and experience.

Their approach involves using standard unit costs from past experience for each step of the process. These costs are then adjusted to account for the specific characteristics of the process and product, including factors such as the scale and complexity of the molecule.

Variable production costs—such as raw materials, utilities, and waste disposal—are typically calculated based on the raw materials available on the market. These calculations consider the number and type of reaction steps, solvent requirements, and yields estimated from literature. Fixed costs, including direct labor and plant overheads, are determined by multiplying the estimated utilization time for each major piece of equipment by its associated hourly costs, expressed in USD per gallon or cubic meter of reactor capacity. To determine the fully loaded production economics and estimate the expected market price, non-cash costs such as depreciation and a mark-up for the required return on capital must be added to the cash production costs.

This type of analysis has several applications. It offers a semi-quantitative assessment of a product’s financial viability, sets concrete cost reduction targets for process optimization, and provides an objective basis for evaluating the price competitiveness of offers from various vendors.

Clearly, conducting such an analysis is both time-consuming and resource-intensive due to the extensive experience required. This has led to the search for simpler and quicker methods to estimate the cost and price of new products.

The article by Peter Hart and Jud Sommerfeld3 discusses methods for estimating the cost of “specialty” chemicals at an industrial scale based on laboratory-scale prices. These methods use the following logarithmic equation:

Log P = log a + b * log Q

In this equation, P represents the price, Q represents the volume, and a and b are parameters empirically determined for each chemical. While this approach may seem straightforward, its practical application is limited. It is typically suited for existing products available at the lab scale, which is often not the case for newly developed molecules by pharmaceutical companies.

Another approach, often called “the Pfizer method,” estimates production costs and target prices by counting the number of covalent chemical bonds and stereo chemically defined centers formed during the synthesis process. This count is then multiplied by a factor that varies based on the scale, volume, and potency of the compound. This factor is determined from a dataset that includes offers from external suppliers and costs estimated from the company’s own industrial operations, linking these to the compound’s structure and features.

Although this methodology is very simple to apply, it has notable shortcomings due to its simplicity. It only considers some of the cost determinants, leaving out other important factors.

To address the limitations of methods based on past experience or simplified shortcuts, there is growing interest in using recently developed analytical tools like big data and artificial intelligence. These tools are being explored to predict the target cost and price of new products by drawing analogies with existing products.

This approach involves providing an artificial intelligence system with the structural formulas and other physicochemical parameters of a wide range of molecules, formatted in a machine-readable form such as “SMILES” (Simplified Molecular Input Line Entry System), along with their sales prices.

The AI system is designed and trained to identify correlations between chemical structures and prices. The goal is to leverage the system’s experience to predict the price of a new molecule by inputting its CAS number. The system calculates the price using the correlations it learned from the dataset used during its training.

In theory, using such a system would allow even the least experienced company to quickly determine a price or cost range for its new chemical products.

An interesting example of this approach is provided by Kwabena Afori-Attu and Clayton Springer,4 who developed a Random Forest regression model for estimating development-scale product costs. They trained the model using price and structural data from over 300,000 products.

Although the model demonstrates some correlation between structures and prices, it needs further refinement to become a reliable predictive tool for new products. For instance, it struggles to explain and predict the significant price variations observed between structures that are only subtly different.

There may be several underlying causes for these limitations in the system. Possible causes for these limitations could include inconsistencies in the dataset used to train the system, particularly with laboratory-scale chemicals. Catalog prices for these chemicals often do not reflect their structures, which introduces bias. This bias may stem from the limited volumes involved and the significant impact of repackaging costs on the pricing structure.

To the best of our knowledge, the use of artificial intelligence systems to predict prices and costs for new drug substances and related intermediates at a larger scale has been limited by the scarcity of reliable and consistent data. Prices for these types of products are often closely guarded secrets, which hinders the development of accurate predictive models.

To address this issue, we have developed an AI model using a dataset of approximately three thousand products for which commercial offers are available. This model aims to predict the target price and cost of new structures.

The logic of this model is illustrated schematically in Figure 1.


Figure 1. Model structure

The dataset, which includes SMILES and prices for various products—primarily advanced intermediates and some active ingredients—was used to train AI systems employing predictive Random Forest and XGBoost models. The training process involved bootstrapping, where the dataset was randomly divided into smaller samples to assess the consistency of the models.

After training the system, the SMILES of the product for which the price needs to be estimated are inputted. To prevent bias from comparing unrelated structures, the Tanimoto/Jaccard algorithm is used to measure molecular similarities. This algorithm ensures that only products with a correlation coefficient greater than 0.7 are considered, where the Tanimoto coefficient ranges from 0 (no similarity) to 1 (perfect match).

The system then estimates the price of the product based on the patterns it learned during training with the data set filtered by the Tanimoto algorithm. To validate the system’s accuracy, its estimated prices for known commercial products were compared with their actual prices.

As shown in Figure 2, which plots actual prices of selected products on the horizontal axis and the prices estimated by the system on the vertical axis, the correlation coefficient between actual and AI-derived prices is approximately 70%. Notably, the largest discrepancies occur among the lowest-priced products.


Figure 2. Example of results

Although a correlation coefficient of 70% is not high enough to guarantee precise price predictions, it still indicates that the approach is valid and useful.

We plan to further refine the model by expanding the reference dataset used for training the AI. This will involve including a broader range of products and possibly adding extra parameters for each product, such as production scale, quote date, and location. These enhancements aim to improve the model’s accuracy and increase confidence in its predictive capabilities. 

References
  1. Drug Substance Cost – a non issue? Michele Jermini and Enrico Polastro – Contract Pharma, March 2020.
  2. IQVIA Institute – Global Trends in R&D – February 2034
  3. Cost estimate of Specialty Chemicals for laboratory scale prices; Peter W.Hart and Jude T. Sommerfeld – Cost Engineerin Vol 39 (3) March 197.
  4. The price is right: Predicting reagent prices; Kwabena Afori-Atta and Clayton Springer – ChemRxiv 2021.


Dr. Michele Jermini is the Managing Director of Exeris; michele.jermini@exeris.com

Dr. Paul Hanselmann is Managing Director of ChemSynthDesign; paul.hanselmann@rhone.ch

Dr. Enrico Polastro is a Vice-President of Arthur D.Little; polastro.enrico@adlittle.com

Keep Up With Our Content. Subscribe To Contract Pharma Newsletters