Skip to main content

Data Math

Data Processing Through Pure Math

We're building a software framework for processing data stored in a database through pure Math constructs. Focusing initially in the medical device and pharmaceutical space. We represent the entire lifecycle of a medical device and pharmaceutical product into a series of vectors. And then we use pure Math to analyze the vectors, and to predict future outcomes for each device and drug.

Our framework leverages the Word2vec concept from the Natural Language Processing ("NLP") space. Word2vec is designed to identify Semantic as well as Syntactic word similarity.

This document outlines a Reference Implementation using Open Data related to pharmaceutical products and medical devices as a practical implementation of this approach.

Step 01: build a representation model to transform database-hosted information for each device and drug into a multi-dimensional vector space ("DataVectors"), with time as both a crucial representational as well as the predictor X axis.

Step 02: train models on our Regulatory Repository.

Step 03: use the trained models to compare Data Vectors and: a.) identify similarities between different Data Vectors b.) predict the future outcome of a single Data Vector based on the trajectory of similar Data Vectors

Data Math should be able to identify and predict functional as well as prescription similarity. That is: for each device / drug in the market, compare a particular device's / drug's DataVector with that of other products and predict the device's / drug's future performance and possible risk of being recalled.

Given the very narrow scope of the datasets of interest, we expect the Data Vectors to have a small level of complexity and a low number of dimensions.

Data Representation as Vectors

We represent a dataset as a series of vectors in the 3D space defined by the X axis (always representing time); and the Y and Z axes.

For example, in the case of a medical device:

  • each regulatory event is placed along the X axis representing years
  • adverse events are represented as bars of different colors and shades, mapped to each corresponding year the adverse event took place in

Data Categories to Represent

These are the initial categories of Open Data to analyze and "vectorize."

Data Category Description
Chemical Compound(s) Chemical structure of "active ingredient" included in every pharmaceutical product.
Literature / Publications Published articles, papers, and evidence about the specific chemical compound.
Clinical Trial(s) Scientific trials conducted to verify safety, efficacy of a particular compound.
Intellectual Property Patents
Regulatory Pathway Filings presented to Health Authorities to request marketing approval (EMA, FDA).
Reimbursement Payments made by the US' CMS to Medicare providers for a drug, treatment, or device.
Open Payments Payments from manufacturers to healthcare providers.
Adverse Events Unintended, undesirable negative effect a device or drug causes on a patient.
Recalls A product is withdrawn from the market.

Data Quadrants

Visualize a 3D space where there is a plane perpendicular to the X axis (yellow surface below).

Then divide the plane into 04 quadrants.

The X axis is red. The Y axis is green. The Z axis is blue.

These are the data categories to represent in each of the 04 Data Quadrants:

Quadrant Description Data Categories Purpose
-Y, -Z Sources
  • Chemical Compounds
  • Literature / Publications
  • Patents / IPR
Represent data that documents the sources of the compound.
+Y, -Z Approval Process
  • Clinical Trial(s)
  • Regulatory Pathway
We believe that the way a drug is tested and approved (number, type of clinical trials and approval submissions) has an impact on the drug's future clinical performance.
+Y, +Z Payments
  • Reimbursement
  • Open Payments
  • Grants
We believe that the type and amount of payments and reimbursements have predictive power over a drug's future financial performance.
-Y, +Z Impact
  • Adverse Events
  • Recalls
Document adverse events for drugs currently in the market. And represent recalls for drugs not in the market any more.

Contact us

Please contact us for more details.