May 2, 2019

The marketer’s data dictionary: 50 terms you need to know

minute read

Your response to the original Marketer’s Data Dictionary was so positive that we have now enhanced it with 14 brand new terms. Enjoy!

Slay the jargon dragons at your company with The Marketer’s Data Dictionary: our user-friendly guide to 50 terms you need to know.

When it comes to data, accessibility is everything.

So, we’ve put together a data dictionary to help marketers simply understand complicated martech language and slay the jargon dragons standing in the way of meaningful, inclusive conversations about data at work.

Types of data

1st party data

1st party data is the data you have collected and are able to use.

2nd party data

2nd party data is the data you can receive from agreed partners.

3rd party data

3rd party data is the data collected from everything and everywhere else, well beyond your own interactions.

We’ve expanded out on first, second and third-party data here.

Persistent ID

A consistent identifier (i.e. a customer number or email) used to follow a customer across different devices, think mobile web, in-app and on desktop.

De-identified ID

Personal information that cannot be associated with a specific individual. “De-identification” refers to removing personal information like an email address or a name.

Hashed data

Hashed data is data stored in an encrypted, secure format. For example is BB8C71F261C69B19446FD88243F8E579820C5D536CCD2572A5D284EEF6081D0 in hashed form.

When a Customer Data Platform (CDP) like Lexer sends email addresses to Google to create an audience, we send the hashed emails and Google matches it to their database of hashed emails – so no private information is transferred.


Personally Identifiable Information (PII) is data like email address, name, date of birth, physical address, or a phone number that can be used to confidently identify a specific person. (It’s important you handle it correctly, especially in Social Media) It is possible to securely unify digital behaviors to an identified person using Lexer Tag.

Cookie data

Think of cookies as the crumbs you leave behind for the internet to trace your customer journey.

A cookie is a small amount of data generated by a website and saved by a user’s web browser. Its purpose is to remember information about a specific user, storing your logins, helping you pick up where you left off and personalizing your experience.

SKU level data

SKU stands for stock keeping unit, a way of keeping track of products by assigning them a specific number. Think of it as the Dewey decimal system for retail businesses. This number denotes category, size, style and color.

NPS data

The Net Promoter Score is an index ranging from -100 to 100 that measures the willingness of customers to recommend a company’s products or services to others. It is used as a proxy for gauging a customer’s overall satisfaction with a company’s product or service and a customer’s loyalty to the brand.

Lexer NPS allows customer service agents to send personalized NPS surveys on social and attribute scores to full profiles of each individual.

Public data

Public data is information that can be freely used, re-used and re-distributed by anyone with no existing local, national or international legal restrictions on access or usage. For example, census data released by the Australian Bureau of Statistics. This article has 33 fascinating examples of public data.

Demographic Data

Statistical data on individuals and households usually collected by a census. Experian ConsumerView is a powerful source of demographic data: a comprehensive data set on 80% of the population. Our partnership with Experian gives our clients access to insights on household income, gender, education, occupation, relationship status, decision making, Mosaic® segments and more.

Mastercard Advertising Insights

Mastercard Advertising Insights identifies consumer segments based on aggregated and anonymized spend data within each postal code and category, derived from billions of Mastercard transactions. Lexer has partnered with Mastercard to make this powerful source of data enrichment available to our clients through our CDP.

Bringing data together


The process of bringing together or integrating customer data from a range of sources to form a comprehensive profile of a customer, by connecting data points using deterministic or probabilistic matching.

Deterministic matching

Deterministic matching looks for an exact match between two different pieces of data. For example, an email address or phone number in two data sets can be used to make an exact match of two records.

Probabilistic matching

Probabilistic or “Fuzzy” matching calculates the likelihood of a match based on a scoring system on a range of data points. For example, two customer records with the same address and date of birth are 99.9% the same person, but two records with the same name, like John Smith, are not very likely the same person. Usually, a combination of 2-3 data points like address, DOB, name, and transactional data are used.

Data cleansing

Data cleansing involves detecting and correcting corrupt or inaccurate data. An example would be removing the value of ‘John’ from a column called ‘Age’.

Data enrichment

The process of creating a richer view of each customer record by adding data from external sources. Lexer enriches each customer record with valuable data from partners like Experian, Roy Morgan, and Mastercard to give you extra data points like their Mosaic profile or purchasing habits.

Extract, transform, and load (ETL)

ETL is a process used in data warehousing to prepare data for use in reporting or analytics. The data is taken from somewhere, shaped, and loaded into a database. It’s one of the initial stages in Lexer’s data onboarding process, which cleans client data before it is loaded as attributes and identities into our platform.

Housing data

Data lakes

Data Lakes store data in its raw format: unstructured, inconsistent and not easily queried. Companies can build data lakes by using Infrastructure-as-a-Service (IaaS) clouds including Amazon Web Services (AWS) and Microsoft Azure.

Data warehouse

A massive database where the structure is defined before the data is captured. Think a ginormous Excel spreadsheet, with rows and column titles specified in advance. Data Warehouses are easier to analyze than data lakes but generally require technical data skills and specialized software.

Analyzing data

Machine learning

Machine learning is a method of data analysis where systems can learn from data, identify patterns and make decisions with minimal human intervention. Some examples of machine learning include regression analysis, looking at trends in data to predict what happens next and classification, the process of predicting the class or targets of given data points e.g. this email matches spam filters, so I will send it to your spam folder.

Customer churn modeling

Churn modeling is a method of identifying which data points can be used to indicate someone is likely to stop buying from you, in other words, their churn rate. It’s a quintessential form of regression testing, as it can help you take action to prevent churn and improve retention rates.

Lifetime value

Lifetime value is much money a customer has spent with you, adding up all spend, across channels, removing discounts and refunds. This can be a challenge if you don’t have all transaction data in one place – this is where a CDP comes in really handy. Sometimes lifetime value can also mean how much a customer will spend with you in the future. Click here to learn how to measure customer lifetime value.

RFM model (Recency, frequency & monetary model)

RFM modeling ranks customers by the recency and frequency of their purchases and how much they’ve spent with you in the past. It’s great for identifying high-value customers for loyalty campaigns and re-engaging lapsed customers.


A way of taking an audience and expanding it to include people with similar qualities. Lookalikes may be used for prospecting and to ensure you’re reaching the largest and most relevant audience possible.

Attribution model

An attribution model is a way of determining the source of leads coming in from different touchpoints. Google Analytics’ Last Interaction model is an example of this, which assigns 100% credit to the final touchpoints (ie. clicks) before a conversion.


One of the more accessible coding languages, Python is really popular in data science and machine learning. Python is supported by major tech companies like Google, Instagram, Netflix, and Dropbox.

It’s a great way to query and transform data, and Lexer use it in our ETL process.

Structured Query Language (SQL)

SQL is an abbreviation for structured query language and pronounced either see-kwell or as separate letters. SQL is a standardized query language for requesting information from a database.

Using data in your day-to-day

Audience / customer onboarding

Audience onboarding involves uploading your customer data to an outside platform like Facebook and having them match it to their database.

Programmatic advertising

Programmatic advertising is the automated process of buying and selling ad space using a few tools including a Demand Side Platform (DSP). To learn more about a how a DSP fits into your marketing stack, check out Navigating Martech: DSP.

Single customer view

A comprehensive view of a customer across all channels. Single customer view is the end goal of bringing various customer data points together. A CDP achieves this rapidly and allows marketers and customer service teams to use this data to deliver personalization and contextualized customer care.

Multichannel marketing

In multichannel marketing, a brand may use different channels to interact with customers, but each channel is managed separately and with a different strategy.

Omnichannel marketing

Omnichannel marketing seamlessly connects all consumer touchpoints to create a consistent and progressive customer experience. It is centered around customer-centric measurements like lifetime value and loyalty, and you need to measure key omnichannel metrics to ensure a successful omnichannel strategy. Excited? Read our guide to the Top 5 benefits of omnichannel marketing.

Cost per click

Cost Per Click (CPC) refers to the actual price you pay for each click in your pay-per-click (PPC) marketing campaigns. CPC is one of the most commonly tracked indicators of success for advertising campaigns. Click here to learn more about how to measure and improve your marketing effectiveness.


CPM means Cost per Thousand Impressions, a marketing term used to denote the price of 1,000 ad impressions online. CPM is a commonly used marker of success when comparing campaign results.

If a website publisher charges $2.00 CPM, that means an advertiser must pay $2.00 for every 1,000 impressions of its ad.

Search Engine Marketing (SEM)

Search Engine Marketing involves using paid advertisements in search results to increase visibility. Companies bid on keywords that they think their customers might type into Google when looking for their product or service, so their ads appear in search results.

Service Level Agreement (SLA)

A service level agreement (SLA) is a contract between a service provider and the end-user that defines the level of service expected.

SLA is an important metric for measuring customer satisfaction, and we provide it simply and quickly through Lexer's CDP-powered customer service tool.


When it comes to Martech, the number of tools and abbreviations out there can be pretty overwhelming, which is why we developed Navigating Martech: The Complete Guide to the key platforms in the industry today. For now, let’s dive into five commonly used systems.

Customer Relationship Management (CRM)

Manages the sales and service history with every known customer. Originating from 1:1 sales workflows, a CRM can provide cross-channel contact history and servicing tools.

Data Onboarder

Expert in connecting email addresses (PII – known customers) to cookies (unknown prospects), so marketers can create addressable segments to target or suppress across the digital advertising ecosystem.

Tag Management System (TMS)

Manages the integration of tags from third-party software into owned digital properties.

Data Management Platform (DMP)

A DMP provides a centralized dataset that aggregates cookie browsing behavior (unknown prospects) to create large, de-identified audiences for ad targeting across digital channels.

Dynamic Creative Optimisation (DCO)

A system that analyses multiple data points on each visitor or email recipient and selects the ideal creative to serve from a library of dynamic or pre-created messages – all in real-time.

Key Players

CDP Institute

The Customer Data Platform Institute is a vendor-neutral organization dedicated to helping marketers manage customer data. Have a read of Lexer's content on their guest blog here.

CDP Resource

Led by Data Rockstar, Todd Belcher, the CDP Resource is another great read for understanding the increasingly crowded Martech space. We were thrilled to be named as one to watch by these guys, and you can read their recent spotlight on us here.


We’re passionate about data compliance, security, and management. We’re also certified and regularly audited and love helping our clients do the same. To learn more about our approach read our Privacy and Information Security policy.


General Data Protection Regulation (GDPR) is a reform designed to give individuals greater control of their data. It has been created to be centered around individuals – with a hefty responsibility on organizations, to empower greater transparency and accountability.


Short for information security, Infosec refers to the processes and tools companies use to protect their data.

ISO 27001

ISO 27001 is a global information security standard for an information security management system or ISMS. (We’re certified!)


SFTP stands for Secure File Transfer Protocol and is a way of safely sending data over a secure connection. (If you’re sending PII or sensitive customer data over email you are breaking laws and putting your customer’s data at risk).


SOC 2 is an audit that ensures you’re securely managing your data. It focuses specifically on controls around unusual system activity, authorized and unauthorized system configuration changes, and user access levels. For example, logins from unusual devices or locations.

Amazon S3

A cloud computing storage option that groups huge amounts of data into buckets that you can query through the Amazon Web Services (AWS) API.

Slay the jargon dragons at your company

We hope our data dictionary helps you slay the jargon dragon at your company, leading to more meaningful and inclusive discussions about how data can drive value at every level.

To learn more about making data part of your every day, have a read of the 2018 Data Culture Study. You can also dive even deeper into the martech space with Navigating Martech: The Complete Guide.

Speak with our retail experts

See how Lexer can help you know your customer and grow your business.
Book a demo
Elizabeth Burnam
Content Marketing Specialist
Elizabeth Burnam is a content marketer and a poet at heart. She has a degree in Professional Writing and experience developing high-impact marketing assets for a broad range of industries.Outside of work, she enjoys reading, painting, people-watching, and exploring the natural wonders of Vermont.