How to Handle PII When Using AI: Best Practices

Quick Definition: Personally Identifiable Information (PII) is any data that can identify a person. It’s things like names, addresses, or social security numbers. Linked or linkable data also provides substantial information about a person, such as a birth date, gender, or location.
AI tools are awesome, and they’ve changed the way we work. They automate boring tasks, analyze huge datasets without breaking a sweat, and generate decent code in seconds. The problem is if you aren’t careful, you could be sharing sensitive information. It only takes one copy-and-paste into a public AI system to leak data accidentally.
Sharing PII with an AI system can be a really serious problem—regulations like HIPAA and Europe’s GDPR both hand out massive fines every year for data violations. The potential cost of a fine isn’t worth the risk. This guide will help you figure out what is safe to enter into an AI app, and what you definitely shouldn’t.
What is PII and Why Does It Matter
PII data is information that can be used to identify a unique individual, but it includes more than a person’s name or Social Security number.
Direct Identifiers: These are identifiers that point directly to a specific person, such as full names, Social Security numbers, phone numbers, email addresses, or passport information.
Indirect Identifiers: This kind of information doesn’t mean much on its own, but it can become PII when linked with other data. Think about things like birth dates, zip codes, medical records, and IP addresses.
It might seem harmless to paste data into an AI system that doesn’t directly name a person, but there’s more to it than that. Research conducted in 2000, using 1990 census data, found that a birth date, gender, and ZIP code were sufficient to identify 87% of the US population as unique individuals. Newer research also found that anonymized datasets with “de-identified” information could be used to “re-identify” people with enough extra data points.
PII really matters, especially for businesses that handle data. Data breaches can expose PII and reveal enough information about customers, suppliers, or employees to make them targets for phishing attempts or identity theft. Leaked PII data kills customer trust and inflicts significant reputational damage on companies, which is hard to recover from.
Then there are legal issues to worry about. GDPR and HIPAA both impose eye-watering fines for companies that expose personal or medical data. Even worse, PII data that gets fed into a public AI platform could end up as part of the training data for future AI models.
How AI Interacts with PII
AI systems don’t just read your data and then forget it. Data gets processed, stored, and sometimes shared with other systems. AI works with data in ways that you might not have even thought of, making the problem of sharing PII with an AI tool a serious security risk.
AI Data Processing
When you enter a prompt into a service like ChatGPT, the data you send is processed immediately, and you receive a response. During that conversation, your data could be doing a lot more than you thought. It could be stored in processing queues, saved to training datasets forever, or even transmitted across servers in different countries.
Some of the major AI companies say that they don’t train on your data or offer an option to exclude your data from their training. Others are not that clear about it, which is why you should always make sure to read and understand the fine print of your user agreement, regardless of which AI provider you choose.
Storage and Retention Risks
Some AI systems retain data longer than they should (from a compliance and regulatory perspective). Using a local AI model reduces some of these risks, but your conversation histories that contain PII could still be stored on your local machine. These conversations will remain on your system until you manually delete them, and the longer PII data is stored, the higher the risk of a data breach.
Public AI tools integrate with other systems and services through complex integrations. Whenever you enter PII into an AI platform, you risk that data being duplicated and stored across different systems. Each new connection in this chain of systems is a potential point of leakage.
How to Protect PII in AI Systems
Protecting PII in an AI system isn’t just one solution that fixes everything. It has multiple layers you need to stay on top of at all times.
Minimize Data Collection
The best way to avoid AI tools using any PII data is to not enter it into an AI system that you don’t have any control over. If you plan on using an AI system with your sensitive data, be fully aware of its data collection policies and retention settings. Ask yourself if you really need to share PII to get the results that you want from the AI model before you send anything.
Anonymize and Mask Data
Before you enter data into a public AI system, anonymize it first by using a trusted, local AI instance or approved enterprise model that you are allowed to use with PII. Data anonymization replaces information that could identify an individual with synthetic values. A person’s name is assigned a label like ‘Employee_A’, and contact information is replaced with randomized numbers or removed altogether.
Limit Data Retention
If you have control over data retention on a local AI system, then you need to make sure that you delete it as soon as you’re done with it. If you use local AI tools like LM Studio or Ollama, check to see how session history is handled by default to make sure that you aren’t keeping data when you shouldn’t.
Use Secure AI Platforms
Before you set up your AI system, ensure it follows best practices. This is for features such as encrypting data in transit and at rest, or for an AI system that offers local deployment options. The AI tool that you choose needs to comply with whichever regulations apply to your industry, and it must let you delete data permanently.
Free AI tools don’t always follow best practices, which means they could be training on your data, sharing it with third parties, or storing it indefinitely. Enterprise AI platforms are pricey, but they are much better at protecting your data because you have some control over your data.
Restrict Access and Permissions
Limit access to your AI systems that process PII, and use controls such as role-based access that grant or deny permissions based on user profiles. Users who only need access for occasional tasks need to have that access revoked when they’re done.
Multi-Factor Authentication (MFA) is a must-have security feature for logging into your AI platform, so make sure it is enabled and ready to go before you make an AI system available company-wide.
How to Prompt and Input Data Responsibly
How you interact with your AI has knock-on effects for the safety of your data. Here's how to prompt and input data the right way:
Avoid Sharing PII in Prompts: Never paste names, phone numbers, or medical records into an AI session unless you know that the system is under your control, and data can be deleted after you are done.
Use Test Data: AI models do a good job of making convincing synthetic or dummy data. If you are testing an AI model and need data that simulates PII, create a prompt that generates fake names, phone numbers, and other required data.
Monitor AI Logs: Review your logs regularly and look for suspicious patterns that could indicate insider threats or compromised accounts accessing your AI system.
AI, Compliance and Legal Considerations: What to Know
Regulations around AI and compliance are changing fast. To stay compliant, you need to keep up. Here's how:
Know Your Regulations: Every industry has its own rules that must be followed. Healthcare must follow HIPAA; companies that process European citizens' data must follow GDPR; and financial institutions must comply with multiple regulations to remain compliant.
Vendor Compliance: The AI company that you use has to comply with certain standards depending on the kind of data that you want them to handle. If you plan to use sensitive data on their platform, you need to know where the data is stored, how long it is retained, and what access controls are in place.
Maintain Documentation: Record all data usage by your AI systems with proper documentation. This includes records of the PII you collect, why you collect it, and how long you plan to store it. You will need documentation about your security measures, incident response procedures, and data handling. Documentation isn’t fun to maintain, but it really helps you during audits, so keeping it up to date is essential.
How to Build a Culture of Data Privacy
An organization's culture is carried by its people. For AI and PII safety, this means helping employees understand why it is important. These steps will help ensure your culture is part of the solution:
Employee Training: Employee awareness should always be the primary focus. You can’t just schedule training once a year; it needs to be an issue that gets spoken about in meetings, memos, and emails all the time.
Regular Audits: Auditing is a huge part of AI safety. Check for PII in your data and audit user permissions regularly. Ensure that each person's access level remains the right fit for their work.
Transparency with Users: Users who log in to your systems need to know what data you collect and how you use it. Whether it is an internal AI system employees use for work tasks or an AI service you offer to users, transparency is essential to maintain trust.
Conclusion
Handling PII in AI systems has become a normal part of business operations. As more people adopt these systems, the risk of data leaks and exposure of confidential information continues to grow. Regulations are catching up, but you need to maintain compliance to stay in the market.
To make the most of this new technology, you need to strike a balance between innovation and data responsibility. Using AI doesn’t mean saying goodbye to data privacy; it's more important than ever. Many procedures and safety measures must be in place before you roll out AI within your organization or offer it to customers as a service.
Want to learn more about AI compliance? Get started with AI Compliance training to build skills that protect both your organization and the people who trust you with their data.
AI Compliance Checklist
Complete before pasting anything into an AI tool
- PII Required
- Tool Risk Assessment
- Data Sanitization
- Technical & Account Controls
- Organizational Compliance
Final Gut-Check Rule
When in doubt – do NOT paste it. There is almost always a safe way to achieve the same result using synthetic or heavily anonymized data.
delivered to your inbox.
By submitting this form you agree to receive marketing emails from CBT Nuggets and that you have read, understood and are able to consent to our privacy policy.