Text Redaction
The Text Redaction feature of SipPulse AI is a powerful tool designed to identify and redact (mask or replace) sensitive information from any textual content. This feature is crucial for organizations handling personal data and needing to comply with data privacy regulations such as GDPR, HIPAA, LGPD, and others.
Through advanced pattern recognition and flexible configurations, you can protect confidential data before storing, processing, or sharing it.
Interactive Playground
The Text Redaction Playground (access here) offers an intuitive interface to experiment with and validate redaction capabilities in real time:
- Language Selection: Choose the language of the text to be processed. The language selection influences the sensitive data entities available for redaction.
- Text Input: Enter or paste the text you want to redact.
- Entity Selection: Select the specific entities to be detected and redacted (e.g., CPF, Email, Name). The available entities are adapted according to the selected language.
- Numeric Sequence Configuration: For the "Number" entity, set the minimum length of the digit sequence to be considered for redaction (e.g., redact only numbers with 3 or more digits).
- Search and Replace Rules:
- Define specific text patterns (literals or basic regular expressions, depending on the Playground implementation) to search for in the text.
- Specify the replacement text for each pattern found.
- Add multiple rules for complex or customized redaction scenarios.
- Processing and Visualization: Click to redact and instantly see the redacted result, allowing quick adjustments to the settings.
- Code Generation: Get code samples in cURL, Python, and JavaScript, pre-configured with the parameters used in the Playground, making API integration easier.
The Playground is an essential tool to understand the behavior of the redaction engine, test different configurations, and prepare for integration into your applications.
Redaction Entities by Language
The redaction engine supports different sets of detectable entities depending on the text language.
Portuguese (pt
)
The following entities can be redacted in Portuguese texts:
- EMAIL: Email addresses (e.g.,
usuario@dominio.com
→[EMAIL]
) - URL: Web addresses and links (e.g.,
https://site.com.br
→[URL]
) - CNPJ: Brazilian company registration numbers (e.g.,
00.000.000/0001-00
→[CNPJ]
) - CPF: Brazilian individual taxpayer numbers (e.g.,
123.456.789-00
→[CPF]
) - PERSON: Person names (e.g.,
João da Silva
→[PERSON]
) - CREDIT_CARD_NUMBER: Credit card numbers (e.g.,
4111111111111111
→[CREDIT_CARD_NUMBER]
) - DATE_TIME: Dates and times (e.g.,
14/05/2025 10:30
→[DATE_TIME]
) - LOCATION: Place names, cities, states, countries (e.g.,
São Paulo
→[LOCATION]
) - IP_ADDRESS: Internet Protocol addresses (e.g.,
192.168.0.1
→[IP_ADDRESS]
) - NUMBER: Numeric sequences, configurable by length (parameter
sequence
).
English (en
)
The following entities can be redacted in English texts:
- EMAIL_ADDRESS: Email addresses (e.g.,
user@domain.com
→[EMAIL_ADDRESS]
) - URL: Web addresses and links (e.g.,
https://website.com
→[URL]
) - US_SSN: US Social Security Numbers (e.g.,
000-00-0000
→[US_SSN]
) - CREDIT_CARD_NUMBER: Credit card numbers (e.g.,
4111111111111111
→[CREDIT_CARD_NUMBER]
) - PERSON: Person names (e.g.,
John Doe
→[PERSON]
) - DATE_TIME: Dates and times (e.g.,
May 14, 2025 10:30 AM
→[DATE_TIME]
) - LOCATION: Place names, cities, states, countries (e.g.,
New York
→[LOCATION]
) - IP_ADDRESS: Internet Protocol addresses (e.g.,
192.168.0.1
→[IP_ADDRESS]
) - NUMBER: Numeric sequences, configurable by length (parameter
sequence
).
To get the exact and up-to-date list of entities supported for a specific language, use the API endpoint
/v1/anonymize/entities/{language}
.
Consuming via REST API
Integrate text redaction directly into your applications and workflows.
List Available Entities by Language
To check which redaction entities can be detected for a specific language:
Endpoint: GET /v1/anonymize/entities/{language}
Path Parameter:
language
(string, required): The language code (e.g.,pt
,en
,es
).
# Example to get entities for Portuguese
curl -X GET 'https://api.sippulse.ai/v1/anonymize/entities/pt' \
-H 'api-key: $SIPPULSE_API_KEY' \
-H 'Accept: application/json'
import os
import requests
import json
def get_anonymization_entities(language: str, api_key: str) -> list | None:
"""Fetches available redaction entities for a language."""
url = f"https://api.sippulse.ai/v1/anonymize/entities/{language}"
headers = {"api-key": api_key, "Accept": "application/json"}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
print(f"API error: {e.response.status_code} - {e.response.text}")
except Exception as e:
print(f"An error occurred: {e}")
return None
if __name__ == "__main__":
api_key = os.getenv("SIPPULSE_API_KEY")
if api_key:
entities_pt = get_anonymization_entities("pt", api_key)
if entities_pt:
print("Entities for Portuguese (pt):")
print(json.dumps(entities_pt, indent=2, ensure_ascii=False))
entities_en = get_anonymization_entities("en", api_key)
if entities_en:
print("\nEntities for English (en):")
print(json.dumps(entities_en, indent=2, ensure_ascii=False))
else:
print("Error: SIPPULSE_API_KEY environment variable not set.")
// Node.js with fetch
async function getAnonymizationEntities(language, apiKey) {
const url = `https://api.sippulse.ai/v1/anonymize/entities/${language}`;
const headers = { "api-key": apiKey, "Accept": "application/json" };
try {
const response = await fetch(url, { headers });
if (!response.ok) {
throw new Error(`API error: ${response.status} - ${await response.text()}`);
}
return await response.json();
} catch (error) {
console.error(`Failed to fetch entities for ${language}:`, error);
return null;
}
}
// Usage example:
// (async () => {
// const apiKey = process.env.SIPPULSE_API_KEY;
// if (apiKey) {
// const entitiesPt = await getAnonymizationEntities("pt", apiKey);
// if (entitiesPt) console.log("Entities PT:", JSON.stringify(entitiesPt, null, 2));
// const entitiesEn = await getAnonymizationEntities("en", apiKey);
// if (entitiesEn) console.log("Entities EN:", JSON.stringify(entitiesEn, null, 2));
// } else {
// console.error("SIPPULSE_API_KEY not set.");
// }
// })();
Sample Response (JSON for pt
):
[
"EMAIL",
"URL",
"CNPJ",
"CPF",
"PERSON",
"CREDIT_CARD_NUMBER",
"DATE_TIME",
"LOCATION",
"IP_ADDRESS",
"NUMBER"
]
Redact Text
To perform redaction on a block of text:
Endpoint: POST /v1/anonymize
Request Body (JSON):
{
"text": "string", // Required: The text to be redacted.
"entities": ["string"], // Required: List of entities to be detected and redacted. Use the strings returned by the /entities/{language} endpoint.
"sequence": 0, // Optional, default 0: For the "NUMBER" entity, sets the minimum digit sequence length for redaction. If 0 or not provided, number redaction may use a default or be disabled if "NUMBER" is not in 'entities'. If > 0, only numbers with 'sequence' or more digits are redacted.
"language": "string", // Required: Language code of the text (e.g., "pt", "en").
"search_and_replace": [ // Optional: List of search and replace rules.
{
"search": "string", // Text pattern to search for.
"replace": "string", // Replacement text.
"case_sensitive": true // Boolean, optional, default true: Whether the search is case sensitive.
}
]
}
Response (JSON):
- Success (200 OK):
{
"text": "string" // The text with sensitive information redacted.
}
- Error: JSON response with appropriate HTTP status code and error details.
# Example of redaction in Portuguese
curl -X POST 'https://api.sippulse.ai/v1/anonymize' \
-H 'api-key: $SIPPULSE_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"text": "Contato: João Silva, email joao.silva@exemplo.com.br, CPF 111.222.333-44. Telefone comercial 99999-1234 e código do cliente ABC001.",
"entities": ["PERSON", "EMAIL", "CPF", "NUMBER"],
"sequence": 5,
"language": "pt",
"search_and_replace": [
{
"search": "ABC001",
"replace": "[COD_CLIENTE]",
"case_sensitive": true
}
]
}'
import os
import requests
import json
def redact_text(
text: str,
entities: list,
language: str,
api_key: str,
sequence: int = 0,
search_and_replace_rules: list = None
) -> dict | None: # API response is {"text": "string"}
"""Performs text redaction using the SipPulse AI API."""
url = "https://api.sippulse.ai/v1/anonymize"
headers = {"api-key": api_key, "Content-Type": "application/json"}
payload = {
"text": text,
"entities": entities,
"language": language,
"sequence": sequence
}
if search_and_replace_rules:
payload["search_and_replace"] = search_and_replace_rules
try:
response = requests.post(url, headers=headers, data=json.dumps(payload))
response.raise_for_status()
return response.json() # Returns {"text": "redacted..."}
except requests.exceptions.HTTPError as e:
print(f"API error: {e.response.status_code} - {e.response.text}")
except Exception as e:
print(f"An error occurred: {e}")
return None
if __name__ == "__main__":
api_key = os.getenv("SIPPULSE_API_KEY")
if not api_key:
print("Error: SIPPULSE_API_KEY not set.")
else:
sample_text_pt = "Meu nome é Maria Oliveira, email maria.oliveira@teste.com e meu CPF é 987.654.321-00. O pedido número 87654 será entregue."
entities_to_redact = ["PERSON", "EMAIL", "CPF", "NUMBER"]
# Redact numbers with 5 or more digits
# Add a custom replacement rule
custom_rules = [
{"search": "pedido número", "replace": "protocolo", "case_sensitive": False}
]
redacted_result = redact_text(
text=sample_text_pt,
entities=entities_to_redact,
language="pt",
api_key=api_key,
sequence=5,
search_and_replace_rules=custom_rules
)
if redacted_result:
print("--- Redacted Text ---")
print(redacted_result.get("text")) # Access the "text" key from the returned JSON
# Information such as entities_found and usage are not present in the simplified response
# print("\n--- Entities Found ---")
# if redacted_result.get("entities_found"):
# for entity in redacted_result["entities_found"]:
# print(f"- Type: {entity['type']}, Text: \"{entity['text']}\"")
# print(f"\nUsage: {redacted_result.get('usage')}")
// Node.js with fetch
async function redactText({
text,
entities,
language,
apiKey,
sequence = 0,
searchAndReplaceRules = [],
}) { // API response is {text: string}
const url = "https://api.sippulse.ai/v1/anonymize";
const headers = { "api-key": apiKey, "Content-Type": "application/json" };
const payload = {
text,
entities,
language,
sequence,
search_and_replace: searchAndReplaceRules,
};
try {
const response = await fetch(url, {
method: "POST",
headers,
body: JSON.stringify(payload),
});
if (!response.ok) {
throw new Error(`API error: ${response.status} - ${await response.text()}`);
}
return await response.json(); // Returns {text: "redacted..."}
} catch (error) {
console.error("Failed to redact text:", error);
return null;
}
}
// Usage example:
// (async () => {
// const apiKey = process.env.SIPPULSE_API_KEY;
// if (!apiKey) {
// console.error("SIPPULSE_API_KEY not set.");
// return;
// }
// const result = await redactText({
// text: "Contact John Doe at john.doe@example.com or call 555-0100. SSN: 000-00-1234.",
// entities: ["PERSON", "EMAIL_ADDRESS", "US_SSN", "NUMBER"],
// language: "en",
// apiKey,
// sequence: 7, // Redact numbers with 7+ digits (e.g., phone)
// searchAndReplaceRules: [{ search: "Contact", replace: "Reach out to", case_sensitive: false }]
// });
// if (result && result.text) { // Check if result and result.text exist
// console.log("Redacted Text:", result.text);
// } else if (result) {
// console.log("Result:", result); // In case the structure is unexpectedly different
// }
// })();
Best Practices
- Be Specific with Entities: Select only the entities that are truly sensitive for your use case to avoid excessive redaction of useful information.
- Test Numeric Sequence Configuration: Adjust the
sequence
parameter for the "NUMBER" entity carefully. A value too low may redact irrelevant numbers, while a value too high may leave sensitive numbers exposed. - Use
search_and_replace
Carefully: While powerful,search_and_replace
based on literal strings or simple regex should be thoroughly tested to avoid unwanted replacements. Consider case sensitivity (case_sensitive
). - Monitor Usage: Track usage and costs associated with the redaction service through your SipPulse AI dashboard.
- Combine with Other Security Measures: Text redaction is one layer of protection. Combine it with other data security practices such as access control, encryption, and data retention policies.
Frequently Asked Questions (FAQ)
Is redaction reversible?
No. The redaction process replaces sensitive data with placeholders (e.g., [CPF]
, [EMAIL]
) or with the text defined in the search_and_replace
rules. SipPulse AI does not store the original data in a way that allows reversal of redaction via the API.
How does "NUMBER" redaction with sequence
work?
If you include "NUMBER" in the entities
list and set sequence
to, for example, 3
, the system will look for sequences of three or more consecutive digits and replace them with a placeholder like [NUMBER]
. Numbers with fewer than sequence
digits will not be affected by this specific rule.