Advanced Text Generation with Large Language Models (LLMs)
Large Language Models (LLMs) are sophisticated neural networks trained on vast amounts of textual data. This architecture allows them to generate coherent, contextually relevant, and creative responses from textual instructions (prompts). On the SipPulse AI platform, we offer access to a diverse selection of LLMs from leading global developers such as OpenAI, Google, Anthropic, Meta, Qwen, Deepseek, among others. Each model has distinct characteristics in terms of performance, cost, and specialization.
For a complete overview of all models, their detailed technical specifications, and the pricing structure, please refer to our official Pricing page.
Interactive Playground
The Text Generation Playground (access here) is a user-friendly web interface designed to facilitate experimentation and evaluation of each LLM's behavior:
Model Selection
- Browse and choose any LLM listed in the selector.
Parameter Configuration
- Control crucial parameters such as
Temperature
(randomness),Max Tokens
(response size),Top P
(nucleus sampling),Frequency Penalty
(penalty for frequency), andPresence Penalty
(penalty for presence). See the Parameter Guide for details on each parameter. - Only the relevant controls supported by the chosen model are displayed, ensuring precise configuration.
Prompt and Message Creation
- Define a System Message to instruct the LLM on the tone, style, persona, or specific rules it should follow in its responses.
- Insert a sequence of user and assistant messages to simulate complex conversations and test the model's ability to maintain context.
- To learn how to create effective prompts and system messages, consult our Prompt Engineering Guide for LLMs.
Execution and Visualization
- Get instant feedback. The model's responses are displayed immediately after each prompt submission.
Code Generation
- With a click on "View Code", the Playground automatically generates code snippets in cURL, Python, and JavaScript.
- These examples include the exact model and parameters you configured, ready to be copied and pasted into your projects.
- Easily select the desired language using the tabs at the top of the code modal.
The Playground is a valuable tool for both users without programming experience who want to understand the potential of LLMs, and for experienced developers looking to quickly validate different configurations and models before implementation.
Consumption via REST API
Integrate the power of LLMs directly into your applications, custom scripts, and automated workflows through calls to our RESTful endpoint.
For details, see How to use the API.
# Example request to complete a text
# Replace $SIPPULSE_API_KEY with your API key.
curl -X POST 'https://api.sippulse.ai/v1/llms/completion' \
-H 'Content-Type: application/json' \
-H 'api-key: $SIPPULSE_API_KEY' \
-d '{
"model": "gpt-4o-mini",
"messages": [
{ "role": "system", "content": "You are an AI assistant specialized in space history." },
{ "role": "user", "content": "Describe in detail the importance of the Apollo 11 mission." }
],
"temperature": 0.7,
"max_tokens": 250,
"stream": false # Set to true to receive the response in parts (streaming)
}'
import os
import requests
import json
def generate_text_completion(messages: list, model: str = "gpt-4o-mini", temperature: float = 0.7, max_tokens: int = 250, stream: bool = False) -> dict:
"""
Calls the /v1/llms/completion endpoint to generate text with an LLM.
Args:
messages: List of messages (conversation history).
model: Model identifier to be used.
temperature: Controls the randomness of the output.
max_tokens: Maximum number of tokens to be generated.
stream: If true, the response will be sent in parts.
Returns:
Dictionary containing the API response.
"""
api_url = "https://api.sippulse.ai/v1/llms/completion"
api_key = os.getenv("SIPPULSE_API_KEY")
if not api_key:
raise ValueError("The SIPPULSE_API_KEY environment variable is not defined.")
headers = {
"Content-Type": "application/json",
"api-key": api_key
}
payload = {
"model": model,
"messages": messages,
"temperature": temperature,
"max_tokens": max_tokens,
"stream": stream
}
try:
response = requests.post(api_url, headers=headers, json=payload)
response.raise_for_status() # Raises an exception for error responses (4xx or 5xx)
return response.json()
except requests.exceptions.RequestException as e:
print(f"API request error: {e}")
if response is not None:
print(f"Error details: {response.text}")
return None
if __name__ == "__main__":
convo_messages = [
{"role": "system", "content": "You are an AI assistant specialized in space history."},
{"role": "user", "content": "Describe in detail the importance of the Apollo 11 mission."}
]
completion_result = generate_text_completion(convo_messages, model="gpt-4o-mini")
if completion_result:
# The response structure may vary depending on the model and if stream=true
# Generally, the generated content is in completion_result['choices'][0]['message']['content']
print(json.dumps(completion_result, indent=2, ensure_ascii=False))
// Example using the Fetch API in Node.js or browser
async function getTextCompletion(messages, model = "gpt-4o-mini", temperature = 0.7, maxTokens = 250, stream = false) {
const apiUrl = "https://api.sippulse.ai/v1/llms/completion";
const apiKey = process.env.SIPPULSE_API_KEY; // Make sure SIPPULSE_API_KEY is in the environment
if (!apiKey) {
throw new Error("The SIPPULSE_API_KEY environment variable is not defined.");
}
try {
const response = await fetch(apiUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
"api-key": apiKey
},
body: JSON.stringify({
model,
messages,
temperature,
max_tokens: maxTokens,
stream
})
});
if (!response.ok) {
const errorBody = await response.text();
throw new Error(`API Error: ${response.status} ${response.statusText} - ${errorBody}`);
}
return response.json();
} catch (error) {
console.error("Failed to call the completion API:", error);
throw error;
}
}
// Usage example
const conversationMessages = [
{ role: "system", content: "You are an AI assistant specialized in space history." },
{ role: "user", content: "Describe in detail the importance of the Apollo 11 mission." }
];
getTextCompletion(conversationMessages)
.then(result => console.log(JSON.stringify(result, null, 2)))
.catch(error => console.error(error));
Available Models
To keep your application up-to-date with the LLMs enabled for your organization, use the /v1/llms/models
endpoint. This allows your application to dynamically adapt to available models without requiring manual code updates.
# Lists all LLM models available for your API key
curl -X GET 'https://api.sippulse.ai/v1/llms/models' \
-H 'api-key: $SIPPULSE_API_KEY'
import os
import requests
import json
def list_available_models() -> dict:
"""
Retrieves the list of LLMs available for the organization associated with the API key.
"""
api_url = "https://api.sippulse.ai/v1/llms/models"
api_key = os.getenv("SIPPULSE_API_KEY")
if not api_key:
raise ValueError("The SIPPULSE_API_KEY environment variable is not defined.")
headers = { "api-key": api_key }
try:
response = requests.get(api_url, headers=headers)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"API request error for listing models: {e}")
if response is not None:
print(f"Error details: {response.text}")
return None
if __name__ == "__main__":
models_data = list_available_models()
if models_data:
print("Available Models:")
print(json.dumps(models_data, indent=2, ensure_ascii=False))
// Example using the Fetch API to list models
async function listAvailableModels() {
const apiUrl = "https://api.sippulse.ai/v1/llms/models";
const apiKey = process.env.SIPPULSE_API_KEY;
if (!apiKey) {
throw new Error("The SIPPULSE_API_KEY environment variable is not defined.");
}
try {
const response = await fetch(apiUrl, {
method: "GET",
headers: { "api-key": apiKey }
});
if (!response.ok) {
const errorBody = await response.text();
throw new Error(`API error when listing models: ${response.status} ${response.statusText} - ${errorBody}`);
}
return response.json();
} catch (error) {
console.error("Failed to call the API to list models:", error);
throw error;
}
}
listAvailableModels()
.then(models => console.log("Available Models:", JSON.stringify(models, null, 2)))
.catch(error => console.error(error));
OpenAI SDK
For developers familiar with the official OpenAI SDK, SipPulse AI offers a simplified integration. Simply configure the baseURL
of the OpenAI client to point to our compatible endpoint: https://api.sippulse.ai/v1/openai
.
This allows you to utilize all the functionalities and conventions of the OpenAI SDK, while the requests are processed by the SipPulse AI infrastructure, leveraging our selection of models and optimizations.
# Example usage with the OpenAI Python SDK
import os
from openai import OpenAI
# Configure the OpenAI client to use the SipPulse AI endpoint
client = OpenAI(
api_key=os.environ.get("SIPPULSE_API_KEY"),
base_url="https://api.sippulse.ai/v1/openai"
)
try:
chat_completion = client.chat.completions.create(
model="gpt-4o-mini", # Or any other compatible model available on SipPulse AI
messages=[
{"role": "system", "content": "You are a helpful assistant who loves puns."},
{"role": "user", "content": "Tell me a joke about programming."}
],
temperature=0.6,
max_tokens=100
)
print(chat_completion.choices[0].message.content)
except Exception as e:
print(f"An error occurred: {e}")
// Example usage with the OpenAI JavaScript SDK
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.SIPPULSE_API_KEY, // Your SipPulse AI key
baseURL: "https://api.sippulse.ai/v1/openai" // SipPulse AI compatible endpoint
});
async function main() {
try {
const response = await openai.chat.completions.create({
model: "gpt-4o-mini", // Or any other compatible model
messages: [
{ role: "system", content: "You are a helpful assistant who loves puns." },
{ role: "user", content: "Tell me a joke about programming." }
],
temperature: 0.6,
max_tokens: 100
});
console.log(response.choices[0].message.content);
} catch (error) {
console.error("An error occurred:", error);
}
}
main();
Benefit: This approach allows you to maintain the familiarity and conveniences of the OpenAI SDK, such as typing and specific methods, while your calls are routed and executed through the SipPulse AI platform. You can switch between OpenAI models and other models offered by SipPulse AI, if supported by the compatibility endpoint.
Parameter Guide
Fine-tuning generation parameters is essential to shape the LLM's responses according to your needs. Understanding the impact of each parameter allows you to optimize the quality, creativity, and relevance of the outputs.
temperature
:What it is: This parameter controls the level of randomness or "creativity" of the model's output. Higher values (e.g., 1.0) make the output more random and diverse, while lower values (e.g., 0.2) make it more deterministic, focused, and predictable.
Common Range:
0.0
to2.0
. Some models may have different ranges.Use Cases: *
0.0 – 0.3
: Ideal for tasks that require high precision, factual answers, and consistency, such as information extraction, direct question answering (QA), factual translation, or code generation. *0.7 – 1.3
: Recommended for creative tasks such as story writing, brainstorming ideas, summarizing texts with a touch of originality, or generating more natural and varied dialogues. *> 1.3
: May lead to very incoherent, "nonsense," or excessively creative responses. Use with caution, generally for exploring extreme creativity, generating abstract text, or "breaking" repetitive patterns.max_tokens
:What it is: Defines the maximum number of tokens (text units, which can be words, parts of words, or characters) that the model can generate in the response.
Importance: Crucial for controlling the length of the output and managing costs (as many models charge per token).
When to use: Adjust according to the desired response length. * Short and concise answers (e.g., classification, yes/no answer): Use a low value (e.g., 5-50). * Detailed answers, articles, stories: Increase the value (e.g., 500-2000), respecting the model's context limit.
Attention: The total number of tokens (input prompt tokens + output
max_tokens
) cannot exceed the model's maximum context limit (e.g., 4096, 8192, 128000 tokens, depending on the model).frequency_penalty
:What it is: Penalizes tokens that have already appeared in the response, proportionally to how frequently they occur. Positive values decrease the likelihood of the model repeating the same words or phrases excessively.
Common Range:
-2.0
to2.0
.When to use: * Positive values (e.g.,
0.1
to1.0
): Useful for reducing monotony and literal repetition, making the text more diverse and interesting. * Negative values: May increase the repetition of certain terms, which is rarely desirable but can be used experimentally to emphasize keywords.presence_penalty
:What it is: Penalizes tokens that have already appeared in the response, regardless of frequency. Once a token appears, it receives a fixed penalty if it reappears. Positive values encourage the model to introduce new topics or concepts.
Common Range:
-2.0
to2.0
.When to use: * Positive values (e.g.,
0.1
to1.0
): Useful for preventing the model from fixating on a single topic or set of ideas, encouraging the exploration of different concepts within the same response. Helps to increase thematic breadth.top_p
(Nucleus Sampling):What it is: A sampling technique that controls the diversity of the output. The model considers only the smallest set of tokens whose cumulative probability exceeds the
top_p
value. From this "nucleus" of tokens, the model randomly chooses the next token.Range:
0.0
to1.0
. A value of1.0
means that all tokens are considered.When to use: * Lower values (e.g.,
0.1
to0.5
): Restrict the choice to high-probability tokens, resulting in more focused, conservative, and predictable responses. * Higher values (e.g.,0.9
to1.0
, with0.9
being a common value): Allow a wider range of tokens, leading to more creative, diversified, but potentially less coherent or more surprising responses.Relationship with
temperature
:top_p
is an alternative totemperature
for controlling randomness. It is generally recommended to adjust one or the other, and not both drastically at the same time. For example, usingtemperature=1.0
andtop_p=0.2
can generate unexpected results. Many prefer to settemperature
to1.0
and control randomness only withtop_p
, or vice versa.top_k
(Top-K Sampling):What it is: Restricts the selection of the next token to the
k
most probable tokens at each generation step. The model then chooses randomly (usually weighted by their probabilities) among thesek
tokens.Range: Positive integer (e.g.,
1
,10
,50
).When to use: * Low values (e.g.,
1
to10
): Make the output more predictable and less diversified, focusing on the most obvious tokens.top_k=1
results in purely "greedy" sampling, always choosing the most probable token, which can lead to repetitive or uninspired responses. * High values (e.g.,50
or more): Allow more diversity, approaching the effect of hightop_p
.Comparison with
top_p
:top_p
is often preferred overtop_k
becausetop_p
dynamically adapts to the number of tokens to consider based on the probability distribution, whiletop_k
uses a fixed number. If the probability distribution is very "flat" (many tokens with similar probability),top_k
can be either too restrictive or too permissive.
The best way to master these parameters is through practical experimentation. Use the Text Generation Playground to test different combinations and observe their effects in real-time. This will allow you to develop an intuition about how to adjust them to achieve the ideal results for each specific application.