Advanced Text Generation with Large Language Models (LLMs)

Large Language Models (LLMs) are sophisticated neural networks trained on vast amounts of textual data. This architecture allows them to generate coherent, contextually relevant, and creative responses from textual instructions (prompts). On the SipPulse AI platform, we offer access to a diverse selection of LLMs from leading global developers such as OpenAI, Google, Anthropic, Meta, Qwen, Deepseek, among others. Each model has distinct characteristics in terms of performance, cost, and specialization.

For a complete overview of all models, their detailed technical specifications, and the pricing structure, please refer to our official Pricing page.

Interactive Playground

The Text Generation Playground (access here) is a user-friendly web interface designed to facilitate experimentation and evaluation of each LLM's behavior:

Model Selection

Browse and choose any LLM listed in the selector.

Parameter Configuration

Control crucial parameters such as Temperature (randomness), Max Tokens (response size), Top P (nucleus sampling), Frequency Penalty (penalty for frequency), and Presence Penalty (penalty for presence). See the Parameter Guide for details on each parameter.
Only the relevant controls supported by the chosen model are displayed, ensuring precise configuration.

Prompt and Message Creation

Define a System Message to instruct the LLM on the tone, style, persona, or specific rules it should follow in its responses.
Insert a sequence of user and assistant messages to simulate complex conversations and test the model's ability to maintain context.
To learn how to create effective prompts and system messages, consult our Prompt Engineering Guide for LLMs.

Execution and Visualization

Get instant feedback. The model's responses are displayed immediately after each prompt submission.

Code Generation

With a click on "View Code", the Playground automatically generates code snippets in cURL, Python, and JavaScript.
These examples include the exact model and parameters you configured, ready to be copied and pasted into your projects.
Easily select the desired language using the tabs at the top of the code modal.

The Playground is a valuable tool for both users without programming experience who want to understand the potential of LLMs, and for experienced developers looking to quickly validate different configurations and models before implementation.

Consumption via REST API

Integrate the power of LLMs directly into your applications, custom scripts, and automated workflows through calls to our RESTful endpoint.

For details, see How to use the API.

cURLPythonJavaScript

bash

# Example request to complete a text
# Replace $SIPPULSE_API_KEY with your API key.
curl -X POST 'https://api.sippulse.ai/v1/llms/completion' \
  -H 'Content-Type: application/json' \
  -H 'api-key: $SIPPULSE_API_KEY' \
  -d '{
  "model": "gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are an AI assistant specialized in space history." },
    { "role": "user",   "content": "Describe in detail the importance of the Apollo 11 mission." }
  ],
  "temperature": 0.7,
  "max_tokens": 250,
  "stream": false # Set to true to receive the response in parts (streaming)
  }'

python

import os
import requests
import json

def generate_text_completion(messages: list, model: str = "gpt-4o-mini", temperature: float = 0.7, max_tokens: int = 250, stream: bool = False) -> dict:
  """
  Calls the /v1/llms/completion endpoint to generate text with an LLM.

  Args:
    messages: List of messages (conversation history).
    model: Model identifier to be used.
    temperature: Controls the randomness of the output.
    max_tokens: Maximum number of tokens to be generated.
    stream: If true, the response will be sent in parts.

  Returns:
    Dictionary containing the API response.
  """
  api_url = "https://api.sippulse.ai/v1/llms/completion"
  api_key = os.getenv("SIPPULSE_API_KEY")

  if not api_key:
    raise ValueError("The SIPPULSE_API_KEY environment variable is not defined.")

  headers = {
    "Content-Type": "application/json",
    "api-key": api_key
  }
  payload = {
    "model": model,
    "messages": messages,
    "temperature": temperature,
    "max_tokens": max_tokens,
    "stream": stream
  }

  try:
    response = requests.post(api_url, headers=headers, json=payload)
    response.raise_for_status()  # Raises an exception for error responses (4xx or 5xx)
    return response.json()
  except requests.exceptions.RequestException as e:
    print(f"API request error: {e}")
    if response is not None:
      print(f"Error details: {response.text}")
    return None

if __name__ == "__main__":
  convo_messages = [
    {"role": "system", "content": "You are an AI assistant specialized in space history."},
    {"role": "user", "content": "Describe in detail the importance of the Apollo 11 mission."}
  ]
  completion_result = generate_text_completion(convo_messages, model="gpt-4o-mini")

  if completion_result:
    # The response structure may vary depending on the model and if stream=true
    # Generally, the generated content is in completion_result['choices'][0]['message']['content']
    print(json.dumps(completion_result, indent=2, ensure_ascii=False))

javascript

// Example using the Fetch API in Node.js or browser
async function getTextCompletion(messages, model = "gpt-4o-mini", temperature = 0.7, maxTokens = 250, stream = false) {
  const apiUrl = "https://api.sippulse.ai/v1/llms/completion";
  const apiKey = process.env.SIPPULSE_API_KEY; // Make sure SIPPULSE_API_KEY is in the environment

  if (!apiKey) {
  throw new Error("The SIPPULSE_API_KEY environment variable is not defined.");
  }

  try {
  const response = await fetch(apiUrl, {
    method: "POST",
    headers: {
    "Content-Type": "application/json",
    "api-key": apiKey
    },
    body: JSON.stringify({
    model,
    messages,
    temperature,
    max_tokens: maxTokens,
    stream
    })
  });

  if (!response.ok) {
    const errorBody = await response.text();
    throw new Error(`API Error: ${response.status} ${response.statusText} - ${errorBody}`);
  }
  return response.json();
  } catch (error) {
  console.error("Failed to call the completion API:", error);
  throw error;
  }
}

// Usage example
const conversationMessages = [
  { role: "system", content: "You are an AI assistant specialized in space history." },
  { role: "user",   content: "Describe in detail the importance of the Apollo 11 mission." }
];

getTextCompletion(conversationMessages)
  .then(result => console.log(JSON.stringify(result, null, 2)))
  .catch(error => console.error(error));

Available Models

To keep your application up-to-date with the LLMs enabled for your organization, use the /v1/llms/models endpoint. This allows your application to dynamically adapt to available models without requiring manual code updates.

cURLPythonJavaScript

bash

# Lists all LLM models available for your API key
curl -X GET 'https://api.sippulse.ai/v1/llms/models' \
  -H 'api-key: $SIPPULSE_API_KEY'

python

import os
import requests
import json

def list_available_models() -> dict:
  """
  Retrieves the list of LLMs available for the organization associated with the API key.
  """
  api_url = "https://api.sippulse.ai/v1/llms/models"
  api_key = os.getenv("SIPPULSE_API_KEY")

  if not api_key:
    raise ValueError("The SIPPULSE_API_KEY environment variable is not defined.")

  headers = { "api-key": api_key }

  try:
    response = requests.get(api_url, headers=headers)
    response.raise_for_status()
    return response.json()
  except requests.exceptions.RequestException as e:
    print(f"API request error for listing models: {e}")
    if response is not None:
      print(f"Error details: {response.text}")
    return None

if __name__ == "__main__":
  models_data = list_available_models()
  if models_data:
    print("Available Models:")
    print(json.dumps(models_data, indent=2, ensure_ascii=False))

javascript

// Example using the Fetch API to list models
async function listAvailableModels() {
  const apiUrl = "https://api.sippulse.ai/v1/llms/models";
  const apiKey = process.env.SIPPULSE_API_KEY;

  if (!apiKey) {
  throw new Error("The SIPPULSE_API_KEY environment variable is not defined.");
  }

  try {
  const response = await fetch(apiUrl, {
    method: "GET",
    headers: { "api-key": apiKey }
  });

  if (!response.ok) {
    const errorBody = await response.text();
    throw new Error(`API error when listing models: ${response.status} ${response.statusText} - ${errorBody}`);
  }
  return response.json();
  } catch (error) {
  console.error("Failed to call the API to list models:", error);
  throw error;
  }
}

listAvailableModels()
  .then(models => console.log("Available Models:", JSON.stringify(models, null, 2)))
  .catch(error => console.error(error));

OpenAI SDK

For developers familiar with the official OpenAI SDK, SipPulse AI offers a simplified integration. Simply configure the baseURL of the OpenAI client to point to our compatible endpoint: https://api.sippulse.ai/v1/openai.

This allows you to utilize all the functionalities and conventions of the OpenAI SDK, while the requests are processed by the SipPulse AI infrastructure, leveraging our selection of models and optimizations.

PythonJavaScript

python

# Example usage with the OpenAI Python SDK
import os
from openai import OpenAI

# Configure the OpenAI client to use the SipPulse AI endpoint
client = OpenAI(
  api_key=os.environ.get("SIPPULSE_API_KEY"),
  base_url="https://api.sippulse.ai/v1/openai"
)

try:
  chat_completion = client.chat.completions.create(
    model="gpt-4o-mini", # Or any other compatible model available on SipPulse AI
    messages=[
      {"role": "system", "content": "You are a helpful assistant who loves puns."},
      {"role": "user", "content": "Tell me a joke about programming."}
    ],
    temperature=0.6,
    max_tokens=100
  )
  print(chat_completion.choices[0].message.content)
except Exception as e:
  print(f"An error occurred: {e}")

javascript

// Example usage with the OpenAI JavaScript SDK
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.SIPPULSE_API_KEY, // Your SipPulse AI key
  baseURL: "https://api.sippulse.ai/v1/openai" // SipPulse AI compatible endpoint
});

async function main() {
  try {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini", // Or any other compatible model
    messages: [
    { role: "system", content: "You are a helpful assistant who loves puns." },
    { role: "user",   content: "Tell me a joke about programming." }
    ],
    temperature: 0.6,
    max_tokens: 100
  });

  console.log(response.choices[0].message.content);
  } catch (error) {
  console.error("An error occurred:", error);
  }
}

main();

Benefit: This approach allows you to maintain the familiarity and conveniences of the OpenAI SDK, such as typing and specific methods, while your calls are routed and executed through the SipPulse AI platform. You can switch between OpenAI models and other models offered by SipPulse AI, if supported by the compatibility endpoint.

Parameter Guide

Fine-tuning generation parameters is essential to shape the LLM's responses according to your needs. Understanding the impact of each parameter allows you to optimize the quality, creativity, and relevance of the outputs.

temperature:
What it is: This parameter controls the level of randomness or "creativity" of the model's output. Higher values (e.g., 1.0) make the output more random and diverse, while lower values (e.g., 0.2) make it more deterministic, focused, and predictable.
Common Range: 0.0 to 2.0. Some models may have different ranges.
Use Cases: * 0.0 – 0.3: Ideal for tasks that require high precision, factual answers, and consistency, such as information extraction, direct question answering (QA), factual translation, or code generation. * 0.7 – 1.3: Recommended for creative tasks such as story writing, brainstorming ideas, summarizing texts with a touch of originality, or generating more natural and varied dialogues. * > 1.3: May lead to very incoherent, "nonsense," or excessively creative responses. Use with caution, generally for exploring extreme creativity, generating abstract text, or "breaking" repetitive patterns.
max_tokens:
What it is: Defines the maximum number of tokens (text units, which can be words, parts of words, or characters) that the model can generate in the response.
Importance: Crucial for controlling the length of the output and managing costs (as many models charge per token).
When to use: Adjust according to the desired response length. * Short and concise answers (e.g., classification, yes/no answer): Use a low value (e.g., 5-50). * Detailed answers, articles, stories: Increase the value (e.g., 500-2000), respecting the model's context limit.
Attention: The total number of tokens (input prompt tokens + output max_tokens) cannot exceed the model's maximum context limit (e.g., 4096, 8192, 128000 tokens, depending on the model).
frequency_penalty:
What it is: Penalizes tokens that have already appeared in the response, proportionally to how frequently they occur. Positive values decrease the likelihood of the model repeating the same words or phrases excessively.
Common Range: -2.0 to 2.0.
When to use: * Positive values (e.g., 0.1 to 1.0): Useful for reducing monotony and literal repetition, making the text more diverse and interesting. * Negative values: May increase the repetition of certain terms, which is rarely desirable but can be used experimentally to emphasize keywords.
presence_penalty:
What it is: Penalizes tokens that have already appeared in the response, regardless of frequency. Once a token appears, it receives a fixed penalty if it reappears. Positive values encourage the model to introduce new topics or concepts.
Common Range: -2.0 to 2.0.
When to use: * Positive values (e.g., 0.1 to 1.0): Useful for preventing the model from fixating on a single topic or set of ideas, encouraging the exploration of different concepts within the same response. Helps to increase thematic breadth.
top_p (Nucleus Sampling):
What it is: A sampling technique that controls the diversity of the output. The model considers only the smallest set of tokens whose cumulative probability exceeds the top_p value. From this "nucleus" of tokens, the model randomly chooses the next token.
Range: 0.0 to 1.0. A value of 1.0 means that all tokens are considered.
When to use: * Lower values (e.g., 0.1 to 0.5): Restrict the choice to high-probability tokens, resulting in more focused, conservative, and predictable responses. * Higher values (e.g., 0.9 to 1.0, with 0.9 being a common value): Allow a wider range of tokens, leading to more creative, diversified, but potentially less coherent or more surprising responses.
Relationship with temperature: top_p is an alternative to temperature for controlling randomness. It is generally recommended to adjust one or the other, and not both drastically at the same time. For example, using temperature=1.0 and top_p=0.2 can generate unexpected results. Many prefer to set temperature to 1.0 and control randomness only with top_p, or vice versa.
top_k (Top-K Sampling):
What it is: Restricts the selection of the next token to the k most probable tokens at each generation step. The model then chooses randomly (usually weighted by their probabilities) among these k tokens.
Range: Positive integer (e.g., 1, 10, 50).
When to use: * Low values (e.g., 1 to 10): Make the output more predictable and less diversified, focusing on the most obvious tokens. top_k=1 results in purely "greedy" sampling, always choosing the most probable token, which can lead to repetitive or uninspired responses. * High values (e.g., 50 or more): Allow more diversity, approaching the effect of high top_p.
Comparison with top_p: top_p is often preferred over top_k because top_p dynamically adapts to the number of tokens to consider based on the probability distribution, while top_k uses a fixed number. If the probability distribution is very "flat" (many tokens with similar probability), top_k can be either too restrictive or too permissive.

The best way to master these parameters is through practical experimentation. Use the Text Generation Playground to test different combinations and observe their effects in real-time. This will allow you to develop an intuition about how to adjust them to achieve the ideal results for each specific application.

Agents

Deploying Agents

Settings

Advanced Text Generation with Large Language Models (LLMs)

Interactive Playground

Model Selection

Parameter Configuration

Prompt and Message Creation

Execution and Visualization

Code Generation

Consumption via REST API

Available Models

OpenAI SDK

Parameter Guide

Deploying Agents

Advanced Text Generation with Large Language Models (LLMs) ​

Interactive Playground ​

Model Selection ​

Parameter Configuration ​

Prompt and Message Creation ​

Execution and Visualization ​

Code Generation ​

Consumption via REST API ​

Available Models ​

OpenAI SDK ​

Parameter Guide ​

Advanced Text Generation with Large Language Models (LLMs)

Interactive Playground

Model Selection

Parameter Configuration

Prompt and Message Creation

Execution and Visualization

Code Generation

Consumption via REST API

Available Models

OpenAI SDK

Parameter Guide