Working with Guardrails

Guardrails are essential components of Kosmoy Studio that allow you to define and enforce safety guidelines for your AI applications. They help ensure responsible AI behavior by detecting and mitigating potential risks, such as toxic language or prompt injection attempts.

Guardrail Types

Kosmoy Studio currently supports the following guardrail types:

  • Toxic Language: Detects and blocks offensive content, including sexual, hate and discrimination, violence and threats, and dangerous and criminal content. You can configure it to monitor user inputs, LLM outputs, or both.
  • Prompt Injection: Detects and prevents attempts to manipulate the AI model’s behavior through malicious prompts.

Creating a Guardrail

The specific steps to create a guardrail vary slightly depending on the type you choose. However, the general process is as follows:

  1. Navigate to Guardrails: From the Kosmoy Studio home page, click on the “Guardrails” menu in the left-hand navigation bar.

  2. Add a New Guardrail: Click the ”+ ADD” button located in the upper right corner of the Guardrails section.

  3. Select Guardrail Type: Choose the desired guardrail type from the dropdown menu (e.g., “Toxic Language,” “Prompt Injection”).

  4. Configure Guardrail: Follow the specific configuration steps for the selected guardrail type, as detailed in the subsections below.

  5. Name and Describe the Guardrail: Give your guardrail a unique name and an optional description.

  6. Review and Create: Review the guardrail configuration and click “Create” to create the guardrail.

Guardrail Creation Workflows

Creating a Toxic Language Guardrail

  1. Select Guardrail Type: Choose “Toxic Language” from the guardrail type dropdown.
  2. Configure Guardrail:
    • Model: Select the model that will be used to power the Toxic Language detection.
    • Preferred Response Message: (Optional) Define a custom response message to be displayed when the guardrail blocks a message.
    • Offensive Content Selection: Choose the types of offensive content you want to block (e.g., Sexual, Hate and Discrimination, Violence and Threats, Dangerous and Criminal Content, Self-Harm).
    • Monitoring: Select whether to monitor Inputs (user messages), Outputs (LLM responses), or both.
  3. Click “Next”: Proceed to the next step.
  4. Name and Describe: Give your guardrail a unique name and an optional description.
  5. Review and Create: Review the configuration and click “Create”.

Creating a Prompt Injection Guardrail

  1. Select Guardrail Type: Choose “Prompt Injection” from the guardrail type dropdown.
  2. Configure Guardrail:
    • Model: Select the model that will be used to power the Prompt Injection detection.
    • Preferred Response Message: (Optional) Define a custom response message to be displayed when the guardrail blocks a message.
  3. Click “Next”: Proceed to the next step.
  4. Name and Describe: Give your guardrail a unique name and an optional description.
  5. Review and Create: Review the configuration and click “Create”.

Guardrail Cards

The Guardrails section displays each created guardrail as a card. Each card shows:

  • Guardrail Icon: An icon representing a guardrail.
  • Guardrail Name: The name you assigned to the guardrail.
  • Description: The description you provided for the guardrail.
  • Edit Icon (Pencil): Click this icon to update the guardrail’s name or description (only if the guardrail is not in use).
  • Delete Icon (Trash Bin): Click this icon to remove the guardrail (only if the guardrail is not in use).

Guardrail Usage Restrictions

You cannot edit or delete a guardrail if it is currently referenced by other entities within Kosmoy Studio. This includes being used in:

  • Gateways
  • Any other Kosmoy Studio component that references guardrails.

Before attempting to edit or delete a guardrail, ensure it is not actively used in any of these areas. You may need to modify or remove the guardrail from those components first.

Updating a Guardrail

You can update the name and description and certain configuration parameters of a registered guardrail, provided it is not currently referenced by any other component.

  1. Navigate to Guardrails: From the Kosmoy Studio home page, click on the “Guardrails” icon in the left-hand navigation bar.
  2. Locate the Guardrail Card: Find the card for the guardrail you want to update.
  3. Click the Edit (Pencil) Icon: This will open the update dialog.
  4. Modify Guardrail: Update the guardrail’s configuration as needed. You will be able to modify the model and the preferred response message. Also, depending on the guardrail type, you will be able to turn on and off the offensive content you want to block (Toxic Language) and/or modify the monitoring settings.
  5. Click “Save”: Save the changes.

If you attempt to edit a guardrail that is currently in use, a warning banner will be displayed at the top of the screen, preventing the modification.

Removing a Guardrail

You can remove a registered guardrail if it’s no longer needed. However, you cannot delete a guardrail that is currently referenced by any other component.

  1. Navigate to Guardrails: From the Kosmoy Studio home page, click on the “Guardrails” icon in the left-hand navigation bar.
  2. Locate the Guardrail Card: Find the card for the guardrail you want to remove.
  3. Click the Delete (Trash Bin) Icon: This will trigger a confirmation prompt.

If you attempt to delete a guardrail that is currently in use, a modal will appear, preventing the deletion and explaining that the guardrail is in use.

  1. Confirm Deletion: Confirm that you want to delete the guardrail.

Warning: Deleting a guardrail is a permanent action and cannot be undone. Ensure that the guardrail is not being referenced by any other component before proceeding.

Using Guardrails

Guardrails are used in other parts of Kosmoy Studio to define the safety guidelines for your AI applications. For example, when creating a Gateway, you can select one or more guardrails to be applied to all traffic flowing through that Gateway.

This comprehensive guide should provide a solid foundation for understanding and using guardrails in Kosmoy Studio. Remember to add screenshots to visually illustrate the workflows and interface elements!