Creating Projects and adding rows¶

Creating the project¶

In order to create a project, we need a client instance

from caplena import Client, resources
client = Client(api_key="YOUR_API_KEY")

Next, we’ll build the project’s columns which defines the schema of the rows to be added.

from caplena.models.projects import (
    NonTTAColumnDefinition,
    NonTTAColumnType,
    TTAColumnDefinition,
    TTAColumnType,
 )

columns=[
    NonTTAColumnDefinition(
        ref="id",  # ref is a unique identifier for the column in the project
        name="Survey Response ID", # name is what is shown in the User Interface
        type=NonTTAColumnType.numerical,
    ),
    TTAColumnDefinition(
        ref="nps_why",
        name="Why did you give this rating?",
        type=TTAColumnType.text_to_analyze,
        description="Please explain the rating in a few sentences.",
        topics=[],
    ),
]

Now we’re ready to create the project:

from caplena.models.projects import ProjectLanguage, ProjectSettings

project_settings = ProjectSettings(
    name="NPS Study",
    language=ProjectLanguage.EN,
    columns=columns,
    tags=["NPS"],
).model_dump(exclude_none=True)

new_project = client.projects.create(**project_settings)

Optionally, we can pass translation_engine=google_translate to translate rows automatically using Google Translate.

The newly created new_project has a generated unique identifier new_project.id. The schema can be inspected using new_project.columns.

Appending rows¶

We can now proceed to add rows to the project. We can add a maximum of 20 rows per request, so we need to batch our data:

In this example, we’ll generate some fake rows. For your application you may for example read from your database, another API or a CSV. The ordering of columns within a row does not matter as columns are referenced using the ref

from caplena.models.projects import (
   MultipleRowPayload,
   RowPayload,
   NonTTACell,
   TTACell,
 )

# generate fake rows
rows = MultipleRowPayload(
    rows=[
        RowPayload(
            columns=[
                NonTTACell(ref="id", value=i),
                TTACell(ref="nps_why", value=f"Row {i}", topics=[]),
            ]
        ) for i in range(100)
    ]
).model_dump()["rows"]

# batch rows, we'll use numpy for this
import numpy as np
n_batches = np.ceil(len(rows)/20) # compute the number of batches needed
row_batches = np.array_split(rows, n_batches) # do the batching

new_rows = []
for row_batch in row_batches:
    new_rows.append(new_project.append_rows(rows=list(row_batch))) # need to cast to list from ndarray

This process takes a while, to monitor the status you can use task_id property from the RowsAppend response and call get_append_status.

# Check append status one by one using their IDs:
for append_task in new_rows:
    while new_project.get_append_status(task_id=append_task.task_id).status == 'in_progress':
        time.sleep(10)
# OR
# Check all append statuses form the project
all_tasks = new_project.get_append_status()
for task in all_tasks.tasks:
    if task['status'] == 'in_progress':
        # Do something when upload not ready
    elif task['status'] == 'failed':
        # Do something when task has failed
    elif task['status'] == 'timed_out':
        # Do something when task timed_out
    elif task['status'] == 'succeeded':
        # Do something when task succeeded

When all upload tasks will succeeded the data will be uploaded to Caplena and ready to be analyzed!