Guide: How to process documents using SenseTask API

To authorize, use your API Key:

import os
import requests
import glob

API_KEY = <the api key>
SERVICE_URI = 'https://api.sensetask.com/api/v1'
FOLDER_PATH = <the folder to upload>

def upload_folder():
    files = list(glob.iglob('{}/*'.format(FOLDER_PATH)))

    documents = []
    for file_path in files:

        # get the metadata
        file_name, file_extension = os.path.splitext(os.path.basename(file_path))
        file_size = os.path.getsize(file_path)

        file_extension = file_extension.lower()

        if file_extension == '.pdf':
            file_type = 'application/pdf'
        elif file_extension == '.png':
            file_type = 'image/png'
        elif file_extension in ['.jpeg', '.jpg']:
            file_type = 'image/jpg'
        else:
            # the file type is not supported so it should not be uploaded
            continue

        # create the files in the sense filesystem
        # first get a valid file signature

        file_signature = requests.post(
            '{}/files/'.format(SERVICE_URI),
            json={
                "fileName": file_name,
                "fileType": file_type,
                "fileSize": file_size
            },
            headers={'ApiKey': API_KEY}).json()


        # then put the file

        requests.put(file_signature['signedUrl'], data=open(file_path, 'rb'))
        print('uploaded {}'.format(file_name))

        # prepare the document batch payload
        documents.append({
            "fileName": file_name,
            "fileType": file_type,
            "fileSize": file_size,
            "fileUri": file_signature['fileUri']
        })

    # create the job
    job = requests.post(
        '{}/jobs/'.format(SERVICE_URI),
        json={ "documentCount": len(documents), "parentNode": <the folder id> },
        headers={'ApiKey': API_KEY}).json()

    # create a batch document create in the job

    requests.post(
        '{}/documents/batch/{}/'.format(SERVICE_URI, job['_id']),
        json={ "data": documents },
        headers={'ApiKey': API_KEY}).json()


if __name__ == "__main__":
    upload_folder()

Make sure to replace <the api key> with your API key.

SenseTask uses API keys to allow access to the API.

SenseTask expects for the API key to be included in all API requests to the server in a header that looks like the following:

ApiKey: <the api key>

You must replace the api key with your personal API key.

You can view and manage your API keys in the Sensetask Settings > Developers > Api Keys. Please keep your api key secure. Don't share it in public repositories or client side applications.

This tutorial will demonstrate how to process all the eligible documents in a folder.

First, it will securely upload the files in your account's Sense storage. After that, it will create a Job object using the Jobs API that will hold all the documents in this processing batch. Finally, the document objects will be created using the Documents Batch API.

Once this is complete, the processing job has already started. The processing status for the queue via the SenseTask web app. If webhooks endpoints have been added using the Webhook Endpoints API, your server should receive notifications for the configured events.

Webhook Endpoints

Setup

You can configure webhook endpoints from Settings > Developers > Webhooks, which provides a user interface for registering your webhook endpoints.

Attributes

The webhook endpoint object

    {
        "_id":"61b0eb9c9e2efc000964eacc",
        "events":[
            "document.processed",
            "document.ocr_complete"
        ],
        "uri":"https://example.com/my/webhook/endpoint",
        "description":null,
        "isEnabled":true,
        "addedBy":"5eb133b52b622d000871cac3",
        "createdAt":"2021-12-08T17:30:05.000Z",
        "updatedAt":"2021-12-08T17:30:05.000Z",
    }

_id string
The id of the target webhook endpoint.

events string list
The event types that the webhook endpoint will be notified for.

uri uri
The webhook endpoint uri.

description string (optional)
The description for the webhook uri.

isEnabled boolean (optional)
true by default. Set it to false if you want to temporarily suspend the webhook endpoint.

secret string (optional)
A way to secure your webhook endpoints against unauthorized use. Check for the value set here in the x-sense-secret header value of the request made to the webhook endpoint.

Guide - Use polling to check the document processing status

To authorize, use your API Key:

import requests

# the id from the first document created with the document ingestion api
document_id = res.json()['items'][0]

processed_statuses = [
    'processed',
    'waiting_for_review',
    'postponed',
    'in_review',
    'review_complete',
    'exported',
    'edited',
    'in_edit'
]

while True:
    res = requests.get(f"{SERVICE_URI}/documents/{document_id}/beta/", headers={'ApiKey': API_KEY}).json()
    if res['status'] in processed_statuses:
        print('processing complete')
        break

    print('processing ', res['status'])

    time.sleep(10)

Make sure to replace <the api key> with your API key.

While we recommend using webhooks to detect when a document is processed, it may not always be possible for all use cases. This is an simple example of how to periodically check the document status until it's complete.

Events

Available events

    document.ocr_complete
    document.processed

Events are ways of notifying you that something happened on your account. For example, if a document's processing is finished, this will trigger a document.processed event and that will send that information to any webhook endpoints configured for it.

NOTE The document.ocr_complete is currently in preview mode and is only available to processing pipelines that only have OCR.

The Event Object

The event object

{
    "objectId": "61b1f014d9c0950009fe304b",
    "type": "document.processed",
    "data": {
        "entities": [
            {
                "payload": "<the entity information>"
            }
        ]
    }
}

Attributes

objectId string
The id of the target object (e.g., document, job, member)

type string
The event type (e.g. document.ocr_complete, document.processed)

data object
Object containing data to the event.