Guide: How to process documents using SenseTask API
To authorize, use your API Key:
import osimport requestsimport globAPI_KEY = <the api key>SERVICE_URI = 'https://api.sensetask.com/api/v1'FOLDER_PATH = <the folder to upload>def upload_folder():files = list(glob.iglob('{}/*'.format(FOLDER_PATH)))documents = []for file_path in files:# get the metadatafile_name, file_extension = os.path.splitext(os.path.basename(file_path))file_size = os.path.getsize(file_path)file_extension = file_extension.lower()if file_extension == '.pdf':file_type = 'application/pdf'elif file_extension == '.png':file_type = 'image/png'elif file_extension in ['.jpeg', '.jpg']:file_type = 'image/jpg'else:# the file type is not supported so it should not be uploadedcontinue# create the files in the sense filesystem# first get a valid file signaturefile_signature = requests.post('{}/files/'.format(SERVICE_URI),json={"fileName": file_name,"fileType": file_type,"fileSize": file_size},headers={'ApiKey': API_KEY}).json()# then put the filerequests.put(file_signature['signedUrl'], data=open(file_path, 'rb'))print('uploaded {}'.format(file_name))# prepare the document batch payloaddocuments.append({"fileName": file_name,"fileType": file_type,"fileSize": file_size,"fileUri": file_signature['fileUri']})# create the jobjob = requests.post('{}/jobs/'.format(SERVICE_URI),json={ "documentCount": len(documents), "parentNode": <the folder id> },headers={'ApiKey': API_KEY}).json()# create a batch document create in the jobrequests.post('{}/documents/batch/{}/'.format(SERVICE_URI, job['_id']),json={ "data": documents },headers={'ApiKey': API_KEY}).json()if __name__ == "__main__":upload_folder()
Make sure to replace
<the api key>
with your API key.
SenseTask uses API keys to allow access to the API.
SenseTask expects for the API key to be included in all API requests to the server in a header that looks like the following:
ApiKey: <the api key>
You can view and manage your API keys in the Sensetask Settings > Developers > Api Keys. Please keep your api key secure. Don't share it in public repositories or client side applications.
This tutorial will demonstrate how to process all the eligible documents in a folder.
First, it will securely upload the files in your account's Sense storage. After that, it will create a Job object using the Jobs API that will hold all the documents in this processing batch. Finally, the document objects will be created using the Documents Batch API.
Once this is complete, the processing job has already started. The processing status for the queue via the SenseTask web app. If webhooks endpoints have been added using the Webhook Endpoints API, your server should receive notifications for the configured events.
Webhook Endpoints
Setup
You can configure webhook endpoints from Settings > Developers > Webhooks, which provides a user interface for registering your webhook endpoints.
Attributes
The webhook endpoint object
{"_id":"61b0eb9c9e2efc000964eacc","events":["document.processed","document.ocr_complete"],"uri":"https://example.com/my/webhook/endpoint","description":null,"isEnabled":true,"addedBy":"5eb133b52b622d000871cac3","createdAt":"2021-12-08T17:30:05.000Z","updatedAt":"2021-12-08T17:30:05.000Z",}
_id string
The id of the target webhook endpoint.
events string list
The event types that the webhook endpoint will be notified for.
uri uri
The webhook endpoint uri.
description string (optional)
The description for the webhook uri.
isEnabled boolean (optional)
true
by default. Set it to false
if you want to temporarily suspend the webhook endpoint.
secret string (optional)
A way to secure your webhook endpoints against unauthorized use. Check for the value set here in the x-sense-secret
header value of the request made to the webhook endpoint.
Guide - Use polling to check the document processing status
To authorize, use your API Key:
import requests# the id from the first document created with the document ingestion apidocument_id = res.json()['items'][0]processed_statuses = ['processed','waiting_for_review','postponed','in_review','review_complete','exported','edited','in_edit']while True:res = requests.get(f"{SERVICE_URI}/documents/{document_id}/beta/", headers={'ApiKey': API_KEY}).json()if res['status'] in processed_statuses:print('processing complete')breakprint('processing ', res['status'])time.sleep(10)
Make sure to replace
<the api key>
with your API key.
While we recommend using webhooks to detect when a document is processed, it may not always be possible for all use cases. This is an simple example of how to periodically check the document status until it's complete.
Events
Available events
document.ocr_completedocument.processed
Events are ways of notifying you that something happened on your account. For example, if a document's processing is finished, this will trigger a document.processed
event and that will send that information to any webhook endpoints configured for it.
NOTE
The document.ocr_complete
is currently in preview mode and is only available to processing pipelines that only have OCR.
The Event Object
The event object
{"objectId": "61b1f014d9c0950009fe304b","type": "document.processed","data": {"entities": [{"payload": "<the entity information>"}]}}
Attributes
objectId string
The id of the target object (e.g., document
, job
, member
)
type string
The event type (e.g. document.ocr_complete
, document.processed
)
data object
Object containing data to the event.