Scheduled jobs in kubernetes

Maximilian Krauss — 2022-01-07

Sometimes it's necessary to have work done not on demand (i.e. with a web service reacting to requests) but rather on a schedule.

What first came to my mind was a constantly running service which does the same as cron is doing on *nix systems. But since we are using kubernetes as our container orchestrator I thought they may be have something in place for my use case, and in fact they do!

In kubernetes world this feature is called "CronJob" and it behaves exactly like the UNIX counterpart.

Boiled down to the basics it's just:

a Name
a cron schedule string (I'm always using https://crontab.guru/ to make sure I'm doing it right)
a Docker image
a command (or list of commands) what to do

The only thing to consider is what should happen if the schedule triggers but the execution from the previous run is not yet completed. kubernetes offers you three options known as concurrency policies from which you can choose:

Allow (default): The cron job allows concurrently running jobs
Forbid: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run Source: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/

And last but not least an example of one of our jobs which runs since months reliable on production:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: job-foo-bar-thing
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: job-foo-bar-thing
            image: {{DOCKER_IMAGE}}
            imagePullPolicy: Always
            env:
              - name: PORT
                value: "80"
              - name: NODE_ENV
                value: {{NODE_ENV}}
              - name: LOG_LEVEL
                value: {{LOG_LEVEL}}

            args:
            - npm
            - start
          restartPolicy: Never

Async iterators with mongo DB

Maximilian Krauss — 2020-03-04

Asynchronous batched iterable for (mongo) cursors. When one is not enough and all is too much

Async iterators landed fresh in node.js v10 as experimental option and since then they have been evolved to latest LTS and stable versions of node.js

Since then I never had a real use case for them. But recently we had to process a lot of data from mongo db without dumping the whole collection into memory.

First result just uses the regular mongo cursor and a simple for ... each loop to process the data one after the other:

collection.find(...).forEach(document => process(document)
    .then(...) // Does not block, so promises arent possible at all
    .catch(...)
)

This works but neither supports promises nor is very efficient. Second approach is my very first usage of the new async iterator syntax by just iterating through the cursor:

for await (const document of cursor.find(...)) {
  await process(document)
}

Surprisingly this just works out of the box because the mongo cursor exposes an asynchronous next() method which is all what's required in order to loop through that collection. That's one nice solution but has a drawback: What if process(document) needs some time and slows down processing of those documents?

Now I need a batch of documents which could processed in parallel but not all of them a the same time to fuckup memory. To the ...

Bat(ch) Mobile

I wrote a small node module which does exactly that: Fetching N items from n cursor (does not have to be mongo), yields the batch of items and awaits the processing. And so on and so on until the cursor is exhausted. The example from above would look like this now:

const { getBatchedIterableFromCursor } = require('batch-mobile')

const cursor = collection.find(...)
for await (const batchOfItems of getBatchedIterableFromCursor(cursor, 100)) {
  await process(batchOfItems) //this is now an array of items
}

The only change is that the document is now an array of documents and one additional function call where the size of the batch is specified (default is 200).

krausshalt

Scheduled jobs in kubernetes

Async iterators with mongo DB

Asynchronous batched iterable for (mongo) cursors. When one is not enough and all is too much

Bat(ch) Mobile