Since our beginnings in the France robotics cup, we have wanted to make our robot smarter. Two of the main points were to make it able to dynamically change its strategy during the match by responding to unforeseen situations, and correctly displaying the number of points it “thinks” it achieved.

A long time ago (I think it might have been in 2018), we sat down at a table during the inter-UT meeting at UTC and created the basis of a simple framework to start tackling the problem. The idea was to define a strategy as a series of actions, themselves composed of sequential series of moves. Each action represents a task the robot has to accomplish during the match, whiles its moves describe imperatively how to do so. Here is an example of a strategy written in C:

ACTION_REGISTER(SOLAR1, 20, 15, false);
ACTION_REGISTER(SOLAR2, 20, 15, false);
ACTION_REGISTER(COLLECT_PLANTS, 15, 3, false);
ACTION_REGISTER(PUT_PLANTS_IN_GARDEN, 15, 8, false);
ACTION_REGISTER(BACK_TO_BASE, 5, 13, true);

The syntax of the ACTION_REGISTER macro is as follows:

ACTION_REGISTER(
	SOLAR1, // name of the action
	20,     // expected duration, in seconds
	15,     // number of points
	false   // is the action critical?
);

The first argument is the name of the action, which is pretty straightforward. It actually corresponds to the name of the function that carries the action in the code. The third field is the number of points we can expect if the action is successful.

Meanwhile, The second and fourth arguments are related to what we call the scheduler. The above actions are meant to be carried on sequentially, but what happens if something goes wrong? Let’s say the first action fails for whatever reason, what should we do? Could we try to perform the action again, or maybe we should try the others and go back to do it later? In our case, the scheduler is the function responsible for answering these questions. The second parameter informs the scheduler of the expected duration of an action in seconds, and the fourth one indicates whether an action is critical, meaning the scheduler should always reserve enough time to try and perform this action. This was implemented for “easy wins”: actions that are so easy that attempting these seem essential (for example, going back to base is usually such an action).

When we had the idea of the scheduler, we anticipated that we would write a first simple version (scheduler-v1) and that we could improve it later. And that’s what happened! Except the simple scheduler-v1 stayed there for 7 years… but finally, we did our job and we are replacing it with the 2025 scheduler-v2.

Prototyping

To prototype the new and improved scheduler, we needed a fast prototyping environment, something that C is… pretty bad at. So we opted for common-lisp which is ideal for these types of exploration (I already hear everyone crying in the back. “why not Python??” they say, horrified. To which I reply: (not (fun 'python))).

Before talking about the new and improved scheduler, let’s go back to scheduler-v1. Try not to laugh when you see the code! Let’s start with some simple datastructures to mimic the C code:

;; An "action" mimicking the C Action structure
(defstruct action
  (name)
  (duration-s)
  (reward)
  (performed nil)
  (tries-nb 0)
  (critical))

;; Game state description
(defstruct game-state
  (match-duration-s 100)
  (time-s 0)
  (earned-points 0))

And now the scheduler, which is a function that takes the current state of the match, the available actions, and returns the next action to perform :

(defun remaining-critical-actions-duration (state actions)
  (apply #'+ (mapcar #'action-duration-s (remove-if-not #'action-critical actions))))

;; scheduler-v1
;; ============
(defun get-next-action (state actions)
  "Return the next action to perform according to STATE and the possible ACTIONS."
  (let ((remaining-duration (- (game-state-match-duration-s state)
                               (game-state-time-s state)))
        (remaining-critical-duration
          (remaining-critical-actions-duration state actions)))
    (first ; < pick the first of the filtered actions
     ;; we filter the actions that...
     (remove-if
      (lambda (action)
        (or
         ;; ... were already performed
         (action-performed action) 
         ;; ... were tried 3 times already
         (>= (action-tries-nb action) 3)
         ;; ... are non critical and starve critical actions of time
         (and (not (action-critical action))
              (< (- remaining-duration (action-duration-s action))
                 remaining-critical-duration))))
      actions))))

As you can see, version 1 is pretty naive. The scheduler performs actions in order and, should one fail, will retry it at most 3 times when time is available. It always tries to reserve enough time for critial actions. What could go wrong?

Well, for once, trying actions three times before abandoning can lose the robot a lot of time1. The concept of critical action seems logical, but what if another combination of actions was more optimal? We had to do better.

When writing the new version of the scheduler, we started from the following principles:

  • The scheduler should keep the relative ordering of actions. There are often reasons for that ordering: for example, going back to base should always be the last action to obtain points. While it would be possible to declare ordering constraints, we wanted to keep things simple for the tired programmer that has to write a strategy at 4 AM.
  • The scheduler should try to maximize score as much as possible.

How can we approach that problem while keeping things simple? Since we want to keep ordering between actions, we can represent a solution to the problem as a binary list. For example:

(1 0 1 1 1)

Means we skip the second action (0), but perform all the other ones (1). If we have enough time on our hands to undertake all the actions, great! Otherwise, we can search the space of solutions to find the optimal possible series of actions: the one which maximizes the obtained points. In this case, this is akin to a search in a binary tree: for example, going left could mean skipping an action, going right keeping it. We write a recursive backtracking algorithm to optimize and cut branches that are unrealistic (for example, if a series of actions would take too long given the remaining time). Here it is in lisp:

(defun todo-action-p (actions action)
  "Returns `t' if ACTION is a candidate for being performed, `nil' otherwise."
  (let ((last-performed-action-i
          (position (car (last (remove-if-not #'action-performed actions))) actions))
        (action-i (position action actions)))
    ;; The action is a candidate if...
    (and
     ;; ... it was not performed yet
     (not (action-performed action))
     ;; ... AND it has not been tried more than 2 times
     (< (action-tries-nb action) 2)
     ;; ... AND it has not been skipped already as the algorithm
     ;;         preserve ordering
     (> action-i (if (null last-performed-action-i) -1 last-performed-action-i)))))

;; scheduler-v2
;; ============
(defun get-next-action (state actions &optional choice-list)
  "Return the next action to perform according to STATE and the possible
ACTIONS."
  (let* ((remaining-duration (- (game-state-match-duration-s state)
                                (game-state-time-s state)))
         (todo-actions (remove-if-not (lambda (action) (todo-action-p actions action))
                                      actions))
         (todo-actions-duration (apply #'+ (mapcar #'action-duration-s todo-actions)))
         (chosen-todo-actions (loop for action in todo-actions
                                    for choice in choice-list
                                    when (= choice 1)
                                      collect action))
         (chosen-todo-actions-duration
           (apply #'+ (mapcar #'action-duration-s chosen-todo-actions))))
    (cond
      ;; We can fit all actions in the remaining time: simply proceed
      ;; in order. This avoids exploring the whole tree of
      ;; possibilities
      ((>= remaining-duration todo-actions-duration)
       (list (+ (game-state-earned-points state)
                (apply #'+ (mapcar #'action-reward todo-actions)))
             (first todo-actions)))
      ;; The current solution is impossible because it takes too much
      ;; time: return current score
      ((< remaining-duration chosen-todo-actions-duration)
       ;; As a last ditch effort, we try the action even though we
       ;; anticipate it wont work (task_ctrl should correctly stop
       ;; the robot anyway). We don't increment score since it is
       ;; likely this action won't succeed, to prioritize branches
       ;; that have a higher chance to terminate.
       (list (game-state-earned-points state) (car (last chosen-todo-actions))))
      ;; We arrived at a leaf node: return path reward
      ((= (length choice-list) (length todo-actions))
       (list (+ (game-state-earned-points state)
                (loop for action in todo-actions
                      for choice in choice-list
                      sum (* choice (action-reward action))))
             nil))
      ;; General case: explore
      (t
       (let* (;; Explore what happens if we skip the next action
              (skip-next (get-next-action state actions (append choice-list '(0))))
              ;; Explore what happens if we keep the next action
              (keep-next (get-next-action state actions (append choice-list '(1)))))
         ;; Keep the best result
         (if (> (first skip-next) (first keep-next))
             skip-next
             (list (first keep-next) (nth (length choice-list) todo-actions))))))))

This implementation has no depth limit, so it will always return the optimal series of actions in terms of score given the remaining time.

Is scheduler-v2 better? A simple experiment

The new scheduler looks good, but is it actually better than the old one? Meaning, does it yield more points? Of course we cannot be sure without testing it in real conditions, but we can run some simulations to see what to expect. To do so, we have to make some assumptions. Let’s suppose the programmer determines the action duration with decent accuracy: we will sample the duration from a uniform distribution centered on the programmer’s guess with a maximal difference of 3 seconds. Let’s also assume that actions fail with x% probability.

With these assumptions, we can generate a bunch of strategies, “simulate” a match, and see which scheduler gets us the most points. We generate strategies with the following function:

(defun generate-strategy ()
  (let ((i 0)
        (match-duration-s 100)
        (actions (list)))
    (loop while (< (apply #'+ (mapcar #'action-duration-s actions))
                   (- match-duration-s 10))
          do (progn (push (make-action :name (format nil "ACTION~A" i)
                                       ;; action durations between 10 and 30 seconds
                                       :duration-s (+ 10 (random 21))
                                       ;; rewards between 1 and 20
                                       :reward (+ 1 (random 20)))
                          actions)
                    (incf i 1)))
    (reverse (cdr actions))))

Score difference between `scheduler-v1` and `scheduler-v2`

We can see that, with our assumptions above, scheduler-v2 always comes out on top. The difference between the two scheduler increases as the failure rate increases, with up to a 6 points differences at 30% failure rate.

Translating the program in C

The above lisp scheduler is written recursively, as that is the most natural way of writing backtracking algorithms. Including it in the robot in C, we had to rewrite it in an iterative style to control stack memory more precisely. For example, we added an option to control the depth of the scheduler’s search, in case it becomes too greedy for the good of the robot.

  1. This actually happened to us in 2024 during our match versus Coffee Machine. Our robot timeouts during an early action, and loses a lot of time.