Meet our next explorers
Planetary robotics is usually built around one extraordinary machine. That machine is protected, driven carefully, and asked to do everything. It is a beautiful engineering answer, but it also concentrates a whole mission into one point of failure. A trapped wheel, a lost link, a corrupted memory sector or a bad landing can turn a decade of preparation into a very small explored patch of ground.
Swarm robotics starts from a different instinct. Nature rarely solves exploration with one perfect individual. Ant-colony optimisation grew from the way local trail reinforcement can become a useful search process. Honeybee swarms make collective decisions by sending many scouts into the world and letting evidence accumulate. Even slime mould network studies show how simple local reinforcement can produce efficient, fault-tolerant transport structures.
The research question behind this page is whether that kind of distributed behaviour can be useful for a planetary mission. The simulator models rovers, aerial scouts and relay nodes on Mars-like terrain, then compares six coordination policies under energy limits, line-of-sight communication and agent failures. The goal is not just to cover a grid. It is to understand whether exploration can stay valuable when the world is rough and the team is imperfect.
The strongest results came from the two proposed policies. ARES reached 80% coverage fastest, while HERMES reached 95% coverage fastest and produced the highest mean scientific return. Both stayed almost unchanged after losing half the swarm.
The work is less about speed alone, and more about useful autonomy
The central question was not whether many robots can paint a map faster than one robot. In a clean simulator, that would be an easy win. The more interesting question is whether the swarm can stay useful while it is being pulled in several directions at once: move quickly, avoid wasting energy, keep a communication backbone alive, visit scientific targets and recover when members disappear.
That is why the dissertation measures time to coverage, final coverage, science score, robustness and message cost together. A policy that covers the world quickly but leaves every rover isolated is not mission-ready. A policy that finds every science target but discovers the terrain too slowly is also incomplete. Useful autonomy lives in the tension between those measures.
The research questions were written to keep that balance visible. They separate coverage from resilience and scalability, then ask whether the proposed policies actually improve the shape of the mission rather than only the final percentage printed at the end of a run.
Can a heterogeneous swarm reach high coverage and scientific return under terrain, energy and communication constraints?
Can the swarm remain useful after catastrophic agent loss?
Are the proposed methods meaningfully better than established baselines?
How do swarm size and role composition trade speed against communication cost?
A controlled simulator, built as five small layers
The simulator was deliberately layered so the same terrain, communication and energy assumptions could be reused for every algorithm. That made the comparison cleaner: when a certain algorithm performed better, the difference came from the policy rather than a hidden change in the world underneath it.
The terrain layer turns Mars-like surfaces into a grid with enough structure to matter. Each cell can represent open regolith, rougher highland, crater material, hazards or a science target. Those labels change traversal cost, path choice, line-of-sight communication and the value of visiting a location. A robot crossing an easy plain should not behave like a robot pushing along a crater rim.
In a flight-grade version, the terrain layer would be grounded in orbital products such as Mars Orbiter Laser Altimeter data and high-resolution landing-site imagery. This portfolio version uses the Perseverance landing site at Jezero Crater as a visual and conceptual anchor, then abstracts the surface into research-friendly terrain profiles. It does not pretend to be a flight terrain product, but it preserves the useful shape of that world: clustered rough regions, sparse scientific opportunities and places where communication can become fragile.
Above that terrain sits the agent layer. The default swarm is 60% rovers, 25% scouts and 15% relays. Rovers are the slow scientific workers. Scouts are faster and better at pulling the frontier outward. Relays are the quiet infrastructure, creating bridges where terrain would otherwise split the team into isolated clusters.
The final layer is the evaluation harness. It keeps the policies honest by applying the same 2,000-step budget, the same energy assumptions, the same failure events and the same message accounting. That is what lets the results read like an experiment rather than a visual demo.
Algorithms borrowed from nature, then made practical
The baseline algorithms are intentionally varied. The single rover gives the conventional reference: a careful sweep by one agent. ACO is inspired by ant foraging, where local trail reinforcement helps a group discover useful paths without a central map. Greedy Nearest asks a simpler question at every step: where is the closest useful frontier? Market-Based coordination treats tasks like an auction, with agents bidding for work according to cost and expected value.
Nature-inspired design is useful here because it gives a vocabulary for decentralised pressure. Ant systems show how positive feedback can accelerate discovery, but also why too much reinforcement can make a swarm overcommit. Honeybee decision-making shows the value of many scouts proposing options before the group settles. Physarum network experiments show that useful paths can be strengthened while weak paths fade, creating networks that balance efficiency and fault tolerance.
The point is not to copy nature literally, a Mars swarm cannot rely on pheromones, waggle dances or living tissue. What transfers is the control pattern: local evidence, limited communication, repeated small decisions, diversity of proposals and graceful degradation when the team is damaged.
That framing shaped the algorithms. ACO gives the page a classical swarm baseline. Market-Based coordination makes allocation explicit. Greedy Nearest is a useful stress test because it shows how far a very simple local rule can go. ARES and HERMES then ask whether those ingredients can be made more mission-aware: still distributed in the tactical layer, but more careful about science return, terrain cost and communication structure.
| Policy | Coordination idea | Useful behaviour |
|---|---|---|
| Single Rover | One agent follows a coverage sweep. | Simple reference for conventional exploration. |
| ACO | Agents reinforce useful paths through pheromone-like memory. | Good distributed coverage with nature-inspired path bias. |
| Market-Based | Agents bid for tasks according to local cost. | Efficient allocation with relatively low communication overhead. |
| Greedy Nearest | Agents chase the nearest frontier or science target. | Strong science collection, but slower high-coverage completion. |
| ARES | Sector frontier search with science auctions and active relays. | Fast early coverage while keeping science and connectivity in view. |
| HERMES | ARES plus a small adaptive mission-mode controller. | Fast 95% coverage and the highest mean science return. |
ARES plans the swarm; HERMES changes the mission mood
ARES, Adaptive Relay-backed Ergodic Search, was designed to keep the strengths of biological swarm behaviour without making the system mysterious. It divides the map into soft outward search sectors, so the swarm does not collapse onto one attractive frontier. A sector is a preference rather than a wall: an agent can leave it for science collection, obstacle avoidance, relay support or fault recovery.
At each replanning step, ARES builds a candidate set from frontiers, science targets and probing moves. Candidate cells are scored by information gain, scientific value, sector alignment, relay usefulness, travel distance, terrain cost, peer separation and connectivity risk. That gives every agent a reason to move, but not the same reason. A scout may push into unknown space, a rover may take a slower route toward a high-value sample, and a relay may hold a position that looks unproductive until the rest of the team needs the link.
The relay behaviour is especially important, in a planetary setting, communication is bounded by the terrain. A perfectly covered map is less useful if half the swarm cannot report what it has seen. ARES therefore treats relay support as an active planning term rather than a clean-up step. Relays are encouraged to support corridors where scouts and rovers are likely to stretch the mesh, which keeps the map growing without letting the swarm tear itself into isolated pieces.
HERMES, Hybrid Exploration, Relay-Mesh and Science Learning, keeps the ARES tactical layer and adds a small online controller above it. It uses a bounded form of Q-learning to choose the current mission mode from aggregate signals such as coverage, science return, connectivity, relay support, remaining energy and failure pressure.
Those modes are intentionally readable: expand the explored region, collect science, strengthen relays, recover after loss, or sweep the remaining boundary. Choosing a mode changes the weights inside the ARES utility function. In practice, that means HERMES can make the same swarm feel more curious, more cautious, more infrastructure-focused or more completion-focused depending on what the run needs.
For a planetary system, interpretability is not a nice extra, if the swarm makes a strange decision, the mission team needs to understand whether it was avoiding a communication shadow, chasing a science target, recovering from a failed rover or simply running out of useful frontier. HERMES was built to adapt without becoming a black box: the learned part chooses the mood of the mission, while the local motion remains auditable through the ARES utility function.
ARES utility
Ci(t) = Fi(t) union Pi(t) union Si(t)
ci*(t) = argmaxc in C_i(t) UiARES(c,t)
UiARES(c,t) = wII(c) + wSS(c) + wAAi(c) + wRRi(c) - wDdi(c) - wTtau(c) - wPPi(c) - wLLi(c)
The candidate set combines frontier cells, probing moves and science targets. The utility then rewards information gain, science, sector alignment and relay value while penalising distance, terrain difficulty, crowding and link risk.
HERMES mode update
zt = bin(Ct, St, Kt, Et, Phit)
Q(zt,mt) = (1 - alpha)Q(zt,mt) + alpha[rt + gamma maxm' Q(zt+1,m')]
ci*(t) = argmaxc in C_i(t) UiARES(c,t; Wm_t)
HERMES bins the mission state, chooses a readable mode, then applies that mode as a weight vector inside ARES. The learned part changes what the swarm cares about next; it does not replace the inspectable tactical planner.
The proposed swarms moved the useful part of the mission earlier
The single-rover reference covered 33.0% of the map within the 2,000-step budget. The swarm policies all approached complete coverage, but the interesting difference was how quickly value arrived. ARES reached 80% coverage in 155.6 steps on average, a much earlier transition into useful map knowledge than the baselines. HERMES reached 95% coverage in 365.9 steps and produced the highest mean science score.
The charts should be read as trajectories, not just final scores, a method that reaches 100% coverage eventually may still be operationally weaker if it spends too long in the uncertain early phase. On a real mission, early information matters: it helps the team route around hazards, choose which targets deserve attention and return science while the hardware is still healthy.
This is the useful part of the mission moving earlier. The proposed swarms do not simply finish with a cleaner final number; they give the operator a larger and more scientifically meaningful picture while the simulated hardware is still young in the run. That matters because physical missions are not neutral about time. Dust, mechanical wear, thermal stress, localisation drift and battery degradation all make late science less certain than early science.
Greedy Nearest is a useful comparison because it collects science aggressively. It performs well on scientific return, but it pays for that behaviour with the slowest 95% coverage time and the highest message overhead among the swarm methods. ARES and HERMES are more balanced: they move quickly, keep relays in mind and still collect almost all available science.
ARES is the sharper early explorer. It reaches 80% and 90% coverage before every other policy in the headline run, which fits its design: sector pressure keeps the swarm spatially spread, while science auctions stop the frontier search from ignoring valuable targets. HERMES is the stronger closer. Its mission-mode controller helps the swarm shift from broad expansion into science, relay support and final boundary work, which is why it reaches 95% coverage fastest and finishes with the highest mean science return.
| Algorithm | C(2000) | T80 | T90 | T95 | Messages | Science |
|---|---|---|---|---|---|---|
| Single Rover | 33% | DNR | DNR | DNR | 0 | 1005.3 |
| Market-Based | 100% | 416.8 | 571.4 | 671.4 | 1559.4 | 1624 |
| ACO | 100% | 430.8 | 571.5 | 693.9 | 2731.9 | 2354.7 |
| Greedy Nearest | 100% | 473.3 | 679.6 | 843 | 3254.7 | 2711.6 |
| ARES | 99.7% | 155.6 | 244 | 480.1 | 2122.6 | 2725.5 |
| HERMES | 99.8% | 164.5 | 259.1 | 365.9 | 2241.3 | 2729.3 |
The surprising part was not that the swarm survived, but how little it changed
The severe test removed 50% of the agents on Rocky Highland. The dissertation target was a fault index of at least 0.70 after that loss. ARES reached Phi(0.5)=0.996 and HERMES reached Phi(0.5)=0.998. That number can look almost too neat, so the absolute values matter as well: HERMES still covered 99.6% of the map, reached 95% coverage in 533.4 steps and retained the best science return in the sweep.
The useful finding is not simply that there were still enough robots left to finish. It is that the policies did not need to become dramatically different after the failure. The same ingredients that helped before the loss still helped afterward: spatial spread, local candidate scoring, science-aware allocation and relay support. When part of the swarm disappeared, the remaining agents already had a structure that could stretch into the missing space.
HERMES has a small advantage in the failure sweep because its mission-mode controller can respond to the new state of the team. If connectivity weakens, relay support becomes more valuable. If enough of the map is already known, final sweeping can become more important than broad exploration. If science return has fallen behind, the controller can bias the ARES layer toward collection without rewriting the low-level planner.
ARES remains impressive because it is deterministic and inspectable. It does not need a learned policy to keep the mission coherent after attrition. The sector pressure and utility penalties still stop the remaining robots from crowding one patch of terrain, and the relay terms still discourage decisions that would strand useful agents beyond communication range.
Adding robots buys speed, then sends the communication bill upward
Scaling tells the other half of the story. Increasing the team from 10 to 100 agents made every swarm policy faster on Rocky Highland. HERMES dropped from 1198.2 steps to 209.4 steps for 95% coverage; ARES fell from 1096.8 to 274.0. That is the obvious appeal of a swarm: parallelism turns waiting time into coverage, and a larger team can search several terrain pockets at once.
The cost is communication. More robots create more local coordination, more relay support, more task claims and more message passing. That is not automatically bad, but it changes the design problem. A planetary swarm cannot be judged only by how quickly it fills a map. It also has to respect bandwidth, power, scheduling windows and the fact that some links may exist only intermittently.
This is why the proposed swarms keep communication inside the algorithm rather than treating it as a post-processing metric. ARES scores connectivity and relay support during target selection. HERMES can shift into a relay-supporting mode when the mesh looks fragile. Both policies accept that the fastest local move is sometimes the wrong move for the whole mission.
There is also a role-composition tradeoff. More scouts make the frontier grow quickly, but they do not replace the scientific work of rovers. More relays make the mesh safer, but every relay is an agent not directly collecting science. More rovers improve sample return, but they can make the team slower and more dependent on carefully maintained links. The 60/25/15 default was a balanced baseline, not a claim that one mix is universally best.
Where next?
The solar system is almost insultingly large. The distances involved do not really fit in the human mind, they have to be reasoned about mathematically, abstractly, at arm's length. And yet, somehow, we keep sending things out into it, one careful mission at a time, learning slowly. That pace made sense when we were just finding our feet. It is harder to justify now.
As exploration extends further, more moons, more surfaces, more questions worth asking, the old model of a single highly capable machine doing everything starts to feel like the wrong unit of thought. The frontier is not a single site anymore. It is vast and scarily large, and the science worth doing is scattered across all of it. Thinking at that scale means rethinking what an exploration system even looks like.
This work is a small attempt to sit with that problem seriously. What does it look like to explore not carefully but broadly, to trade individual precision for collective coverage, to let a system absorb loss and keep going? The simulation is bounded and the hardware challenges are real. But the core argument survives: swarms can do things solo missions structurally cannot, and that gap only grows as the targets get more ambitious.
Thinking in large numbers is uncomfortable. It requires letting go of the idea that we can monitor everything, control everything, account for everything in advance. But that discomfort might be exactly the right place to start. The next era of exploration will not be won by decades of development into a single machine. It will be shaped by learning to think, and explore at a scale we cannot even possibly imagine yet.