Meet our next explorers
Planetary robotics has mostly been organised around one extraordinary machine. That is a beautiful engineering answer: build the rover carefully, protect it obsessively, drive it with great patience and ask it to be geologist, cartographer, photographer and lab assistant all at once. The weakness is just as obvious. A trapped wheel, a lost link, a bad landing ellipse or one deeply annoying memory fault can concentrate a decade of planning into a disappointingly small patch of ground.
Swarm robotics comes from a less heroic instinct. In nature, exploration is rarely handed to one flawless individual. Ant-colony optimisation formalised the idea that local trail reinforcement can become a competent search process. Honeybee swarms show how many partial reports can accumulate into a collective choice. Even slime mould network studies are useful here, because they show how repeated local reinforcement can produce transport networks that are both efficient and fairly tolerant of damage.
This project asks how far that logic can be pushed toward a planetary mission. The simulator models rovers, aerial scouts and relay nodes on Mars-inspired terrain, then compares six coordination policies under planetary constraints: energy limits, line-of-sight communication and permanent agent failures. Coverage matters, of course, but I treated it as the floor of the problem. The harder question is whether the search remains scientifically useful when the landscape is awkward and the team is no longer intact.
The two proposed policies carried most of the argument. ARES reached 80% coverage fastest, HERMES reached 95% coverage fastest and produced the highest mean scientific return, and both remained remarkably steady after half the swarm was removed. That last result is really the most interesting: it suggests the behaviour had something exponentially more valuable than speed, a structure that could forgive damage.
The work is less about speed alone, and more about useful autonomy
I did not want the dissertation to become a fancy way of proving that many robots can colour in a map faster than one robot. In a clean simulator that result is almost baked in. The more useful test is a lot messier, can the swarm move quickly while saving energy, keeping a communication backbone alive, visiting science targets and recovering when some of its members disappear?
For that reason, the evaluation keeps time to coverage, final coverage, science score, robustness and message cost in the same frame. A policy can look amazing on coverage while quietly stranding useful agents beyond the relay mesh. Another can chase science so aggressively that it learns the terrain too late. The interesting behaviour sits in those tradeoffs, where the boring engineering constraints have a vote on the outcome.
The research questions were written to keep that balance visible. They separate coverage, resilience and scalability, then ask whether ARES and HERMES change the shape of the mission rather than only the last percentage printed at the end of a run. A pretty final number is nice; a mission that becomes useful earlier is more convincing.
Can a heterogeneous swarm reach high coverage and scientific return under terrain, energy and communication constraints?
Can the swarm remain useful after catastrophic agent loss?
Are the proposed methods meaningfully better than established baselines?
How do swarm size and role composition trade speed against communication cost?
A controlled simulator, built as five small layers
I built the simulator in layers because the comparison only means anything if every policy inherits the same world. Terrain, communication, energy accounting, failures and time budget are shared underneath the algorithms. When one method performs better, the difference should come from the decision policy, not from a conveniently friendlier planet hiding below it.
The terrain layer turns Mars-like surfaces into a grid with enough structure to matter. A cell can be open regolith, rough highland, crater material, hazard or science target, and those labels affect traversal cost, path choice, line-of-sight communication and the value of visiting the location. A rover crossing an easy plain should not make the same decisions as one picking its way along a crater rim.
The terrain model uses orbital products such as Mars Orbiter Laser Altimeter data and high-resolution landing-site imagery. This version in the above demo uses the Perseverance landing site at Jezero Crater as a chosen anchor, then abstracts the surface into research-friendly terrain profiles.
Above the map is the agent layer. The default team is 60% rovers, 25% scouts and 15% relays, which gave the runs a practical mix of scientific labour, fast survey and communications support. Rovers are slow and useful. Scouts pull the frontier outward. Relays do the unglamorous work of keeping the team from becoming a set of disconnected little adventures.
The evaluation harness is the boring part, which is precisely why it matters. Every policy faces the same 2,000-step budget, the same energy assumptions, the same failure events and the same message accounting. That discipline is what lets the results read as an experiment rather than a nice visual demo with cool Mars colours.
Algorithms borrowed from nature, then made practical
The baselines were chosen to give the proposed methods something honest to push against. The single rover is the conventional reference: one agent performing a careful lawnmower-like sweep. ACO is inspired by ant foraging, where local trail reinforcement helps a group discover useful paths without a central map. Greedy Nearest is intentionally blunt, asking which useful frontier or target is closest right now. Market-Based coordination treats tasks like an auction, with agents bidding according to local cost and expected value.
I found the biological literature useful mainly because it gives a language for decentralised pressure. Ant systems show how positive feedback can accelerate discovery, and also how too much reinforcement can make a group overcommit. Honeybee decision-making is a good reminder that many scouts proposing imperfect options can be better than one agent trying to know everything. Physarum network experiments show how useful paths can strengthen while weak paths fade, producing networks that balance efficiency and fault tolerance in a pleasantly weird way.
I did not try to copy biology literally. A Mars swarm cannot rely on pheromones, waggle dances or living tissue. What transfers is the control pattern: local evidence, limited communication, repeated small decisions, diversity of proposals and graceful degradation when the team is damaged.
That framing shaped the comparison, ACO supplies the classical swarm reference. Market-Based coordination makes allocation explicit. Greedy Nearest is a useful stress test because it shows how far a very simple local rule can get before its limits show. ARES and HERMES take those ingredients and make them more mission-aware: still distributed at the tactical level, but more careful about science return, terrain cost and communication structure.
| Policy | Coordination idea | Useful behaviour |
|---|---|---|
| Single Rover | One agent works through a coverage sweep. | A deliberately plain reference for conventional exploration. |
| ACO | Agents reinforce useful paths through pheromone-like memory. | Distributed coverage with a strong path-following bias. |
| Market-Based | Agents bid for tasks using local cost and expected value. | Clear allocation logic with relatively restrained communication. |
| Greedy Nearest | Agents chase the nearest frontier or science target. | Strong science collection, although high coverage arrives later. |
| ARES | Sector frontier search with science auctions and active relays. | Fast early coverage without forgetting science or the relay mesh. |
| HERMES | ARES with a small adaptive mission-mode controller above it. | The quickest 95% coverage and the highest mean science return. |
ARES plans the swarm; HERMES changes the mission mood
ARES, Adaptive Relay-backed Ergodic Search, was my attempt to keep the useful parts of swarm behaviour without turning the system into a mystery box. It divides the map into soft outward search sectors, so the team is encouraged to spread instead of collapsing onto the same attractive frontier. A sector is a preference, not a prison: an agent can leave it for science collection, obstacle avoidance, relay support or fault recovery.
At each replanning step, ARES builds a candidate set from frontiers, science targets and probing moves. Candidate cells are scored using information gain, scientific value, sector alignment, relay usefulness, travel distance, terrain cost, peer separation and connectivity risk. Every agent gets a reason to move, but not the same reason. A scout may push into unknown space, a rover may accept a slower route toward a valuable target, and a relay may hold a position that looks dull until the rest of the team suddenly needs the link.
Relay behaviour is particularly important on planetary terrain, where communication is shaped by range, timing and line of sight. A beautifully covered map loses a lot of its charm if half the agents cannot report what they saw. ARES treats relay support as part of planning rather than a clean-up chore, so relays are encouraged to sit near corridors where scouts and rovers are likely to stretch the mesh.
HERMES, Hybrid Exploration, Relay-Mesh and Science Learning, keeps the ARES tactical layer and adds a small online controller above it. It uses a bounded form of Q-learning to choose the current mission mode from aggregate signals such as coverage, science return, connectivity, relay support, remaining energy and failure pressure.
The modes are intentionally readable: expand the explored region, collect science, strengthen relays, recover after loss, or sweep the remaining boundary. Choosing a mode changes the weights inside the ARES utility function. In practice, HERMES can make the same swarm behave as if it is more curious, more cautious, more infrastructure-focused or more interested in finishing the job.
Interpretability is not decoration in a planetary system. If the swarm makes a strange choice, the mission team needs to know whether it was avoiding a communication shadow, chasing a high-value target, recovering from a failed rover or running out of useful frontier. HERMES was built to adapt without becoming opaque: the learned layer chooses the mission posture, while the local motion remains auditable through the ARES utility function.
ARES utility
Ci(t) = Fi(t) union Pi(t) union Si(t)
ci*(t) = argmaxc in C_i(t) UiARES(c,t)
UiARES(c,t) = wII(c) + wSS(c) + wAAi(c) + wRRi(c) - wDdi(c) - wTtau(c) - wPPi(c) - wLLi(c)
ARES starts with frontiers, probes and science targets, then scores each option against what the agent can actually afford to do. The useful cells are rewarded, but distance, rough ground, crowding and fragile links are allowed to push back.
HERMES mode update
zt = bin(Ct, St, Kt, Et, Phit)
Q(zt,mt) = (1 - alpha)Q(zt,mt) + alpha[rt + gamma maxm' Q(zt+1,m')]
ci*(t) = argmaxc in C_i(t) UiARES(c,t; Wm_t)
HERMES reduces the mission state into coarse bins, selects a readable operating mode, and passes that mode into ARES as a weight change. The learning layer nudges the priorities; the local planner remains something a human can inspect.
The proposed swarms moved the useful part of the mission earlier
The single-rover reference covered 33.0% of the map within the 2,000-step budget. The swarm policies all moved close to complete coverage, so the more revealing question became when useful information arrived. ARES reached 80% coverage in 155.6 steps on average, which moved the run into meaningful map knowledge much earlier than the baselines. HERMES reached 95% coverage in 365.9 steps and produced the highest mean science score.
The charts are worth reading as trajectories rather than scoreboards. A method that eventually reaches 100% coverage can still be operationally weak if it spends too long wandering through the uncertain early phase. In a real mission, early information helps route around hazards, choose which targets deserve attention and return science while the hardware is still healthy.
That timing is the main result for me. The proposed swarms give the operator a larger and more scientifically meaningful picture while the simulated hardware is still fresh. Physical missions are not neutral about time, things like dust, mechanical wear, thermal stress, localisation drift and battery degradation all make late science less certain than early science.
Greedy Nearest is a useful comparison because it collects science aggressively. It performs well on scientific return, but pays for that behaviour with the slowest 95% coverage time and the highest message overhead among the swarm methods. ARES and HERMES are more balanced. They move quickly, keep relays in mind and still collect almost all available science.
ARES is the sharper early explorer. It reaches 80% and 90% coverage before every other policy in the headline run, which fits its design: sector pressure keeps the swarm spatially spread, while science auctions stop frontier search from ignoring valuable targets. HERMES is the stronger closer. Its mission-mode controller helps the swarm shift from broad expansion into science, relay support and final boundary work, which is why it reaches 95% coverage fastest and finishes with the highest mean science return.
| Algorithm | C(2000) | T80 | T90 | T95 | Messages | Science |
|---|---|---|---|---|---|---|
| Single Rover | 33% | DNR | DNR | DNR | 0 | 1005.3 |
| Market-Based | 100% | 416.8 | 571.4 | 671.4 | 1559.4 | 1624 |
| ACO | 100% | 430.8 | 571.5 | 693.9 | 2731.9 | 2354.7 |
| Greedy Nearest | 100% | 473.3 | 679.6 | 843 | 3254.7 | 2711.6 |
| ARES | 99.7% | 155.6 | 244 | 480.1 | 2122.6 | 2725.5 |
| HERMES | 99.8% | 164.5 | 259.1 | 365.9 | 2241.3 | 2729.3 |
The surprising part was not that the swarm survived, but how little it changed
The harshest test removed 50% of the agents on Rocky Highland. The dissertation target was a fault index of at least 0.70 after that loss; ARES reached Phi(0.5)=0.996 and HERMES reached Phi(0.5)=0.998. Those numbers look almost suspiciously tidy, so the underlying performance matters as well: HERMES still covered 99.6% of the map, reached 95% coverage in 533.4 steps and retained the best science return in the sweep.
What interested me was the lack of drama after the failure. The policies did not need a completely different operating style once half the team vanished. The same ingredients that helped before the loss still helped afterward: spatial spread, local candidate scoring, science-aware allocation and relay support. When part of the swarm disappeared, the remaining agents already had a structure that could stretch into the missing space.
HERMES has a small advantage in the failure sweep because its mission-mode controller can respond to the new state of the team. Weak connectivity can make relay support more valuable. A mostly known map can make final sweeping more important than broad exploration. A poor science score can push the ARES layer toward collection without rewriting the low-level planner.
ARES is still impressive because it is deterministic and inspectable. It does not need a learned policy to keep the mission coherent after attrition. Sector pressure and utility penalties keep the remaining robots from crowding one patch of terrain, while the relay terms still discourage decisions that would strand useful agents beyond communication range.
Adding robots buys speed, then sends the networking bill upward
Scaling tells the other half of the story. Increasing the team from 10 to 100 agents made every swarm policy faster on Rocky Highland. HERMES dropped from 1198.2 steps to 209.4 steps for 95% coverage; ARES fell from 1096.8 to 274.0. This is the obvious attraction of a swarm: parallelism turns waiting time into coverage, and a larger team can work several terrain pockets at once.
The cost is communication. More robots mean more local coordination, more relay support, more task claims and more message passing. That is not automatically a failure, but it changes the design problem. A planetary swarm has to respect bandwidth, power, scheduling windows and the awkward fact that some links may only exist intermittently. Radios, sadly, do not care how elegant the coverage curve looks.
That is why the proposed swarms keep communication inside the algorithm rather than treating it as a post-processing metric. ARES scores connectivity and relay support during target selection. HERMES can shift into a relay-supporting mode when the mesh looks fragile. Both policies accept that the fastest local move can be the wrong move for the whole mission.
Role composition matters too. More scouts make the frontier grow quickly, but they do not replace the scientific work of rovers. More relays make the mesh safer, but every relay is an agent that is not directly collecting science. More rovers improve sample return, but they can make the team slower and more dependent on carefully maintained links. The 60/25/15 default is a balanced baseline, not a claim that one mixture wins everywhere.
Where next?
The solar system is almost insultingly large. The distances involved do not really fit in the human mind, so we reason about them mathematically, abstractly, at arm's length. And yet we keep sending machines into that scale one careful mission at a time, learning slowly and expensively. That pace made sense while planetary exploration was finding its feet. It is harder to defend as the questions get bigger.
As exploration extends to more moons, more surfaces and more questions worth asking, the old model of a single highly capable machine doing everything starts to feel like the wrong unit of thought. The frontier is not a single site. It is huge, uneven and inconveniently scattered, with valuable science sitting in many places at once. Thinking at that scale means rethinking what an exploration system even looks like.
This work is a small attempt to sit with that problem seriously. What does it look like to explore broadly, to trade some individual precision for collective coverage, and to let a system absorb loss without treating every failure as mission-ending? The simulation is bounded, and the hardware challenges are real. Still, the central argument holds: swarms can do things solo missions structurally cannot, and that gap becomes more important as the targets become more ambitious.
Thinking in large numbers is uncomfortable. It means letting go of the idea that we can monitor every decision, control every route and account for every contingency in advance. That discomfort might be the right place to begin. The next era of exploration will still need brilliant individual machines, but it may also depend on learning to trust coordinated systems that can search, fail, recover and keep going at a scale a single rover was never meant to carry.