This paper picks up roughly where the last one left off. We had built models that could flag women likely to disengage from ARMMAN's mMitra program — good enough to deploy, and validated in the field. But there was an uncomfortable gap between "here's a list of 500 high-risk women" and "here's who your three health workers should call today."
Resources at an NGO aren't infinite. You can't intervene on everyone the model flags. So which 20 do you actually call? And tomorrow, given what happened today, which 20 do you call then? Prediction alone doesn't answer that. You need a planning layer on top — something that reasons about who is most likely to respond to an intervention, not just who is most likely to drop out.
That's the problem this paper tackles, using a framework called Restless Multi-Armed Bandits.
A Quick Recap of the Prediction Part
The setup is the same as before: ARMMAN's mMitra sends automated voice calls with maternal health information to enrolled women throughout pregnancy and after delivery. About 40% of enrolled women fail to really engage — listening for less than 30 seconds on more than half the calls they pick up. Health workers can call these women personally to encourage continued engagement, but they can only do so many per day.
In this paper we rebuilt the prediction model on a slightly different formulation — using one month of call history instead of two, which lets you flag at-risk women earlier. The architecture is the same ReNDiP (bidirectional LSTM + demographic features) from the previous work. It gets 74.4% recall on low-engaging beneficiaries from just the first month of data, which is good enough to be useful.
We also ran a proper randomized pilot — the Pilot Service Quality Improvement Study (PSQIS) — to actually measure whether interventions help. Four groups: no intervention (control), SMS only, phone call only, and a hybrid SMS-then-call strategy. The results were clear.
6,563 predicted at-risk women, split across four groups. Measured by the fraction showing high engagement in the 15 weeks post-intervention:
Control: 23.3% · SMS only: 27.7% · Hybrid: 29.7% · Phone call: 37.6%
A personal phone call produced a 61% relative improvement over no intervention. SMS barely moved the needle.
The hybrid strategy was interesting: send an SMS first, wait six weeks, then only call the women who didn't respond. The idea was to conserve expensive call resources for the cases that really needed it. It worked moderately well but didn't beat just calling — probably because the six-week wait meant some women had already drifted too far by the time the call happened.
The Planning Problem
Here's the thing about a list of predicted at-risk women: not all of them will respond to an intervention equally. Some are in a temporary slump — they'll re-engage soon whether you call or not. Others are genuinely on their way out and a call will pull them back. Others still are disengaged for structural reasons (no time, wrong phone number, network issues) and no amount of calling will change that. You want to spend your limited calls on the second group.
A prediction model can't tell you this. It tells you who's at risk. It doesn't tell you who's movable.
This is where Restless Multi-Armed Bandits come in. The RMAB framework is built for exactly this situation: you have a large number of "arms" (here, beneficiaries), each evolving according to their own dynamics, and at each time step you can only pull a limited number of them. You want to pick the subset that maximizes your total reward over time.
In our formulation, each woman is an arm. Her state is simple: either engaging (E2C ratio above 0.5 in the past month) or not engaging. Pulling an arm means calling her. The reward is +1 when she's engaging, −1 when she's not. The system learns, from real intervention data, what the transition probabilities look like — how likely is a not-engaging woman to become engaging if you call her? If you don't?
Whittle Indices: A Practical Trick for an Intractable Problem
Finding the optimal policy for an RMAB is technically PSPACE-hard — it doesn't scale. The standard solution is the Whittle Index, a heuristic from 1988 that turns out to work remarkably well in practice. The idea: instead of solving the full joint problem, assign each arm an index — a single number reflecting how valuable it is to intervene on that arm right now. Then just pick the top-k arms by index.
Intuitively, the Whittle Index captures the answer to: "how much passive reward would I need to be offered before I'd prefer not intervening on this arm?" A high index means the arm badly wants pulling. A low index means it'll probably be fine on its own.
One practical challenge: to learn the MDP transition probabilities, you need enough data per beneficiary, and with tens of thousands of women you don't have nearly enough transitions per person. The fix is clustering — group women with similar demographic and call-history profiles together, estimate shared MDP parameters for each cluster, then use K-means to merge clusters that have similar parameters. The Whittle Index computation then scales with the number of clusters, not the number of women.
Did It Work?
The evaluation here is necessarily limited — you'd need a dedicated new intervention study to fully test a sequential planning policy, and that's expensive to run. What we could do was retrospective: use the PSQIS data, compute Whittle Indices for everyone, take the top 100 ranked by the model, and check how many of them actually exhibited high engagement post-intervention in the call group versus the control group.
The gap was meaningful: 33.8% of the RMAB's top-100 picks in the call group became high-engagers. In the control group — the same top-100 picks, but no intervention — only 19.9% did. The model was selecting women who responded to being called, not just women who would've re-engaged anyway.
It's a preliminary result and the paper is honest about that. But the signal is real.
The Bigger Picture
What I find most satisfying about this line of work is how the three pieces fit together. The first paper built the prediction model. This paper adds the deployment validation and the planning layer on top. Together they form something that looks like an actual pipeline a health worker could use: a dashboard showing who's at risk, ranked by who's most likely to benefit from a call, updated as new call data comes in.
The RMAB framing also opens up directions that pure prediction can't touch — like learning to sequence interventions over time, or incorporating cost differences between SMS and phone calls into the optimization. The NGO world is full of resource constraints that look exactly like bandit problems once you squint at them right.
Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes
Siddharth Nishtala, Lovish Madaan, Aditya Mate, Harshavardhan Kamarthi, et al.
IIT Madras & Google Research & Harvard University & ARMMAN · arXiv:2103.09052