[chart/redis-ha] split-brain-fix does nothing when sentinel can't find a master

We ran into a situation where our redis-ha cluster got into a split-brain state during a node disruption, and the `fix-split-brain.sh` sidecar didn't do anything about it. After digging into the script, I think I found why.

### What's happening

When sentinel can't agree on a master (quorum is broken), `sentinel get-master-addr-by-name` returns an empty string. The main loop in `fix-split-brain.sh` checks for two cases:

```sh
identify_master   # sets $MASTER via sentinel get-master-addr-by-name

if [ "$MASTER" = "$ANNOUNCE_IP" ]; then
    # "I'm supposed to be master" - check that local redis agrees

elif [ "${MASTER}" ]; then
    # "Someone else is master" - check that local redis is replicating from the right node

fi
# Nothing here for when $MASTER is empty
```

When `$MASTER` comes back empty:
- First `if` is false (`""` doesn't equal our announce IP)
- `elif` is also false (empty string is falsy in shell)
- So the script just... sleeps and loops. No log, no warning, no recovery attempt.

This is exactly the scenario where you'd most want the split-brain fix to kick in, but it's completely inert.

### Why this matters

- **No visibility** - There's nothing in the logs to tell you sentinel lost quorum. The sidecar just silently keeps looping.
- **No recovery** - Redis nodes stay in whatever broken state they're in. We had to manually intervene.
- **False sense of security** - The container stays up and passes health checks, so everything looks fine from a monitoring perspective.

### How to reproduce

1. Deploy redis-ha with 3 replicas and `splitBrainDetection.enabled: true`
2. Wait for things to stabilize
3. Break quorum - e.g. kill 2 of 3 sentinel processes, or partition the network so sentinels can't talk to each other
4. Watch the split-brain-fix container logs: nothing gets printed for the empty-master case
5. Redis nodes may now be in an inconsistent state with no automatic recovery

### What I'd expect instead

At minimum, the script should log a warning when sentinel returns empty so operators know something is wrong. Ideally it should also:

- Check the local redis role as a diagnostic
- After some number of consecutive empty responses, try `sentinel reset` to kick off re-election
- Maybe write a status file that a readiness probe could check

### Suggested fix

Add an `else` branch:

```sh
if [ "$MASTER" = "$ANNOUNCE_IP" ]; then
    # existing logic...
elif [ "${MASTER}" ]; then
    # existing logic...
else
    EMPTY_MASTER_COUNT=$((EMPTY_MASTER_COUNT + 1))
    echo "$(date) WARNING: sentinel returned no master (attempt ${EMPTY_MASTER_COUNT}). Quorum may be broken."
    redis_role
    echo "  Local redis role: ${ROLE:-unknown}"

    if [ "${EMPTY_MASTER_COUNT}" -ge "${MAX_EMPTY_MASTER_RETRIES}" ]; then
        echo "$(date) ERROR: No master from sentinel after ${EMPTY_MASTER_COUNT} checks. Resetting sentinel."
        redis-cli -h "${SERVICE}" -p "${SENTINEL_PORT}" sentinel reset "${MASTER_GROUP}" || true
        EMPTY_MASTER_COUNT=0
    fi
fi
```

Could be controlled with a new values.yaml param like `splitBrainDetection.maxEmptyMasterRetries: 5`.

### Related issues

This feels like part of a pattern with silent failures in this script:
- #121 - the original split brain bug that motivated the fix
- #229 - the `==` vs `=` POSIX bug that silently broke the comparisons
- #383 - the race condition where a newly promoted master gets shut down because `identify_redis_master()` returns empty on masters

In each case, the script either does nothing or does the wrong thing, and there's no log output to help you figure out what happened.

### Environment

- redis-ha 4.35.10
- Kubernetes 1.28
- 3 replicas, default sentinel config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chart/redis-ha] split-brain-fix does nothing when sentinel can't find a master #397

What's happening

Why this matters

How to reproduce

What I'd expect instead

Suggested fix

Related issues

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[chart/redis-ha] split-brain-fix does nothing when sentinel can't find a master #397

Description

What's happening

Why this matters

How to reproduce

What I'd expect instead

Suggested fix

Related issues

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions