← homepage

I Audited My Own iOS App's Supabase Backend and Found More Than I Bargained For

The RLS Problem Nobody Talks About Clearly

Row Level Security in Postgres is the kind of thing that sounds simple and isn't. You enable it on a table and you think you're done. You're not done.

Supabase's security advisor flagged several tables as "RLS enabled, no policies." What that means in practice: RLS is on, so the table is locked down, but because there's no policy defined, every query returns nothing. Zero rows. It's not a data exposure problem, it's a silent failure. A user signs in, queries their data, gets nothing back, and you have no idea why.

That's the benign version.

The worse version is what I found on some of my tables: WITH CHECK (true) on the anon role. That's a policy that says anyone, authenticated or not, can write rows. I put that in during development to get things moving. It never came out. That's the kind of thing that bites you six months after you shipped.

The actual OWASP classification for this is API1: Broken Object Level Authorization. It's been at the top of their API security list for years because it keeps happening. A 2021 Parler breach exposed 70 terabytes of user data partly because their API didn't enforce object-level authorization checks. Users could request other users' data just by changing an ID in the URL. My WITH CHECK (true) policy isn't the same thing exactly, but it's in the same neighborhood. Writing data without being who you say you are.

My fix was straightforward once I knew what to look for: every table that handles user data needs auth.uid() = user_id in the policy, both for reads and writes. The Supabase docs make this look easy. It is easy, once you understand that "enable RLS" and "write a policy" are two separate steps and both are required.

The Supabase security advisor isn't something most tutorials mention. It sits in the dashboard under a tab most people walk past, and honestly it should probably be harder to ignore than it is. I didn't run it because some article told me to. I ran it because somewhere in the middle of a refactoring pass on Holy Scroll, I kept seeing the same category of problem surface. Not crashes, not broken features. Architecture that felt wrong the more I looked at it. Tables I wasn't sure were protected correctly, RPC functions whose permission model I couldn't reconstruct from memory, comments in my own code like "ideally we should use the authenticated client here, but this operation is naturally tied to the device." I was tracking everything in Fizzzy.do at the time, and there was this growing cluster of tasks that kept getting pushed down because none of them were user-facing. None of them were broken. They just weren't right.

That's when I sat down and actually tried to understand the architecture, not just patch what was visibly wrong. The report came back with ten advisory categories across my production project. Ten, on a Bible app over a thousand real users. And looking back at each finding, every single one traced back to a decision I had made, usually under time pressure (sometimes being lazy or feeling smarter than I thought I was), that felt reasonable in the moment and only looked like a problem once I was forced to examine it from the outside.


RLS Is Not Just a Toggle

Row Level Security is the thing Supabase tutorials introduce in the first five minutes and then move past before you really absorb what it does. The pitch is clean: enable it on a table, write a policy, Postgres enforces access at the row level based on who's calling. What the pitch leaves out is that there are at least three distinct ways to implement it and still end up with a misconfigured database.

The first one I hit: I enabled RLS on my daily_verse_delivery table and never wrote any policies. What happens in that situation is not that reads and writes proceed freely. What happens is an implicit full deny. Every SELECT, INSERT, and UPDATE gets blocked for every role, including authenticated users. The table had 13 rows from before I turned RLS on. After that point, nothing new was being written. My cron job was silently failing to log delivery records, catching the exception, printing a warning, and moving on. Zero user-facing symptoms. I only found it when I looked at the row count in the advisor output and wondered why a table I was actively writing to had stopped growing.

This is how Postgres behaves, and it's the correct behavior. But when you're the only person on the project, there's no one running nightly anomaly checks. Silent failure is expensive.

The second issue was more instructive. I had WITH CHECK (true) on INSERT and UPDATE policies for the anon role on my devices and notification_events tables. The reasoning is right there in comments I left myself: device registration happens before a user signs in. Someone opens Holy Scroll for the first time, hasn't created an account, the app needs to register their device so push notifications work. So I wrote a policy that lets the anon role insert without any predicate check.

What I hadn't fully thought through was what "anon" actually means in Supabase's model. It doesn't mean "this specific iOS device running my app." It means any HTTP request that presents the anon key. My anon key is in Release.xcconfig, which gets compiled into the app binary, which means it's extractable by anyone who downloads the app from the App Store. That's not a secret by design. Supabase is explicit that the anon key is public-facing and that RLS is what protects your data. But WITH CHECK (true) means there is no predicate. Anyone with the key, from anywhere, can insert rows into my devices table with no validation that the request came from an iOS device running Holy Scroll rather than a script running on a laptop.

In January 2026, an AI social platform called Moltbook had its entire Supabase database publicly exposed because RLS was disabled on their tables. Email addresses, authentication tokens, and API keys for 1.5 million accounts were accessible to anyone with the anon key and a basic HTTP client. The root cause is the same category of mistake: having RLS on paper without actually thinking through what the policies permit. The scale of what happened to them relative to what my configuration exposed is different. The mechanism is the same.

The third issue I want to name here is the SECURITY DEFINER views. I had three of them: user_monthly_stats, user_total_stats, and admin_user_analytics. A view defined with SECURITY DEFINER runs as the view owner rather than the querying user, which means RLS on the underlying tables is bypassed entirely. The owner in Supabase is typically the postgres superuser. So any authenticated user who had SELECT on admin_user_analytics could query aggregated data across all 722 users in my database, regardless of what RLS said about the tables underneath it. I'd created that view months earlier to inspect data in the dashboard and never went back to audit who could actually reach it from the API layer.


The 22 That Actually Kept Me Up

The finding that made me sit with it longest was a category called anon_security_definer_function_executable. Functions that are both SECURITY DEFINER and callable by the anon role without any authentication. I had 22 of them.

SECURITY DEFINER on a Postgres function means it executes with the privileges of the function owner, not the caller. I thought I was making things more controlled by using it. Combined with being accessible to the anon role, the actual effect is the opposite: the function bypasses RLS, runs with elevated privileges, and requires no session token to invoke. The anon key alone is sufficient.

The worst one was migrate_anonymous_data. This RPC handled a real product problem: when someone uses Holy Scroll anonymously for a while and then signs in with Apple, their reading history should transfer over. The function signature was migrate_anonymous_data(p_user_id uuid, p_device_id text). It accepted any UUID as a parameter and reassigned anonymous analytics rows to that user's account.

The vulnerability has a name. OWASP calls it Broken Object Level Authorization, API1 in the API Security Top 10. The pattern is simple: an endpoint accepts an object ID as a parameter without verifying the caller has any right to act on that object. Here, the object is a user account, the identifier is a UUID, and the action is claiming ownership of their data. And because the function was SECURITY DEFINER and callable by anon, this required no authentication at all. Just the public anon key and a valid user UUID.

Parler's 2021 data scrape is the canonical example of what this looks like at scale. Their API accepted sequential post IDs with no authentication and no rate limiting. Someone wrote a script that walked the ID space in order and downloaded 70 terabytes of content, including posts users had deleted, which were only soft-deleted in the database and still fully accessible through the API. The specific vulnerability class in my case is an accepted-parameter IDOR rather than sequential enumeration, but the underlying principle is identical: an endpoint accepted an identifier without checking who was asking.

The correct fix is structural, not just a permission revoke. The function should never accept p_user_id as a parameter at all. Derive the caller's identity from auth.uid() on the server side and run the function as SECURITY INVOKER so RLS applies:

CREATE OR REPLACE FUNCTION
migrate_anonymous_data(p_device_id text)
SECURITY INVOKER
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE user_daily_stats
SET user_id = auth.uid()
WHERE device_id = p_device_id AND user_id IS NULL;
END;
$$;

auth.uid() returns null for unauthenticated requests. The WHERE clause matches nothing. The function becomes useless without a valid session. The same pattern applies to every function that was previously accepting a user identifier as a trusted input: replace the parameter with a server-side context lookup.

There were also 17 functions flagged for a mutable search_path. The attack surface here is a schema injection: if an attacker can plant an object in a schema that resolves before public in Postgres's search path, they can intercept your function calls by shadowing them. The remediation is one line added to each function definition, SET search_path = public, which pins the resolution context. It's invisible in a code review, it's the kind of thing you'd never think to check, and Supabase's advisor caught all 17.


Sign In With Apple: The Part I Actually Built Correctly

I want to spend some time here because it's easy for a piece like this to read as a list of things that went wrong. The Sign in with Apple implementation is the part of Holy Scroll's auth story I'm most confident in, and understanding why it's correct is as useful as understanding why the database wasn't.

Apple uses an OIDC flow. They act as the identity provider, issue a signed JWT called an identity token, and your backend validates that token to confirm the user is who they say they are. The mechanism that prevents replay attacks is the nonce.

Before initiating sign-in, the app generates a cryptographically random 32-byte nonce using SecRandomCopyBytes, hashes it with SHA-256, and sends the hash to Apple as part of the authorization request. Apple embeds that hash in the identity token it signs. Supabase validates the token server-side and checks that the nonce matches. An intercepted token in transit is useless without the original nonce, which never left the device.

The fallback in my randomNonceString() function, for the rare case where SecRandomCopyBytes fails, originally used a trimmed UUID().uuidString. I eventually changed it. A UUID has around 122 bits of entropy if generated correctly, which is technically acceptable, but prefix(32) on a UUID string gives you the first 32 characters of a hex-dash formatted string, and the structure of that format is predictable in ways that reduce the effective entropy. The change was marginal in practice. The principle matters: when you're generating values that gate authentication, every bit of entropy you don't preserve is entropy you chose to discard.

One edge case that catches a lot of people with Apple Sign In: Apple only returns the user's name on the first authentication. Every sign-in after that returns nil for the name fields. A naive upsert on every sign-in overwrites your stored names with null. I left a comment in the upsert function: "SMART upsert, preserves existing data when Apple doesn't provide names." What that means in practice is fetching the existing profile first, merging new data over only the non-nil fields, and then writing back. More network calls than a straight upsert, but the only way to not silently destroy your users' display names on their second login.


Session Storage: The Bug That Lived in a Comment

Somewhere during a debugging session, I added two lines to the session monitoring loop:

UserDefaults.standard.set(session.accessToken, forKey: "supabaseAccessToken") UserDefaults.standard.set(session.refreshToken, forKey: "supabaseRefreshToken")

I already had these tokens being written to the Keychain through SessionKeychainStore. The UserDefaults write was for quick inspection during development. It never got cleaned up.

The distinction between those two storage locations matters more than it looks. UserDefaults on iOS has a default file protection class of NSFileProtectionCompleteUntilFirstUserAuthentication, which means the data is decryptable any time after the device's first post-boot unlock, including while the screen is locked and background processes are running. Keychain items with kSecAttrAccessibleWhenUnlocked are only accessible while the device is actively in use. An unencrypted iTunes backup includes UserDefaults. A refresh token sitting in that backup is a long-lived credential that doesn't rotate the way an access token does.

This is iOS's version of the localStorage versus httpOnly cookie problem on the web. The secure storage option exists. I built it. I then added a second storage path for convenience and left it there.

Related: I had unconditional print() calls throughout the authentication flow logging user IDs, full names, and emails, with no #if DEBUG guard. On iOS, that output goes to the unified logging system and is accessible over USB via Console.app without a jailbreak. Crash reports submitted through MetricKit or Xcode Organizer can include recent log output. None of that belongs in a release build. The fix is wrapping every print() that touches a name, email, or identifier in a debug guard, or replacing it with os_log with the appropriate privacy annotations.


The Dual-Write Bug I Introduced While Trying to Be Careful

This one's embarrassing in a very specific way.

I was worried about session persistence. I wanted to make sure that when Supabase updated the session tokens, they got stored somewhere durable. So in startSessionMonitoring(), I set up auth state change listeners, and when a new session came in I wrote the tokens to... both UserDefaults and the Keychain.

The intent was right. The execution created a problem.

UserDefaults is not encrypted. On iOS, it's stored at Library/Preferences/<bundle-id>.plist and it gets backed up to iCloud by default. So my session refresh tokens, which are equivalent to long-lived credentials, were sitting in a location that iCloud backup could pick up and that any process with the app's bundle ID entitlement on a compromised device could read.

NSFileProtectionCompleteUntilFirstUserAuthentication is the file protection class that applies after first unlock. It's not the highest protection level. For tokens, you want kSecAttrAccessibleWhenUnlocked in the Keychain, where data is encrypted at rest and only accessible when the device is unlocked.

My SessionRestorationService does read from the Keychain correctly. The session restoration path is fine. The problem is the startSessionMonitoring() dual-write creating a second, less secure copy of the same credentials. Those should be Keychain-only.


The Notification Console

At some point curl commands aren't the right interface for sending push notifications to real users. I built a Next.js admin portal that runs locally behind Tailscale. The access model has three independent layers. Reaching the portal requires enrollment in my tailnet, which requires my explicit approval. Once inside, the portal requires a valid Supabase session. Even with a valid session, calling the Edge Function that actually delivers notifications requires an x-admin-secret header that's separate from the Supabase credentials entirely. Any one of those three layers failing blocks the action.

The architecture of the full pipeline:

pg_cron (hourly)
-> send-daily-verse Edge Function  [auth:
service_role OR x-admin-secret]
-> send-push-notification Edge Function  [auth:
x-admin-secret]
-> APNs

Portal (Next.js, Tailscale)
-> Server Action  [auth: Supabase session]
-> send-push-notification Edge Function  [auth:
x-admin-secret]
-> APNs

Two paths into the same pipe. The cron path runs automatically on an hourly schedule. The portal path is deliberate, manual sends.

The server action uses supabase.auth.getUser(), not supabase.auth.session(). The distinction is that getUser() makes a live API call to validate the token, while session() reads from local state and trusts whatever's cached. For an admin operation, live validation is the only acceptable choice. A replayed or expired session should not pass. I left a comment in the code referencing OWASP Broken Access Control specifically because I knew I'd rationalize the simpler version the next time I looked at it.

Zod validates every input field before the Edge Function sees anything: types, value ranges, cross-field refinements. The schema rejects invalid image URLs, badge counts below zero, relevance scores outside 0 to 1, and payloads with neither a title nor content_available. The Edge Function has its own validation on top of that. Redundant validation at multiple layers isn't over-engineering when you're building tooling that broadcasts to everyone.


Two bugs in the delivery layer I found after shipping:

The first one was that apns-push-type was hardcoded to "alert" in the Edge Function despite accepting an apns_push_type parameter in the request body. My daily verse function sends content_available: 1 to trigger a background app refresh when the verse arrives. Apple's APNs spec requires apns-push-type: background for those payloads. Sending content_available with apns-push-type: alert doesn't generate an error. Apple accepts the notification and then delivers it at its own discretion. In Low Power Mode and certain Focus states it silently drops it. I spent longer than I'd like to admit assuming the problem was in my timezone-aware hour-matching logic. It was a hardcoded string I forgot to wire to the parameter I was already parsing.

The second was that I was calling push_enabled = false on a device for both APNs 410 and 400 responses. A 410 (Gone) means the device token is permanently dead, the device was reset or the app was uninstalled. Disabling it is correct. A 400 (Bad Request) means I sent a malformed payload. The device is fine. I made an error on my end. Tombstoning the device on a 400 means a single malformed broadcast silently disables every device it reaches. I caught this after a test send where a field had the wrong format and several sandbox devices stopped receiving notifications. The connection took a while to surface.


The APNs Bugs I Didn't Notice Until I Looked

Two bugs in my send-push-notification Edge Function that I want to document because they're subtle.

First: the apns-push-type header is hardcoded to "alert" regardless of what the caller passes. APNs uses this header to route notifications correctly. Background notifications need "background". Sending a background notification with apns-push-type: alert violates APNs requirements and Apple can reject or throttle it. I have a parameter to accept different push types but I never wired it to the actual header.

Second: I'm marking a device's push_enabled column as false on both 410 and 400 response codes from APNs. These mean very different things. A 410 is "this device token is permanently invalid and you should stop sending to it." A 400 is a bad request, usually a malformed payload or wrong push type. My malformed payload shouldn't permanently disable push for a real user, but that's what happens. A user who gets caught by a 400 response will silently stop receiving notifications until they reinstall.

The APNs JWT is also regenerated on every function invocation. The ES256 token has a 60-minute lifespan. Generating a new one for every push adds unnecessary latency and burns compute. Module-level caching with a 50-minute TTL is the right pattern.


Account Deletion, and the Part I Shipped Knowing It Was Incomplete

I'll be direct about this one. When I built the account deletion flow, I knew it wasn't complete. My deleteAccount() function deletes the user's row from user_profiles and calls signOut(). There is a comment in the code that I wrote: "Note: Auth user record may still exist in auth.users table. To fully delete auth user, consider setting up a database trigger or using admin API."

That comment was honest self-documentation from a moment where I was trying to ship the delete account screen because Apple's App Store review guidelines require one, I had a release queued, and the full implementation wasn't done. It went on the Fizzzy.do board. It stayed there through several releases because from the outside, the behavior looked correct. The sign-out happened, the UI reset, the account appeared gone from the app. What persisted invisibly was the auth record in Supabase's auth.users table, along with the user's email address and Apple subject identifier.

Under GDPR Article 17, the right to erasure means all of it, not just the profile row. This is exactly the kind of gap that accumulates when you're moving fast without a project manager or a compliance review. I'm not making excuses for it. I'm naming it because every solo developer I know has some version of this: a comment in the code that says "this isn't right yet" that has been there long enough it's started to feel permanent.

The correct implementation is a server-side Edge Function that calls supabase.auth.admin.deleteUser(). The client can't call the admin API directly because it requires the service role key. An Edge Function with the same authenticated gate as everything else handles both the profile deletion and the auth record deletion. The iOS client calls the function, the function handles cleanup completely, and the user is actually gone.


I I did my undergrad in cybersecurity. I work as a cybersecurity professional. I can tell you the formal definition of Broken Object Level Authorization, explain what SECURITY DEFINER does at the Postgres privilege model level, and describe why NSFileProtectionCompleteUntilFirstUserAuthentication is weaker than it sounds.

What school didn't give me was the experience of having 22 functions callable without authentication in a production database with real users in it and having to figure out which ones were actually exploitable and which ones were just noisy. Some of these were things I knew about abstractly but hadn't fixed. Some were things I did intentionally during development that never got cleaned up. One of them (the dual-write) I introduced while actively trying to be more secure, which is humbling.

Building Holy Scroll is where theory stopped being theory. The gap between knowing what RLS is supposed to do and shipping WITH CHECK (true) because you're focused on getting device registration working and you'll tighten the policy later, that gap only exists in production.. The difference between those outcomes and a quiet afternoon fixing advisor warnings is usually not sophistication. It's whether someone looked at the thing from the outside before someone else did.

Holy Scroll is a better app for having looked.


Holy Scroll is available on the App Store.

Sources: