How to Vet a Custom Software Dev Shop: 12 Checks

You've got a real problem: a workflow that's held together with spreadsheets and a Zapier account that's about to hit its next pricing tier. You need software built, but you've been burned before — or you've heard the horror stories. A $60k engagement that shipped six months late with code nobody wants to touch. The question isn't whether to hire out. It's how to vet the shop so you don't end up as the war story someone else tells at a founder dinner.
This is the checklist I use when clients ask me to sanity-check a vendor before signing. It's opinionated on purpose. If a shop can't answer these cleanly, keep interviewing.
The 12-point checklist (use this in your first call)
Before you compare pricing or look at logos, you need signal on whether this team ships production code that other engineers can maintain. Here is the exact list to walk through on discovery calls:
- Show me the last thing you shipped. Not a case study PDF — the actual live product, with the client's permission to reference it.
- Who writes the code? Named engineers, in-house or offshore, seniority mix. If they can't tell you, it's a body-shop with a rotating cast.
- What's your stack, and why? They should have opinions. "We use whatever the client wants" means they use whatever is cheapest to staff this week.
- How do you handle scope changes? Fixed-bid, T&M, or capped T&M — each has a right answer depending on your project shape.
- Show me your Git workflow. Trunk-based? Feature branches? PR reviews by whom? This tells you if code review is real or performative.
- What's your test coverage baseline? No answer, or "we test manually," is a red flag for anything above a landing page.
- What does handover look like? Docs, Loom walkthroughs, credentials transfer, a 30-day bug window — all of this should be a template, not improvised.
- Who owns the IP and repos? You should. In writing. Day one.
- What happens if I fire you mid-project? A confident shop has a clean answer. A shady one gets weird here.
- Give me two references I can call — one happy, one that ended badly. The second answer is the tell. Every shop has a project that went sideways. The mature ones will discuss it.
- What's your typical burn rate on maintenance? Post-launch, most projects need 10–20% of build cost annually just to stay alive. If they haven't thought about this, you're their guinea pig.
- Can I start with a 2-week paid trial? A discovery sprint, a spike, an audit — anything low-commitment. Vendors who refuse small engagements are usually optimizing for their sales cycle, not your outcome.
The rest of this post goes deep on the checks that matter most and where they typically break.
Agencies vs. freelancers vs. AI build platforms
The right vendor depends on your project shape, not on which category sounds most legitimate. Here's the honest breakdown:
| Dimension | Agency (10-100 people) | Freelancer / small studio | AI build platform |
|---|---|---|---|
| Best for | Multi-year products, regulated industries, when you need account management | Well-scoped MVPs, one-off tools, internal apps | Automations, integrations, internal ops workflows |
| Typical price range (US market) | $75–$250/hr blended | $60–$180/hr | Subscription + usage, often 5–10× cheaper than an agency for equivalent scope |
| Speed to first shipped feature | 4–8 weeks (staffing + kickoff) | 1–3 weeks | Days |
| Failure mode | Junior devs on your project, senior on the pitch | Bus factor of 1, disappears when they get busy | Breaks on edge cases; needs someone who understands the underlying systems |
| IP + code ownership | Usually clean, contracts are mature | Depends heavily on the individual | You own the workflow config; the platform runs the runtime |
| Maintenance cost | High and ongoing | Depends on the person's availability | Baked into the subscription |
Most SMBs pick the wrong category before they even start evaluating. If your project is "build me a mobile app for my restaurant loyalty program" — that's an agency or a senior freelancer. If your project is "route inbound leads to the right salesperson, enrich them, and log everything in HubSpot" — you don't need a dev shop at all. That's an automation problem.
The mistake is applying agency-scale process (RFPs, statements of work, Gantt charts) to a problem that a well-configured platform solves in a week.
Pricing models: what each one actually means
Every pricing model shifts risk somewhere. Fixed-bid moves risk to the vendor and produces defensive scoping. T&M moves it to you and produces open-ended budgets. Capped T&M is the compromise most senior teams prefer.
Fixed-bid. Good for tightly-scoped, well-understood work. Bad for anything with unknowns — which is most software. Vendors bidding fixed-price on ambiguous scope are either padding heavily (30–50% is common) or they'll come back for change orders. The classic tell: a suspiciously round number in the proposal.
Time and materials. Honest, but only works if you trust the vendor's estimation and reporting. Ask for weekly burn-down reports and named-engineer time logs, not a monthly invoice with "development services" as the line item.
Capped T&M. T&M up to a ceiling. Vendor eats overages, you get flexibility on scope trades. Best default for greenfield custom software.
Retainer. Only makes sense once you're in maintenance mode. Retainers for build work are how agencies keep utilization high — you're paying for their capacity planning, not your outcomes.
Outcome-based / equity. Rare, and usually a red flag when offered upfront. Real outcome-based deals happen after a shop has already delivered something and wants deeper alignment.
Here's a rough sanity check for a mid-complexity build (say, a B2B SaaS with auth, billing, admin panel, and one non-trivial workflow):
Solo senior freelancer: $25k–$60k (3-5 months, one person, high bus factor)
Small studio (3-8 people): $60k–$180k (2-4 months, better process)
Mid-size agency: $150k–$500k+ (3-6 months, PM + designers + devs)
AI build platform + reviewer: $5k–$25k (weeks, if the shape fits the platform)
These ranges are wide because scope elasticity is the single biggest variable. Anyone quoting you within 10% before a discovery phase is guessing.
Stack red flags (and green flags)
The stack tells you more about the shop than the sales deck. A team that picks their stack based on hiring convenience will make a hundred other decisions the same way.
Green flags:
- They ask about your team's technical background before recommending a stack. If your ops person will eventually take over maintenance, the answer is different than if you'll hire engineers later.
- They default to boring, well-supported tools: Postgres, a mainstream web framework, a major cloud provider. Boring is the correct choice 90% of the time.
- They can articulate trade-offs. "We picked Next.js because your SEO matters and you want SSR; if you didn't, we'd have used something lighter."
- They mention observability early — logging, error tracking, uptime monitoring. Software that ships without these is software you can't debug at 2am.
Red flags:
- Bleeding-edge everything. If they're pitching you a stack built on tools that got their first stable release in the last six months, you're paying for their R&D.
- No opinion on infrastructure. "We'll deploy wherever you want" is fine for a static site, alarming for a real product.
- Custom frameworks or in-house libraries as core dependencies. When you eventually hire another team, this is what makes them quote a full rewrite.
- Microservices for a two-person startup. This is almost always resume-driven development.
Here's a quick smell test you can run on a proposed architecture. Ask the vendor to sketch the system in a single diagram, then ask:
1. What breaks first when we 10x traffic?
2. How do we debug a production issue at 3am?
3. What's the recovery plan if the database gets corrupted?
4. How does a new developer get from git clone to running locally?
If any answer is vague, hand-wavy, or "we'd figure that out when we get there," you have your signal.
Process: what "good" actually looks like
A good process is invisible when it's working and obvious when it's not. On a healthy engagement, you know what's shipping this week, what's blocked, and where the budget stands — without asking.
Here's what to look for in the first 30 days:
- Kickoff produces artifacts. Not just a Slack channel. A written brief, a milestone list with dates, a decision log, and access to a shared repo. If the first 30 days produce only meetings, run.
- Demos are weekly, minimum. Real running software, not Figma. If you're four weeks in and haven't seen a working screen, the project is already late.
- Written status, not verbal. A short Friday update — shipped, in-progress, blocked, decisions needed. Verbal-only status updates are how projects hide slippage until it's a crisis.
- You have direct access to engineers. Not through an account manager. Not through a PM who translates. If the shop won't let you talk to the people writing the code, you're paying agency margins for a black box.
A concrete example of a healthy weekly cadence:
Monday: Planning — what ships this week, what's carried over
Tue-Thu: Async standup in Slack; blockers surfaced same-day
Thursday: Demo of the week's work on a staging environment
Friday: Written status: shipped / in-flight / blocked / decisions
Anytime: Direct DM access to the lead engineer for urgent questions
If a shop resists this level of transparency and calls it "micromanagement," they're protecting margin, not your project.
Support, handover, and what happens after launch
Most projects fail at handover, not at launch. The code ships, the invoice clears, and six months later you're paying a second vendor 40% of the original build cost to figure out what the first one did.
A serious handover package includes:
- A README that gets a new developer running locally in under 30 minutes. Not "install dependencies and run" — actual commands, actual environment variables (with a
.env.example), actual gotchas. - Architecture docs. One page. Boxes and arrows. What talks to what, where the data lives, where the secrets live.
- Runbook for the top 5 things that will break. "Payments failing" → check X, then Y, then Z. "Emails not sending" → check the SendGrid dashboard, then the queue.
- Deployment docs. How to ship a change. Who has access to production. How to roll back.
- Credential transfer, in writing. Every third-party account, every API key, every domain, every DNS record. Ownership transferred to your accounts, not held by the vendor.
- A defined bug window. Typically 30–90 days post-launch, at reduced or zero rate. Anything critical found in this window is on them.
Ask for a sample handover package from a past client (redacted) before you sign. If they don't have a template, they've never actually done this well.
How BizFlowAI approaches this
We built BizFlowAI because the "hire a dev shop" answer is wrong for a huge slice of what SMBs actually need. Lead routing, invoice processing, customer support triage, cross-app data sync, internal dashboards — these aren't custom software problems anymore. They're configuration problems on top of AI models and a workflow engine, and paying agency rates to build them from scratch is money set on fire.
Our clients typically come to us after getting a $40k–$120k quote from a dev shop for something we ship in two to four weeks, with clear docs, real observability, and a runtime they don't have to maintain. If your project genuinely is custom software — a mobile app, a regulated platform, deep hardware integration — hire an agency and use the checklist above. If your project is "connect these systems and make decisions with AI in the middle," you're the shape of problem we solve, and we'll tell you honestly when you're not.
Making the call
Here's the decision tree I'd walk through with any founder asking me this in person:
- Is the project truly custom software (novel UX, complex domain logic, regulated data)? → Agency or senior freelancer, use the 12-point checklist.
- Is it automation, integration, or an internal ops workflow with AI in the loop? → AI build platform. Way cheaper, way faster, and honest about its limits.
- Is it a well-known pattern (CRM, e-commerce, appointment booking)? → Buy off-the-shelf first, customize second. Custom-building a CRM in 2026 is a choice, and usually a bad one.
- Do you have technical judgment on your side of the table? → If not, hire a fractional CTO for two weeks before you sign anything. $5k spent here saves $50k later.
The last one is the check most founders skip and regret. A senior engineer on your side, even part-time, changes every conversation you have with vendors. They'll spot the architectural hand-waves, push back on padded estimates, and know when a green flag is actually theater. If you can't afford one, at minimum bring a technical friend to the second call. Vendors behave differently when there's someone in the room who can call the bluff.
Pick the right category first. Then vet hard. Then start small.
Work with BizFlowAI
If you'd rather have this built for you, that's what we do: production AI automation for solo founders and small teams — agents, integrations, and document pipelines that actually ship.
Book a free discovery call — 30 minutes, we map the highest-ROI automation in your workflow. No pitch deck, just engineering.
More guides like this on the BizFlowAI blog.
Frequently asked questions
How do I vet a custom software development agency before signing a contract?
Ask 12 specific questions during discovery: who writes the code, what stack they use and why, their Git workflow, test coverage baseline, handover process, IP ownership, references (including one project that went badly), and whether they'll do a 2-week paid trial. Verify they can show live shipped products, not just case study PDFs. A shop that can't answer these cleanly is a body-shop or optimizing for their sales cycle. Confident shops give clean answers to uncomfortable questions like 'what happens if I fire you mid-project?'
What pricing model should I use for custom software development?
Capped time and materials (T&M up to a ceiling) is the best default for greenfield custom software — the vendor eats overages while you keep scope flexibility. Fixed-bid only works for tightly-scoped, well-understood work and typically includes 30-50% padding. Pure T&M requires trust and weekly burn-down reports with named-engineer logs. Retainers only make sense in maintenance mode, and outcome-based or equity deals offered upfront are usually a red flag.
How much does custom software development cost in 2024?
For a mid-complexity B2B SaaS build with auth, billing, and admin panel: solo senior freelancers charge $25k-$60k, small studios of 3-8 people charge $60k-$180k, and mid-size agencies charge $150k-$500k+. AI build platforms with a reviewer can handle appropriate scopes for $5k-$25k. US hourly rates run $60-$180 for freelancers and $75-$250 blended for agencies. Anyone quoting within 10% before discovery is guessing.
Should I hire an agency, freelancer, or use an AI build platform?
Match the vendor to the project shape, not category prestige. Agencies fit multi-year products and regulated industries; freelancers fit well-scoped MVPs and internal tools; AI build platforms fit automations, integrations, and ops workflows at 5-10x lower cost. A restaurant loyalty mobile app needs an agency or senior freelancer, but 'route leads to salespeople and log to HubSpot' is an automation problem, not a dev shop problem. Applying agency-scale process to platform-solvable problems is the most common expensive mistake.
What are the biggest red flags in a software development proposal?
Bleeding-edge stacks built on tools released within six months, no opinion on infrastructure or hosting, custom in-house frameworks as core dependencies, and microservices for a two-person startup (usually resume-driven development). Also watch for fixed-bid proposals with suspiciously round numbers, refusal to do a small paid trial, vague answers about handover and IP ownership, and vendors who can't name the specific engineers who will write your code. Weak observability planning — no mention of logging, error tracking, or uptime monitoring — means unmaintainable software.