You're going to get a lot of responses that say record-replay testing doesn't sc...

_the_inflator · on Nov 9, 2022

> I went to a GTAC (Google testing conference) where a presentation made a very good argument--with numbers and everything--that for smaller projects with simple and more or less static UIs, and where the tests were all fundamentally scripted, there was almost no advantage to coding the tests later. Record-replay was the best way to go.

I agree with you. Most people here discuss recording vs not recording. I think most people really lack a basic understanding of testing. A tool is still a tool.

In my point of view most websites aka apps can be treated like that and maybe should. Usually there needs to be a testing strategy in place. Why and what are you testing? What do you wanna confirm, what kind of bugs are you looking for, what is your testing strategy? How much time and effort go into test?

We found out (finance), that one essential ingredient was missing in our tests: the happy path. Without a single test for the end to end happy path, testing for anything else becomes useless.

And the happy path can and maybe should in most cases be tested using a recorder, because it comes closer to what a real person does.

geoelectric · on Nov 9, 2022

We're of a mind, for sure.

I was/am a pretty big fan of the Context-Driven Testing (https://context-driven-testing.com/) concept from James Bach and Cem Kaner, though I think Kaner later backed away from either it or Bach, not sure which. It was basically the testing version of the Agile Manifesto.

The idea in general of "THINK about what you're doing and the specific results you need and do what's OPTIMAL, not what's DOGMATIC" has guided my career for many years, both inside and outside of testing.

MetaWhirledPeas · on Nov 9, 2022

> It's awfully nice if you used Page Object Model and you have a single-point-of-truth to fix it.

I like fewer sources of truth, but I dislike trying to force everything into a "page" analogy. Try maintaining a library of commonly-used actions instead, like:

  logIn()
  registerNewUser()
  addItemToCart()

And maybe group them thematically.

The underlying intent is the same... reduce duplication, improve organization. But you won't be stuck wondering "what page am I on now?" in a single page application, or rifling through folder upon folder of nonsensical "page" files.

And even in the worst case, Ctrl-Shift-H still works.

nickstinemates · on Nov 10, 2022

A long time ago I wrote an enterprise automation platform that abstracted things into a business-minded DSL like your example above. Wired together with XML. Allowed a tools team to define new DSL and tests to be composed more simply by intent. Workflow would look something like

    createUser
    logIn
    addCreditCard
    addItemToCart
    checkoutItem

Was pretty powerful, but ended up being used more as a CD platform for testing and less like automated regression testing.

epolanski · on Nov 10, 2022

You can write fixtures for those in Playwright.

geoelectric · on Nov 9, 2022

I could get into an even longer discussion on this, but basically when I brought up UI automation teams my advice was:

Use granular element-oriented functions (i.e. loginButton.click() or fillForm(name, pw) type stuff for the very small part of the very few tests that were specifically exercising that portion of the UI.

Those you probably define traditionally for POM, as methods of the page (or functions in the page module depending on the language).

Use result-oriented functions (logIn(), registerNewUser()) whenever it's "travel" i.e. things you do to get to the start of your scenario, or a setup/cleanup task.

Those you do not keep with your pages, they live in modules organized by task or result. Plus they have to work from anywhere. They can leave you somewhere, if that's their defined result, but they should be callable from any UI state. By the same token, tests shouldn't assume how they used the UI to get there, again, unless that was defined as their result.

In other words, they're functions: black boxes. The biggest point there was "you can't change this function without preserving that contract, and you can't assume anything but that contract."

The advantage is that if you could wave a magic wand for setup, travel, or cleanup and get the result, the test would still work and be valid. IOW, you can select the most robust and direct way to accomplish those things, even going completely around the UI with cookie injection or whatever.

What most other teams I've had visibility into tend to do differently than I advised is they'll use the granular element-oriented POM functions everywhere except fixture setup/cleanup. They don't have the "travel" concept of things you have to do on the way to your scenario start, and they include that in the test scenario itself with granular element calls.

And travel is really all setup. But some reason when it's "set up yourself via login, option selection, loading a file, etc" people's thought process goes out the window and they think that all needs to be strung together in UI like a user would do it. But intelligently separating out the very small bit of "specific UI manipulation that causes a state change + verification of that change" that is the test from everything else in the scenario that is setup/travel/cleanup gives you much more maintainable tests.

Or even when they do separate them out, they're not really "result-oriented" functions. Instead they're "flow-oriented" macros that you couldn't replace with a magic wand, because the meat of the test assumes intermediate UI flows they performed rather than just end state, and they're written to be strung together in some coupled (and usually undocumented) way.

Then you have the systems that try to use the same functions for setup/cleanup and testing, caught in between the need for granularity and robustness. Those tend to get extra "doItThisWay" flags on their functions and stuff really goes to hell.

Gotta keep 'em separated!

TL;DR I agree with you, and even a few steps further.

abracadabra_ · on Nov 10, 2022

That sounds similar to the Screenplay pattern [1]

This has the concept of actors, abilities, interactions, questions, and tasks. This allows good separation of concerns as well as much more user-focused tests.

[1] https://serenity-js.org/handbook/design/screenplay-pattern.h...

geoelectric · on Nov 10, 2022

Oh, wow! I had been trying to develop a pattern like this out a few years ago, ~2016 (at the time, called Action-Flow), but ended up shifting out of test as a primary focus before I could put polish on it and make it cohesive enough to publish something.

I hadn't realized there was prior art to look at or potentially clone by mistake. I wonder how old this pattern is.

Thanks for showing me!

rroot · on Nov 9, 2022

I watched a few person test team blow itself up (people leaving, going toxic, doing minimum work or other things, ...) because they wanted to get it 'right from the start'. Many, many, months in it was barely not functional.

geoelectric · on Nov 10, 2022

I've been on at least one team effort like that (not as the architect, in my defense--in fact, it was that failure and my somewhat frustrated ask to my manager to give me a couple of weeks head down to start it properly that launched the automation lead portion of my career).

I bet a team like that would be totally transformed by getting some early success under their belts, and giving valuable feedback early enough to actually be included in the whole process.

That's exactly what a QA team can use these tools for, especially early on--it can accelerate manual testing if nothing else, by people record-replaying at their desk for repeatability. Even if that won't work for verification, sometimes just recording the script that does all the setup you have to do for every test gets you really far--then you just take over manually from there.

In general, I think QA's sometimes-bad rap comes from putting in maximal effort and cost, in trade for a very limited and usually unquantifiable increase in actual quality confidence, and that usually comes down to dogma. Most of the "traditional" ways of doing QA come down to writing maximum docs without any single point of truth (and therefore become maintenance hogs and/or wrong) then spending a million bucks or more a year on a team whose whole job is to tell you 99 times out of 100 that things look the same as yesterday.

It makes the sector a thankless grind from the inside, and makes it an expensive spreadsheet generator that sometimes blocks your releases for unclear reasons from the outside. Fun times.

pydry · on Nov 9, 2022

Record-replay doesnt even work when running the same scenario 2 minutes later half the time - never mind in 2 months' time.

I tend to find in most apps that unless you directly change the front end templates to give elements you interact with better identifiers and then deliberately use those identifiers your browser tests will always end up an unreliable sucky mess.

geoelectric · on Nov 9, 2022

Agreed, giving it at least stable identifiers to play with makes a huge difference. With some front end generators, that may not be trivial, though I'd think Selenium being a thing (POM does rely heavily on stable unique identifiers, since "nth element past m" xpath queries are way more fragile) has probably made it more of a standard ask.

Dynamically-created UIs are usually the hardest thing to deal with--if they get a different ID every time, record-replay is out the door, and even Selenium is a lot harder.

But personally, with a lot of automation architecture experience, I think you're exaggerating a little re: doesn't work 2 minutes later half the time. But it also depends on whether your app is driving some invisible external resource that makes the timing highly variable, etc. Even then you can usually "harden" the recording by putting in worst-case delays. It's just that then your test takes so long to run you can't CI it either.

It's really situational. What I'm arguing against more than anything is the knee-jerk reaction that it's never appropriate. It's just never enough usually. The fact that we tend to skip over the option entirely in testing is probably a blind spot and a mistake. Devs dabbling in testing as a side task definitely shouldn't ignore the option.

jacobsenscott · on Nov 9, 2022

I don't write tests based on xpath or ids - it is too fragile. I just stick a special class on elements I want to target, like class="test-save-user-button". This allows changing the markup however you want without breaking tests. Just leave the test-* classes alone and you're fine.

geoelectric · on Nov 9, 2022

I was probably being a little too literal when I said xpath/id. Something unique like a custom class would be ideal, agreed. I just see that as a different kind of id that I find through an xpath query, which is what I really meant.

To your point, though, the advantage of the class is that other things in your stack probably don't have opinions about what class you tack on, whereas they might want to define your (actual) id for you. IIRC this was how I asked devs to get around dynamic-UI runtime issues where the ids were also auto-generated.

EugeneOZ · on Nov 10, 2022

We add data-test-id, data-test-name, or data-test-value attributes (depending on the case) to our elements, and selectors are rock-solid.

robgibbons · on Nov 10, 2022

Using data-test-name or data-test="name" for e2e automation is the right answer. Data attributes are not going to conflict with your classes or IDs, won't get mangled, and they are tolerant to DOM, style, and backend/form refactoring. Essentially ignored by everything but your test framework.

epolanski · on Nov 10, 2022

I disagree completely.

E2E tests should mimic how users interact with the page and they see no data attributes.

You can query by text, label, html semantics, position.

Hell even for clickable icons you should have alt text.

robgibbons · on Nov 10, 2022

Minor copy changes shouldn't break your tests. If you're selecting elements by their content the tests are much more brittle.

Nor should unimportant stylistic changes in class, or changes in element order/positioning, break our functional tests. Thus we give important elements data-attributes which are resilient to all forms of refactoring.

epolanski · on Nov 11, 2022

Let's agree to disagree I guess.

HTML has specific semantics, and I find that interacting with applications the same way a blind person would (aria labels, text, etc) leads to much more solid tests.

Copy changes are far from common on stuff like labels and any actionable item you aren't changing "submit" to "send" every other week or "pay" to "checkout" (moreover, such a button would have a meaningful aria role like "search" or "register").

And if you do, fixing the tests is generally very cheap and quick, so I see it as a non-issue.

Nothing will change my idea that abusing data attributes (which are sometimes, but rarely, a necessary evil) leads to the great amount of non accessible and semantically incorrect bloated HTML we see everywhere on the net.

Good tests will lead to better websites, and data-attributes do nothing to help in that direction.

EugeneOZ · on Nov 10, 2022

Humans are faaar too smart to even try mimicking their behavior. All you can hope for is to make sure that the intended usage works fine.

giobox · on Nov 10, 2022

I agree with all you have written, however I feel its important to reply to:

> for smaller projects with simple and more or less static UIs, and where the tests were all fundamentally scripted, there was almost no advantage to coding the tests later.

How many times in my career have I been asked to meaningfully design tests for basic static sites? Hardly ever.

Complex, multi-tenanted, many user-ed hydras with dynamic client side rendered content? All the time.

The scenario in which that Google presentation suggests it's most effective is the rarest ever, in my experience, especially if site is complex enough to have someone working on the team with browser automation coding experience/ability in the first place.

geoelectric · on Nov 10, 2022

It's been a lot of years, but some of the handwaving there was editorial on my side. I can't remember exactly which projects they propped up as their examples. That said, I wouldn't recommend record/replay for Gmail, for example, so I do believe they meant smaller and simpler ones.

I also think (but am hesitant to assign to them because obviously I built this up in my head over the last ten years too) that part of the argument was that this enabled testing earlier and for smaller projects than people normally bothered testing--for example, internal-facing tooling. Whether or not they said that, I think that is one of the big advantages.

Also, left unsaid, this was for situations where a very small number of critical path UI tests are sufficient for the moment and you're not re-recording a huge suite if something breaks. Of course, if you're familiar with the testing pyramid, you probably know that only having a small handful of critical-path tests running E2E via UI is your ideal, period. Most organizations who do it at all heavily over-test via UI automation.

Your point is well-taken, though. By the time your app is that complex, I'd say you're probably in the position of creating those long-standing tests I wouldn't recommend recording.