Skip to content

Adds initial scaffolding for the split interface#2

Open
aryann wants to merge 13 commits into
mainfrom
scaffolding
Open

Adds initial scaffolding for the split interface#2
aryann wants to merge 13 commits into
mainfrom
scaffolding

Conversation

@aryann
Copy link
Copy Markdown
Collaborator

@aryann aryann commented May 10, 2026

The purpose of this PR is to start the discussion on the interface shape. Note the TODOs I've added in the code for some of the decisions we need to make.

We can either get this PR to a good shape and merge it, or merge it earlier and evolve it via future PRs.

@aryann aryann requested a review from bartholomaios May 10, 2026 00:44
@coveralls
Copy link
Copy Markdown

coveralls commented May 10, 2026

Coverage Report for CI Build 25686618867

Warning

No base build found for commit 26f82af on main.
Coverage changes can't be calculated without a base build.
If a base build is processing, this comment will update automatically when it completes.

Coverage: 100.0%

Details

  • Patch coverage: 12 of 12 lines across 1 file are fully covered (100%).

Uncovered Changes

No uncovered changes found.

Coverage Regressions

Requires a base build to compare against. How to fix this →


Coverage Stats

Coverage Status
Relevant Lines: 12
Covered Lines: 12
Line Coverage: 100.0%
Coverage Strength: 8.0 hits per line

💛 - Coveralls

Copy link
Copy Markdown

@bartholomaios bartholomaios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an overload that takes an output iterator (like std::format and std::format_to).

I think the simplest case should output to std::vector without needing the "say" vector.

Fundamentally, I see 3 parts to this library:

  • Delimiter/Pattern concept with implementations for all the common cases.
  • A range adapter for more general use-cases that has better ergonomics than the current std::ranges::views::split.
  • Algorithms that eagerly evaluate for simple use-cases. This should include splitting to tuple-like.

Comment thread include/beman/str_split/todo.hpp Outdated

#include <string>
#include <string_view>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these go here? Should they be guarded by BEMAN_STR_SPLIT_USE_MODULES()?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are the first ones to use the exemplar, and I am not 100% sure where these includes should go.

The namespace in todo.hpp is fully within the else branch of the BEMAN_STR_SPLIT_USE_MODULES, which is confusing to me (that would mean the TODO code will not be included when modules are enabled).

More broadly, we need to better understand the purpose of todo.hpp (we'll need to delete it eventually, but it's not clear to me why it was included in the first place).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like all our headers need to:

  • Act like normal headers when modules aren't being used. Need to #include other headers in this case.
  • Contain just import beman.str_split when modules are enabled, except when included from the interface unit.
  • When included in the interface unit (str_split.cppm), expose the code, but rely on the import std; for standard library components.

Therefore (I think) the correct thing to do is:

#include <beman/str_split/config.hpp>

#if BEMAN_STR_SPLIT_USE_MODULES() && !defined(BEMAN_STR_SPLIT_INCLUDED_FROM_INTERFACE_UNIT)

import beman.str_split;

#else

#if BEMAN_STR_SPLIT_USE_MODULES()
// includes here
#endif

namespace beman::str_split {

// implementation here

} // namespace beman::str_split

#endif

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a good example to follow: https://github.com/bemanproject/utf_view/blob/main/include/beman/utf_view/to_utf_view.hpp

We'll want to guard std:: includes using !BEMAN_STR_SPLIT_USE_MODULES(). Our own headers should be included in the else branch of BEMAN_STR_SPLIT_USE_MODULES() && !defined(BEMAN_STR_SPLIT_INCLUDED_FROM_INTERFACE_UNIT).

// Ensures this constructor does not hijack copy and move construction which would fail to compile with a
// difficult-to-read wall of errors.
different_from<Range, split_by>)
constexpr explicit split_by(Range&& range) : delimiter_(std::ranges::begin(range), std::ranges::end(range)) {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could constrain this with: std::constructable_from<std::string, std::from_range_t, R>. We wouldn't need the std::string_view constructor in that case.

constexpr explicit split_by(std::string_view delimiter) : delimiter_(delimiter) {}

// Constructor for the single character case.
constexpr explicit split_by(char delimiter) : delimiter_(1, delimiter) {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the single char case to be a separate delimiter (like abseil and std::string::find).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_by_char?

Also, see my comment about alternatives below and see if that's an approach you'd want to take instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added split_by_char.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to consider: should this be parameterized by CharT (with default == char) given that the inputs allow any char type?

@bartholomaios
Copy link
Copy Markdown

I'm going to take a first-pass at the range adapter today.

@aryann
Copy link
Copy Markdown
Collaborator Author

aryann commented May 10, 2026

I replied to some of your comments, but not all. I won't have time to work on this today. Do you want to merge this PR and make your changes on top? I agree with all your comments, so no objections on making the changes.

Please also see my TODO on modeling this as a series of function overloads. That's another direction that may produce better ergonomics and names.

@bartholomaios
Copy link
Copy Markdown

Here's how it could look without the implicit conversion operators. This is modeled after std::format_to and std::ranges::to.

template<class OutputIt, class CharT, class Traits, class Delimiter>
auto str_split_to(OutputIt to, std::basic_string_view<CharT, Traits> str, Delimiter&& delim) -> OutputIt;

template<class Container, class CharT, class Traits, class Delimiter>
auto str_split_to(std::basic_string_view<CharT, Traits> str, Delimiter&& delim) -> Container;

template<template<class...> class Container, class CharT, class Traits, class Delimiter>
auto str_split_to(std::basic_string_view<CharT, Traits> st, Delimiter&& delim) -> Container<std::basic_string_view<CharT, Traits>>;

template<class CharT, class Traits, class Delimiter>
auto str_split(std::basic_string_view<CharT, Traits> str, Delimiter&& delim) -> std::vector<std::basic_string_view<CharT, Traits>>;

@aryann
Copy link
Copy Markdown
Collaborator Author

aryann commented May 11, 2026

Here's how it could look without the implicit conversion operators. This is modeled after std::format_to and std::ranges::to.

template<class OutputIt, class CharT, class Traits, class Delimiter>
auto str_split_to(OutputIt to, std::basic_string_view<CharT, Traits> str, Delimiter&& delim) -> OutputIt;

template<class Container, class CharT, class Traits, class Delimiter>
auto str_split_to(std::basic_string_view<CharT, Traits> str, Delimiter&& delim) -> Container;

template<template<class...> class Container, class CharT, class Traits, class Delimiter>
auto str_split_to(std::basic_string_view<CharT, Traits> st, Delimiter&& delim) -> Container<std::basic_string_view<CharT, Traits>>;

template<class CharT, class Traits, class Delimiter>
auto str_split(std::basic_string_view<CharT, Traits> str, Delimiter&& delim) -> std::vector<std::basic_string_view<CharT, Traits>>;

I replaced the initial functions with these and added some basic tests + an example.

One downside is that you can only pass in basic_string_view. Type deduction fails with const char* and std::string. A non-exhaustive list of options (which we can explore in another PR):

  • Add explicit overloads for std::basic_string<CharT, Traits, Alloc> and const CharT*.
  • Make the input more generic and use concepts to ensure it's a range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants