One of the best things about Cocoa is that it has very strong conventions. These conventions allow Cocoa programmers to reason about high-level designs without language support. In the case of containers, the conventions strongly encourage Cocoa programmers to think of these objects as values, despite the Objective-C language not having built-in support for value types. This strong convention is best observed in the advice we give to class designers. When exposing a container as a property, programmers are strongly encouraged to use the immutable version of the container, and to specify the "copy" attribute on the property. Now, in Swift, we do have formal value types and we should strongly consider formalizing the Cocoa conventions and make our containers be value types. Now, in theory, formalizing this semantic for containers could be disruptive to the day-to-day experience of Cocoa programmers and therefore we have been doing research to prove that this isn't the case. We will be reviewing the public API implications and the internal design implications.
The public API implications are rather simple. If we model containers as value types, we dramatically simplify the life of public API designers. We no longer need to specify "copy" on each and every property, and we also don't need to model immutability as a subclassing relationship (we wouldn't want to either for reasons this paper won't go into). Furthermore, computed properties don't need to defensive copy in either their getter or setter. This is a huge win for safety, and for performance, because we can defer these problems completely to the compiler.
Now, let's consider the implications for class implementors. We have been doing a lot of research to prove that value semantics would be a net win, with improved syntax and compile-time error detection, and with no noticeable performance downside.
In Swift, function parameters can be passed by value or by reference. The simpler (default) syntax is by value:
func foo(_ c : Character, i : Int) // define foo // ... foo(c, i) // call foo
If one wants foo
to pass by reference, both the author of foo
and the client of foo
agree to use an alternate syntax, for each
individual parameter:
func foo(_ c : Character, i : [inout] Int) // define foo // ... foo(c, &i) // call foo
In the latter example above, the author of foo
specifies
that the second parameter is to be passed by reference, instead of by
value. And if the caller of foo
does not realize which
arguments of foo
the author of foo
intends to
modify (e.g. intends as out-parameters), then the Swift compiler
will catch this miscommunication and flag a compile-time error.
For a moment, imagine that Swift did not require modified syntax at the call site:
func doWork() { // ... foo(c, i) bar(i) // ... }
Now, when a human reads the code for doWork()
, it is ambiguous
whether or not i
is modified by foo
, and thus difficult
to know what value of i
is sent to bar
. One first has to
first look at the signature of foo
to discover that it intends
to modify its last argument:
func foo(_ c : Character, i : [inout] Int)
Or alternatively just trust that the author of foo
chose a
sufficiently descriptive name for the function so that the human reader
can tell in doWork()
whether or not foo
modifies i
without looking up the signature of
foo
.
An even worse scenario would be for Swift to implicitly pass everything by reference:
func foo(_ c : Character, i : Int)
Now in doWork
:
func doWork() { // ... foo(c, i) bar(i) // ... }
... we have to not only look at the signature of foo
, we
have to examine the entire definition of foo
to
discover if we are sending a modified i
to bar
or not. And if foo
calls any other functions with
i
each of those function definitions would also have
to be examined in order to answer this question.
Effectively, if Swift were designed to implicitly pass i
by
reference in this example, the programmer must effectively and manually
perform a whole-program-analysis just to reason about the behavior of
do_Work
.
Fortunately this is not how Swift was designed. Instead the programmer
can analyze the behavior of doWork
by looking only
at the definition of doWork
. Either i
will be
explicitly passed by reference, or it will be passed by value, and thus
one immediately knows whether further uses of i
are getting
a modified value or not:
func doWork() { // ... foo(c, i) bar(i) // i is unchanged by foo // ... } func doOtherWork() { // ... foo(c, &i) bar(i) // i is most likely modified by foo // ... }
The ability to locally reason about the behavior of doWork
in the previous section applies to mutable containers such as
Array
and String
just as much as it applies
to the Int
parameter.
For performance reasons, we need to have mutable containers. Otherwise
operations such as continuously appending become too expensive. But we
need to be able to locally reason about code which uses these
containers. If we make Array
a reference type, then Swift
will compile this code:
func foo(_ v : Array, i : Int) // define foo func doWork() { // ... foo(v, i) bar(v) // ... }
But the human reader of this code will have to perform a non-local
analysis to figure out what doWork
is doing with the
Array
(whether or not it is modified in or under
foo
, and thus what value is used in bar
). Or
rely on a high quality and strictly followed naming conventions such as
those used in Cocoa. Now there is absolutely nothing wrong with high
quality, strictly followed naming conventions. But they aren't checked
by the compiler. Having the compiler be able to confirm: yes, this
argument can be modified / no, that argument cannot be modified; is a
tremendous productivity booster.
If Array
has value semantics, then we only need to glance at
doWork()
to discover that both foo
and bar
receive the same value in v
(just like i
).
Swift containers should have value semantics to enable programmers to locally reason about their code using compiler-enforced syntax.
I have spent the last couple of weeks looking at a lot of Apple's use of
NSMutableArray
, NSMutableDictionary
and
NSMutableSet
. I've looked at Foundation, AppKit, CoreData,
Mail, etc. Everything I've looked at has shown that the programmers are
going out of their way to reduce and even eliminate sharing between
containers (avoid reference semantics). I am seeing customer-written
DeepCopy methods.
-(NSMutableArray *)mutableDeepCopy;
I am seeing factory functions that return mutable containers which share with nothing but a local within the factory function.
+ (NSMutableDictionary*) defaultWindowStateForNode: (const TFENode&) inTarget { NSMutableDictionary* result = [NSMutableDictionary dictionary]; // result is just filled, it shares with nothing... if (inTarget.IsVolume() && (inTarget.IsDiskImage() || inTarget.IsOnReadOnlyVolume()) && inTarget != TFENodeFactory::GetHomeNode() && !inTarget.IsPlaceholderAliasTarget()) { [result setBoolFE: NO forKey: kShowToolbarKey]; } else { [result setBoolFE: YES forKey: kShowToolbarKey]; } [result setBoolFE: [[self class] shouldShowStatusBar] forKey: UDefaults::StrKey(UDefaults::kShowStatusBar).PassNSString()]; [result setBoolFE: [[self class] shouldShowTabView] forKey: UDefaults::StrKey(UDefaults::kShowTabView).PassNSString()]; [result addEntriesFromDictionary: [TBrowserContainerController defaultContainerStateForNode: inTarget]]; return result; }
For the above factory function, it makes absolutely no difference whether value
semantics or reference semantics is used for NSMutableDictionary
.
On occasion, I am seeing NSMutable*
being passed into
functions just for the purpose of having in/out parameters, not creating
any sharing within the function. For example:
- (void)_checkPathForArchive:(DVTFilePath *)archivePath andAddToArray:(NSMutableArray *)archives { if ([self _couldBeArchivePath: archivePath]) { IDEArchive *archive = [IDEArchive archiveWithArchivePath:archivePath]; if (archive != nil) { [archives addObject:archive]; } } }
And called like this:
[self _checkPathForArchive: archivePath andAddToArray: archives];
Such code is typically used as a private function and called from only
one or two places. Note that here is an example that the human reader
is greatly depending on really good names to make the semantics clear:
andAddToArray
. In Swift such code is easily translated to
something like:
_checkPathForArchiveAndAddToArray(archivePath, &archives)
Note that now the code not only has a great name, but the compiler is helping
the programmer ensure that both the client and function both agree that
archives
is an in-out argument. And just as importantly,
that archivePath
is not to be modified within
_checkPathForArchiveAndAddToArray
. However if
archivePath
is a mutable object with reference semantics, we lose
that ability to locally reason about this code, even in Swift.
The most worrisome code I see looks like this:
- (_FilteredListInfo *)_backgroundSortNewFilteredMessages:(NSMutableArray *)messages { MCAssertNotMainThread(); // lots and lots of code here involving messages... and then near the bottom: res.filteredMessages = messages; // share here! ... return res; }
I.e. this is sharing a mutable container! However if you then go to the trouble to track down how this method is used, you find that messages is only bound to a temporary, or to a local variable in the calling function. So, actually there is no permanent sharing after all.
But the programmer has to do non-local code analysis to figure out that the above is actually safe. Additionally there is nothing to stop this function from being misused in the future. This is just a ticking time bomb in the code.
It would be far easier to reason about this code if messages
was
passed, semantically, by-value.
func _backgroundSortNewFilteredMessages(_ messages : Array) -> _FilteredListInfo { // ... res.filteredMessages = messages; // copy here! // ... return res }
This need not imply a performance penalty with reference counting and copy-on-write. Especially when the argument is a temporary as is the case in this example.
I'm seeing the same patterns, over and over, after surveying tens of thousands of lines of code across multiple applications. Cocoa programmers are:
It appears to me that Cocoa programmers are already using value semantics for their containers. But they are getting no help at all from their programming environment in doing so. Instead they are relying solely on good coding conventions and very descriptive method names. Cocoa programmers will not undergo a painful paradigm shift by giving Swift containers value semantics.
The most concern I hear when talking about "large" objects having value
semantics is that of performance. One understandably does not want to
be copying tons of data around. The plan for Swift containers is to
implement them with the reference-counted copy-on-write idiom. In doing
so, the Swift programmer is freed from the tedious task of writing "deep
copy" functions, and yet the type behaves just like an Int
.
Additionally the type can be passed by reference, just like an
Int
, in the relatively few places where pass-by-reference
is the desired behavior.
In making Swift containers behave just like Int
, not only
is the overall Swift API simplified, but this is also a boon to generic
algorithms. For example one can confidently write a generic sorting
algorithm that works for sequences of Int
, and for
sequences of Array<Int>
. The generic code need not
concern itself with the question of whether or not the type has value
semantics or reference semantics.
Cocoa programmers already view and use containers as value types, even though to achieve this they have to rely only on strong coding conventions. They get no help from ObjC in this regard.
Swift can have both reference and by-value parameters, and the latter has the simpler syntax. This helps programmers to locally reason about their code when using value types by making it clear that most arguments passed to functions are not modified. Function results tend to be returned from the function.
In the relatively infrequent case that the programmer intends to modify
a function argument (i.e. as an out-parameter), if value semantics are
used, Swift makes the modification explicit in the client code through
use of '&
' at the call site. Thus the programmer can
more easily spot the few cases where this is desired.
Performance is not compromised by formally adopting value semantics for containers in Swift.
Swift containers should have value semantics to enable programmers to locally reason about their code using compiler-enforced syntax.