Talk:Comparison of programming languages (string functions)
This article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
C function toupper() in UpperCase
[edit]This is misleading in the article. C doesn't have a function to uppercase a whole string. toupper() takes and returns an integer as its arguments, NOT strings. It's prototype:
int toupper(int c);
If c is a lowercase letter (a-z), topupper() returns the uppercase version (A-Z). Otherwise toupper() returns c unchanged. toupper() does not convert international characters (those with ASCII codes over 0x80), like ă or ç. To uppercase a whole string you need to write a function something like this:
#include <ctype.h> //standard C header file with the prototype of toupper()
void UpperCaseAString(char *theString)//string is a pointer to the first char of the string you want to uppercase.
{
char *myCharPtr = theString;//myCharPtr is a pointer to char - innitialize it to theString
while(*myCharPtr != '\0')//C uses-null terminated strings. *is what's pointed to by myCharPtr
{
*myCharPtr = toupper(*myCharPtr);
myCharPtr ++; //myCharPtr is a pointer to type char so it will be incremented by sizeof(char).
}
}
In C strings are essentially pointers to a character and they end where there is a NULL ('\0') character. It would be worthwhile to explain what strings are in different languages.Senor Cuete (talk) 03:41, 10 May 2008 (UTC)Senor Cuete
The 1. should appear as a pound sign and the box is put there by Wiki's text engine. I didn't type it like that.Senor Cuete (talk) 03:44, 10 May 2008 (UTC)Senor Cuete
- The <syntaxhighlight lang="...">...</syntaxhighlight> tag should fix it. Ghettoblaster (talk) 12:43, 10 May 2008 (UTC)
Compare (integer result, fast/non-human ordering)
[edit]In the table row for C, why would you go through the hassle of writing your own function when you could call the C function strncmp?
#include <string.h>
int strncmp(const char *s1, const char *s2, size_t n);
Senor Cuete (talk) 00:52, 16 May 2008 (UTC)Senor Cuete
substring
[edit]Shouldn't the table row for C just mention the C function strncpy?
#include <string.h>
char *strncpy(char *s1, const char *s2, size_t n);
Why concatenate when you can copy?Senor Cuete (talk) 00:53, 16 May 2008 (UTC)Senor Cuete
- Because strncpy() will not copy a null-terminator if the string is n or more characters long. --Spoon! (talk) 12:13, 16 May 2008 (UTC)
strings vs lists
[edit]"In both Prolog and Erlang, a string is represented as a list (of character codes), therefore all list-manipulation procedures are applicable, though the latter also implements a set of such procedures that are string-specific."
I think this is the same for Haskell, should it also be noted? —Preceding unsigned comment added by 124.171.21.141 (talk) 00:20, 28 June 2008 (UTC)
Additional procedure/operators
[edit]Some further string manipulations for consideration:
- substring append & prepends: eg in python: s+="ABD"
- replace substring:
- by substring text: eg AWK gsub("Earthling","Martian",string)
- by slice: s[3:4]="XY"
- insert substring at offset.
NevilleDNZ (talk) 08:17, 15 May 2009 (UTC)
ASC
[edit]Came here looking for a Python equivalent to the ASC() function, which, in BASIC/VB6, returns the numeric value of the first character of a string.
Not exactly equivalent to any string function in any language which handles strings differently, but in BASIC it was a string function. —Preceding unsigned comment added by 203.206.162.148 (talk) 05:17, 22 June 2009 (UTC)
- It's called ORD() in many languages (since the character set / language / font may not be ASCII, but the idea is the same). This Wikipedia Page String Function comparison could use a section on (number to/from string, character to/from string) http://rosettacode.org/wiki/Character_code#Python --BrianFennell (talk) 22:37, 3 September 2009 (UTC)
substring, startpos, base?
[edit]Ark! The substring table does not list the base for startpos and endpos. Is the startpos=1 the first character in the parent string, or the second? —Preceding unsigned comment added by 203.206.162.148 (talk) 05:57, 22 June 2009 (UTC)
Square bracket as syntax
[edit]There is a problem here: sometimes the square brackets indicate on optional field: string(1[,n]), and sometimes are part of the language: string[1,n].
That leaves the problem that we can't always see that part of the command is optional: string[1 /,n/]. —Preceding unsigned comment added by 203.206.162.148 (talk) 06:03, 22 June 2009 (UTC)
- I see that it's been Fixed now - thank you whoever :~) 203.206.162.148 (talk) 07:22, 14 January 2010 (UTC)
LUA missing as programming language
[edit]I missed lua in this page. I'm willing to add lua examples (which might take some time) but there should be someone to cross-read them. Or are there reasons not to have lua in the examples?
LUA string.find and string.gsub misplaced?
[edit]These functions work with pattern matching, not with plain strings (well, find can be forced to do so with additional options) There should be at least a comment about this. Bassklampfe (talk) 15:12, 30 November 2010 (UTC)
Removal of "Compare (integer result, fast/non-human ordering)"
[edit]I am removing the Compare (integer result, fast/non-human ordering) section, for the following reasons:
- This is not a common or primitive operation. Observe that of the languages listed, not one provides a built-in operator or standard library function to perform this type of comparison. Only one of the examples calls a single function, and that is in an uncommon third-party library. The rest are all implemented in terms of structural comparison of tuples (not a string operation at all) or sequential boolean OR (using the basic string comparison already detailed in the previous section). The section therefore does not in fact provide any new information about string functions at all. It merely describes an alleged optimisation technique. But...
- This is not even an optimisation in most cases. The complicated "fast" approaches given in the article all involved more operations than the straightforward standard approach, nullifying any speed improvement they might have brought. The OCaml and Ruby examples were particularly bad, since the "fast" versions actually involved allocating and freeing memory on the heap!
I ran some benchmarks in Perl and OCaml, and I was unable to find any cases where the "fast" version was not actually slower than the standard approach. In one case (OCaml, comparing short strings), the code given in the article was literally 33% slower than a straightforward String.compare!
It's possible that things might be different in other languages, and the technique might be generally faster in some very restricted circumstances (maybe when comparing very long strings that are very similar?), but it is clearly not something that anyone should be using without benchmarking it against their own data; and it's unlikely that string comparisons will frequently be enough of a bottleneck to justify this kind of micro-optimisation in the first place.
In short, this is not the kind of useful information that Wikipedia prides itself on spreading, and I don't think it belongs in this article. 87.194.117.80 (talk) 17:04, 26 July 2009 (UTC)
equivalence relation missing
[edit]This article deals with three ways to compare string (equality, compare, and strcmp). This might have some issues:
- From my understanding, the three ones cover the same feature.
- This feature is not defined as long as lexicographical order is not defined.
- It is not clear if this comparison is a low level comparison, or on an equivalence basis.
For instance, how do you compare Montréal and Montréal (the two canonically equivalents UTF16 unicode forms)?
character | M | o | n | t | r | é | a | l | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UTF16 NFC | 004d | 006f | 006e | 0074 | 0072 | 00e9 | 0061 | 006c | |||||||||||||||
UTF16 NFD | 004d | 006f | 006e | 0074 | 0072 | 0065 | 0301 | 0061 | 006c | ||||||||||||||
UTF16 NFD (code points) | M | o | n | t | r | e | ◌́ | a | l |
"Code" format
[edit]The "code" tags on the keywords in the tables (or perhaps other changes) have destroyed the formatting, making the tables almost illegible. If you go back a decade and look the original tables, you'll see that the keywords are clearly delimited, making the tables clear and easy to read.
The present formating makes the whole excercise almost worthless: if you can't read it easily, whats the point of having pages of text? — Preceding unsigned comment added by 203.206.162.148 (talk) 09:27, 18 July 2017 (UTC)
How-to guide
[edit]It seems to me that this article, as useful as it is, is outside of Wikipedia's scope, in light of the principle that Wikipedia is not a how-to guide, which is exactly what this article is. Largoplazo (talk) 10:00, 18 June 2020 (UTC)
Mentioning strtok as C/C++ way of splitting strings
[edit]strtok(char *restrict str, const char *restrict delim)
returns tokens (aka split strings). This is essentially what string.split does in most other languages, except it doesn't allocate memory to store array of tokens, instead just mutating original string (replacing delimiter with '\0')[1] and returning tokens in order of occurence, one by one.
Also it never returns empty tokens[2].
"Proper" implementation of split (using strtok) is something like:
#include <stdlib.h>
#include <string.h>
struct stringArray {
size_t size;
char **strings;
};
struct stringArray splitString(char *restrict str, const char *restrict delim) {
char **strings = malloc(sizeof(char *));
if (strings == NULL)
abort();
char *token = strtok(str, delim);
size_t count = 0, allocated = 1;
while (token != NULL) {
if (allocated >= count) {
strings = realloc(strings, (allocated *= 2) * sizeof(char *)); // Doubling reallocation, to provide acceptable performance
if (strings == NULL)
abort();
}
strings[count++] = token;
token = strtok(NULL, delim);
}
if (allocated != count) {
strings = realloc(strings, count * sizeof(char *));
if (strings == NULL)
abort();
}
return (struct stringArray){count, strings};
}