Question

Why does this assembly implementation of strcmp behave unexpectedly?

I've been writing my own naive implementation of the strcmp function in x86_64 assembly.
I've written a c test program to compare the behavior with the real strcmp from the C library.
Even if both function seems to return the same value, the condition
(strcmp(s1,s2) == my_strcmp(s1,s2)) seems to be false.
Weirdly, my_strcmp(s1,s2) == strcmp(s1,s2) seems to be true. What am I missing ?

While troubleshooting, i've compiled with my implementation of strcmp in C and the same test program, just to verify : it works well, all behaviors are expected.

my_strcmp.s :

; Function: int my_strcmp(const char *dest, const char *src);
; Arguments:
;       rdi             s1
;       rsi             s2
; Variable:
;       rcx             counter
; Returned value :

section .text
global my_strcmp

my_strcmp:
    ; Function prologue (optional)
    push rbp
    mov rbp, rsp

        xor rcx,rcx
loop:
        mov al, byte [rdi]
        mov bl, byte [rsi]
        cmp al, bl
        jne not_equal
        test al,al
        je null_byte_found
        
        inc rdi
        inc rsi
        jmp loop

not_equal:
        sub al, bl      
        movsx rax, al 
        jmp end
null_byte_found:
        xor rax,rax
        jmp end
end:
    ; Function epilogue (optional)
    mov rsp, rbp
    pop rbp
        ret

main.c ( the test program) :

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
extern int my_strcmp(const char *s1, const char *s2);

void    test_my_strcmp();

int main() {
        test_my_strcmp();
    return 0;
}

void    test_my_strcmp(){
        char s1[10] = "123456789";
        char s2[10] = "12345AAA";
        int a = strcmp(s1,s2);  
        int b = my_strcmp(s1,s2);       
        if ( a == b)
                printf("a == b\n");
        if (strcmp(s1,s2) == -11)
                printf("strcmp return value is -11\n");
        if (-11 == my_strcmp(s1,s2))
                printf("my_strcmp retun value is -11\n");
        if (strcmp(s1,s2) == my_strcmp(s1,s2))
                printf("OK1\n");
        if (my_strcmp(s1,s2) == strcmp(s1,s2))
                printf("OK2\n");
}

my_strcmp.c (just to verify) :

int my_strcmp(const char *s1, const char *s2){
        while (*s1 && *s2 && (*s1 == *s2))
        {
                s1++;
                s2++;
        }
        return (*s1 - *s2);
}

Building steps/ recreating the bug

After creating the 3 files my_strcmp.s, my_strcmp.c and main.c,

nasm -f elf64 -g my_strcmp.s -o my_strcmp_assembly.o
gcc -c my_strcmp.c
gcc main.c my_strcmp.o -o c_version.out
gcc main.c my_strcmp_assembly.o -o asm_version.out
./c_version.out && ./asm_version.out

C Version output (the expected output)

a == b
strcmp return value is -11
ft_strcmp retun value is -11
OK1
OK2

asm version output

a == b
strcmp return value is -11
ft_strcmp retun value is -11
OK2

gcc version : 11.4.0 (up to date) nasm version : 2.15.05 (up to date)

``

 3  124  3
1 Jan 1970

Solution

 7

You are using %bl as a temp register.

  1. This register is owned by the calling function (i.e. not yours).
  2. This must be preserved for the caller of your function by you.
  3. Either push/pop %rbx during your function prolog/epilog or use another register.

See: Section 3.2.1 of https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf for a list of registers and how they're used by caller/callee

2024-07-21
Craig Estey

Solution

 7

In addition to what @Craig Estey and problems commented elsewhere,return (*s1 - *s2); is a problem when *s1 and *s2 differ in sign. @Nate Eldredge

strcmp() performs as if each character was an unsigned char.

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C2x dr § 7.26.1 4

A better re-invented strmcp()

int my_strcmp(const char *s1, const char *s2){
  const unsigned char *uc1 = (const unsigned char *) s1;
  const unsigned char *uc2 = (const unsigned char *) s2;

  while ((*uc1 == *uc2) && *uc1) {  // Do an unsigned access and compare
    uc1++;
    uc2++;
  }
  return ((*uc1 > *uc2) - (*uc1 < *uc2));  // No overflow possible
}

Pedantically, this also fixes potential overflow of subtracting 2 int values that may overflow and avoids (nearly extinct) problems with non-2's complement encoding of a signed char.

2024-07-21
chux - Reinstate Monica