I still see this in various text that’s meant to be readable.
Usually ampersands are the biggest culprit, but is it just a really sacred data type that can’t be upgraded to include punctuation, but can include the foreign looking wingdings that try to stand in for it?
I’m just confused on why those characters have multi character reference names that aren’t part of the regular alphabet or punctuation set either, but those still show up instead of having room to just remove the erroneous reference with the actual character.
It’s 2026, just dig out this fossil and fix it already.
The characters are all in your own pc. The text data is actually just numbers, referencing the index of each character in a reference table.
Early on someone thought “let’s create a bunch of different reference tables and each country uses the one that is best for them so we don’t have to include every character in the world”.
But that thinking has a critical problem: when you write some text that will only be read within the country, you don’t need to keep track of which table you used because everyone will be using the same. Soon you forget that there are other tables for other countries so when you do send an international text using your table as a reference, the person on the other side will be parsing it using their own table and the resulting text will be different. And sometimes when this mixup happens, the index referenced by the text in the other table may actually be some internal control character that is not meant for rendering.
These days the problem is “mostly fixed” by the near-universal adoption of a single reference table that proposes including verything you may ever need (even a lot of emojis) - but this large table means that each character in a text may need more digits to represent the intended index so the total file size for the same text is larger than it would be with the non-universal table.
Exactly. My point is to move to a single universal standard that is used by literally everything so this never happens. Just cut off everything that can’t be updated, and it can just sink or swim based on how well it can parse the new table.
Fuck all that ancient non-updateable shit. There’s no good reason that old table still exists, much less be possible to use this side of 2000.
Obviously this has legacy problems, but fuck those systems, everyone gotta get new shit now, tough shit. The old table should be cause for new shit to fail compiling in the first place. Shouldn’t be possible to use it.
Let’s just make forward progress, and lose the chains.
Can this really not be fixed?
I still see this in various text that’s meant to be readable.
Usually ampersands are the biggest culprit, but is it just a really sacred data type that can’t be upgraded to include punctuation, but can include the foreign looking wingdings that try to stand in for it?
I’m just confused on why those characters have multi character reference names that aren’t part of the regular alphabet or punctuation set either, but those still show up instead of having room to just remove the erroneous reference with the actual character.
It’s 2026, just dig out this fossil and fix it already.
the wrong UTF encoding is usually the issue
The characters are all in your own pc. The text data is actually just numbers, referencing the index of each character in a reference table.
Early on someone thought “let’s create a bunch of different reference tables and each country uses the one that is best for them so we don’t have to include every character in the world”.
But that thinking has a critical problem: when you write some text that will only be read within the country, you don’t need to keep track of which table you used because everyone will be using the same. Soon you forget that there are other tables for other countries so when you do send an international text using your table as a reference, the person on the other side will be parsing it using their own table and the resulting text will be different. And sometimes when this mixup happens, the index referenced by the text in the other table may actually be some internal control character that is not meant for rendering.
These days the problem is “mostly fixed” by the near-universal adoption of a single reference table that proposes including verything you may ever need (even a lot of emojis) - but this large table means that each character in a text may need more digits to represent the intended index so the total file size for the same text is larger than it would be with the non-universal table.
Exactly. My point is to move to a single universal standard that is used by literally everything so this never happens. Just cut off everything that can’t be updated, and it can just sink or swim based on how well it can parse the new table.
Fuck all that ancient non-updateable shit. There’s no good reason that old table still exists, much less be possible to use this side of 2000.
Obviously this has legacy problems, but fuck those systems, everyone gotta get new shit now, tough shit. The old table should be cause for new shit to fail compiling in the first place. Shouldn’t be possible to use it.
Let’s just make forward progress, and lose the chains.
That’s the joke. 😅