@REPLY: 1:153/7001 5d13cc9cCommon practice as defintion is 7-Bit ascii not utf-8
@MSGID: 1:153/7001.2989 5d13d439
@CHRS: UTF-8 4
Hey Maurice!
Again I won't quote back but it is noticably missing the '■э'
character at the beginning of the sentence, which happens to be 0x8d
in CP866. I am thinking this is a significant failure to communicate properly.
Life is good,
Maurice
... Don't cry for me I have vi.
--- GNU bash, version 5.0.7(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's CanadARM - Ladysmith BC, Canada (1:153/7001.2989)
SEEN-BY: 103/705 153/7001 154/10 30 40 700 203/0 221/0 6 227/201 400 SEEN-BY: 229/310 426 240/5832 280/464 5003 5006 5555 292/854 310/31 SEEN-BY: 320/219 340/800 396/45 423/120 712/848 770/1 2452/250 3634/12 SEEN-BY: 5020/545
@PATH: 153/7001 154/10 280/464
Common practice as defintion is 7-Bit ascii not utf-8
FTN Header versus actual message body conveying Unicode.
When I telnet to a SQL server that speaks Unicode only, it always
returns the following characters (pascal): #239#187#191
When I telnet to a web page that speaks Unicode, it too returns #239#187#191 plus the <!doctype html> etc.
So... would it not stand true that systems that are posting UTF8 do
the same introduction on the message body?
Then authors *know* it potentially has Unicode
and leave it damn well alone, and also parse it based upon UTF8
instead of 8bit char...
This is how I am coding things here, just based upon NexusSQL,
PremierSQL, MS SQL, Apache and Nexus Web Service. I do not have access
to my Oracle box nor the MySQL 5 server to see if they do the same
during the initial connection negotiation(s).
A quick google: It's the utf8 byte order mark. Some editors save the
BOM inside the file (in order to be used as a header) which regularly causes confusion because it is optional.
So, if we wanted to help enforce at a reader (or even tosser level)
how to handle, I would offer this up as a required BOM to the message
body that is UTF8.
Just for the record, the message I am replying to -> MSGID: 1:1/123.0 5D164D19 ><- never arrived at 1:153/7001 through the normal uplink 1:153/757. However i
was recieved via 1:154/10 which I am using as the main link for 1:153/7001.2989. Also it shows up on the Europoint which I use for verification purposes as well.
My guess is that since you are connected at 154/10, 153/7001 was
already listed in the seen by lines when it arrived here so it
was not sent to your node?
That would have been my guess as well except that it appears to be totally
random to which msg's go missing. In the latest case it was a msg originating >from afar or sometimes one that originates from here ... 'here' being CanadARM
or Brain.
As far as I am concerned the missing 0x8d characters are of greater need of attention than missing msg's although if missig elsewhere might give it a higher priority. So far I seem to be the only one noticing.
Messages can and do arrive here by different paths. If a message
arrives here with your node in the seen bys you will not get a
copy of it, if it arrives by another path without your node in
the seen bys then you will. That is random, depends on who gets
here first.. :)
On 2019-06-28 02:01:09 +0000, Maurice Kinal -> Torsten Bamberg said:
FTN Header versus actual message body conveying Unicode.
When I telnet to a SQL server that speaks Unicode only, it always
returns the following characters (pascal): #239#187#191
When I telnet to a web page that speaks Unicode, it too returns
#239#187#191 plus the <!doctype html> etc.
So... would it not stand true that systems that are posting UTF8 do the
same introduction on the message body? Then authors *know* it
potentially has Unicode and leave it damn well alone, and also parse it based upon UTF8 instead of 8bit char...
This is how I am coding things here, just based upon NexusSQL,
PremierSQL, MS SQL, Apache and Nexus Web Service. I do not have access
to my Oracle box nor the MySQL 5 server to see if they do the same
during the initial connection negotiation(s).
A quick google: It's the utf8 byte order mark.
Some editors save the
BOM inside the file (in order to be used as a header) which regularly
causes confusion because it is optional.
So, if we wanted to help enforce at a reader (or even tosser level) how
to handle, I would offer this up as a required BOM to the message body
that is UTF8.
It's an idea. But that's not how *other* charsets/encodings work
So, if we wanted to help enforce at a reader (or even tosser
level) how to handle, I would offer this up as a required BOM to
the message body that is UTF8.
And why is that better than a header field ("control paragraph"
as defined in FTS-5003) which indicates UTF-8?
Hallo Rob!
It's an idea. But that's not how *other* charsets/encodings work
Other than the existance of 8-bit characters utf-8 is totally different than standard 8-bit character sets. If one is to scan msgs for 8-bit characters it won't help to decypher the message without knowing beforehand what the character set is, whereas with utf-8 it doesn't matter.
The "CHRS: UTF-8 4"
is totally useless especially when it is wrong such as in "CHRS: UTF-8 2" which still happens.
So, if we wanted to help enforce at a reader (or even tosser
level) how to handle, I would offer this up as a required BOM to
the message body that is UTF8.
And why is that better than a header field ("control paragraph"
as defined in FTS-5003) which indicates UTF-8?
It isn't.
Well mannered implementations should gracefully handle this
situation
Hey Rob!
Well mannered implementations should gracefully handle this
situation
Do you mean like the 'ignoring' of 0x8d as opposed to deleting it?
Anyhow
for my part it really doesn't matter as there is no consequences for 'ignoring' ftsc standards with utf-8. With or without it is still utf8 ... unless of course there are any 0x8d trailing bytes which will the be stripped and then the kludge will be wrong.
I think that's a completely different issue.
I agree: tossers probably should not be stripping 0x8d's.
I don't think that has anything to do with "CHRS: UTF-8 2" vs
"CHRS: UTF-8 4".
Hallo Rob!
I think that's a completely different issue.
Not really. The fact that nothing is ever done about these "bugs" when raised with suitable evidence isn't going to change anything if my observations have been correct over the decades.
I agree: tossers probably should not be stripping 0x8d's.
Amen. There ought to be a law against it.
I don't think that has anything to do with "CHRS: UTF-8 2" vs
"CHRS: UTF-8 4".
It doesn't matter. 4, 2 or any other number will change nothing in the case of utf8. The 0x8d training bytes will all be stripped, as theu will for all the 8-bit characters.
I posted a CP866 as an example and lo and behold the
single 16-bit charater with the trailing bit 0x8d was deleted. The only differece is that in utf8 those bytes matter way more than they do in purely 8-bit land.
I think that's a completely different issue.
Not really. The fact that nothing is ever done about these "bugs"
when raised with suitable evidence isn't going to change anything if
my observations have been correct over the decades.
If it's a problem for you, then switch software?
SBBSecho used to strip 0x8d's and since rev 3.113 (Apr-30-2019) no
longer does (at least, not by default).
See? "bugs" can be fixed, so long as someone has the source code.
I posted a CP866 as an example and lo and behold the single 16-bit
charater with the trailing bit 0x8d was deleted. The only differece
is that in utf8 those bytes matter way more than they do in purely
8-bit land.
So find out what software is stripping the 0x8d's and start a campaign
to get that software "fixed".
If it's a problem for you, then switch software?
"bugs" can be fixed, so long as someone has the source code.
So find out what software is stripping the 0x8d's and start a
campaign to get that software "fixed".
So find out what software is stripping the 0x8d's and start a
campaign to get that software "fixed".
That is why I am posting here in this particular echo.
Whoever/whatever is responsible is not saying, not that it would
really matter since there is no mechanism to force anyone to correct
their "bugs".
oh but there is... getting that mechanism into operation like
used to be done is a little harder to do these days for some
reason...
oh but there is... getting that mechanism into operation like used to
be done is a little harder to do these days for some reason...
I'll believe it when I see it.
Also I don't buy into the "good ol' days" when everything was peacho-keen-neato. I am sure some (most?) of that is selective
memory.
Anyhow I look forward to seeing the 0x8d bug die a horrible, miserable death. Make my day. :-)
the mechanism is the *Cs
i don't have a clue what you're speaking of
we have to find exactly which software it is causing it
the mechanism is the *Cs
So I brought this up in the wrong echo?
This seems like the place for thes issues.
somei don't have a clue what you're speaking of
I was just responding to the line that says, "getting that mechanism into operation like used to be done is a little harder to do these days for
reason". It sounded like a "back in the good ol' days" type of thing. Sorry if I read that wrong.
we have to find exactly which software it is causing it
Good idea.
mark lewis wrote to Maurice Kinal <=-
have gone TITSUP (Total Inability To Support Usual Performance) for
have gone TITSUP (Total Inability To Support Usual Performance) for
You read The Register too, eh? :D
Sysop: | Weed Hopper |
---|---|
Location: | Clearwater, FL |
Users: | 14 |
Nodes: | 6 (0 / 6) |
Uptime: | 230:43:32 |
Calls: | 55 |
Calls today: | 1 |
Files: | 50,127 |
D/L today: |
26 files (3,281K bytes) |
Messages: | 275,350 |