Forum: Local Yocal BBS

please quote these back

From Maurice Kinal@1:153/7001.2989 to Nancy Backus on Sunday, April 28, 2019 06:25:38

Hey Nancy!

Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ

These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у

These 4 all contain a trailing 0x88 byte:
È ψ Ј ш

These 4 all contain a trailing 0x8d byte:
� � � �

These 4 all contain a trailing 0x8f byte:
Ï ď Џ я

These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ

These 4 all contain a trailing 0x98 byte:
Ø Ř И ј

Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ

I believe they cover all the holes in CP1250, CP1251 and CP1252. If my hunch was correct then the charaters with trailing 0x81, 0x8d, 0x8f, 0x90 and 0x9d should end up with their trailing bytes stripped and therefore no longer be valid utf8 characters. If only the ones with 0x8d get their trailing bytes stripped then it is likely mark's hunch about the end of line is most likely correct. If none of them get stripped of their trailing bytes then ?!?!?!?!?!

Life is good,
Maurice

... Cybertoasts of note:
2020-01-01 is 248 days from now and falls on a Wednesday.
2024-11-05 is 2018 days from now and falls on a Tuesday.
--- GNU bash, version 5.0.7(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's CanadARM - Ladysmith BC, Canada (1:153/7001.2989)

From Ozz Nixon@1:1/123 to Maurice Kinal on Monday, April 29, 2019 09:28:12

On 2019-04-28 06:25:38 +0000, Maurice Kinal -> Nancy Backus said:

Hey Nancy!

Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ

These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у

These 4 all contain a trailing 0x88 byte:
È ψ Ј ш

These 4 all contain a trailing 0x8d byte:
�
�
�
�

These 4 all contain a trailing 0x8f byte:
Ï ď Џ я

These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ

These 4 all contain a trailing 0x98 byte:
Ø Ř И ј

Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ

I believe they cover all the holes in CP1250, CP1251 and CP1252. If my

hunch

was correct then the charaters with trailing 0x81, 0x8d, 0x8f, 0x90 and 0x9d should end up with their trailing bytes stripped and therefore no longer be valid utf8 characters. If only the ones with 0x8d get their trailing bytes stripped then it is likely mark's hunch about the end of line is most likely correct. If none of them get stripped of their trailing bytes then

?!?!?!?!?!

Life is good,
Maurice

... Cybertoasts of note:
2020-01-01 is 248 days from now and falls on a Wednesday.
2024-11-05 is 2018 days from now and falls on a Tuesday.

--- ExchangeBBS NNTP Server v3.1/Linux64
* Origin: (1:1/123)

From Maurice Kinal@1:153/7001 to Ozz Nixon on Monday, April 29, 2019 14:34:07

Hey Ozz!

These 4 all contain a trailing 0x8d byte:

We have a winner! ... sort of. Do you know exactly where/what stripped those bytes?

Life is good,
Maurice

... Don't cry for me I have vi.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)

From Maurice Kinal@2:280/464.113 to Nancy Backus on Wednesday, May 01, 2019 12:51:38

Hallo Nancy!

[the above set came on four separate lines for at least one bbs]

That particular bbs would be substituting 0x0d for 0x8d while the rest are stripping the 0x8d from the utf8 characters completely which of course completly corrupts them. In this example I only posted 4 16 bit characters with 0x8d as the trailing byte but there are definetly more that will get corrutped, not to mention the 24 bit and 32 bit characters. I posted two emoticons (32 bit) to Ozz that will get corrupted by the same bbses. They will
no longer be valind utf8 characters whether just stripped or substituted with an ascii (C0) control code. The substitued ones will be more obvious as to what form of corruption occurs.

Sad times indeed. :::sigh:::

Het leven is goed,
Maurice

... Cybertoasts van belang:
2020-01-01 is 245 dagen vanaf nu en valt op een woensdag.
2024-11-05 is 2015 dagen vanaf nu en valt op een dinsdag.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From mark lewis@1:3634/12.73 to Maurice Kinal on Wednesday, May 01, 2019 09:45:16

On 2019 May 01 12:51:38, you wrote to Nancy Backus:

That particular bbs would be substituting 0x0d for 0x8d while the rest are stripping the 0x8d

FWIW1: there are two places where 0x8d may be acted on...
1. the tosser may strip by ignoring completely and skipping
2. the BBS may strip or convert to 0x0d while displaying the message
or when packaging it for offline mail

FWIW2: sbbsecho, the FTN tosser for Synchronet BBS, has just added the option to strip or not 0x8d characters... the default is to leave them alone and ignore them by not reacting to them at all... they are copied as they are and applies to both echomail and netmail... this is the good (proper?) form of ignoring for this situation ;)

----- snip -----
http://cvs.synchro.net/commitlog.ssjs#38249

Log Message:
Add option to strip so-called "Soft CRs" (0x8D) from incoming messages.
The default is off (no stripping). Previously, Soft-CRs were always stripped, but this behavior is now seen as an anachronism as CP437 char 141 is an important non-English laguage character and used as such in FidoNet msgs.
----- snip -----

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
... SystemD would be a good OS but it needs a decent init system.
---
* Origin: (1:3634/12.73)

From Maurice Kinal@2:280/464.113 to mark lewis on Wednesday, May 01, 2019 17:03:46

Hallo mark!

FWIW1: there are two places where 0x8d may be acted on...
1. the tosser may strip by ignoring completely and skipping

I am not sure what you mean by that but if the end result is what Nancy shows in her quotes of the trailing 0x8d in the 4 utf8 characters then that would be the "may strip" result which leaves only the leading byte ... or what I prefer it be called the masterbyte. :::evil grin:::

2. the BBS may strip or convert to 0x0d while displaying the

message or when packaging it for offline mail

Which is what happened on at least one BBS according to Nancy, which matches with what Ozz's quote of the same message showed. Either way that does not bode well since 0x8d is a well used trailing byte in utf8.

Also worth mentioning is that two of the supported codepages, IBM848 and IBM866, use 0x8d as the exact same character - "CYRILLIC CAPITAL LETTER EN" - while IBM850 uses 0x8d as the same character as IBM437 - "LATIN SMALL LETTER I WITH GRAVE".

An interesting aside; U+040D known as "CYRILLIC CAPITAL LETTER I WITH GRAVE" will also work in the phrase, "It is all fun and games until someone loses an �.", since the trailing byte happens to be 0x8d. Although in this case the masterbyte :::snicker::: 0xd0 will survive but then it no longer is a valid utf8 character without it's needed trailing byte.

Het leven is goed,
Maurice

... Huil niet om mij, ik heb vi.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From mark lewis@1:3634/12.73 to Maurice Kinal on Wednesday, May 01, 2019 14:01:06

On 2019 May 01 17:03:46, you wrote to me:

FWIW1: there are two places where 0x8d may be acted on...
1. the tosser may strip by ignoring completely and skipping

I am not sure what you mean by that

there are two ways to "ignore" something when processing it...

1. a. read in data
b. find character to ignore
c. ignore it and do not write it to the output (aka skip)

2. a. read in data
b. find character to ignore
c. ignore it and write it to the output (true ignore)

a lot of code does #1 when it should do #2...

but if the end result is what Nancy shows in her quotes of the
trailing 0x8d in the 4 utf8 characters then that would be the "may
strip" result which leaves only the leading byte ... or what I prefer
it be called the masterbyte. :::evil grin:::

hehehe... remember, though, that in nancy's case, she sees the characters after

1. they are originally written on a BBS
or imported into the BBS from offline mail
2. scanned out of the BBS to FTN packets
3. transferred across the wire
4. tossed in to the BBS from FTN packets
5. they're packed into her offline mail format
6. opened on her end and displayed in her reader

numbers 2, 4, 5, and 6 could result in her not seeing certain characters... number 1 could if the import from the offline mail upload package filtered them
by ignoring them...

2. the BBS may strip or convert to 0x0d while displaying the
message or when packaging it for offline mail

Which is what happened on at least one BBS according to Nancy, which matches with what Ozz's quote of the same message showed. Either way that does not bode well since 0x8d is a well used trailing byte in utf8.

very true...

Also worth mentioning is that two of the supported codepages, IBM848
and IBM866, use 0x8d as the exact same character - "CYRILLIC CAPITAL LETTER EN" - while IBM850 uses 0x8d as the same character as IBM437 - "LATIN SMALL LETTER I WITH GRAVE".

there is that, as well...

An interesting aside; U+040D known as "CYRILLIC CAPITAL LETTER I WITH GRAVE" will also work in the phrase, "It is all fun and games until

someone

loses an �.", since the trailing byte happens to be 0x8d. Although in this case the masterbyte :::snicker::: 0xd0 will survive but then it no longer is a valid utf8 character without it's needed trailing byte.

keep on with the poking... it may result in some real good for the network one day ;)

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
... I'm the trombone joke in this echo (it was a joke why they let me play). ---
* Origin: (1:3634/12.73)

From Maurice Kinal@2:280/464.113 to mark lewis on Wednesday, May 01, 2019 19:08:31

Hallo mark!

1c. ignore it and do not write it to the output (aka skip)

That sounds like what I called stripping in the previous post. Basically it has the same effect if it isn't in the output.

a lot of code does #1 when it should do #2...

I am not convinced it "should" do either but perhaps in certain cases there could be characters that might do "harm" such as some ansi bbses do to user's terminals. I used to have to run reset after logging out of bbses to get things back to normal after telnetting to them. This is what led me to where I
am today as far as offline messaging is concerned. I want it all and not just what some sysop thinks I want since they are always wrong amd make very bad software choices as a rule. This "loses an i" bug is a prime example methinks,
although I still think there may still be issues with the holes in the MS codepages that haven't reared their ugly heads ... or at least that I am aware of. I did see on one Russian site ages ago a document which identified as KOI8-R was actually CP1251 and although they were all Cyrillic they obviously were wrong. I downloaded and used iconv to change it to CP1251 just to make sure. Bottomline is there is way too much that goes awry when dealing with differing codepages and stripping out codes will definetly cause harm to messages.

that in nancy's case, she sees the characters after

Yes but Nancy's editor is perfect for testing since we both know for a fact it does no harm, even to utf8 characters which she cannot 'properly' render but she can see the 8 bit hex codes as they map out to IBM437. If anything is amiss it is obvious when she quotes whatever is of concern back as is in the case of the "loses an i" bug.

keep on with the poking... it may result in some real good for
the network one day

You too. Your call on this particular issue was bang on. I now bow to the master.

Het leven is goed,
Maurice

... Huil niet om mij, ik heb vi.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From mark lewis@1:3634/12.73 to Maurice Kinal on Wednesday, May 01, 2019 18:21:50

On 2019 May 01 19:08:30, you wrote to me:

1c. ignore it and do not write it to the output (aka skip)

That sounds like what I called stripping in the previous post. Basically it has the same effect if it isn't in the output.

yes and that's a problem that some coders don't seem to understand...

a lot of code does #1 when it should do #2...

I am not convinced it "should" do either but perhaps in certain cases

there

could be characters that might do "harm" such as some ansi bbses do to user's terminals. I used to have to run reset after logging out of bbses to get things back to normal after telnetting to them.

that's them or your terminal simply not resetting things properly... i see it, too, when displaying ANSIs and have also seen it when displaying animated web graphics when breaking out of them to return to the terminal...

This is what led me to where I am today as far as offline messaging is

[...]

Bottomline is there is way too much that goes awry when dealing with differing codepages and stripping out codes will definetly cause harm
to messages.

agreed... remember, though, that much of fidonet was created by hobbiests... that's not a bacd thing, though... just they don't always consider or even know
"all the things"...

that in nancy's case, she sees the characters after

Yes but Nancy's editor is perfect for testing since we both know for a fact it does no harm, even to utf8 characters which she cannot
'properly' render but she can see the 8 bit hex codes as they map out
to IBM437. If anything is amiss it is obvious when she quotes
whatever is of concern back as is in the case of the "loses an i" bug.

true but that may be dependent on which BBS she is using for her replies at that time... i don't know if she is using my offline capabilities here or not... i do have another user that cycles uploading messages between three or four BBSes... plus there's whatever path the messages may take from where they upload their messages... this path used to be reliable and easily used to track
down problematic systems but with "fidoweb" and going "against the grain" of fidonet and not sending duplicates, there are numerous paths that may be taken...

keep on with the poking... it may result in some real good for the
network one day

You too. Your call on this particular issue was bang on. I now bow
to the master.

i'm no master but thank you :) :blush:

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
... If you don't like me tailgating, buy bumper stickers with bigger words.
---
* Origin: (1:3634/12.73)

From Maurice Kinal@2:280/464.113 to mark lewis on Thursday, May 02, 2019 03:03:52

Hallo mark!

that may be dependent on which BBS she is using for her replies
at that time

Yes but both her and I determined ages ago that anything funky in our messages was due to the BBS and not her editor. Her editor (uEmacs) is perfect so we both know for a fact that if there is something oddball happening with messaging that it is likely the bbs at fault or something in between. She knows what I am talking about ... even when she doesn't. :-)

Anyhow she helped confirm the "loses an i" bug, both the lossy and the switcheroo versions.

Het leven is goed,
Maurice

... Huil niet om mij, ik heb vi.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From mark lewis@1:3634/12.73 to Maurice Kinal on Thursday, May 02, 2019 00:30:00

On 2019 May 02 03:03:52, you wrote to me:

that may be dependent on which BBS she is using for her replies at
that time

Yes but both her and I determined ages ago that anything funky in our messages was due to the BBS and not her editor.

understood... i'm looking at the additional separation between the tossers and the BBSes... offline readers may also "adjust things" for display... editors are something else and would only affect what is written or quoted... it is possible that what is quoted is original or it could be what the reader modifies for display... i know you know these things... it is my own infliction
making me cover the details...

Her editor (uEmacs) is perfect so we both know for a fact that if
there is something oddball happening with messaging that it is likely
the bbs at fault or something in between. She knows what I am talking about ... even when she doesn't. :-)

hehehe... yup! i remember your discussions about the editor... i remember tha she has, basicaly, a *nix type environment but in DOS using packet drivers for her networking... like back in the w95 and KA9Q days ;)

Anyhow she helped confirm the "loses an i" bug, both the lossy and the switcheroo versions.

she's very well versed in finding bugs :)

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
... Violence is now mainly organized and governmental. - B. Russell.
---
* Origin: (1:3634/12.73)

From Maurice Kinal@2:280/464.113 to mark lewis on Thursday, May 02, 2019 19:55:58

Hallo mark!

i know you know these things... it is my own infliction making
me cover the details...

And in this case it paid off. I wasn't aware of the 0x8d being used the way it
apparently is being used (EOL) although I do recall from back in the so-called "good old days" IBM using 0x85 for a '\r\n' replacement. I haven't actually run across anything that uses it but have seen more than a few references to 0x85 and IBM.

like back in the w95 and KA9Q days ;)

It goes further back than that. More around 89-91-ish. If I am not mistaken uEmacs was a Finnish programmers idea for making DOS more Unixie and the latest
release was sometime around 1991. I think Nancy's version was earlier. Anyhow
it definetly is a keeper given how well it still works and I can understand her
reluctance to upgrade. Also I think hubby hacked it a bit if I recall correctly.

Het leven is goed,
Maurice

... Huil niet om mij, ik heb vi.
--- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Nancy Backus@1:229/452 to mark lewis on Saturday, May 04, 2019 01:12:08

Quoting mark lewis to Maurice Kinal on 01-May-2019 18:21 <=-

Yes but Nancy's editor is perfect for testing since we both know for a
fact it does no harm, even to utf8 characters which she cannot
'properly' render but she can see the 8 bit hex codes as they map out
to IBM437. If anything is amiss it is obvious when she quotes
whatever is of concern back as is in the case of the "loses an i" bug.

true but that may be dependent on which BBS she is using for her
replies at that time... i don't know if she is using my offline capabilities here or not...

So far I've not been sending through your offline (in the new iteration
of the bbs), but I have been reading messages to me online, and then downloading from the echoes I'm active in, including this one... I've
been answering pretty exclusively through Tiny's BBS, since it's BW door
allows me to utilize the longer subject lines (and people like Maurice
often use that full capacity) without truncating subjects like using QWK
does (including in the BW reader)....

i do have another user that cycles uploading messages between three
or four BBSes...

And I know who that is... <G> I tend to usually answer specific echoes
on specific bbses, and only use the other bbses as a backup when access
is temporarily gone... and to keep that manageable, keep up with regular message packet downloads for whichever echoes are available at the
various bbses... which also serves as a double-check that messages are
flowing, both mine and overall... :)

plus there's whatever path the messages may take from where they
upload their messages... this path used to be reliable and easily
used to track down problematic systems but with "fidoweb" and going "against the grain" of fidonet and not sending duplicates, there
are numerous paths that may be taken...

Yup... and I'm just as glad to not have to really deal with that in my
lowly position of user.... ;)

ttyl neb

... System Error #303: Power not on.

--- EzyBlueWave V3.00 01FB001F
* Origin: Tiny's BBS - http://www.tinysbbs.com (1:229/452)

From Ozz Nixon@1:1/123 to Nancy Backus on Friday, May 24, 2019 19:37:08

Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ

These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у

These 4 all contain a trailing 0x88 byte:
È ψ Ј ш

These 4 all contain a trailing 0x8d byte:
� � � �

Ï ď Џ я

These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ

These 4 all contain a trailing 0x98 byte:
Ø Ř И ј

Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ

Sorry for the dela on this thread - still int he process of moving to
Florida. Anyway, what I see in Unison NNTP Client are the UTF8 A I D N
or A I D O characters as it should have been. However, in PCBoard 16, I
see the CP437 8bit character plus the character your trailing each line
with. My PCBoard terminal (fTelnet) would render the UTF8, however, the
header did not contain ^aCHRS: UTF8 so it assumes to stay in the
current state (CP437 as detected during the ANSI detection routine).

So, some environments render the UTF8 w/o the required CHRS signature.

Ozz
--- ExchangeBBS FTN Tosser/JAM v1.19.04 (Beta 4.09)
* Origin: (1:1/123)

From Ozz Nixon@1:1/123 to Nancy Backus on Friday, May 24, 2019 23:38:23

On 2019-05-01 01:22:08 +0000, Nancy Backus -> Maurice Kinal said:

Quoting Maurice Kinal to Nancy Backus on 28-Apr-2019 06:25 <=-

Here you go.... :)

Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ

These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у

These 4 all contain a trailing 0x88 byte:
È ψ Ј ш

These 4 all contain a trailing 0x8d byte:
� � � �

These 4 all contain a trailing 0x8f byte:
Ï ď Џ я

These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ

These 4 all contain a trailing 0x98 byte:
Ø Ř И ј

Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ

Per my PCBoard 16 reply, this is from Unison NNTP Client for MAC OSX. I
see the UTF8 characters perfectly... (or so I assume perfectly)...

Ozz

--- ExchangeBBS NNTP Server v3.1/Linux64
* Origin: (1:1/123)

From Nancy Backus@1:229/452 to Ozz Nixon on Wednesday, May 29, 2019 15:56:48

Quoting Ozz Nixon to Nancy Backus on 24-May-2019 19:37 <=-

It was Maurice that was playing with this... all I did was quote it back
to him as requested.... :) Hopefully he will have seen your two
messages from your two different sources.... ;) Your comments would
probably mean more to him... :)

ttyl neb

Here are 4 utf8 characters that all contain a trailing 0x81 byte:
Á Ё с Ӂ

These 4 all contain a trailing 0x83 byte:
Ã σ Ѓ у

These 4 all contain a trailing 0x88 byte:
È ψ Ј ш

These 4 all contain a trailing 0x8d byte:
� � � �

Ï ď Џ я

These 4 all contain a trailing 0x90 byte:
Ð Ő А ѐ

These 4 all contain a trailing 0x98 byte:
Ø Ř И ј

Finally, these 4 all contain a trailing 0x9d byte:
Ý ŝ Н ѝ

Sorry for the dela on this thread - still int he process of moving to Florida. Anyway, what I see in Unison NNTP Client are the UTF8 A I D N
or A I D O characters as it should have been. However, in PCBoard 16,
I see the CP437 8bit character plus the character your trailing each
line with. My PCBoard terminal (fTelnet) would render the UTF8,
however, the header did not contain ^aCHRS: UTF8 so it assumes to stay
in the current state (CP437 as detected during the ANSI detection routine).
So, some environments render the UTF8 w/o the required CHRS signature.

Ozz
-!- ExchangeBBS FTN Tosser/JAM v1.19.04 (Beta 4.09)
! Origin: (1:1/123)

--- EzyBlueWave V3.00 01FB001F
* Origin: Tiny's BBS - telnet://tinysbbs.com:3023 (1:229/452)

From Maurice Kinal@1:153/7001.2989 to Nancy Backus on Thursday, May 30, 2019 21:23:31

Hey Nancy!

It was Maurice that was playing with this

We already know the answer.

Life is good,
Maurice

... Cybertoasts of note:
2020-01-01 is 216 days from now and falls on a Wednesday.
2024-11-05 is 1986 days from now and falls on a Tuesday.
--- GNU bash, version 5.0.7(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's CanadARM - Ladysmith BC, Canada (1:153/7001.2989)

Who's Online

System Info

Sysop:	Weed Hopper
Location:	Clearwater, FL
Users:	14
Nodes:	6 (0 / 6)
Uptime:	232:06:10
Calls:	55
Calls today:	1
Files:	50,127
D/L today:	35 files (4,430K bytes)
Messages:	275,420

please quote these back

Who's Online

System Info