Problem with xml file after upgrade to 1.6.1.3

An evolving, supported alternative to Rovi
Forum rules
★ Download the latest EPG123 here: http://epg123.garyan2.net <> Setup guide here: http://epg123.garyan2.net/downloads/epg123_Guide.pdf
Post Reply
tirebiter

Posts: 24
Joined: Mon Jul 03, 2017 8:33 pm
Location:

HTPC Specs: Show details

Problem with xml file after upgrade to 1.6.1.3

#1

Post by tirebiter » Sat Feb 13, 2021 1:06 am

Hi,

I have written a utility that loads all the TV shows from the epg123.xml (now epg123.xmltv) file into two SQL SERVER tables. One for channels and the other for shows. I then export that data into a text file which can be imported into Excel. I use this to peruse upcoming premiere shows and movies, among other things.

After the upgrade, the SQL Server script which imports the XML into the database stopped working. Here's why:

The old xml file had this format:
<channel id="EPG123.11331.schedulesdirect.org">
<display-name>WCBS</display-name>
<display-name>WCBS</display-name>
<display-name>2 WCBS</display-name>
<display-name>2</display-name>
<display-name>CBS</display-name>
<icon src="https://schedulesdirect-api20141201.s3. ... w_270h.png" />
</channel>

The new xml fil has this format:
<channel id="EPG123.11331.schedulesdirect.org">
<display-name>WCBS</display-name>
<display-name>2 WCBS</display-name>
<display-name>2</display-name>
<display-name>CBS</display-name>
<icon src="https://schedulesdirect-api20141201.s3. ... w_270h.png" width="360" height="270" />
</channel>

Note that the second <dislplay-name> tag from the old file is missing from the new file. Presumably, because they are identical. When they are different in the old file, the second tag is present in the new file.

My code relies on the fixed locations of these tags. Here's how it's being inserted into SQL Server:
ELECT *
into channels
FROM OPENXML (@hdoc, '/tv/channel', 2)
WITH (
id varchar(50) '@id',
chan1 varchar(50) 'display-name[1]',
chan2 varchar(50) 'display-name[2]',
chan3 varchar(50) 'display-name[3]',
chan4 float 'display-name[4]',
chan5 varchar(50) 'display-name[5]',
icon varchar(100))

How do I detect when one (or maybe more) <display-name> tags are not provided for a specific TV channel?

I know this isn't a normal request, but I really like using this utility.

Regards,
Dan

User avatar
garyan2

Posts: 5837
Joined: Fri Nov 27, 2015 7:23 pm
Location:

HTPC Specs: Show details

#2

Post by garyan2 » Sat Feb 13, 2021 1:40 am

I don't speak SQL, but can you detect which display-name contains digits only and count up 2 from there?
- Gary
Keeping WMC alive beyond January 2020. http://epg123.garyan2.net

NOYB

Posts: 88
Joined: Thu Sep 10, 2020 8:03 am
Location:

HTPC Specs: Show details

#3

Post by NOYB » Sat Feb 13, 2021 2:38 am

Count the number of <display-name> tags.
If there are 5 then continue as is.
If there are 4 then insert the second one with the contents of the first.

User avatar
garyan2

Posts: 5837
Joined: Fri Nov 27, 2015 7:23 pm
Location:

HTPC Specs: Show details

#4

Post by garyan2 » Sat Feb 13, 2021 2:49 am

The only problem with that is this last change to xmltv file to be compliant with the DTD, there could be multiple channel numbers in there so you could have 5, 7, 9, 11 or 4, 6, 8, 10. The last line is the affiliate, but if there is no affiliate for the channel then it is reduced by 1. Can't do odd/even counts.
- Gary
Keeping WMC alive beyond January 2020. http://epg123.garyan2.net

Space

Posts: 2220
Joined: Sun Jun 02, 2013 9:44 pm
Location:

HTPC Specs: Show details

#5

Post by Space » Sat Feb 13, 2021 5:40 am

What is the purpose of all those display-names? Do you really need them all for you purposes? Why not just load the first one in to the DB and ignore all the rest?

I don't know much about the XMLTV format, but I think it is strange to have multiple display-names that may differ in count for each channel. How do you tell which is which when they all have the same "display-name" identifier and you cannot rely on the position to determine which is which?

tirebiter

Posts: 24
Joined: Mon Jul 03, 2017 8:33 pm
Location:

HTPC Specs: Show details

#6

Post by tirebiter » Sat Feb 13, 2021 4:09 pm

NOYB wrote:
Sat Feb 13, 2021 2:38 am
Count the number of <display-name> tags.
If there are 5 then continue as is.
If there are 4 then insert the second one with the contents of the first.
I confess to not have a very strong knowledge on SQL Server's XML support. I don't know if I can look ahead and see how many tags there are, or what their content is. I am investigating that now. I'll report back with my findings.

tirebiter

Posts: 24
Joined: Mon Jul 03, 2017 8:33 pm
Location:

HTPC Specs: Show details

#7

Post by tirebiter » Sat Feb 13, 2021 4:18 pm

Space wrote:
Sat Feb 13, 2021 5:40 am
What is the purpose of all those display-names? Do you really need them all for you purposes? Why not just load the first one in to the DB and ignore all the rest?

I don't know much about the XMLTV format, but I think it is strange to have multiple display-names that may differ in count for each channel. How do you tell which is which when they all have the same "display-name" identifier and you cannot rely on the position to determine which is which?
The whole system is reference only for me. I eventually load the data into Excel so I have a comprehensive list of all the upcoming TV offerings that I can filter and sort to my purposes. While all the varied display-name tags aren't necessary for me, I particularly like the numeric one (the channel number by itself).

I don't think your second question is directed to me. I've wondered the same thing about why the same tag name was reused and how epg123 distinguishes between therm, but I'm sure there's a good reason, and it's not my concern.

tirebiter

Posts: 24
Joined: Mon Jul 03, 2017 8:33 pm
Location:

HTPC Specs: Show details

#8

Post by tirebiter » Sat Feb 13, 2021 4:28 pm

Here is my interim solution. I reverted my epg123 installation to the version with the fixed number of tags. I will continue to see if I can make the necessary checks on the tag counts/values in SQL Server, so I can use the current epg123. If not, I may install the older epg123 on a different PC for the sole purpose of providing input to my utility, and upgrade the PC that records shows on WMC so it has the latest and greatest.

I want to thank Gary and whoever assists him for this remarkable application, as well as his prompt and extremely insightful postings on this forum. Thanks to everyone else for suggestions and ideas.

NOYB

Posts: 88
Joined: Thu Sep 10, 2020 8:03 am
Location:

HTPC Specs: Show details

#9

Post by NOYB » Sat Feb 13, 2021 9:39 pm

garyan2 wrote:
Sat Feb 13, 2021 2:49 am
The only problem with that is this last change to xmltv file to be compliant with the DTD, there could be multiple channel numbers in there so you could have 5, 7, 9, 11 or 4, 6, 8, 10. The last line is the affiliate, but if there is no affiliate for the channel then it is reduced by 1. Can't do odd/even counts.
Under what conditions does the multiple channel numbers occur and how are the display-name entries ordered within the channel id tag?
A channel here, KGW 8 (8.1) which is also broadcast on 23 (8.10). The sub channels too.
But only the one I have included in the guide is in the xmltv.

User avatar
garyan2

Posts: 5837
Joined: Fri Nov 27, 2015 7:23 pm
Location:

HTPC Specs: Show details

#10

Post by garyan2 » Sun Feb 14, 2021 2:46 am

Your 8.1 (KGWDT) and 8.10 (KGWZLD) channels are considered different stations with different station IDs so they won't be in the same channel construct of the xmltv file.

The display-names from epg123 are constructed like the following:
  • Callsign
  • Name (if different from callsign)
  • Channel number + callsign (if configured to include channel numbers)
  • Channel number (if configured to include channel numbers)
If the station is on more than 1 channel number, then it will repeat the last 2 for each one.

Finally the last entry will be the affiliate broadcaster which is typically on for the OTA channels.

So for an example, my local cableco has ActionMAX HD (Pacific) on 3 channels, and they will all be within the channel "epg123.59950.schedulesdirect.org"
  • ACMXHDP
  • ActionMAX HD (Pacific)
  • 224 ACMXHDP
  • 224
  • 1222 ACMXHDP
  • 1222
  • 1224 ACMXHDP
  • 1224
- Gary
Keeping WMC alive beyond January 2020. http://epg123.garyan2.net

Space

Posts: 2220
Joined: Sun Jun 02, 2013 9:44 pm
Location:

HTPC Specs: Show details

#11

Post by Space » Sun Feb 14, 2021 3:12 am

Again, I don't understand how this can work. How is the program that reads in the XML supposed to know what is what in that list?

The only one you can be sure of is the first one, all the others appear to be optional, so you can have (for example):

callsign ABC
affiliate ABC+

Or you can have:

callsign ABC
name ABC+

They are identical without having any way of knowing the second entry for the first example is the affiliate and the second one for the second example is the name.

NOYB

Posts: 88
Joined: Thu Sep 10, 2020 8:03 am
Location:

HTPC Specs: Show details

#12

Post by NOYB » Sun Feb 14, 2021 3:28 am

I'm not seeming to get it either. Did a brief xmltv format search and didn't come across anything like that.

User avatar
garyan2

Posts: 5837
Joined: Fri Nov 27, 2015 7:23 pm
Location:

HTPC Specs: Show details

#13

Post by garyan2 » Sun Feb 14, 2021 3:39 am

From the xmltv.dtd

Code: Select all

Each channel has one id attribute, which must be unique and should
preferably be in the form suggested by RFC2838 (the 'broadcast'
element of the grammar in that RFC, in other words, a DNS-like name
but without any URI scheme).  Then one or more display names which are
shown to the user.  You might want a different display name for
different languages, but also you can have more than one name for the
same language.  Names listed earlier are considered 'more canonical'.

Since the display name is just there as a way for humans to refer to
the channel, it's acceptable to just put the channel number if it's
fairly universal among viewers of the channel.  But remember that this
isn't an official statement of what channel number has been
allocated, and the same number might be used for a different channel
somewhere else.

...

<!-- A user-friendly name for the channel - maybe even a channel
number.  List the most canonical / common ones first and the most
obscure names last.  The lang attribute follows RFC 1766.
-->
<!ELEMENT display-name (#PCDATA)>
<!ATTLIST display-name lang CDATA #IMPLIED>
The contents and order of the display-name elements, if even included, are up to the designer. There is no standard to dictate what will be first, second, or last. I just included patterns that I have seen other developers use and it will be up to the consuming software to determine which it wants to use, or none at all.

This is also why I made including the channel numbers optional to remove all the noise.
- Gary
Keeping WMC alive beyond January 2020. http://epg123.garyan2.net

Space

Posts: 2220
Joined: Sun Jun 02, 2013 9:44 pm
Location:

HTPC Specs: Show details

#14

Post by Space » Sun Feb 14, 2021 6:32 am

It seems that these values are meant for a human to look at and choose, so there is no built-in way to determine what is what other than the first one on the list is the most "canonical" and then they get less so.

So again, I would say, just use the first one and ignore all the others if you are not going to be passing them on to a human to pick manually (which in the case of loading this in to SQL, it appears you are not).

If you will, then I would just put them all in the same text field with a delimiter rather than giving each one it's own DB field. Alternatively, make X number of fields (maximum display-names you think there will be) and then populate them sequentially, leaving the rest null. Of course this means if you make 10 fields and there happens to be 11 display-names, you will have to drop one.

NOYB

Posts: 88
Joined: Thu Sep 10, 2020 8:03 am
Location:

HTPC Specs: Show details

#15

Post by NOYB » Sun Feb 14, 2021 6:51 pm

If include channel numbers is enabled then it should be possible to use regex to determine field type by content and position.

1st is Callsign

2nd is Name if content is not a channel number + callsign. Or if 3rd and 4th are a channel number + callsign and channel number pair.

channel number + callsign and channel number pairs.
...
...

last is Affiliate if content is not a channel number. Or if last -2 and -1 are a channel number + callsign and channel number pair.

Otherwise if include channel numbers is not enabled I see no way to distinguish between a name and affiliate field. Unless both exist, or neither exist.

Channel Number + Callsign RegEx

Code: Select all

/^\d+(\.\d+)?\s+<callsign>$/
String begins with one or more decimal digits: ^\d+
Optionally followed by a decimal point and one or more decimal digits: (\.\d+)?
Followed by one or more white spaces: \s+
Followed by the callsign to end of string: <callsign>$


Channel Number RegEx

Code: Select all

/^\d+(\.\d+)?$/
String begins with one or more decimal digits: ^\d+
Optionally followed by a decimal point and one or more decimal digits to end of string: (\.\d+)?$

Post Reply