1 year ago
#303006
Raj Raj
Format response content with python requests module
I have a web page which I can access from my server. The contents of the web page are as below.
xys.server.com - /xys/reports/
[To Parent Directory]
3/4/2021 6:09 AM <dir> All_Master
3/4/2021 6:09 AM <dir> Hartland
3/4/2021 6:09 AM <dir> Hauppauge
3/4/2021 6:09 AM <dir> Hazelwood
2/15/2019 7:41 AM 58224 NetBackup Retention and Full Backup Occupancy.xlsx
1/1/2022 11:00 AM 23959 OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip
2/1/2022 11:00 AM 18989 OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip
3/1/2022 11:00 AM 18969 OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip
4/1/2021 10:00 AM 21709 OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip
5/1/2021 10:00 AM 27491 OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip
6/1/2021 10:00 AM 21260 OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip
7/1/2021 10:00 AM 19898 OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip
8/1/2021 10:00 AM 22642 OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip
9/1/2021 10:00 AM 19426 OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip
10/1/2021 10:01 AM 19149 OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip
11/1/2021 10:00 AM 19638 OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip
12/1/2021 11:00 AM 19375 OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip
1/2/2022 11:00 AM 22281 OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip
2/2/2022 11:00 AM 19435 OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip
3/2/2022 11:00 AM 19380 OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip
4/2/2021 10:00 AM 21411 OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip
Now, I need to get the contents from this page in a structured format. I am using requests module but the data is highly un-structured and difficult to parse. The code is as below..
req = requests.get(url)
print (req.content.decode('utf-8'))
Output is like :
<pre><A HREF="/webreports/">[To Parent Directory]</A><br><br> 3/4/2021 6:09 AM <dir> <A HREF="/webreports/admin/All_Master/">All_Master</A><br> 3/4/2021 6:09 AM <dir> <A HREF="/webreports/admin/Hartland/">Hartland</A><br> 3/4/2021 6:09 AM <dir> <A HREF="/webreports/admin/Hauppauge/">Hauppauge</A><br> 3/4/2021 6:09 AM <dir> <A HREF="/webreports/admin/Hazelwood/">Hazelwood</A><br> 2/15/2019 7:41 AM 58224 <A HREF="/webreports/admin/NetBackup%20Retention%20and%20Full%20Backup%20Occupancy.xlsx">NetBackup Retention and Full Backup Occupancy.xlsx</A><br> 1/1/2022 11:00 AM 23959 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip">OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip</A><br> 2/1/2022 11:00 AM 18989 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip">OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip</A><br> 3/1/2022 11:00 AM 18969 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip">OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip</A><br> 4/1/2021 10:00 AM 21709 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip">OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip</A><br> 5/1/2021 10:00 AM 27491 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip">OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip</A><br> 6/1/2021 10:00 AM 21260 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip">OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip</A><br> 7/1/2021 10:00 AM 19898 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip">OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip</A><br> 8/1/2021 10:00 AM 22642 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip">OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip</A><br> 9/1/2021 10:00 AM 19426 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip">OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip</A><br> 10/1/2021 10:01 AM 19149 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip">OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip</A><br> 11/1/2021 10:00 AM 19638 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip">OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip</A><br> 12/1/2021 11:00 AM 19375 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip">OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip</A><br> 1/2/2022 11:00 AM 22281 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip">OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip</A><br> 2/2/2022 11:00 AM 19435 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip">OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip</A><br> 3/2/2022 11:00 AM 19380 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip">OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip</A><br> 4/2/2021 10:00 AM 21411 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip">OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip</A><br> 5/2/2021 10:00 AM 24191 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_05_2021_10_00_28_556_AM_14.zip">OpsCenter_All_Master_Server_Backup_Report_02_05_2021_10_00_28_556_AM_14.zip</A><br> 6/2/2021 10:00 AM 21675 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_06_2021_10_00_54_962_AM_73.zip">OpsCenter_All_Master_Server_Backup_Report_02_06_2021_10_00_54_962_AM_73.zip</A><br> 7/2/2021 10:00 AM 19954 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_07_2021_10_00_13_058_AM_31.zip">OpsCenter_All_Master_Server_Backup_Report_02_07_2021_10_00_13_058_AM_31.zip</A><br> 8/2/2021 10:00 AM 21085 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_08_2021_10_00_28_778_AM_79.zip">OpsCenter_All_Master_Server_Backup_Report_02_08_2021_10_00_28_778_AM_79.zip</A><br> 9/2/2021 10:00 AM 19691 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_09_2021_10_00_44_294_AM_5.zip">OpsCenter_All_Master_Server_Backup_Report_02_09_2021_10_00_44_294_AM_5.zip</A><br> 10/2/2021 10:01 AM 23477 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_10_2021_10_01_00_793_AM_9.zip">OpsCenter_All_Master_Server_Backup_Report_02_10_2021_10_01_00_793_AM_9.zip</A><br> 11/2/2021 10:00 AM 2
This is very unstructured.
Kindly suggest a way to make this content more readable so it is easy to parse the data...
python
httpwebresponse
0 Answers
Your Answer