1 year ago

#303006

test-img

Raj Raj

Format response content with python requests module

I have a web page which I can access from my server. The contents of the web page are as below.

xys.server.com - /xys/reports/
[To Parent Directory]

  3/4/2021  6:09 AM        <dir> All_Master
  3/4/2021  6:09 AM        <dir> Hartland
  3/4/2021  6:09 AM        <dir> Hauppauge
  3/4/2021  6:09 AM        <dir> Hazelwood
 2/15/2019  7:41 AM        58224 NetBackup Retention and Full Backup Occupancy.xlsx
  1/1/2022 11:00 AM        23959 OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip
  2/1/2022 11:00 AM        18989 OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip
  3/1/2022 11:00 AM        18969 OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip
  4/1/2021 10:00 AM        21709 OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip
  5/1/2021 10:00 AM        27491 OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip
  6/1/2021 10:00 AM        21260 OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip
  7/1/2021 10:00 AM        19898 OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip
  8/1/2021 10:00 AM        22642 OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip
  9/1/2021 10:00 AM        19426 OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip
 10/1/2021 10:01 AM        19149 OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip
 11/1/2021 10:00 AM        19638 OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip
 12/1/2021 11:00 AM        19375 OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip
  1/2/2022 11:00 AM        22281 OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip
  2/2/2022 11:00 AM        19435 OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip
  3/2/2022 11:00 AM        19380 OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip
  4/2/2021 10:00 AM        21411 OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip

Now, I need to get the contents from this page in a structured format. I am using requests module but the data is highly un-structured and difficult to parse. The code is as below..

req = requests.get(url)
print (req.content.decode('utf-8'))

Output is like :

<pre><A HREF="/webreports/">[To Parent Directory]</A><br><br>  3/4/2021  6:09 AM        &lt;dir&gt; <A HREF="/webreports/admin/All_Master/">All_Master</A><br>  3/4/2021  6:09 AM        &lt;dir&gt; <A HREF="/webreports/admin/Hartland/">Hartland</A><br>  3/4/2021  6:09 AM        &lt;dir&gt; <A HREF="/webreports/admin/Hauppauge/">Hauppauge</A><br>  3/4/2021  6:09 AM        &lt;dir&gt; <A HREF="/webreports/admin/Hazelwood/">Hazelwood</A><br> 2/15/2019  7:41 AM        58224 <A HREF="/webreports/admin/NetBackup%20Retention%20and%20Full%20Backup%20Occupancy.xlsx">NetBackup Retention and Full Backup Occupancy.xlsx</A><br>  1/1/2022 11:00 AM        23959 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip">OpsCenter_All_Master_Server_Backup_Report_01_01_2022_10_00_45_259_AM_49.zip</A><br>  2/1/2022 11:00 AM        18989 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip">OpsCenter_All_Master_Server_Backup_Report_01_02_2022_10_00_04_813_AM_4.zip</A><br>  3/1/2022 11:00 AM        18969 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip">OpsCenter_All_Master_Server_Backup_Report_01_03_2022_10_00_24_664_AM_17.zip</A><br>  4/1/2021 10:00 AM        21709 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip">OpsCenter_All_Master_Server_Backup_Report_01_04_2021_10_00_02_266_AM_31.zip</A><br>  5/1/2021 10:00 AM        27491 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip">OpsCenter_All_Master_Server_Backup_Report_01_05_2021_10_00_27_655_AM_11.zip</A><br>  6/1/2021 10:00 AM        21260 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip">OpsCenter_All_Master_Server_Backup_Report_01_06_2021_10_00_54_053_AM_19.zip</A><br>  7/1/2021 10:00 AM        19898 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip">OpsCenter_All_Master_Server_Backup_Report_01_07_2021_10_00_12_544_AM_42.zip</A><br>  8/1/2021 10:00 AM        22642 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip">OpsCenter_All_Master_Server_Backup_Report_01_08_2021_10_00_28_384_AM_25.zip</A><br>  9/1/2021 10:00 AM        19426 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip">OpsCenter_All_Master_Server_Backup_Report_01_09_2021_10_00_43_851_AM_70.zip</A><br> 10/1/2021 10:01 AM        19149 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip">OpsCenter_All_Master_Server_Backup_Report_01_10_2021_10_01_00_422_AM_7.zip</A><br> 11/1/2021 10:00 AM        19638 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip">OpsCenter_All_Master_Server_Backup_Report_01_11_2021_10_00_15_326_AM_20.zip</A><br> 12/1/2021 11:00 AM        19375 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip">OpsCenter_All_Master_Server_Backup_Report_01_12_2021_10_00_29_943_AM_13.zip</A><br>  1/2/2022 11:00 AM        22281 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip">OpsCenter_All_Master_Server_Backup_Report_02_01_2022_10_00_45_803_AM_37.zip</A><br>  2/2/2022 11:00 AM        19435 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip">OpsCenter_All_Master_Server_Backup_Report_02_02_2022_10_00_05_577_AM_71.zip</A><br>  3/2/2022 11:00 AM        19380 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip">OpsCenter_All_Master_Server_Backup_Report_02_03_2022_10_00_24_973_AM_90.zip</A><br>  4/2/2021 10:00 AM        21411 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip">OpsCenter_All_Master_Server_Backup_Report_02_04_2021_10_00_03_069_AM_56.zip</A><br>  5/2/2021 10:00 AM        24191 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_05_2021_10_00_28_556_AM_14.zip">OpsCenter_All_Master_Server_Backup_Report_02_05_2021_10_00_28_556_AM_14.zip</A><br>  6/2/2021 10:00 AM        21675 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_06_2021_10_00_54_962_AM_73.zip">OpsCenter_All_Master_Server_Backup_Report_02_06_2021_10_00_54_962_AM_73.zip</A><br>  7/2/2021 10:00 AM        19954 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_07_2021_10_00_13_058_AM_31.zip">OpsCenter_All_Master_Server_Backup_Report_02_07_2021_10_00_13_058_AM_31.zip</A><br>  8/2/2021 10:00 AM        21085 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_08_2021_10_00_28_778_AM_79.zip">OpsCenter_All_Master_Server_Backup_Report_02_08_2021_10_00_28_778_AM_79.zip</A><br>  9/2/2021 10:00 AM        19691 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_09_2021_10_00_44_294_AM_5.zip">OpsCenter_All_Master_Server_Backup_Report_02_09_2021_10_00_44_294_AM_5.zip</A><br> 10/2/2021 10:01 AM        23477 <A HREF="/webreports/admin/OpsCenter_All_Master_Server_Backup_Report_02_10_2021_10_01_00_793_AM_9.zip">OpsCenter_All_Master_Server_Backup_Report_02_10_2021_10_01_00_793_AM_9.zip</A><br> 11/2/2021 10:00 AM        2

This is very unstructured.

Kindly suggest a way to make this content more readable so it is easy to parse the data...

python

httpwebresponse

0 Answers

Your Answer

Accepted video resources