Description
This block of code is for Part 1 Step 1. SF is referring to the SWL table. I extracted the html content from the SWL website, used find() to get the html strings I need to extract data from, and used regular expressions (regex) to parse the html strings for the data I need. In order to gather the data I need for my DataFrame, I created SF_arr to accumulate that data for each row. After I parsed and added data to my array, then I created the DataFrame with the array and the appropriate columns.
[2]: SF_r = requests.get(“https://cmsc320.github.io/files/top-50-solar-flares.html”)␣
↪#get request to access data from the SWL mirror
SF_root = BeautifulSoup( SF_r.content ) #accesses the content from the SWL␣ ↪mirror
SF_arr = [] # 2D array for gathering the data
SF_regex = r”(d+).*(Xd+.*d*).*(d{4}/d{2}/d{2}).*(d{4}).*(d{2}:d{2}). ↪*(d{2}:d{2}).*(d{2}:d{2}).*(Movie).*(Viewsarchive)” #regex for the data␣ ↪with MovieView Archive
SF_regex2 = r”(d+).*(Xd+.*d*+?).*(d{4}/d{2}/d{2}).*(d{4}).*(d{2}:
↪d{2}).*(d{2}:d{2}).*(d{2}:d{2}).*(Viewsarchive)” #regex for the data␣
↪with View Archive for i in range(len(SF_root.find(‘div’, id=’SWL_Page’).find(‘table’).
↪find(‘tbody’).findAll(‘tr’))): #iterating through html with data for each row try: #try except is used in case SF_regex didn’t match because of grouping␣
↪MovieView Archive vs View Archive match = re.search(SF_regex, str(SF_root.find(‘div’, id=’SWL_Page’).
↪find(‘table’).find(‘tbody’).findAll(‘tr’)[i])) #uses SF_regex for each row␣
↪in the data with MovieView Archive
SF_arr.append([match.group(1), match.group(2), match.group(3) ,match.
↪group(4) ,match.group(5) ,match.group(6) ,match.group(7) , match.group(8),␣
↪match.group(9)]) #adds data from the groups caught from SF_regex
except AttributeError: #if SF_regex doesn’t match the current html string␣
↪because of the grouping with MovieView Archive and View Archive match = re.search(SF_regex2, str(SF_root.find(‘div’, id=’SWL_Page’).
↪find(‘table’).find(‘tbody’).findAll(‘tr’)[i])) #uses SF_regex2 for each row␣
↪in the data with View Archive
SF_arr.append([match.group(1), match.group(2), match.group(3) ,match.
↪group(4) ,match.group(5) ,match.group(6) ,match.group(7) , match.group(8)])␣
↪#adds data from the groups caught from SF_regex2 for i in range(len(SF_arr)): #iterates through the SF_arr if(len(SF_arr[i]) == 9): #checks if an element (or an array in this case)␣
↪has “Movie” and “View Archive” as separate elements
SF_arr[i][7] = SF_arr[i][7] + SF_arr[i][8] #Merges “Movie” and “View␣
↪Archive” into “MovieView Archive”
SF_arr[i].remove(SF_arr[i][8]) #deletes the element with View Archive
↪display DataFrame display(SF_table)
0 1 X28 2003/11/04 0486 19:29 19:53 20:06
1 2 X20 2001/04/02 9393 21:32 21:51 22:03
2 3 X17.2 2003/10/28 0486 09:51 11:10 11:24
3 4 X17 2005/09/07 0808 17:17 17:40 18:03
4 5 X14.4 2001/04/15 9415 13:19 13:50 13:55
5 6 X10 2003/10/29 0486 20:37 20:49 21:01
6 7 X9.4 1997/11/06 8100 11:49 11:55 12:01
7 8 X9.3 2017/09/06 2673 11:53 12:02 12:10
8 9 X9 2006/12/05 0930 10:18 10:35 10:45
9 10 X8.3 2003/11/02 0486 17:03 17:25 17:39
10 11 X8.2 2017/09/10 2673 15:35 16:06 16:31
11 12 X7.1 2005/01/20 0720 06:36 07:01 07:26
12 13 X6.9 2011/08/09 1263 07:48 08:05 08:08
13 14 X6.5 2006/12/06 0930 18:29 18:47 19:00
14 15 X6.2 2005/09/09 0808 19:13 20:04 20:36
15 16 X6.2 2001/12/13 9733 14:20 14:30 14:35
16 17 X5.7 2000/07/14 9077 10:03 10:24 10:43
17 18 X5.6 2001/04/06 9415 19:10 19:21 19:31
18 19 X5.4 2012/03/07 1429 00:02 00:24 00:40
19 20 X5.4 2005/09/08 0808 20:52 21:06 21:17
20 21 X5.4 2003/10/23 0486 08:19 08:35 08:49
21 22 X5.3 2001/08/25 9591 16:23 16:45 17:04
22 23 X4.9 2014/02/25 1990 00:39 00:49 01:03
23 24 X4.9 1998/08/18 8307 22:10 22:19 22:28
24 25 X4.8 2002/07/23 0039 00:18 00:35 00:47
25 26 X4 2000/11/26 9236 16:34 16:48 16:56
26 27 X3.9 2003/11/03 0488 09:43 09:55 10:19
27 28 X3.9 1998/08/19 8307 21:35 21:45 21:50
28 29 X3.8 2005/01/17 0720 06:59 09:52 10:07
29 30 X3.7 1998/11/22 8384 06:30 06:42 06:49
30 31 X3.6 2005/09/09 0808 09:42 09:59 10:08
31 32 X3.6 2004/07/16 0649 13:49 13:55 14:01
32 33 X3.6 2003/05/28 0365 00:17 00:27 00:39
33 34 X3.4 2006/12/13 0930 02:14 02:40 02:57
34 35 X3.4 2001/12/28 9767 20:02 20:45 21:32
35 36 X3.3 2013/11/05 1890 22:07 22:12 22:15
36 37 X3.3 2002/07/20 0039 21:04 21:30 21:54
37 38 X3.3 1998/11/28 8395 04:54 05:52 06:13
38 39 X3.2 2013/05/14 1748 00:00 01:11 01:20
39 40 X3.1 2014/10/24 2192 21:07 21:41 22:13
40 41 X3.1 2002/08/24 0069 00:49 01:12 01:31
41 42 X3 2002/07/15 0030 19:59 20:08 20:14
42 43 X2.8 2013/05/13 1748 15:48 16:05 16:16
43 44 X2.8 2001/12/11 9733 07:58 08:08 08:14
44 45 X2.8 1998/08/18 8307 08:14 08:24 08:32
45 46 X2.7 2015/05/05 2339 22:05 22:11 22:15
46 47 X2.7 2003/11/03 0488 01:09 01:30 01:45
47 48 X2.7 1998/05/06 8210 07:58 08:09 08:20
48 49 X2.6 2005/01/15 0720 22:25 23:02 23:31
49 50 X2.6 2001/09/24 9632 09:32 10:38 11:09
movie 0 MovieView archive
1 MovieView archive
2 MovieView archive
3 MovieView archive
4 MovieView archive
5 MovieView archive
6 MovieView archive
7 MovieView archive
8 MovieView archive
9 MovieView archive
10 MovieView archive
11 MovieView archive
12 MovieView archive
13 MovieView archive
14 MovieView archive
15 MovieView archive
16 MovieView archive
17 MovieView archive
18 MovieView archive
19 MovieView archive
20 MovieView archive
21 MovieView archive
22 MovieView archive
23 View archive
24 MovieView archive
25 MovieView archive
26 MovieView archive
27 View archive
28 MovieView archive
29 MovieView archive
30 MovieView archive
31 MovieView archive
32 MovieView archive
33 MovieView archive
34 MovieView archive
35 MovieView archive
36 MovieView archive
37 MovieView archive
38 MovieView archive
39 MovieView archive
40 MovieView archive
41 MovieView archive
42 MovieView archive
43 MovieView archive
44 View archive
45 MovieView archive
46 MovieView archive
47 MovieView archive
48 MovieView archive
49 MovieView archive
↪%M’)) #the next three lines are for creating datetime objects with the␣
SF_table[‘start_time’][k] = start_datetime #the next three lines assign the␣
↪strings from earlier to the start_time, max_time, and end_time temporarily SF_table[‘maximum_time’][k] = max_datetime SF_table[‘end_time’][k] = end_datetime
SF_table.rename(columns = {‘start_time’:’start_datetime’}, inplace = True) #the␣
↪next three lines rename start, max, and end time to start, max and end␣
↪datetimes to accurately name these columns
SF_table.rename(columns = {‘maximum_time’:’max_datetime’}, inplace = True) SF_table.rename(columns = {‘end_time’:’end_datetime’}, inplace = True) display(SF_table) #displays the tidy SWL data
rank x_classification region start_datetime max_datetime
0 1 X28 0486 2003-11-04 19:29:00 2003-11-04 19:53:00
1 2 X20 9393 2001-04-02 21:32:00 2001-04-02 21:51:00
2 3 X17.2 0486 2003-10-28 09:51:00 2003-10-28 11:10:00 3 4 X17 0808 2005-09-07 17:17:00 2005-09-07 17:40:00
4 5 X14.4 9415 2001-04-15 13:19:00 2001-04-15 13:50:00 5 6 X10 0486 2003-10-29 20:37:00 2003-10-29 20:49:00
6 7 X9.4 8100 1997-11-06 11:49:00 1997-11-06 11:55:00
7 8 X9.3 2673 2017-09-06 11:53:00 2017-09-06 12:02:00 8 9 X9 0930 2006-12-05 10:18:00 2006-12-05 10:35:00
9 10 X8.3 0486 2003-11-02 17:03:00 2003-11-02 17:25:00
10 11 X8.2 2673 2017-09-10 15:35:00 2017-09-10 16:06:00
11 12 X7.1 0720 2005-01-20 06:36:00 2005-01-20 07:01:00
12 13 X6.9 1263 2011-08-09 07:48:00 2011-08-09 08:05:00
13 14 X6.5 0930 2006-12-06 18:29:00 2006-12-06 18:47:00
14 15 X6.2 0808 2005-09-09 19:13:00 2005-09-09 20:04:00
15 16 X6.2 9733 2001-12-13 14:20:00 2001-12-13 14:30:00
16 17 X5.7 9077 2000-07-14 10:03:00 2000-07-14 10:24:00
17 18 X5.6 9415 2001-04-06 19:10:00 2001-04-06 19:21:00
18 19 X5.4 1429 2012-03-07 00:02:00 2012-03-07 00:24:00
19 20 X5.4 0808 2005-09-08 20:52:00 2005-09-08 21:06:00
20 21 X5.4 0486 2003-10-23 08:19:00 2003-10-23 08:35:00
21 22 X5.3 9591 2001-08-25 16:23:00 2001-08-25 16:45:00
22 23 X4.9 1990 2014-02-25 00:39:00 2014-02-25 00:49:00
23 24 X4.9 8307 1998-08-18 22:10:00 1998-08-18 22:19:00
24 25 X4.8 0039 2002-07-23 00:18:00 2002-07-23 00:35:00 25 26 X4 9236 2000-11-26 16:34:00 2000-11-26 16:48:00
26 27 X3.9 0488 2003-11-03 09:43:00 2003-11-03 09:55:00
27 28 X3.9 8307 1998-08-19 21:35:00 1998-08-19 21:45:00
28 29 X3.8 0720 2005-01-17 06:59:00 2005-01-17 09:52:00
29 30 X3.7 8384 1998-11-22 06:30:00 1998-11-22 06:42:00
30 31 X3.6 0808 2005-09-09 09:42:00 2005-09-09 09:59:00
31 32 X3.6 0649 2004-07-16 13:49:00 2004-07-16 13:55:00
32 33 X3.6 0365 2003-05-28 00:17:00 2003-05-28 00:27:00
33 34 X3.4 0930 2006-12-13 02:14:00 2006-12-13 02:40:00
34 35 X3.4 9767 2001-12-28 20:02:00 2001-12-28 20:45:00
35 36 X3.3 1890 2013-11-05 22:07:00 2013-11-05 22:12:00
36 37 X3.3 0039 2002-07-20 21:04:00 2002-07-20 21:30:00
37 38 X3.3 8395 1998-11-28 04:54:00 1998-11-28 05:52:00
38 39 X3.2 1748 2013-05-14 00:00:00 2013-05-14 01:11:00
39 40 X3.1 2192 2014-10-24 21:07:00 2014-10-24 21:41:00
40 41 X3.1 0069 2002-08-24 00:49:00 2002-08-24 01:12:00
41 42 X3 0030 2002-07-15 19:59:00 2002-07-15 20:08:00
42 43 X2.8 1748 2013-05-13 15:48:00 2013-05-13 16:05:00
43 44 X2.8 9733 2001-12-11 07:58:00 2001-12-11 08:08:00
44 45 X2.8 8307 1998-08-18 08:14:00 1998-08-18 08:24:00
45 46 X2.7 2339 2015-05-05 22:05:00 2015-05-05 22:11:00
46 47 X2.7 0488 2003-11-03 01:09:00 2003-11-03 01:30:00
47 48 X2.7 8210 1998-05-06 07:58:00 1998-05-06 08:09:00
48 49 X2.6 0720 2005-01-15 22:25:00 2005-01-15 23:02:00
49 50 X2.6 9632 2001-09-24 09:32:00 2001-09-24 10:38:00
end_datetime 0 2003-11-04 20:06:00
1 2001-04-02 22:03:00
2 2003-10-28 11:24:00
3 2005-09-07 18:03:00
4 2001-04-15 13:55:00
5 2003-10-29 21:01:00
6 1997-11-06 12:01:00
7 2017-09-06 12:10:00
8 2006-12-05 10:45:00
9 2003-11-02 17:39:00
10 2017-09-10 16:31:00
11 2005-01-20 07:26:00
12 2011-08-09 08:08:00
13 2006-12-06 19:00:00
14 2005-09-09 20:36:00
15 2001-12-13 14:35:00
16 2000-07-14 10:43:00
17 2001-04-06 19:31:00
18 2012-03-07 00:40:00
19 2005-09-08 21:17:00
20 2003-10-23 08:49:00
21 2001-08-25 17:04:00
22 2014-02-25 01:03:00
23 1998-08-18 22:28:00
24 2002-07-23 00:47:00
25 2000-11-26 16:56:00
26 2003-11-03 10:19:00
27 1998-08-19 21:50:00
28 2005-01-17 10:07:00
29 1998-11-22 06:49:00
30 2005-09-09 10:08:00
31 2004-07-16 14:01:00
32 2003-05-28 00:39:00
33 2006-12-13 02:57:00
34 2001-12-28 21:32:00
35 2013-11-05 22:15:00
36 2002-07-20 21:54:00
37 1998-11-28 06:13:00
38 2013-05-14 01:20:00
39 2014-10-24 22:13:00
40 2002-08-24 01:31:00
41 2002-07-15 20:14:00
42 2013-05-13 16:16:00
43 2001-12-11 08:14:00
44 1998-08-18 08:32:00
45 2015-05-05 22:15:00
46 2003-11-03 01:45:00
47 1998-05-06 08:20:00
48 2005-01-15 23:31:00
49 2001-09-24 11:09:00
This block of code is for Step 3 of Part 1. I extracted the html content from the NASA website, used find() to get the html strings I need to extract data from, and used regular expressions (regex) to parse the html strings for the data I need. In order to gather the data I need for my DataFrame, I created NASA_arr to accumulate that data for each row. After I parsed and added data to my array, then I created the DataFrame with the array and the appropriate columns.
[4]: NASA_r = requests.get(“https://cmsc320.github.io/files/waves_type2.html”) #get␣ ↪request to access data from the NASA mirror
NASA_root = BeautifulSoup( NASA_r.content) #accesses the content from the NASA␣ ↪mirror
NASA_htmlarr = str(NASA_root.find(‘pre’)).splitlines() #gets html data and␣
↪splits into multiple lines for ease in iterating
NASA_regex = r”(d{4}/d{2}/d{2})s*(d{2}:d{2})s*(d{2}/d{2})s*(d{2}: ↪d{2}).*c2rdif_waves.html”>(d*?*).*c3rdif_waves.html”>(d*?*)</
↪a>s*(w*??-*)s*(w*-*??)s(-*[A-Z]*d*.*d*).*rdif.html”>(d{2}/ ↪d{2}).*(d{2}:d{2})s*(w*)s*(-*&?w*;?w*).*htp.html”>(d*).
↪*(PHTX)</a>s*(.*)?” #first regex for the case that there are links for the␣
↪data in both frequency columns and all data for the CME are filled
NASA_regex_one_link = r”(d{4}/d{2}/d{2})s*(d{2}:d{2})s*(d{2}/
↪d{2})s*(d{2}:d{2}).*c2rdif_waves.html”>(d*?*)</ ↪a>s*(d*)s*(w*)s*(d*)s*(w*.d)s*(–/–)s*(–:
↪–)s(-*)s(-*)s(-*).*(PHTX)</a>s*(.*)?” #second regex for the case␣
↪that one of the frequencies for a row has a link and the other one doesn’t␣ ↪and all data for the CME are empty
NASA_regex_zero_links = r”(d{4}/d{2}/d{2})s*(d{2}:d{2})s*(d{2}/ ↪d{2})s*(d{2}:d{2})s*(d*?*)s*(d*)s*(w*)s*(d*)s*(w*.
↪*d*-*)s*(–/–)s*(–:–)s(-*)s(-*)s(-*).*(PHTX)</a>s*(.
↪*)?” #third regex for the case that there are no links for the frequencies␣
↪and all data for the CME are empty
NASA_regex_CME_empty = r”(d{4}/d{2}/d{2})s*(d{2}:d{2})s*(d{2}/ ↪d{2})s*(d{2}:d{2}).*c2rdif_waves.html”>(d*?*).*c3rdif_waves.
↪html”>(d*?*)</a>s*(w*-*)s*(w*-*)s*(w*.*d*-*)s*(–/
↪–)s*(–:–)s(-*)s(-*)s(-*).*(PHTX)</a>s*(.*)?” #final regex␣
↪for the case that there are links for both frequencies and all data for the␣ ↪CME are empty
NASA_data_arr = [] # 2D array for gathering the data del NASA_htmlarr[0:12] #the next three lines delete elements from the html data␣
↪that do not have the data needed del NASA_htmlarr[518:521] del NASA_htmlarr[529:532]
for elem in NASA_htmlarr: #iterates through the html data match = re.search(NASA_regex, elem) #uses first regex to match for the main␣
↪case if match is None: #if the first regex does not match a string if “c2r” in elem and “c3r” not in elem: #checks to see if one of the␣
↪links for the frequencies are present match = re.search(NASA_regex_one_link, elem) #uses second regex to␣
↪match for the second case elif “c2r” not in elem and “c3r” not in elem: #checks to see if neither␣
↪links for the frequences are present match = re.search(NASA_regex_zero_links, elem) #uses third regex to␣
↪match for the third case elif “c2r” in elem and “c3r” in elem: #checks to see if both links for␣
↪frequencies are present (this check is necessary in case the match does not␣
↪happen for empty CME data) match = re.search(NASA_regex_CME_empty, elem) #uses fourth regex to␣
↪match for the fourth cases
NASA_data_arr.append([match.group(1), match.group(2), match.group(3) , match.group(4) ,match.group(5) ,match.group(6) , match.group(7) , match.group(8), match.group(9), match.group(10), match.group(11), match.group(12),
match.group(13), match.group(14), match.group(15)])␣
↪#adds data from the groups caught from any of the regex
NASA_table = pd.DataFrame(NASA_data_arr, columns = [‘start_date’, ‘start_time’,␣
↪’end_date’, ‘end_time’, ‘start_frequency’,
‘end_frequency’,␣
↪’flare_location’, ‘flare_region’,
‘flare_classification’,␣ ↪’cme_date’, ‘cme_time’, ‘cme_angle’,
start_date start_time end_date end_time start_frequency end_frequency
0 1997/04/01 14:00 04/01 14:15 8000 4000
1 1997/04/07 14:30 04/07 17:30 11000 1000
2 1997/05/12 05:15 05/14 16:00 12000 80
3 1997/05/21 20:20 05/21 22:00 5000 500
4 1997/09/23 21:53 09/23 22:16 6000 2000
.. … … … … … …
513 2017/09/04 20:27 09/05 04:54 14000 210
514 2017/09/06 12:05 09/07 08:00 16000 70
515 2017/09/10 16:02 09/11 06:50 16000 150
516 2017/09/12 07:38 09/12 07:43 16000 13000
517 2017/09/17 11:45 09/17 12:35 16000 900
flare_location flare_region flare_classification cme_date cme_time
0 S25E16 8026 M1.3 04/01 15:18
1 S28E19 8027 C6.8 04/07 14:27
2 N21W08 8038 C1.3 05/12 05:30
3 N05W12 8040 M1.3 05/21 21:00
4 S29E25 8088 C1.4 09/23 22:02
.. … … … … …
513 S10W12 12673 M5.5 09/04 20:12
514 S08W33 12673 X9.3 09/06 12:24
515 S09W92 —– X8.3 09/10 16:00
516 N08E48 12680 C3.0 09/12 08:03
517 S08E170 —–
cme_angle cme_width cme_speed plot
0 74 79 312 PHTX
1 Halo 360 878 PHTX
2 Halo 360 464 PHTX
3 263 165 296 PHTX
4 133 155 712 PHTX
.. … … … …
513 Halo 360 1418 PHTX
514 Halo 360 1571 PHTX
515 Halo 360 3163 PHTX 516 124 96 252 PHTX 517 Halo 360 1385 PHTX —- 09/17 12:00
[518 rows x 15 columns]
[5]: NASA_table[‘is_halo’] = [False for x in range(len(NASA_table))] #creates␣
↪is_halo column with boolean array with all false for the total # of rows
NASA_table[‘width_lower_bound’] = [False for x in range(len(NASA_table))]␣
↪#creates width_lower_bound column with boolean array with all false for the␣
↪total # of rows for idx, row in NASA_table.iterrows(): #iterates rows in NASA DataFrame if row[‘start_frequency’] == “????” and row[‘end_frequency’] == “????”:␣
↪#checks if frequencies have a “????”
NASA_table[‘start_frequency’][idx] = np.nan
NASA_table[‘end_frequency’][idx] = np.nan if “Back” in row[‘flare_location’].lower() or “-” in row[‘flare_location’]:␣
↪#checks if Back is in the flare_location or if its’ empty NASA_table[‘flare_location’][idx] = np.nan if “-” in row[‘flare_region’]: #checks if flare_region is empty
NASA_table[‘flare_region’][idx] = np.nan if “-” in row[‘flare_classification’]: #checks if flare_classification is␣
↪empty
NASA_table[‘flare_classification’][idx] = np.nan if “-” in row[‘cme_date’]: #checks if CME data is empty for the next 5 if␣
↪statements
NASA_table[‘cme_date’][idx] = np.nan if “-” in row[‘cme_time’]:
NASA_table[‘cme_time’][idx] = np.nan if “-” in row[‘cme_angle’]:
NASA_table[‘cme_angle’][idx] = np.nan if “-” in row[‘cme_width’] or “” == row[‘cme_width’]: #extra edge case for␣
↪completely empty data in cme_width column
NASA_table[‘cme_width’][idx] = np.nan if “-” in row[‘cme_speed’]:
NASA_table[‘cme_speed’][idx] = np.nan if row[‘cme_angle’] == “Halo”: #checks for CME_angle being a halo
NASA_table[‘is_halo’][idx] = True #changes value from False to True
NASA_table[‘cme_angle’][idx] = pd.NA #assigns an NA value if “>” in row[‘cme_width’]: #checks for a lower bound in the CME_width
NASA_table[‘width_lower_bound’][idx] = True #changes value from False␣
↪to True
NASA_table[‘cme_width’][idx] = NASA_table[‘cme_width’][idx].
↪replace(“>”, “”) #removes extra tags in the data
start_datetime = datetime.strptime(NASA_table[‘start_date’][idx] + ” ” +␣
↪NASA_table[‘start_time’][idx], ‘%Y/%m/%d %H:%M’) #creates datetime object␣ ↪for start
NASA_table[‘start_time’][idx] = str(start_datetime) #puts in start datetime␣
↪in string datetime format to the table try: #try except statement for if the time is 24:00 end_datetime = datetime.strptime(NASA_table[‘end_date’][idx] + ” ” +␣
↪NASA_table[‘end_time’][idx], ‘%m/%d %H:%M’) #creates datetime object for end end_datetime = end_datetime.replace(year = start_datetime.year) #adds␣
↪the year from the start datetime object to end
NASA_table[‘end_time’][idx] = str(end_datetime) #puts in end datetime␣
↪in string datetime format to the table
except ValueError: ##try except statement for if the time is 24:00 end_datetime = datetime.strptime(NASA_table[‘end_date’][idx] + ” ” +␣
↪”00:00″, ‘%m/%d %H:%M’) ##creates datetime object for end with 00:00 end_datetime = end_datetime + timedelta(days=1) #adds a day to the␣
↪the year from the start datetime object to end
NASA_table[‘end_time’][idx] = str(end_datetime) #puts in end datetime␣
↪in string datetime format to the table
if (type(NASA_table[‘cme_date’][idx]) != float): #checks for nan values in␣
↪CME_date cme_datetime = datetime.strptime(NASA_table[‘cme_date’][idx] + ” ” +␣
↪NASA_table[‘cme_time’][idx], ‘%m/%d %H:%M’) #creates datetime object for cme cme_datetime = cme_datetime.replace(year = start_datetime.year) #adds␣
↪the year from the start datetime object to cme
NASA_table[‘cme_time’][idx] = str(cme_datetime) #puts in cme datetime␣
↪in string datetime format to the table
NASA_table.drop([‘end_date’], axis = 1, inplace = True)
NASA_table.drop([‘cme_date’], axis = 1, inplace = True)
NASA_table.rename(columns = {‘start_time’:’start_datetime’}, inplace = True)␣
↪#the next three lines rename start, cme, and end time to start, cme and end␣
↪datetimes to accurately name these columns
NASA_table.rename(columns = {‘cme_time’:’cme_datetime’}, inplace = True) NASA_table.rename(columns = {‘end_time’:’end_datetime’}, inplace = True) display(NASA_table) #displays tidy NASA data
/tmp/ipykernel_50/849007548.py:31: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘start_time’][idx] = str(start_datetime) #puts in start datetime in string datetime format to the table
/tmp/ipykernel_50/849007548.py:35: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘end_time’][idx] = str(end_datetime) #puts in end datetime in string datetime format to the table
/tmp/ipykernel_50/849007548.py:46: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘cme_time’][idx] = str(cme_datetime) #puts in cme datetime in string datetime format to the table
/tmp/ipykernel_50/849007548.py:24: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘is_halo’][idx] = True #changes value from False to True /tmp/ipykernel_50/849007548.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘cme_angle’][idx] = pd.NA #assigns an NA value /tmp/ipykernel_50/849007548.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘flare_region’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘flare_classification’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:27: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘width_lower_bound’][idx] = True #changes value from False to True /tmp/ipykernel_50/849007548.py:28: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘cme_width’][idx] = NASA_table[‘cme_width’][idx].replace(“>”,
“”) #removes extra tags in the data
/tmp/ipykernel_50/849007548.py:41: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘end_time’][idx] = str(end_datetime) #puts in end datetime in string datetime format to the table
/tmp/ipykernel_50/849007548.py:14: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘cme_date’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘cme_time’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:18: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘cme_angle’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:20: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘cme_width’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:22: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘cme_speed’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘start_frequency’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy NASA_table[‘end_frequency’][idx] = np.nan
/tmp/ipykernel_50/849007548.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandasdocs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NASA_table[‘flare_location’][idx] = np.nan start_datetime end_datetime start_frequency end_frequency
0 1997-04-01 14:00:00 1997-04-01 14:15:00 8000 4000
1 1997-04-07 14:30:00 1997-04-07 17:30:00 11000 1000
2 1997-05-12 05:15:00 1997-05-14 16:00:00 12000 80
3 1997-05-21 20:20:00 1997-05-21 22:00:00 5000 500
4 1997-09-23 21:53:00 1997-09-23 22:16:00 6000 2000
.. … … … …
513 2017-09-04 20:27:00 2017-09-05 04:54:00 14000 210
514 2017-09-06 12:05:00 2017-09-07 08:00:00 16000 70
515 2017-09-10 16:02:00 2017-09-11 06:50:00 16000 150
516 2017-09-12 07:38:00 2017-09-12 07:43:00 16000 13000
517 2017-09-17 11:45:00 2017-09-17 12:35:00 16000 900
flare_location flare_region flare_classification cme_datetime
0 S25E16 8026 M1.3 1997-04-01 15:18:00
1 S28E19 8027 C6.8 1997-04-07 14:27:00
2 N21W08 8038 C1.3 1997-05-12 05:30:00
3 N05W12 8040 M1.3 1997-05-21 21:00:00
4 S29E25 8088 C1.4 1997-09-23 22:02:00
.. … … … …
513 S10W12 12673 M5.5 2017-09-04 20:12:00
514 S08W33 12673 X9.3 2017-09-06 12:24:00
515 S09W92 NaN X8.3 2017-09-10 16:00:00
516 N08E48 12680 C3.0 2017-09-12 08:03:00
517 S08E170 NaN NaN 2017-09-17 12:00:00
cme_angle cme_width cme_speed plot is_halo width_lower_bound
0 74 79 312 PHTX False False
1 <NA> 360 878 PHTX True False
2 <NA> 360 464 PHTX True False
3 263 165 296 PHTX False False
4 133 155 712 PHTX False False
.. … … … … … …
513 <NA> 360 1418 PHTX True False
514 <NA> 360 1571 PHTX True False
515 <NA> 360 3163 PHTX True False
516 124 96 252 PHTX False False
517 <NA> 360 1385 PHTX True False
518 rows x 14 columns]
This block of code is for Part 2 Question 1. The NASA and the SpaceWeather tables were mostly matched except for a couple of conditions; one of them is that some data in the NASA flare classification have a decimal point without a digit after (such as X28.), therefore that flare would not be considered in the new table, and there are multiple flares with the same classification but with different regions. The regions in the NASA table are different compared to the Solar Flares. For example, 0486 as a region in the original top 50 solar flare table is different from 10486 as a region in the NASA table. However, the project description has inconsistent outputs from step 1 and step 2 regarding region numbers,therefore it was difficult to tell how to obtain the correct region numbers. In order to create the top 50 table, I renamed the x_classification column to flare_classification in the SWL table, did an inner join or merge with the SWL and NASA table, and displayed the new table.
[6]: SF_table.rename(columns = {‘x_classification’:’flare_classification’}, inplace␣
↪= True) #renamed SWL x_classification column to flare_classification
NASA_top_50_table = NASA_table.merge(SF_table, how = ‘inner’, on =␣
↪’flare_classification’, indicator = False) #inner joined/merged SWL and NASA␣
↪table display(NASA_top_50_table) #displays top 50 table start_datetime_x end_datetime_x start_frequency end_frequency
0 1997-11-06 12:20:00 1997-11-07 08:30:00 14000 100
1 1997-11-27 13:30:00 1997-11-27 14:00:00 14000 7000
2 1997-11-27 13:30:00 1997-11-27 14:00:00 14000 7000
3 2001-09-24 10:45:00 2001-09-25 20:00:00 7000 30
4 2001-09-24 10:45:00 2001-09-25 20:00:00 7000 30
5 2005-01-15 23:00:00 2005-01-17 00:00:00 3000 40
6 2005-01-15 23:00:00 2005-01-17 00:00:00 3000 40
7 1998-05-06 08:25:00 1998-05-06 08:35:00 14000 5000
8 1998-05-06 08:25:00 1998-05-06 08:35:00 14000 5000
9 1998-05-06 08:25:00 1998-05-06 08:35:00 14000 5000
10 2003-11-03 01:15:00 2003-11-03 01:25:00 3000 1500
11 2003-11-03 01:15:00 2003-11-03 01:25:00 3000 1500
12 2003-11-03 01:15:00 2003-11-03 01:25:00 3000 1500
13 2015-05-05 22:24:00 2015-05-05 23:14:00 14000 500
14 2015-05-05 22:24:00 2015-05-05 23:14:00 14000 500
15 2015-05-05 22:24:00 2015-05-05 23:14:00 14000 500
16 2000-07-14 10:30:00 2000-07-15 14:30:00 14000 80
17 2001-04-06 19:35:00 2001-04-07 01:50:00 14000 230
18 2001-08-25 16:50:00 2001-08-25 23:00:00 8000 170
19 2001-12-28 20:35:00 2001-12-29 03:00:00 14000 350
20 2001-12-28 20:35:00 2001-12-29 03:00:00 14000 350
21 2006-12-13 02:45:00 2006-12-13 10:40:00 12000 150
22 2006-12-13 02:45:00 2006-12-13 10:40:00 12000 150
23 2002-07-20 21:30:00 2002-07-20 22:20:00 10000 2000
24 2002-07-20 21:30:00 2002-07-20 22:20:00 10000 2000
25 2002-07-20 21:30:00 2002-07-20 22:20:00 10000 2000
26 2002-07-23 00:50:00 2002-07-23 04:00:00 11000 400
27 2002-08-24 01:45:00 2002-08-24 03:25:00 5000 400
28 2002-08-24 01:45:00 2002-08-24 03:25:00 5000 400
29 2003-05-28 01:00:00 2003-05-29 00:30:00 1000 200
30 2003-05-28 01:00:00 2003-05-29 00:30:00 1000 200
31 2003-05-28 01:00:00 2003-05-29 00:30:00 1000 200
32 2003-11-02 17:30:00 2003-11-03 01:00:00 12000 250
33 2017-09-10 16:02:00 2017-09-11 06:50:00 16000 150
34 2003-11-03 10:00:00 2003-11-03 12:30:00 6000 400
35 2003-11-03 10:00:00 2003-11-03 12:30:00 6000 400
36 2005-01-17 10:00:00 2005-01-17 10:35:00 6100 1500
37 2005-01-20 07:15:00 2005-01-20 16:30:00 14000 25
38 2005-09-09 19:45:00 2005-09-09 22:00:00 10000 50
39 2005-09-09 19:45:00 2005-09-09 22:00:00 10000 50
40 2006-12-06 19:00:00 2006-12-09 00:00:00 16000 30
41 2011-08-09 08:20:00 2011-08-09 08:35:00 16000 4000
42 2012-03-07 01:00:00 2012-03-08 19:00:00 16000 30
43 2012-03-07 01:00:00 2012-03-08 19:00:00 16000 30
44 2012-03-07 01:00:00 2012-03-08 19:00:00 16000 30
45 2013-05-13 16:15:00 2013-05-13 19:10:00 16000 300
46 2013-05-13 16:15:00 2013-05-13 19:10:00 16000 300
47 2013-05-13 16:15:00 2013-05-13 19:10:00 16000 300
48 2013-05-14 01:16:00 2013-05-14 08:20:00 16000 240
49 2014-02-25 00:56:00 2014-02-25 11:28:00 14000 100
50 2014-02-25 00:56:00 2014-02-25 11:28:00 14000 100
51 2017-09-06 12:05:00 2017-09-07 08:00:00 16000 70
flare_location flare_region flare_classification cme_datetime
0 S18W63 8100 X9.4 1997-11-06 12:10:00
1 N17E63 8113 X2.6 1997-11-27 13:56:00
2 N17E63 8113 X2.6 1997-11-27 13:56:00
3 S16E23 9632 X2.6 2001-09-24 10:30:00
4 S16E23 9632 X2.6 2001-09-24 10:30:00
5 N15W05 10720 X2.6 2005-01-15 23:06:00
6 N15W05 10720 X2.6 2005-01-15 23:06:00
7 S11W65 8210 X2.7 1998-05-06 08:29:00
8 S11W65 8210 X2.7 1998-05-06 08:29:00
9 S11W65 8210 X2.7 1998-05-06 08:29:00
10 N10W83 10488 X2.7 2003-11-03 01:59:00
11 N10W83 10488 X2.7 2003-11-03 01:59:00
12 N10W83 10488 X2.7 2003-11-03 01:59:00
13 N15E79 12339 X2.7 2015-05-05 22:24:00
14 N15E79 12339 X2.7 2015-05-05 22:24:00
15 N15E79 12339 X2.7 2015-05-05 22:24:00
16 N22W07 9077 X5.7 2000-07-14 10:54:00
17 S21E31 9415 X5.6 2001-04-06 19:30:00
18 S17E34 9591 X5.3 2001-08-25 16:50:00
19 S26E90 9756 X3.4 2001-12-28 20:30:00
20 S26E90 9756 X3.4 2001-12-28 20:30:00
21 S06W23 10930 X3.4 2006-12-13 02:54:00
22 S06W23 10930 X3.4 2006-12-13 02:54:00
23 S13E90 10039 X3.3 2002-07-20 22:06:00
24 S13E90 10039 X3.3 2002-07-20 22:06:00
25 S13E90 10039 X3.3 2002-07-20 22:06:00
26 S13E72 10039 X4.8 2002-07-23 00:42:00
27 S02W81 10069 X3.1 2002-08-24 01:27:00
28 S02W81 10069 X3.1 2002-08-24 01:27:00
29 S07W20 10365 X3.6 2003-05-28 00:50:00
30 S07W20 10365 X3.6 2003-05-28 00:50:00
31 S07W20 10365 X3.6 2003-05-28 00:50:00
32 S14W56 10486 X8.3 2003-11-02 17:30:00
33 S09W92 NaN X8.3 2017-09-10 16:00:00
34 N08W77 10488 X3.9 2003-11-03 10:06:00
35 N08W77 10488 X3.9 2003-11-03 10:06:00
36 N15W25 10720 X3.8 2005-01-17 09:54:00
37 N14W61 10720 X7.1 2005-01-20 06:54:00
38 S12E67 10808 X6.2 2005-09-09 19:48:00
39 S12E67 10808 X6.2 2005-09-09 19:48:00
40 S05E64 10930 X6.5 NaN
41 N17W69 11263 X6.9 2011-08-09 08:12:00
42 N17E27 11429 X5.4 2012-03-07 00:24:00
43 N17E27 11429 X5.4 2012-03-07 00:24:00
44 N17E27 11429 X5.4 2012-03-07 00:24:00
45 N11E85 11748 X2.8 2013-05-13 16:07:00
46 N11E85 11748 X2.8 2013-05-13 16:07:00
47 N11E85 11748 X2.8 2013-05-13 16:07:00
48 N08E77 11748 X3.2 2013-05-14 01:25:00
49 S12E82 11990 X4.9 2014-02-25 01:25:00
50 S12E82 11990 X4.9 2014-02-25 01:25:00
51 S08W33 12673 X9.3 2017-09-06 12:24:00
cme_angle cme_width cme_speed plot is_halo width_lower_bound rank
0 <NA> 360 1556 PHTX True False 7
1 98 91 441 PHTX False False 49
2 98 91 441 PHTX False False 50
3 <NA> 360 2402 PHTX True False 49 4 <NA> 360 2402 PHTX True False 50
5 <NA> 360 2861 PHTX True False 49
6 <NA> 360 2861 PHTX True False 50
7 309 190 1099 PHTX False False 46
8 309 190 1099 PHTX False False 47
9 309 190 1099 PHTX False False 48
10 304 65 827 PHTX False False 46
11 304 65 827 PHTX False False 47
12 304 65 827 PHTX False False 48
13 <NA> 360 715 PHTX True False 46
14 <NA> 360 715 PHTX True False 47
15 <NA> 360 715 PHTX True False 48
16 <NA> 360 1674 PHTX True False 17
17 <NA> 360 1270 PHTX True False 18
18 <NA> 360 1433 PHTX True False 22
19 <NA> 360 2216 PHTX True False 34
20 <NA> 360 2216 PHTX True False 35
21 <NA> 360 1774 PHTX True False 34
22 <NA> 360 1774 PHTX True False 35
23 <NA> 360 1941 PHTX True False 36
24 <NA> 360 1941 PHTX True False 37
25 <NA> 360 1941 PHTX True False 38
26 <NA> 360 2285 PHTX True False 25
27 <NA> 360 1913 PHTX True False 40
28 <NA> 360 1913 PHTX True False 41
29 <NA> 360 1366 PHTX True False 31
30 <NA> 360 1366 PHTX True False 32
31 <NA> 360 1366 PHTX True False 33
32 <NA> 360 2598 PHTX True False 10
33 <NA> 360 3163 PHTX True False 10
34 293 103 1420 PHTX False False 27
35 293 103 1420 PHTX False False 28
36 <NA> 360 2547 PHTX True False 29
37 <NA> 360 882 PHTX True False 12
38 <NA> 360 2257 PHTX True False 15
39 <NA> 360 2257 PHTX True False 16
40 NaN NaN NaN PHTX False False 14
41 <NA> 360 1610 PHTX True False 13
42 <NA> 360 2684 PHTX True False 19
43 <NA> 360 2684 PHTX True False 20
44 <NA> 360 2684 PHTX True False 21
45 <NA> 360 1850 PHTX True False 43
46 <NA> 360 1850 PHTX True False 44
47 <NA> 360 1850 PHTX True False 45
48 <NA> 360 2625 PHTX True False 39
49 <NA> 360 2147 PHTX True False 23
50 <NA> 360 2147 PHTX True False 24
51 <NA> 360 1571 PHTX True False 8
region start_datetime_y max_datetime end_datetime_y
0 8100 1997-11-06 11:49:00 1997-11-06 11:55:00 1997-11-06 12:01:00
1 0720 2005-01-15 22:25:00 2005-01-15 23:02:00 2005-01-15 23:31:00
2 9632 2001-09-24 09:32:00 2001-09-24 10:38:00 2001-09-24 11:09:00
3 0720 2005-01-15 22:25:00 2005-01-15 23:02:00 2005-01-15 23:31:00
4 9632 2001-09-24 09:32:00 2001-09-24 10:38:00 2001-09-24 11:09:00 5 0720 2005-01-15 22:25:00 2005-01-15 23:02:00 2005-01-15 23:31:00
6 9632 2001-09-24 09:32:00 2001-09-24 10:38:00 2001-09-24 11:09:00
7 2339 2015-05-05 22:05:00 2015-05-05 22:11:00 2015-05-05 22:15:00
8 0488 2003-11-03 01:09:00 2003-11-03 01:30:00 2003-11-03 01:45:00
9 8210 1998-05-06 07:58:00 1998-05-06 08:09:00 1998-05-06 08:20:00
10 2339 2015-05-05 22:05:00 2015-05-05 22:11:00 2015-05-05 22:15:00
11 0488 2003-11-03 01:09:00 2003-11-03 01:30:00 2003-11-03 01:45:00
12 8210 1998-05-06 07:58:00 1998-05-06 08:09:00 1998-05-06 08:20:00
13 2339 2015-05-05 22:05:00 2015-05-05 22:11:00 2015-05-05 22:15:00
14 0488 2003-11-03 01:09:00 2003-11-03 01:30:00 2003-11-03 01:45:00
15 8210 1998-05-06 07:58:00 1998-05-06 08:09:00 1998-05-06 08:20:00
16 9077 2000-07-14 10:03:00 2000-07-14 10:24:00 2000-07-14 10:43:00
17 9415 2001-04-06 19:10:00 2001-04-06 19:21:00 2001-04-06 19:31:00
18 9591 2001-08-25 16:23:00 2001-08-25 16:45:00 2001-08-25 17:04:00
19 0930 2006-12-13 02:14:00 2006-12-13 02:40:00 2006-12-13 02:57:00
20 9767 2001-12-28 20:02:00 2001-12-28 20:45:00 2001-12-28 21:32:00
21 0930 2006-12-13 02:14:00 2006-12-13 02:40:00 2006-12-13 02:57:00
22 9767 2001-12-28 20:02:00 2001-12-28 20:45:00 2001-12-28 21:32:00
23 1890 2013-11-05 22:07:00 2013-11-05 22:12:00 2013-11-05 22:15:00
24 0039 2002-07-20 21:04:00 2002-07-20 21:30:00 2002-07-20 21:54:00
25 8395 1998-11-28 04:54:00 1998-11-28 05:52:00 1998-11-28 06:13:00
26 0039 2002-07-23 00:18:00 2002-07-23 00:35:00 2002-07-23 00:47:00
27 2192 2014-10-24 21:07:00 2014-10-24 21:41:00 2014-10-24 22:13:00
28 0069 2002-08-24 00:49:00 2002-08-24 01:12:00 2002-08-24 01:31:00
29 0808 2005-09-09 09:42:00 2005-09-09 09:59:00 2005-09-09 10:08:00
30 0649 2004-07-16 13:49:00 2004-07-16 13:55:00 2004-07-16 14:01:00
31 0365 2003-05-28 00:17:00 2003-05-28 00:27:00 2003-05-28 00:39:00
32 0486 2003-11-02 17:03:00 2003-11-02 17:25:00 2003-11-02 17:39:00
33 0486 2003-11-02 17:03:00 2003-11-02 17:25:00 2003-11-02 17:39:00
34 0488 2003-11-03 09:43:00 2003-11-03 09:55:00 2003-11-03 10:19:00
35 8307 1998-08-19 21:35:00 1998-08-19 21:45:00 1998-08-19 21:50:00
36 0720 2005-01-17 06:59:00 2005-01-17 09:52:00 2005-01-17 10:07:00
37 0720 2005-01-20 06:36:00 2005-01-20 07:01:00 2005-01-20 07:26:00
38 0808 2005-09-09 19:13:00 2005-09-09 20:04:00 2005-09-09 20:36:00
39 9733 2001-12-13 14:20:00 2001-12-13 14:30:00 2001-12-13 14:35:00
40 0930 2006-12-06 18:29:00 2006-12-06 18:47:00 2006-12-06 19:00:00
41 1263 2011-08-09 07:48:00 2011-08-09 08:05:00 2011-08-09 08:08:00
42 1429 2012-03-07 00:02:00 2012-03-07 00:24:00 2012-03-07 00:40:00
43 0808 2005-09-08 20:52:00 2005-09-08 21:06:00 2005-09-08 21:17:00
44 0486 2003-10-23 08:19:00 2003-10-23 08:35:00 2003-10-23 08:49:00
45 1748 2013-05-13 15:48:00 2013-05-13 16:05:00 2013-05-13 16:16:00
46 9733 2001-12-11 07:58:00 2001-12-11 08:08:00 2001-12-11 08:14:00
47 8307 1998-08-18 08:14:00 1998-08-18 08:24:00 1998-08-18 08:32:00
48 1748 2013-05-14 00:00:00 2013-05-14 01:11:00 2013-05-14 01:20:00
49 1990 2014-02-25 00:39:00 2014-02-25 00:49:00 2014-02-25 01:03:00
50 8307 1998-08-18 22:10:00 1998-08-18 22:19:00 1998-08-18 22:28:00
51 2673 2017-09-06 11:53:00 2017-09-06 12:02:00 2017-09-06 12:10:00 This block of code is for Part 2 Question 2. The best row would be row 0 in the table below, or row 9 in the table above (AKA the NASA_top_50_table). I made matches based on if the end_datetimes and regions are the same from the SWL and the NASA table. The rows with the same end_datetimes had the same start_datetimes, but rows with the start_datetimes do not necessarily have the same end_datetimes. The assumption for the regions is that regions from both NASA and SWL are the exact same number, regardless of the project description being inconsistent with region values in part 1. Although these region numbers are somewhat identical (AKA 0846 vs 10846), treating them the same would not be a good assumption. Once the table is made, I chose row 0 since the start/end datetimes from the NASA table (start_datetime_x and end_datetime_x) were the closest to the start/end datetimes from the SWL table (start_datetime_y and end_datetime_y). In order to create the best row table, I created an array to gather the data of the matched rows, iterated through the top 50 table, created datetime objects for the end_datetimes in the top 50 and NASA table, checked the dates to see if they were equal and if the region numbers were equal, then added those rows in the array. After that, I created a dataframe with the new data gathered.
[7]: NASA_best_row_arr = [] #2D array for gathering rows with matched data for idx,row in NASA_top_50_table.iterrows(): #iterates rows in top 50 table end_datetime_NASA = datetime.strptime(row[‘end_datetime_x’], ‘%Y-%m-%d %H:
↪%M:%S’) #next two lines extract datetime fields and converts them to␣
↪datetime objects from the SWL and NASA data end_datetime_SF = datetime.strptime(row[‘end_datetime_y’], ‘%Y-%m-%d %H:%M:
↪row[‘flare_region’] == row[‘region’]: #checks if dates and regions from the␣ ↪NASA and SWL data match
NASA_best_row_arr.append(row.tolist()) #adds data to 2D array and uses␣
↪tolist in order to not add a Series object
NASA_best_row_table = pd.DataFrame(NASA_best_row_arr, columns =␣
↪NASA_top_50_table.columns) #creates and displays best row DataFrame display(NASA_best_row_table)
start_datetime_x end_datetime_x start_frequency end_frequency
0 1998-05-06 08:25:00 1998-05-06 08:35:00 14000 5000
1 2001-08-25 16:50:00 2001-08-25 23:00:00 8000 170
flare_location flare_region flare_classification cme_datetime
0 S11W65 8210 X2.7 1998-05-06 08:29:00
1 S17E34 9591 X5.3 2001-08-25 16:50:00
cme_angle cme_width cme_speed plot is_halo width_lower_bound rank region
0 309 190 1099 PHTX False False 48 8210
1 <NA> 360 1433 PHTX True False 22 9591
start_datetime_y max_datetime end_datetime_y 0 1998-05-06 07:58:00 1998-05-06 08:09:00 1998-05-06 08:20:00
1 2001-08-25 16:23:00 2001-08-25 16:45:00 2001-08-25 17:04:00
I added a rank column to the NASA table. The best matching row has a rank of 239.
[8]: rank_arr = list(NASA_table.index) #created an array for ranks, which is just␣
↪numbers from 1-518 del rank_arr[0] #deleted the first element since there is no rank 0 rank_arr.append(len(NASA_table)) #added the length of the NASA table since␣ ↪there are 518 rows total
NASA_table[‘rank’] = rank_arr #created new rank column with the array display(NASA_table) #displays NASA dataframe with the rank column start_datetime end_datetime start_frequency end_frequency
0 1997-04-01 14:00:00 1997-04-01 14:15:00 8000 4000
1 1997-04-07 14:30:00 1997-04-07 17:30:00 11000 1000
2 1997-05-12 05:15:00 1997-05-14 16:00:00 12000 80
3 1997-05-21 20:20:00 1997-05-21 22:00:00 5000 500
4 1997-09-23 21:53:00 1997-09-23 22:16:00 6000 2000
.. … … … …
513 2017-09-04 20:27:00 2017-09-05 04:54:00 14000 210
514 2017-09-06 12:05:00 2017-09-07 08:00:00 16000 70
515 2017-09-10 16:02:00 2017-09-11 06:50:00 16000 150
516 2017-09-12 07:38:00 2017-09-12 07:43:00 16000 13000
517 2017-09-17 11:45:00 2017-09-17 12:35:00 16000 900
flare_location flare_region flare_classification cme_datetime
0 S25E16 8026 M1.3 1997-04-01 15:18:00
1 S28E19 8027 C6.8 1997-04-07 14:27:00
2 N21W08 8038 C1.3 1997-05-12 05:30:00
3 N05W12 8040 M1.3 1997-05-21 21:00:00
4 S29E25 8088 C1.4 1997-09-23 22:02:00
.. … … … …
513 S10W12 12673 M5.5 2017-09-04 20:12:00
514 S08W33 12673 X9.3 2017-09-06 12:24:00
515 S09W92 NaN X8.3 2017-09-10 16:00:00
516 N08E48 12680 C3.0 2017-09-12 08:03:00
517 S08E170 NaN NaN 2017-09-17 12:00:00
cme_angle cme_width cme_speed plot is_halo width_lower_bound rank
0 74 79 312 PHTX False False 1
1 <NA> 360 878 PHTX True False 2
2 <NA> 360 464 PHTX True False 3
3 263 165 296 PHTX False False 4
4 133 155 712 PHTX False False 5
.. … … … … … … …
513 <NA> 360 1418 PHTX True False 514
514 <NA> 360 1571 PHTX True False 515
515 <NA> 360 3163 PHTX True False 516
516 124 96 252 PHTX False False 517
517 <NA> 360 1385 PHTX True False 518
[518 rows x 15 columns]
This block of code is for Part 2 Question 3. For the analysis, I want to show if the Top 50 Solar Flares tend to have Halo CMEs. In order to do this, I created two boolean arrays to gather the is_halo values from the Top 50 table and from the overall NASA table. After this, I created a bar graph to show the spread between the number of solar flares with a Halo_CME from the Top 50 solar flares and from all of the solar flares. In order to put in the quantity from each table, I used the len() function, which determines the number of values that are True in this case.
This plot is a bar graph that has the quantity of solar flares with Halo_CMEs from the top 50 solar flares and all of the solar flares. The x-axis represents the table I’m using, and the y-axis goes up to 300 since there are a little less than 300 solar flares with Halo_CMEs in the NASA overall table. After creating the graph, it is clearly shown that the Top 50 solar flares do not tend to have a Halo_CME in comparison to other solar flares. There is a big spread from each table that proves this, which is indicated in the bar graph.
Reviews
There are no reviews yet.