************************************************************************* /* The purpose of this code is to calculate active share starting in 2020 using the CRSP database The process uses 19 benchmarks: S5, S4, S6, R1, R2, R3, and RM (plus the value and growth components of each) -The value and growth components of R3 are excluded */ ************************************************************************* ************************************************************************* *Specify the starting year scalar year_min = 2020 ************************************************************************* ************************************************************************* *Specify file locations *Output Directory local Output "C:\Users\tbriley\Desktop\Data Updates\CRSP Active Share\File Output" *CRSP Fund Holdings local CRSP_Holdings "C:\Users\tbriley\Desktop\Data Updates\CRSP Active Share\Files For Access\CRSP Holdings.dta" *CRSP Quarterly Fund Summary local CRSP_Summary "C:\Users\tbriley\Desktop\Data Updates\CRSP Active Share\Files For Access\CRSP Quarterly Fund Summary.dta" *Benchmark Holdings local Benchmark_Holdings "C:\Users\tbriley\Desktop\Data Updates\CRSP Active Share\Files For Access\Updated Research Benchmark Holdings - Through 2021M9 - Generated 7 Feb 2022.csv" *CRSP Fundno to WFICN Map local CRSP_Fundno_WFICN_Map "C:\Users\tbriley\Desktop\Data Updates\CRSP Active Share\Files For Access\WFICN CRSP Fundno Map.dta" ************************************************************************* ************************************************************************* *Set the output directory cd "`Output'" ************************************************************************* ************************************************************************* *Prepare the CRSP fundno to WFICN map *Open the base mapping file use "`CRSP_Fundno_WFICN_Map'", clear *Keep only what is needed keep wficn crsp_fundno drop if missing(wficn) drop if missing(crsp_fundno) *Drop any instances where the same CRSP fundno is assigned to two different WFICN quietly by crsp_fundno, sort: gen dup = cond(_N==1,0,_n) drop if dup > 0 drop dup *Save this file save CRSP_Fundno_WFICN_Mapping, replace ************************************************************************* ************************************************************************* *Create a mapping of WFICN to CRSP portno using CRSP fundno *Open the quarterly fund summary use "`CRSP_Summary'", clear *Keep only what is needed keep crsp_fundno crsp_portno caldt *Drop if missing CRSP portno drop if missing(crsp_portno) *Set the starting point drop if year(caldt) < year_min *Generate a Stata date variable gen ym = ym(year(caldt),month(caldt)) drop caldt *Merge in WFICN merge m:1 crsp_fundno using CRSP_Fundno_WFICN_Mapping drop if _merge != 3 drop _merge crsp_fundno erase CRSP_Fundno_WFICN_Mapping.dta *Drop any duplicates gduplicates drop *Drop any instances where the same CRSP portno and date combination is assigned to two different WFICN quietly by crsp_portno ym, sort: gen dup = cond(_N==1,0,_n) drop if dup > 0 drop dup *Drop any instances where the same WFICN and date combination is assigned to two different CRSP portno quietly by wficn ym, sort: gen dup = cond(_N==1,0,_n) drop if dup > 0 drop dup *Save this file save CRSP_Portno_WFICN_Mapping, replace ************************************************************************* ************************************************************************* *Prepare the fund holdings file *Open the base CRSP holdings use "`CRSP_Holdings'", clear *Keep only what is needed keep crsp_portno report_dt permco market_val *Keep only useful data points drop if missing(crsp_portno) | missing(report_dt) | missing(permco) | missing(market_val) drop if market_val <= 0 *Set the starting point drop if year(report_dt) < year_min *Generate a Stata date variable gen ym = ym(year(report_dt),month(report_dt)) drop report_dt *Keep only quarterly results keep if month(dofm(ym)) == 3 | month(dofm(ym)) == 6 | month(dofm(ym)) == 9 | month(dofm(ym)) == 12 *Move results to the permco level collapse (sum) market_val, by(crsp_portno ym permco) *Generate benchmark weights by crsp_portno ym, sort: egen Total_Assets = sum(market_val) gen Fund_Weight = market_val / Total_Assets drop market_val *Calculate a count of holdings gegen Holding_Count = count(Fund_Weight), by(crsp_portno ym) *Calcuate the max weight gegen Max_Weight = max(Fund_Weight), by(crsp_portno ym) *Save the general holdings information preserve keep crsp_portno ym Holding_Count Max_Weight Total_Assets gduplicates drop save General_Holdings_Information, replace restore drop Holding_Count Max_Weight Total_Assets *Save the individual holdings save Edited_CRSP_Holdings, replace ************************************************************************* ************************************************************************* *Prepare the benchmark holdings file *Open the benchmark holdings import delimited "`Benchmark_Holdings'", clear *Keep only certain benchmarks keep if bmk == "R1" | /// bmk == "R1G" | /// bmk == "R1V" | /// bmk == "R2" | /// bmk == "R2G" | /// bmk == "R2V" | /// bmk == "R3" | /// bmk == "RM" | /// bmk == "RMG" | /// bmk == "RMV" | /// bmk == "S5" | /// bmk == "S5G" | /// bmk == "S5V" | /// bmk == "S4" | /// bmk == "S4G" | /// bmk == "S4V" | /// bmk == "S6" | /// bmk == "S6G" | /// bmk == "S6V" *Create a Stata date variable tostring yyyymm, replace gen year = substr(yyyymm,1,4) gen month = substr(yyyymm,5,2) destring year month, replace gen ym = ym(year,month) drop yyyymm year month order bmk ym *Set the starting point drop if year(dofm(ym)) < year_min *Rename some variables rename bmk Benchmark rename wght Benchmark_Weight *Create a numeric ID for each benchmark preserve keep Benchmark gduplicates drop gen ID = _n save Benchmark_ID, replace restore merge m:1 Benchmark using Benchmark_ID drop _merge order Benchmark ID *Create separate benchmark files egen max = max(ID) scalar maxx = max drop max quietly forvalues i=1/`=scalar(maxx)' { preserve keep if ID == `i' drop ID Benchmark noisily save Benchmark_`i', replace restore } ************************************************************************* ************************************************************************* *Calculate active share relative to each benchmark *Open the fund holdings use Edited_CRSP_Holdings, clear *Begin the loop quietly forvalues i=1/`=scalar(maxx)' { *Preserve the fund holdings preserve *Merge in the benchmark holdings file fmerge m:1 permco ym using Benchmark_`i' drop if _merge == 2 drop _merge erase Benchmark_`i'.dta *Set the benchmark weight to zero for stocks not in the benchmark replace Benchmark_Weight = 0 if missing(Benchmark_Weight) *Calculate AS egen MIN = rowmin(Fund_Weight Benchmark_Weight) gegen sum = sum(MIN), by(crsp_portno ym) gen AS = 1 - sum drop sum MIN *Keep only what is neccessary keep crsp_portno ym AS gduplicates drop *Set the benchmark ID gen ID = `i' *Re-order the vars order crsp_portno ym ID *Save this file noisily save AS_`i', replace *Restore the fund holdings restore } *Erase the fund holdings erase Edited_CRSP_Holdings.dta ************************************************************************* ************************************************************************* *Create a singular active share file *Append the individual files together use AS_1, clear forvalues i=2/`=scalar(maxx)' { append using AS_`i' erase AS_`i'.dta } erase AS_1.dta *Bring back in the benchmark string ID merge m:1 ID using Benchmark_ID drop if _merge != 3 drop _merge ID erase Benchmark_ID.dta *Reshape the data tostring crsp_portno ym, replace gen DOUBLE = crsp_portno + "_" + ym drop crsp_portno ym rename AS AS_ reshape wide AS_, i(DOUBLE) j(Benchmark) string split DOUBLE, p(_) drop DOUBLE destring DOUBLE1 DOUBLE2, replace rename DOUBLE1 crsp_portno rename DOUBLE2 ym order crsp_portno ym ************************************************************************* ************************************************************************* *Combine the general holdings file with the active share data, add in WFICN, and perform some final prep *Merge in the general holdings file merge 1:1 crsp_portno ym using General_Holdings_Information drop _merge erase General_Holdings_Information.dta *Merge in WFICN merge 1:1 crsp_portno ym using CRSP_Portno_WFICN_Mapping drop if _merge == 2 drop _merge erase CRSP_Portno_WFICN_Mapping.dta *Remove the variable labels foreach var of varlist _all { label var `var' "" } *Reformat the date variable format %tm ym *Find the data range gegen min = min(ym) gegen max = max(ym) gen min_year = year(dofm(min)) gen min_month = month(dofm(min)) gen max_year = year(dofm(max)) gen max_month = month(dofm(max)) scalar minn_year = min_year scalar minn_month = min_month scalar maxx_year = max_year scalar maxx_month = max_month drop min_year min_month max_year max_month min max *Perform some final prep renvars _all, lower order crsp_portno wficn ym total_assets holding_count max_weight order as_*, last sort crsp_portno ym *Export the file as a CSV export delimited using "CRSP Active Share - `=scalar(minn_year)'m`=scalar(minn_month)' to `=scalar(maxx_year)'m`=scalar(maxx_month)' - Generated`c(current_date)'.csv", replace *Save the file as a Stata dataset save "CRSP Active Share - `=scalar(minn_year)'m`=scalar(minn_month)' to `=scalar(maxx_year)'m`=scalar(maxx_month)' - Generated`c(current_date)'.dta", replace *************************************************************************