* reading in the visit data infile long mcode wk wt sfh using z:documents/teach/datasets/preglong.txt * drop case corresponding to variable names drop in 1 * format mcode so it will print correctly format %9.0f mcode * missing data patterns list if mcode==. | wk==. | sfh==. | wt==. * Now I want to check how those cases with missing data relate * to other measurments for the same subject * number of visits for each subject egen nvisit= count(mcode), by(mcode) * missing data patterns egen nwk = count(wk), by(mcode) egen nsfh = count(sfh), by(mcode) egen nwt = count(wt), by(mcode) sort mcode wk list if nvisit!=nwk list if nvisit!=nsfh list if nvisit!=nwt * visits with missing wk appear to be a duplicate of another visit * visit with missing sfh or wt do not appear to be a duplicate * so we will have to continue to consider them drop if wk==. drop nvisit nwk nsfh nwt egen nvisit= count(mcode), by(mcode) egen nwk = count(wk), by(mcode) egen nsfh = count(sfh), by(mcode) egen nwt = count(wt), by(mcode) sort mcode wk * week of first visit with a SFH measurement egen enrolwkSFH= min(wk) if sfh!=., by(mcode) gen visit1SFH=0 replace visit1SFH=1 if wk==enrolwkSFH * verify only one case with visit1SFH==1 for mcode egen grbg= total(visit1SFH), by(mcode) table grbg drop grbg summ visit1SFH if visit1SFH * week of first visit with a wt measurement egen enrolwkWT= min(wk) if wt!=., by(mcode) gen visit1WT=0 replace visit1WT=1 if wk==enrolwkWT * verify only one case with visit1WT==1 for mcode egen grbg= total(visit1WT), by(mcode) table grbg drop grbg summ visit1WT if visit1WT * total number of visits during weeks 20 - 24 and 25-30 with a SFH measurment egen grbg= count(sfh) if wk>=20 & wk<=24, by(mcode) egen nSFH20= mean(grbg), by(mcode) replace nSFH20= 0 if nSFH20==. drop grbg egen grbg= count(sfh) if wk>=25 & wk<=30, by(mcode) egen nSFH25= mean(grbg), by(mcode) replace nSFH25= 0 if nSFH25==. drop grbg tabulate nSFH20 nSFH25 if visit1SFH * minimum ratio of SFH to week during weeks 20 - 30 egen grbg= min(sfh / wk) if wk >= 20 & wk <= 30, by(mcode) egen minSFHperWK= mean(grbg), by(mcode) drop grbg tabstat minSFHperWK if visit1SFH, col(stat) stat(n mean sd min q max) * change in SFH per change in week during weeks 20-30 egen grbg= min(wk) if wk>=20 & wk<=30 & sfh!=., by(mcode) egen firstSFH20wk= mean(grbg), by(mcode) drop grbg egen grbg= max(wk) if wk>=20 & wk<=30 & sfh!=., by(mcode) egen lastSFH20wk= mean(grbg), by(mcode) drop grbg gen grbg= . replace grbg= sfh if wk==firstSFH20wk egen firstSFH20= mean(grbg), by(mcode) replace grbg= . replace grbg= sfh if wk==lastSFH20wk egen lastSFH20= mean(grbg), by(mcode) drop grbg gen ratio2030= . replace ratio2030= (lastSFH20 - firstSFH20) / (lastSFH20wk - firstSFH20wk) /// if (lastSFH20wk != firstSFH20wk) tabstat ratio2030 if visit1SFH, col(stat) stat(n mean sd min q max) list mcode wk sfh wt ratio2030 if ratio2030<0 * LS slope of SFH during weeks 20-30 egen grbg= total(sfh) if sfh!=. & wk>=20 & wk<=30, by(mcode) egen Sy= mean(grbg), by(mcode) drop grbg egen grbg= total(wk) if sfh!=. & wk>=20 & wk<=30, by(mcode) egen Sx= mean(grbg), by(mcode) drop grbg gen trash= wk^2 egen grbg= total(trash) if sfh!=. & wk>=20 & wk<=30, by(mcode) egen Sxx= mean(grbg), by(mcode) drop grbg trash gen trash= sfh * wk egen grbg= total(trash) if sfh!=. & wk>=20 & wk<=30, by(mcode) egen Syx= mean(grbg), by(mcode) drop grbg trash gen slope= nSFH20 + nSFH25 replace slope= . if slope==0 replace slope= (Syx - Sy * Sx / slope) / (Sxx - Sx * Sx / slope) tabstat slope if visit1SFH, col(stat) stat(n mean sd min q max) scatter slope ratio2030 * If we want the slope between successive measurements, we can use * the rank function in egen to create new variables as shown egen wkrank= rank(wk) if sfh!=. & wk>=20 & wk<=30, by(mcode) egen grbg= mean(wk) if wkrank==1, by(mcode) egen wk1= mean(grbg), by(mcode) egen trash= mean(sfh) if wkrank==1, by(mcode) egen sfh1= mean(trash), by(mcode) drop grbg trash egen grbg= mean(wk) if wkrank==2, by(mcode) egen wk2= mean(grbg), by(mcode) egen trash= mean(sfh) if wkrank==2, by(mcode) egen sfh2= mean(trash), by(mcode) drop grbg trash egen grbg= mean(wk) if wkrank==3, by(mcode) egen wk3= mean(grbg), by(mcode) egen trash= mean(sfh) if wkrank==3, by(mcode) egen sfh3= mean(trash), by(mcode) drop grbg trash gen delta12= (sfh2 - sfh1) / (wk2 - wk1) gen delta23= (sfh3 - sfh2) / (wk3 - wk2) egen mindelta= rowmin(delta12 delta23) * Also construct whatever variables you want for weight * Then we can merge these data with the data in pregout.txt * I first get rid of all the variables I do not want. * It is easier to use the command keep keep if visit1SFH keep mcode nvisit enrolwkSFH nSFH20 nSFH25 minSFHperWK ratio2030 slope mindelta save z:documents/teach/datasets/pregsummary * Now I read in the file pregout.txt clear infile long mcode ht age sga parity smoker bweight sex gesage using z:documents/teach/datasets/pregout.txt drop in 1 * Now I merge merge 1:1 mcode using z:documents/teach/datasets/pregsummary drop _merge save z:documents/teach/datasets/pregsummary, replace